The role of data in development has been articulated at the highest policy levels. The Post-2015 Development Agenda report called for a “data revolution for sustainable development, with a new international initiative to improve the quality of information available to citizens”. However, for the potential of a data revolution to be realised, serious challenges in relation to control, capacity, access, efficacy and privacy must be addressed.
The amount of data generated daily is growing at an astounding rate, so quickly that 90% of the world’s data has been produced in just the past two years. In our daily interactions with products and services, from financial transactions, Facebook likes, to GPS signals emitted by mobile phones, we create massive amounts of data. This mass of data, typically described as “Big Data”, can be “cleaned up” and analysed to provide new insights in human behaviour, and used as a tool to monitor phenomena in a faster, more efficient and less costly manner.
While businesses see this unprecedented access to data as an opportunity to target and customise marketing messages, data has an increasingly central role to play in the development context, where new technologies, particularly mobile technology, are used as alternative means of reaching and helping communities, overcoming weak infrastructure or telecommunications, and delivering services.
The data and development agenda was further cemented by the High-Level Panel of Eminent Persons Post-2015 Development Agenda report (full report, executive summary, annex), which called for a “data revolution for sustainable development, with a new international initiative to improve the quality of statistics and information available to citizens”.
National governments and international governmental organisations have started to explore the potential that this data deluge carries. The United Nations initiative Global Pulse was set up specifically to look at the application of big data to development, the United Kingdom has allocated part of its 2014 budget to the creation of a Big Data research centre, the Kenya Open Data Initiative has published more than 430 government datasets online, and the Open Government Partnership now includes 64 participating countries.
Big Data, Government Data and Open Data
The different terms used to describe data can sometimes create confusion. As the diagram (fig. 1 ) exemplifies, there are areas of overlap and intersection between the concepts of Big Data, Open Data and Government Data.
Despite “Big Data” becoming an increasingly popular term, it still has no fixed definition. As Global Pulse points out “Big Data is an umbrella term referring to the large amounts of digital data continually generated by the global population”. Rather than just its size, practitioners emphasise that the most valuable feature of big data is the insight in human behaviour that “comes from the patterns that can be derived by making connections between pieces of data, about an individual, about individuals in relation to others, about groups of people, or simply about the structure of information itself” (Boyd and Crafword, 2012  ).
Government data is data created and held by public authorities, which could previously be made available on request through Freedom of Information Acts or public-record laws. On the other hand, Open Data is defined by its use. Gurin (2014) defines open data as “accessible public data that people, companies, and organisations can use to launch new ventures, analyse patterns and trends, make data-driven decisions, and solve complex problems”.
It is these qualities that suggest that while Big Data has captured the most attention, the most promising opportunity for the proponents of the Post-2015 “data revolution” is open data. To this end there is a shift towards Open Development, which promotes the use of open data to make development initiatives more accountable, and to promote a more inclusive, bottom-up approach.
Data and policy-making
The potential for data to inform decisions and result in public policies that are shaped by facts and reflect people’s needs has generated significant interest in the fields of research and public policy.
Gathering data to inform policies has traditionally been a complex, time consuming and expensive processes, often requiring researchers to choose between in-depth studies of small population groups, or superficial inquiries into large population groups. Consider for instance surveys and national censuses.
Data is anticipated to open-up information on an unprecedented scale and offer new ways of conceiving solutions and holding governments to account. Some examples include:
- Transparency and accountability: initiatives like the Aid Transparency Initiative (IATI) and Follow The Money call for data on aid, development and humanitarian spending to be made publicly available and accessible, to increase the effectiveness of programmes;
- Analysis and information: e-Health systems – develop policy models that can be visualised and monitored;
- Insight and communication: Crowdsourcing and analysing social networking data can be an effective way to engage with the crowds, listen to the public. For example, a study by researchers at Harvard and MIT demonstrated that the 2010 cholera outbreak in Haiti could have been mapped faster through Twitter and online news report mining.
Challenges and Issues
In the policy-making and development field, the enthusiasm around Open Data stems from the expectation that greater quantities of accessible data will help researchers, advocates and decision-makers gain new insights and empower people. However, for the potential of a data revolution to be realised, serious challenges in relation to control, capacity, access, efficacy, privacy and consent must be addressed.
Control, Access and Capacity
When it comes to Big Data, much of the data is the property of private companies – such as social network firms – which sometimes make it available for a fee, or offer small datasets cost-free to university academics. However, Boyd and Crawford highlight that “Large data companies have no responsibility to make their data available, and they have total control over who gets to see them”.
This restricted access increases the risk of a “new kind of digital divide: the Big Data rich and the Big Data poor”. While the North/ South digital divide is narrowing through an increased amount of digital users across the globe, the skills, technology and resources for data analytics are not evenly spread. Actors working in the developing contexts know they should be using data, but they don’t necessarily have access to the resources. As a sector, there is a need for a major training and skills programme to understand how to deliver data driven projects and understand the limitations of data.
Moreover, to counter the risk of reinforcing a hierarchical dynamic in which experts in the developed world have the means to do the observing, while the “observed” remain passive data-subjects, more steps need to be taken to build skills, capacity and infrastructure, to encourage civil society involvement. For instance, initiatives like hackathons – which gather computer programmers, open data experts and civil society advocates on collective projects, to produce software for a specific focus, such as governance or development – provide opportunities for a more inclusive participation in development initiatives.
Do the numbers speak for themselves?
“Facts all come with points of view, Facts don’t do what I want them to.”– David Byrne – Talking Heads
The availability of data through new technologies has led Big Data advocates to talk of a new era of research in which “numbers will speak for themselves”. However, as noted by Letouzé  “New’, ‘Big’, ‘official’ or ‘traditional’: data is data. It has its flaws and its value. …Only those who wrongfully assume that the data is an accurate picture of reality can be deceived”. Data must be examined and interpreted to extract meaningful information, and traditional methodological issues continue to stand. For instance, in the case of sentiment or opinion mining – scraping social media content to translate what people express in text online into hard data – the limits are set by different languages, varying attitudes to online presence, and above all the difficulty of relating reported feelings to facts. “New” data analysis should be approached with the same critical awareness applied to traditional research, not considered as the bearer of fact-based truth.
Privacy & Consent
In 2013 whistleblower Edward Snowden revealed the extent of NSA surveillance. Snowden’s revelations triggered worldwide debate on privacy, security and consent. It is against this backdrop that conversations about how to open up more data while protecting individuals’ privacy and safety are taking place.
Advocates of Big Data in policy-making urge private companies, NGOs, governments and authorities to engage in “data philanthropy”, sharing their datasets for analysis. This raises issues of consent. Studies have proven a phenomenon, sometimes referred to as the “Mosaic Effect”, whereby combining different sets can result in de-anonymised data, and without a contract specifying consent there is no guarantee that a different actor will not use the data for other means.
Moreover, consent can only be meaningful if it can be refused. Questions need to be raised when individuals are required to share personal data to access basic services, particularly when it involves vulnerable individuals or communities, and when it ties itself to corporations as third parties in the process. In the youth field, one striking example is Mexico’s Personal Identity card for minors, a database managed by multinational technology firm Unisys, which includes digital records of fingerprints, a photograph, a signature and for the first time in the world, iris scans.
Open Government initiatives are laudable, but caution should be used to avoid the risk of indiscriminately opening up delicate government data. For instance, the introduction of e-Health systems to improve the delivery of healthcare in poorer regions has yielded positive results. However, the digitisation and sharing of health-related data without a clear policy framework for the protection of privacy could result in a collection of information opening patients to new risks, potentially leading to stigma, social exclusion or persecution.
While there is no straightforward answer to these debates, a clear policy framework for the protection of privacy is essential. Collective efforts by actors in the field of open data are taking shape: in July at the Open Knowledge Festival crowd-sourced ideas to include in a Open Data Manifesto draft, or Sunlight Foundation’s living set of Open Data Policy Guidelines. But are these initiatives taking place at the fringe?
To meet the expectations around the “data revolution” in development and policy-making, we must critically consider which steps are necessary to make data a tool for positive change, while not resulting in increased surveillance.
The need for greater transparency in development work should be matched by transparency and accountability in the use of new forms of data for research, and the creation of legal frameworks and good practice guidelines to protect the right to privacy. All data collection effort should be underscored by the ‘do not harm’ principle. For instance, in extreme contexts where individuals might be exposed to potential harm through data insecurity, data should be collected responsibly and anonymised. If re-identification is possible, then data should not be collected at all.
Talk of a data revolution is promising, but creating the environment for the inclusive access to data required to meet the aspirations of the post 2015 agenda will require more than just words. As Ben Taylor writes, “It’s when data goes beyond reporting on poor people’s lives and starts to provide those people with the data and information to shape change for themselves that it starts to get interesting”.