“’Big data’ is a term that has come into vogue only in the last couple of years, and it refers to the tremendous explosion in volume and velocity and variety of digital data that is being produced around the world,” said Robert Kirkpatrick, Director of UN Global Pulse. “The statistics are somewhat astonishing: there was more data produced in 2011 alone than in all of the rest of human history combined back to the invention of the alphabet.”
Mr. Kirkpatrick’s department at the United Nations—Global Pulse—deals almost exclusively with big data, and its existence speaks volumes as to how some multilateral organizations are working to use this information.
“Global Pulse is an initiative that came out of the global financial crisis, at which point there was a recognition that we now live in this hyper-connected world where information movies at the speed of light, and a crisis can be all around the world very, very quickly, but we’re still using two- to three-year-old statistics to make most policy decisions.”
“A lot of this data is so new that even the private sector, which is the source of much of the data, is still struggling to learn how to use it,” Mr. Kirkpatrick said.
Big data makes some stunning claims. It can predict, with 90 percent accuracy, household incomes just by the frequency and amount people use mobile phones. It can predict unemployment spikes by examining online conversations about work. And health crises can be detected from spikes in Google searches for various symptoms.
However, this data mining has its critics. “Right now, the conversation around big data is very polarized,” said Mr. Kirkpatrick. “You might call it ‘Germany vs. Mark Zuckerberg.’ You have the very conservative prohibition against reuse without explicit permission that has become pervasive in the European Union; it’s a very guarded approach. At the opposite end of the spectrum, you have companies that live on big data, which are saying privacy is dead, profit is king. We’re trying to insert a third pole into this debate, which is to say, big data is a raw public good.”
The interview was conducted by Marie O’Reilly, Publications Officer at the International Peace Institute.
Listen to interview (or download mp3):
Marie O’Reilly: I’m here today with Robert Kirkpatrick, Director of UN Global Pulse. Thank you very much for speaking with us today, Robert. I’m wondering if you can briefly explain for our listeners, what is big data and what is UN Global Pulse? We’re particularly interested in hearing more about your strategy of combining research and practical field-based pulse labs. So why have you chosen this strategy, and what are you hoping to achieve as Global Pulse?
Robert Kirkpatrick: Sure, so big data is a term that has come into vogue only in the last couple of years, and it refers to the tremendous explosion in volume and velocity and variety of digital data that is being produced around the world. The statistics are somewhat astonishing: there was more data produced in 2011 alone than in all of the rest of human history combined back to the invention of the alphabet. This data is digital in nature; it is the data that is produced as people buy and sell goods, as they search for information, as they browse the web, as they share their day-by-day experiences with friends and family on social networks.
This data is already being used by the private sector to transform how they make decisions, how they understand their customers, how they identify new markets, and how they track their own operations. So, we began to ask at Global Pulse, how can we use this data to understand when people are getting sick, when their losing their jobs, when their struggling to afford food and medicine?
Global Pulse is an initiative that came out of the global financial crisis, at which point there was a recognition that we now live in this hyper-connected world where information moves at the speed of light, and a crisis can be all around the world very, very quickly, but we’re still using two- to three-year-old statistics to make most policy decisions. The irony is, we’re swimming in this ocean of digital data, which is being produced for free all around us.
Global Pulse is essentially an R & D [research and development] lab for the UN system. We’re focused on three objectives: one, learning how to harness it for development; two, building free and open-source tools that help practitioners do that; and then three, supporting mainstream adoption of these approaches into policy-making.
Our strategy is to form global partnerships with the organizations that have the data, the technology, and the human expertise to do this analysis. So that means private sector, that means academia, and at the same time, it is to take this work to the field, to the country level, on the ground in developing countries to try out new approaches. So we’re launching labs where data scientists and engineers and policy experts work side by side on R & D projects to learn how to take this forward.
MO: So a lot of your work investigates the role of real-time data in development. I wonder, could you give us some examples of how real-time data has already contributed practically to the fight against poverty, hunger, and disease? Or indeed, speak a little bit about your own proof of concept projects that you initially carried out, or cases that have already been seen to make an impact.
RK: Well a lot of this data is so new that even the private sector, which is the source of much of the data, is still struggling to learn how to use it. I think if we list poverty, hunger, and disease, the area where the greatest evidence of efficacy has already been attained is in the area of health.
There have been tremendous demonstrations that when people get sick, their behavior online changes. They search for their symptoms on Google. They share their symptoms and their own hypothesis about what’s affecting them through social media. I think there’s been a lot of use for example in showing how population movements in areas where there is a pandemic under way can be used to project where the pandemic will be spread to next. So we’ve seen huge evidence that essentially health-related human behavior change creates digital echoes that can be used both to track what’s happening and to predict what’s going to happen next.
In food security, in poverty, this is still a fairly untested area, but we’ve done some research, for example, in Indonesia, where we showed that we can begin to approximate the consumer price index for basic food stuffs by looking at key words and online mood related to food through social media. Indonesia is probably one of the best places in the world to study content of social media–Jakarta produces more tweets than any other city in existence today. So we’ve seen some very interesting work in that area.
In poverty, mobile carriers, for example, can tell us that they can predict with 90 percent accuracy your household income simply by the frequency and amount by which you top off your air time on your mobile phones. And we did a project last year, where we found that by looking at work-related conversations online through blogs and forums in the US and Ireland, we were able to predict unemployment spikes, because people know something’s wrong at the office, and in the months before they lose their jobs, they talk about work differently.
MO: You mentioned Indonesia, and you mentioned Ireland and the US, and all of these countries have a very high Internet penetration rate. I’m wondering, what about least developed countries and countries without significant Internet access? What kind of approach is needed there? Can mobile phones fill the gap? Or is something more than that needed?
RK: I think there are two points here: one, we’re trying to develop now the beginnings of what will be a toolkit for a post 2015 world, so this is a forward-looking R & D initiative. We recognize that the digital divide is far from closed, but it’s going to take time to do this research and learn how to use these different kinds of data. We want to be ready, as Internet access, as smart phones, as social media penetrate into populations where it is unimaginable that it will be available today.
But that said, yes, it’s astonishing what you can do with mobile phones. The work that already is happening in the private sector in the mobile industry is a testament to how powerful that analysis is. Even if you are looking at purely anonymized data on the use of mobile phones, carriers could predict your age to within in some cases plus or minus one year with over 70 percent accuracy. They can predict your gender with between 70 and 80 percent accuracy. One carrier in Indonesia told us they can tell what you’re religion is by how you use your phone. You can see the population moving around.
Now think about this, this is astonishing: the ability to see in real time where beneficiaries are can allow us to understand exactly where the population is that we need to reach, and if you combine that with information on the size of air-time purchases, you can tell how much money these people have. You start to be able to extract basic demographic information, population movement, and behavior data from this information while fully protecting privacy in the process.
What we’re focused on now is working with mobile carriers around the world, including in Indonesia, to get access to archives of anonymized call records and purchase records, because what we do is essentially correlate that data with official statistics. You look at the movement patterns, the mobile service consumption patterns, the social-network patterns that you can derive from how people interact and compare that to food prices, fuel prices, unemployment rates, disease outbreaks, earthquakes, and look at how a population was affected. Or, you compare it to when a program was initiated in the field or when a policy initiative got off the ground: did it actually work? The potential for monitoring and evaluation here as well is quite remarkable.
MO: A lot of the UN system is also dedicated to conflict prevention, as we know. I wonder if you can tell us a little bit about the role that you see for big data in conflict prevention. Could real-time data be used to examine, for example, behavioral changes in a group or population that could turn into violence? And what role do you see for the Global Pulse in assisting UN agencies involved in conflict prevention?
RK: Conflict prevention isn’t an area we’ve looked at to date. I think there are two possible tracks one can pursue here: one is simply understanding the precursors–if we go back and look at historical situations where conflicts emerge within a population, being able to use the archive of digital data and the actual historical record that was collected on the ground through practitioners and doing again these correlation studies could be very interesting for being able to understand what the precursors were, what the early warning signs were in the digital world that no one was listening for. So I do think it would be potentially a fruitful direction.
At the same time, there is already some research going on in the US and other countries looking at discrimination and looking at hate pages, for example, and ways you can quantify and geographically place those kinds of conversations through social media. So I think the potential is there, it’s not something we’ve explored to date, but the signals are probably out there.
MO: To what extent is all of this reliant on large multinational corporations sharing the data. I’m thinking for example of the mobile phone companies in particular, or using data that a company like Google might have. Is the Global Pulse working with companies like these, and how willing are they to share, and with whom are they willing to share?
RK: So, we’ve recognized from the start that while there’s a wealth of information available publicly, a tremendous amount is being collected behind firewalls and is not publicly shared. We started approaching companies about a year ago around this concept we call data philanthropy, which is essentially a vision of companies that have this very valuable data, sharing it in ways that don’t compromise the privacy of their costumers, don’t compromise their own competitiveness in the market, and yet could be used by development to understand what’s happening while it’s still happening.
The response has been overwhelming and surprising. The companies don’t actually view this as simply a form of corporate-social responsibility–we thought data philanthropy, it’s a new form of giving–but actually what they see is an opportunity to mitigate business risk, because if you’re operating in an emerging market where you’re counting on a population to be able to afford your goods and services over the long haul, and they fall back into poverty due to exogenous shocks, and it was your data all along that could have alerted policy makers earlier in ways that could let them adjust social protection and make other interventions, there goes your consumer base. There is your opportunity, and the possibility of actually buying what you might call business-risk insurance through data is actually pretty appealing.
MO: There are clearly a lot of benefits that could be had by using this kind of data for timelier insights into what communities are experiencing and then in turn for better informed interventions. What are the risks? What do you see as the big challenges that remain, and given these constraints, what do you expect Global Pulse to achieve in the coming years? Do you have a specific goal that you’re working towards?
RK: We do. I think one of the fundamental challenges with big data and global development is the challenge that’s simply intrinsic to all data: validation; verification; how do you use it; how do you handle the volume of data; how do you know where there is causality and where there isn’t? This is an area where a lot of research is needed, so this is where we’re focused really as kind of separating out where there are real opportunities in understanding what data sources contain genuine signals: how do you identify them; what technologies do we need; how do we find ways to make those affordable; how could open-source software help?
So the basic challenge of working with the data is there. At the same time, you have to have the privacy. This is I think, if we look at the big data phenomenon, it’s probably the single biggest human rights issue we’re going to see over the next decade because of the power to re-identify people when they thought they were anonymized, by bringing together these different data sets. Now the private sector looks at big data and they see privacy risks, but at the UN we can and must look at big data and see a human rights issue. We’re very concerned that at the same time that you see these risks to privacy in big data, there’s also this opportunity.
Right now, the conversation around big data is very polarized. You might call it “Germany vs. Mark Zuckerberg.” You have the very conservative prohibition against reuse without explicit permission that has become pervasive in the European Union; it’s a very guarded approach. At the opposite end of the spectrum, you have companies that live on big data, which are saying privacy is dead, profit is king. We’re trying to insert a third pole into this debate, which is to say, big data is a raw public good. But to do that we have to create a kind of R & D sandbox where we can experiment with it and learn how to use it safely.
MO: Thank you very much for your insights.
RK: Thank you.