Data from social media and Ushahidi-style crowdsourcing platforms have emerged as possible ways to leverage cellphones to prevent conflict. But in the world of Big Data, the amount of information generated from these is too small to use in advanced data-mining techniques and “machine-learning” techniques (where algorithms adjust themselves based on the data they receive).
But there is another way cellphones could be leveraged in conflict settings: through the various types of data passively generated every time a device is used. “Phones can know,” said Professor Alex “Sandy” Pentland, head of the Human Dynamics Laboratory and a prominent computational social scientist at MIT, in a Wall Street Journal article. He says data trails left behind by cellphone and credit card users—“digital breadcrumbs”—reflect actual behavior and can tell objective life stories, as opposed to what is found in social media data, where intents or feelings are obscured because they are “edited according to the standards of the day.”
The findings and implications of this, documented in several studies and press articles, are nothing short of mind-blowing. Take a few examples. It has been shown that it was possible to infer whether two people were talking about politics using cellphone data, with no knowledge of the actual content of their conversation. Changes in movement and communication patterns revealed in cellphone data were also found to be good predictors of getting the flu days before it was actually diagnosed, according to MIT research featured in the Wall Street Journal. Cellphone data were also used to reproduce census data, study human dynamics in slums, and for community-wide financial coping strategies in the aftermath of an earthquake or crisis.
This is because cellphone companies record the time, duration, approximate locations, and numbers of the initiator and recipient of each call. The same is true for text messages (substituting word count for duration). The amount and timing of all prepaid SIM-card top offs, as well as credit use, are similarly trackable. All features of mobile-banking transactions, such as payments to a supplier or remittance receipts, can also be recorded, as can those of online activities using a smartphone.
And as prices dropped and coverage increased across the developing world, cellphones became ubiquitous in post-conflict, conflict-affected and conflict-prone countries, providing data in places where other types of data-collection methods are almost impossible. As it turns out, cellphone data may prove to be as consequential in the second decade of the 21st century as cellphone devices have been in the first, especially in conflict settings.
It is relatively easy to see how these advances could help future conflict prevention efforts, both structural and operational. In simple terms, “structural prevention” is concerned with understanding and addressing what are believed to be contributing factors of conflict, such as poverty, inequality, etc. Here, cellphone data could help by enhancing real-time awareness, i.e., paint a fine-grained, real-time picture of human ecosystems by unveiling digital patterns or signatures in the data.
For instance, observing a community topping off SIM-cards $1 a week (when commodity prices are low) then switch to a pattern of topping off 10 cents every other day a few weeks after a rise in these prices may carry important information with respect to the local poverty impact of inflation and help design better and more targeted policies.
The bulk of “operational prevention,” in contrast, focuses on early warning and early response, and is concerned with detecting and acting upon early signs of mounting tensions or initial violence to prevent it from spreading or escalating. Seeing a major drop in calls coming out of a given area may be indicative of mass movements, purposeful destructions of cell towers, or any other unexplained factor and may act as a digital smoke signals warranting further investigation, similar to what has been done for decades in the field of syndromic surveillance.
One may even argue that cellphone data may not just help, but fundamentally change conflict prevention. Detecting a digital smoke signal is dependent on having previously identified digital signatures, since the former is only defined in relation to the latter. In turn, a digital smoke signal could help improve the characterization of digital signatures in that particular context by providing an example of how the ecosystem behaves faced with an extreme event. As the SIM-card top off example suggests, fast cellphone data streams may contain early warnings of impending or growing harm, which traditionally falls under structural prevention, not operational prevention. The difference is that the standard dichotomy and underpinning models of structural vs. operational prevention were based on low velocity and relatively small volumes of historical data.
There has of course been a move away from relatively static macro-predictive models of early warning and response with the 3rd and 4th generation early warning systems, but not to the point suggested and permitted here. With massive, continuous streams of digital data, time shrinks, and categories may consequently collide. Sequential processes may turn into loops characterized by incremental, iterative processes that are central to agile policymaking.
What matters then is monitoring and responding to volatility—not so much “preventing conflict,” either structurally or operationally, which is something we know we have failed to achieve when conflict occurs but cannot claim to have succeeded in when it does not in the absence of a counterfactual. In a ‘volatility-centered’ approach, a pattern is identified, which is adjusted at any given point in time to reflect new data coming in, and suddenly a large-enough spike appears that warrants further investigation that may or may not result in what is then believed to be the appropriate corrective action being taken, which in turn helps refine the model.
This may appear or be out of reach or out of touch, but it does point to important considerations that are just as valid within the more traditional boundaries of current conflict-prevention efforts. One is that identifying digital signatures—and thus smoke signals—in the data requires having a sound understanding of local dynamics, including access to more traditional data sources and local insights. The truth is not in the amount of data, but in the insights to be revealed through the contextual and combined analysis of small and large data streams. How and why individuals use their cellphone to start with, and what a change in these patterns may mean, is fundamentally context-specific. The aforementioned failure of earlier generations of early warning systems also serves as a reminder of the dearth, paucity, unreliability, and infrequency of traditional data in most poor, unstable countries. Certainly, cellphone data streams are not perfect data. Depending on technological penetration, they will always be biased to some degree. But imperfect information is usually better than no information.
Fundamentally, leveraging these kinds of data streams requires recognizing the difficulties, risks, and associated requirements. The difficulties are numerous and tend to be well known—getting access to the data from telecom companies, treating it, making sense of it, etc. The risks are great and many. Re-identification is perhaps the most salient of all, as what is considered a privacy issue can soon become a security concern. There are dark sides to every technology, but Big Data in general—and cellphone data analytics in particular—may be extremely risky. But the fact is that a government that is willing to re-identify citizens will most likely be able to do so whether or not the international community and other local actors build Big Data capacities of their own. In contrast, embracing the Big Data potential may help devise legal and technical frameworks that are more likely to allow privacy- and security-preserving analysis.
Levering cellphone data for conflict prevention purposes is only one example of what Big Data for development may achieve. None of it is easy. The potential not only appears tremendous in theory, but it is regularly affirmed by a fast-growing flow of evidence coming from academia and the private sector. Based on my experience at Global Pulse in 2011, the single greatest impediment to building the right intent (i.e., setting the right expectations, objectives, principles and capacities [i.e., setting up the necessary legal, institutional, and technical systems]) to fulfill this potential has been the lack of will or inability of many donors to embrace and support Big Data for development. Another has been the reluctance of private corporations to share their data. “Data philanthropy” is a promising way to overcome some of these obstacles.
Five years ago, crowdsourcing had just been created; Facebook was only a few years old; and Twitter did not exist. Today, they are integral part of our universe and work. Big Data for development could be the same, and probably more. It is high time that governments, policymakers, private corporations, academia and civil society work together to develop and mainstream Big Data for development.
Emmanuel Letouzé is a PhD candidate at UC Berkeley and a former senior development economist at UN Global Pulse. He is the author of Big Data for Development:Opportunities and Challenges, May 2012, and an illustrator who did the above illustration. He can be reached at [email protected].