Wednesday, December 3, 2014

Big data, MPT, Ooredoo, and Telenor



I read about big data for the first time in 2013 and that was from the Web. When I talked about it, in our small community of moderately curious nuts, I was the only one who has heard of it. Here, it is easy to blame the slow internet connection for missing many of the things happening in the world.

So what is big data and is it important? How important?

It was exactly the same questions I asked myself and the answers to my questions were revealed to me bit by bit as I looked more and more for answers. My first encounter with big data was "A business report on BIG DATA GETS PERSONAL" from MIT Technology Review of 2013 available for download.
 





I quickly read through it and was left with a fear if this big technology will rob me of the little privacy I have. This fear seems justified as I searched and read on. I found that in developed countries big data is used in private business and every big business is joining the band-wagon; in contrast, big data is not yet known by businesses in the developing world, though governments noticed it, took interest in it, and have probably been exploring its potential as a tool for intelligence and control.

After second thought I realized they, whoever they are, would be after much bigger fish than me and I would have the natural protection of safety in numbers. At this stage of my understanding data-wise, big data means each of us small fry would be assigned with some numbers, lot and lots of us would then be scooped up and shoveled into some kind of analytic machine. The logic, mostly implicit, of this machine being big data = all data. Then, out we come collectively as neat patterns of predictions that would shape our future consumption behavior or individually as perfectly shaped pawns.

From what has been expounded in this collection of articles, I wasn't much impressed with what big data could do for the individual, because those seem too far removed from us. But I felt it is too real to ignore what has been said in The Dictatorship of Data.

Big data is poised to transform society, from how we diagnose illness to how we educate children, even making it possible for a car to drive itself. Information is emerging as a new economic input, a vital resource. Companies, governments, and even individuals will be measuring and optimizing everything possible.

But there is a dark side. Big data erodes privacy. And when it is used to make predictions about what we are likely to do but haven’t yet done, it threatens freedom as well. Big data also exacerbates a very old problem: relying on the numbers when they are far more fallible than we think. Nothing underscores the consequences of data analysis gone awry more than the story of Robert McNamara.
McNamara was a numbers guy. Appointed the U.S. secretary of defense when tensions in Vietnam rose in the early 1960s, he insisted on getting data on everything he could. Only by applying statistical rigor, he believed, could decision makers understand a complex situation and make the right choices. ...

Among the numbers that came back to him was the “body count.” ... A mere 2 percent of America’s generals considered the body count a valid way to measure progress. “A fake—totally worthless,” wrote one general in his comments. “Often blatant lies,” wrote another. “They were grossly exaggerated by many units primarily because of the incredible interest shown by people like McNamara,” said a third.
The use, abuse, and misuse of data by the U.S. military during the Vietnam war is a troubling lesson about the limitations of information as the world hurls toward the big-data era. The underlying data can be of poor quality. It can be biased. It can be misanalyzed or used misleadingly. And even more damningly, data can fail to capture what it purports to quantify.

Even today, the best of the top executives may not be able to evade the dictatorship of the data, for example, one top executive in Google tried:

... To determine the best color for a toolbar on the website ... once ordered staff to test 41 gradations of blue to see which ones people used more. In 2009, Google’s top designer, Douglas Bowman, quit in a huff because he couldn’t stand the constant quantification of everything. “I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. ... "

Such as these could have been dismissed easily as whims of the rich and powerful toying with ideas, but when those with authority and power become obsessed with the power and promise of big data it is another matter altogether.

Big data will be a foundation for improving the drugs we take, the way we learn, and the actions of individuals. However, the risk is that its extraordinary powers may lure us to commit the sin of McNamara: to become so fixated on the data, and so obsessed with the power and promise it offers, that we fail to appreciate its inherent ability to mislead.

Is it all that big data has to offer to humanity? All this seems to be confined to the other half of the digital divide (and logically the big data divide), this other half with perfect connectivity, smart gadgets, smart home, smart people living in an opulent world, a repository of world's knowledge. Though not necessarily of its wisdom, I would timidly add.  Gandhi once said: earth provides enough to satisfy every man’s need but not for every man’s greed. We find greed on both sides of the digital divide and yet it could be more severe on our side because of the lack of mechanism to check it or because the existing mechanism malfunctions.  Then, unsatisfied with the realm of big data as I have discovered, I tried looking for its potential in development. Then I found out the big data challenge by Orange.

Orange the mobile telecommunication provider in Africa offered to make its Call Detail Record data to the participants in its challenge to find the best way to use this data for development.

The Orange \Data for Development" (D4D) challenge is an open data challenge on anonymous call patterns of Orange's mobile phone users in Ivory Coast. The goal of the challenge is to help address society development questions in novel ways by contributing to the socio-economic development and well-being of the Ivory Coast population. Participants to the challenge are given access to four mobile phone datasets ... The datasets are based on anonymized Call Detail Records (CDR) of phone calls and SMS exchanges between five million of Orange's customers in Ivory Coast between December 1, 2011 and April 28, 2012. The datasets are: (a) antenna-to-antenna traffic on an hourly basis, (b) individual trajectories for 50,000 customers for two week time windows with antenna location information, (3) individual trajectories for 500,000 customers over the entire observation period with sub-prefecture location information, and (4) a sample of communication graphs for 5,000 customers.




The organizers expected 40 or 50 project applications and got 260 instead. D4D winners were announced in first week of May 2013. Among the four winners, one addressing mobility and transport, and the other addressing disease containment and information campaigns seem most relevant to Myanmar situation and extracts for them have been provided by way of introduction. For more information, you may want to follow the links provided.

Best Visualization prize winner: “Exploration and Analysis of Massive Mobile Phone Data: A Layered Visual Analytics Approach”


Best Development prize winner: “AllAboard: a System for Exploring Urban Mobility and Optimizing Public Transport Using Cellphone Data”

With large scale data on mobility patterns, operators can move away from the costly and resource intensive four-step transportation planning processes prevalent in the West, to a more data-centric view, that places the instrumented user at the center of development. In this framework, using mobile phone data to perform transit analysis and optimization represents a new frontier with significant societal impact, especially in developing countries.
AllAboard is a system to optimize the planning of a public transit network using mobile phone data with the goal to improve ridership and user satisfaction.
Mobile phone location data is used to infer origin-destination flows in the city, which are then converted to ridership on the existing transit network.
Sequential travel patterns from individual call location data is used to propose new candidate transit routes. An optimization model evaluates how to improve the existing transit network to increase ridership and user satisfaction, both in terms of travel and wait time.

Best Scientific prize winner: “Analyzing Social Divisions Using Cell Phone Data”


First prize winner: “Exploiting Cellular Data for Disease Containment and Information Campaigns Strategies in Country-Wide Epidemics”

... human mobility is one of the key factors at the basis of the spreading of diseases in a population. Containment strategies are usually devised on movement scenarios based on coarse-grained assumptions. Mobility phone data provide a unique opportunity for building models and defining strategies based on precise information about the movement of people in a region or in a country. Another important aspect is the underlying social structure of a population, which might play a fundamental role in devising information campaigns to promote vaccination and preventive measures, especially in countries with a strong family (or tribal) structure. Among the issues that developing countries are facing today, healthcare is probably the most urgent. The effectiveness of health campaigns is often reduced due to low availability of data, inherent limits in the infrastructure and difficult communication with citizens.
... We present a model that describes how diseases spread across the country by exploiting mobility patterns of people extracted from the available data. Then, we simulate several epidemics scenarios and we evaluate mechanisms to contain the spreading of diseases, based on the information about people mobility and social ties.
If you go on to look for related information you will find a lot of information on the prospect of using mobile phone data for development. One that will be interesting to the general public, administrators, or researchers, and "must read" for MPT, Ooredoo, and Telenor our mobile communication providers if they haven't done so is the Mobile Data for Development Primer by Global Pulse available at: http://www.unglobalpulse.org/sites/default/files/Mobile%20Data%20for%20Development%20Primer_Oct2013.pdf


No comments:

Post a Comment