I read about big data for the first time in 2013 and that
was from the Web. When I talked about it, in our small community of moderately
curious nuts, I was the only one who has heard of it. Here, it is easy to blame
the slow internet connection for missing many of the things happening in the
world.
So what is big data and is it important? How important?
It was exactly the same questions I asked myself and the answers to my
questions were revealed to me bit by bit as I looked more and more for answers.
My first encounter with big data was "A business report on BIG DATA GETS PERSONAL" from MIT Technology
Review of 2013 available for download.
I quickly read through it and was left with a fear if this
big technology will rob me of the little privacy I have. This fear seems justified
as I searched and read on. I found that in developed countries big data is used
in private business and every big business is joining the band-wagon; in
contrast, big data is not yet known by businesses in the developing world,
though governments noticed it, took interest in it, and have probably been
exploring its potential as a tool for intelligence and control.
After second thought I realized they, whoever they are,
would be after much bigger fish than me and I would have the natural protection
of safety in numbers. At this stage of my understanding data-wise, big data
means each of us small fry would be assigned with some numbers, lot and lots of
us would then be scooped up and shoveled into some kind of analytic machine.
The logic, mostly implicit, of this machine being big data = all data. Then, out we come collectively as neat
patterns of predictions that would shape our future consumption behavior or
individually as perfectly shaped pawns.
From what has been expounded in this collection of articles,
I wasn't much impressed with what big data could do for the individual, because
those seem too far removed from us. But I felt it is too real to ignore what
has been said in The Dictatorship of Data.
Big data is poised to transform society,
from how we diagnose illness to how we educate children, even making it
possible for a car to drive itself. Information is emerging as a new economic
input, a vital resource. Companies, governments, and even individuals will be
measuring and optimizing everything possible.
But there is a dark side. Big data erodes
privacy. And when it is used to make predictions about what we are likely to do
but haven’t yet done, it threatens freedom as well. Big data also exacerbates a
very old problem: relying on the numbers when they are far more fallible than
we think. Nothing underscores the consequences of data analysis gone awry more
than the story of Robert McNamara.
McNamara was a numbers guy. Appointed the U.S.
secretary of defense when tensions in Vietnam rose in the early 1960s, he
insisted on getting data on everything he could. Only by applying statistical
rigor, he believed, could decision makers understand a complex situation and
make the right choices. ...
Among the numbers that came back to him was
the “body count.” ... A mere 2 percent of America’s generals considered the
body count a valid way to measure progress. “A fake—totally worthless,” wrote
one general in his comments. “Often blatant lies,” wrote another. “They were
grossly exaggerated by many units primarily because of the incredible interest
shown by people like McNamara,” said a third.
The use, abuse, and misuse of data by the
U.S. military during the Vietnam war is a troubling lesson about the
limitations of information as the world hurls toward the big-data era. The
underlying data can be of poor quality. It can be biased. It can be misanalyzed
or used misleadingly. And even more damningly, data can fail to capture what it
purports to quantify.
Even today, the best of the top executives may not be able
to evade the dictatorship of the data, for example, one top executive in Google
tried:
... To determine the best color for a
toolbar on the website ... once ordered staff to test 41 gradations of blue to
see which ones people used more. In 2009, Google’s top designer, Douglas
Bowman, quit in a huff because he couldn’t stand the constant quantification of
everything. “I had a recent debate over whether a border should be 3, 4 or 5
pixels wide, and was asked to prove my case. ... "
Such as these could have been dismissed easily as whims of
the rich and powerful toying with ideas, but when those with authority and
power become obsessed with the power and promise of big data it is another
matter altogether.
Big data will be a foundation for improving
the drugs we take, the way we learn, and the actions of individuals. However,
the risk is that its extraordinary powers may lure us to commit the sin of
McNamara: to become so fixated on the data, and so obsessed with the power and
promise it offers, that we fail to appreciate its inherent ability to mislead.
Is it all that big data has to offer to humanity? All this
seems to be confined to the other half of the digital divide (and logically the
big data divide), this other half with perfect connectivity, smart gadgets,
smart home, smart people living in an opulent world, a repository of world's knowledge. Though not necessarily of its
wisdom, I would timidly add. Gandhi once said: earth provides enough to satisfy every man’s need but not
for every man’s greed. We find greed on both sides of the digital
divide and yet it could be more severe on our side because of the lack of
mechanism to check it or because the existing mechanism malfunctions. Then, unsatisfied with the realm of big data
as I have discovered, I tried looking for its potential in development. Then I
found out the big data challenge by Orange.
Orange the mobile telecommunication provider in Africa
offered to make its Call Detail Record
data to the participants in its challenge to find the best way to use this data
for development.
The Orange \Data for Development" (D4D)
challenge is an open data challenge on anonymous call patterns of Orange's
mobile phone users in Ivory Coast. The goal of the challenge is to help address
society development questions in novel ways by contributing to the
socio-economic development and well-being of the Ivory Coast population.
Participants to the challenge are given access to four mobile phone datasets
... The datasets are based on anonymized Call Detail Records (CDR) of phone
calls and SMS exchanges between five million of Orange's customers in Ivory
Coast between December 1, 2011 and April 28, 2012. The datasets are: (a)
antenna-to-antenna traffic on an hourly basis, (b) individual trajectories for
50,000 customers for two week time windows with antenna location information,
(3) individual trajectories for 500,000 customers over the entire observation
period with sub-prefecture location information, and (4) a sample of
communication graphs for 5,000 customers.
The organizers expected 40 or 50 project applications and
got 260 instead. D4D winners were announced in first week of May 2013. Among
the four winners, one addressing mobility
and transport, and the other addressing disease
containment and information campaigns seem most relevant to Myanmar
situation and extracts for them have been provided by way of introduction. For
more information, you may want to follow the links provided.
Best Visualization prize winner: “Exploration
and Analysis of Massive Mobile Phone Data: A Layered Visual Analytics Approach”
Best Development prize winner: “AllAboard:
a System for Exploring Urban Mobility and Optimizing Public Transport Using
Cellphone Data”
With large scale data
on mobility patterns, operators can move away from the costly and resource
intensive four-step transportation planning processes prevalent in the West, to
a more data-centric view, that places the instrumented user at the center of
development. In this framework, using mobile phone data to perform transit
analysis and optimization represents a new frontier with significant societal
impact, especially in developing countries.
AllAboard is a system
to optimize the planning of a public transit network using mobile phone data
with the goal to improve ridership and user satisfaction.
Mobile phone location data is used to infer origin-destination flows in the city, which are then converted to ridership on the existing transit network.
Sequential travel patterns from individual call location data is used to propose new candidate transit routes. An optimization model evaluates how to improve the existing transit network to increase ridership and user satisfaction, both in terms of travel and wait time.
Mobile phone location data is used to infer origin-destination flows in the city, which are then converted to ridership on the existing transit network.
Sequential travel patterns from individual call location data is used to propose new candidate transit routes. An optimization model evaluates how to improve the existing transit network to increase ridership and user satisfaction, both in terms of travel and wait time.
Best Scientific prize winner: “Analyzing
Social Divisions Using Cell Phone Data”
First prize winner: “Exploiting
Cellular Data for Disease Containment and Information Campaigns Strategies in
Country-Wide Epidemics”
... human mobility is
one of the key factors at the basis of the spreading of diseases in a
population. Containment strategies are usually devised on movement scenarios
based on coarse-grained assumptions. Mobility phone data provide a unique
opportunity for building models and defining strategies based on precise
information about the movement of people in a region or in a country. Another
important aspect is the underlying social structure of a population, which
might play a fundamental role in devising information campaigns to promote
vaccination and preventive measures, especially in countries with a strong
family (or tribal) structure. Among the issues that developing countries are
facing today, healthcare is probably the most urgent. The effectiveness of
health campaigns is often reduced due to low availability of data, inherent
limits in the infrastructure and difficult communication with citizens.
... We present a model that describes
how diseases spread across the country by exploiting mobility patterns of
people extracted from the available data. Then, we simulate several epidemics
scenarios and we evaluate mechanisms to contain the spreading of diseases,
based on the information about people mobility and social ties.
If you go on to look for related information you will find a
lot of information on the prospect of using mobile phone data for development.
One that will be interesting to the general public, administrators, or
researchers, and "must read" for MPT, Ooredoo, and Telenor our mobile
communication providers if they haven't done so is the Mobile Data for Development Primer by Global Pulse available at: http://www.unglobalpulse.org/sites/default/files/Mobile%20Data%20for%20Development%20Primer_Oct2013.pdf
No comments:
Post a Comment