Saturday, January 31, 2015

Little data: facing the one-legged little wind


Big data has been called a data tsunami. It has been described as data exhaust or found data. Perhaps the key distinction between big data and little data is that in the latter you have the option to make your data represent the population you are targeting your research. Like for example in a sample survey.

A lazy way to get some idea of how little data measure up to big data (with all the hype) is to do a Google search, I suppose.

Google Search: little data vs big data
All time
Past year
Past month

David Vs. Goliath: Why Little Data Will Win Over Big Data

David Vs. Goliath: Why Little Data Will Win Over Big Data

Little Data vs. Big Data: Does Size Matter? | 6Sense

Market Research - Little Data vs. Big Data: Nine Types of ...

Market Research - Little Data vs. Big Data: Nine Types of ...

What's Holding Us Back From Big Data? Daniel Burrus ...

Small data vs big data: the battle that never was ...

Small data vs big data: the battle that never was ...

Big data - Wikipedia, the free encyclopedia

You May Not Need Big Data After All - HBR

Little Data vs. Big Data: Does Size Matter? | 6Sense

The Big Buzz About Big Data | UKFast Blog

Forget big data, small data is the real revolution | News | The ...

Big Data vs. Small Data - Is there a Difference ...

Microsoft vs US.gov, Internet of Stuff, Big Data - The Channel

Little Data vs. Big Data: Does Size Matter? | 6Sense

Why Companies Need to Focus on 'Little Data' - WSJ Blogs

Our Future: Free Will vs. Predictions with Data - Lutz Finger

Is Little Data The Next Big Data? | Jonah Berger | LinkedIn

Little privacy in the age of big data - The Guardian

AllAnalytics - Matthew Brodsky - Big Chief Data, Little Chief ...

Is Little Data The Next Big Data? | Jonah Berger

Forget Big Data. Use Little Data for Incremental Self ...

6sense | LinkedIn


Big Data vs. Small Data - Is there a Difference ...

Big Data vs. CRM: How Can They Help Small Businesses?

Hype vs Reality regarding Big Data? | James McGovern ...

Big data - Wikipedia, the free encyclopedia

Big Data vs Little Data - Sales Initiative

Data Informed | Big Data and Analytics in the Enterprise


These were the first pages of search results for three different time frames and without looking at their contents, I felt the idea that little data could hold its ground would be quite the dominant opinion. That insight could have been quite wrong, based upon just the titles from first pages of information that is a product of big data! So I better go non-committal and say "use both, suitably".

For that matter the title "Small data vs big data: the battle that never was" of the June 2, 2014 post by Pam Baker in FierceBigData site makes me feel like I've found a sympathizer of this view. However, she was thinking about little data as subsets of big data:

Every so often media reports come blasting the message that little data wins over big data. Give it a minute and more media reports will come out saying the opposite. So which is winning in the business arena--big or small data? Neither. This is the battle that never was. There is a time for big data and a time for little data. Further, big data is made of little data and it's ridiculous to pit the piece against the whole and declare one the all-occasion winner. Further still, one almost always drills down to little data after gaining the big data, big picture insight. Why would one step be superior to another in the same process?

To use another metaphor to make the point: when you pit small data against big data you are not comparing apples to oranges but a bushel of apples against a planet of orchards.

And all the search results as well as her post were talking about business applications, while we are interested in the use of big data for development.

On the other hand, it is said that the future of big data is all about predictions. Time and again we learn that the sheer size of data is no substitute for relevant data. Lutz Finger in "Our Future: Free Will vs. Predictions with Data" contrasted one example of big data times against a prediction in ancient past:

But often it is not the amount of data that matters to create a good prediction. For example, the Incas predicted the best time to plant crops. Their dataset might have been as little as 3560 data points (= 10 years) – nothing in our big data world. 500 years later we have companies like Google that measure a lot about our online behavior. But despite all this data, predictions are not necessarily easy. For example, New York Times bestselling business author Carol Roth once complained in her blog that Google infers that she is a male over age 65, when in fact she is a woman decades younger.

Why is this? Because not all of the data Google has aggregated is really helpful for the specific prediction they try to make. 

Back to our theme, traditionally, data for development comes from the research community and official statistics and comprises experimental data, observational data or survey data and administrative records. These are the little data I'm thinking about and I may simply say that little data is the kind of data we have before big data came around and most people may have been getting aware of big data only after 2011 or so.


So, before the "big buzz about big data" there had been the little data and it was long recognized as the basis for evidence-based policymaking and monitoring in all countries, especially for developing countries. In the area of little data, the Paris21 consortium is a partnership of policymakers, analysts, and statisticians from all countries of the world, focusing on promoting high-quality statistics, making these data meaningful, and designing sound policies. It was established in November 1999 in response to the UN Economic and Social Council resolution on the goals of the UN Conference on Development. A significant project of Paris21, currently, is the Informing a Data Revolution (IDR)  funded by a grant from the Bill and Melinda Gates Foundation. Paris21 asked "Are developing countries ready for the data revolution?"

Are we ready for the data revolution? In the old days we would jokingly answer—"it's good; spicy hot, though".  Now, I remember my days as a youngster fascinated by little whirlwinds we call lay-bway. You can't guess where it is going and it is this that makes them so fascinating. If one brushes you with all the leaves, dust and sand floating around it sting your eyes. I remember one of our writers of the old generation, Ze-ya, imaginatively called it one-legged little wind, which we would have expected from a writer like Dagon-taya and not from him.

But what's this data revolution anyway? In their report "A World that Counts: Mobilising the Data Revolution for Sustainable Development" of November 2014
(http://www.undatarevolution.org/wp-content/uploads/2014/11/A-World-That-Counts.pdf), the Independent Expert Advisory Group gives the rationale:

As the world embarks on an ambitious project to meet new Sustainable Development Goals  (SDGs), there is an urgent need to mobilise the data revolution for all people and the whole planet in order to monitor progress, hold governments accountable and foster sustainable development. More diverse, integrated, timely and trustworthy information can lead to better decision-making and real-time citizen feedback. (Executive summary, p. 3)

And defines data revolution this way.

The data revolution is:
         An explosion in the volume of data, the speed with which data are produced, the number of producers of data, the dissemination of data, and the range of things on which there is data, coming from new technologies such as mobile phones and the “internet of things”, and from other sources, such as qualitative data, citizen-generated data and perceptions data;
         A growing demand for data from all parts of society.

After all it reads like what you see in any writing about big data these days. May be I could summarize it for the dummies: (i) Let there be big data, and (ii) Witness the surge in demand for data.

Then they link data revolution with sustainable development goals. There were three bullets, but seems to me that the first is the one that is essential.

The data revolution for sustainable development is:
         The integration of these new data with traditional data to produce high-quality information that is more detailed, timely and relevant for many purposes and users, especially to foster and monitor sustainable development;

So now, (iii) Let's arrange a marriage of the little data with the most eligible big data. 
I am glad that that is what I arrived at vaguely (or more plainly, through guesswork) and I am not sure if that is not a marriage of convenience. But how you actually get the little data married to the big data (I guess they may just have been working on match-making), and specifically for the stewardship of sustainable development?

The executive summary gives how data revolution for sustainable development could be used: (i) directly through enabling to "monitor progress", and (ii) complementarily through "... hold(ing) governments accountable ... (and getting) real-time citizen feedback." Here the second part could be seen also as a revolution for equality between the data rich and the data poor:

... the data revolution can be a revolution for equality. More, and more open, data can help ensure that knowledge is shared, creating a world of informed and empowered citizens, capable of holding decision-makers accountable for their actions. (p. 8)

But where's this eye-stinging part? Seems like nations with a lot of catch up to do could find coping with data revolution a bit spicy-hot. Particularly, those governments with creaking national data infrastructures will have to face quite formidable tasks like these:

National statistical offices, the traditional guardians of public data for the public good, will remain central to the whole of government efforts to harness the data revolution for sustainable development. To fill this role, however, they will need to change, and more quickly than in the past, and continue to adapt, abandoning expensive and cumbersome production processes, incorporating new data sources, including administrative data from other government departments, and focusing on providing data that is human and machine-readable, compatible with geospatial information systems and available quickly enough to ensure that the data cycle matches the decision cycle.

Anyway when you open your windows and this sudden gust of lay-bway hits your face and sting your eyes, you need not panic. Think of that as ventilation a bit stronger than usual.

Things need to be done have to be done somehow and as usual the UN post-2015 development agenda does not come without a package to assist—partnership to catalyze global solidarity for sustainable development in this case. Also, you could look for technical assistance from projects like Informing a Data Revolution (IDR) and others.

We are glad to know that Myanmar already has good relations with Paris-21. It is one from eleven countries of Southeast Asia, South Asia, and North Asia which has successfully completed the first National Strategy for the Development of Statistics (NSDS) Training Course in the Asian Region in December 2014 organized by PARIS21 in collaboration with the Statistical Institute for Asia and the Pacific (SIAP).

Paris-21 informed on their website of the opportunity for the voice of developing countries to be heard in the debate on data revolution which we should at least be aware of:

In the months leading up to September 2015 there will be a comprehensive process to involve as many people as possible in discussions about the data revolution, what it should do, who should be involved and how it should be put into action. It is essential that the voice of developing countries is heard in this debate and that the discussion is not hijacked by special interests or those with the deepest pockets.

No comments:

Post a Comment