Thursday, July 13, 2017

Playing with microdata


A little over a month ago I happened to have read the news about Myanmar Demographic and Health Survey and by the first week of June I had been able to download the microdata. It wasn't without frustration and struggle, though. Perhaps an exaggeration of my capabilities and my plan for analysis helped. Then, trying to fulfill the promise I made myself on producing something not so dumb out of microdata downloads, I chose the easy way out to compare population pyramids from the DHS and the most recent population results. Such was the theme of my last post.

How about trying my hand at a more refined graphing of complex data?

Looking around I found The World Bank Group Country Survey FY 2014 and promptly downloaded the microdata from the World Bank Microdata portal here. Unlike the microdata for the DHS, you don't need to make a formal request for it. All you need to do is to accept the “Terms and conditions” for the microdata by clicking the “Accept” button on that page, choose the data file format, and start downloading.

The Myanmar Country Opinion Survey is part of the County Opinion Survey Program series of the World Bank Group. It was designed to achieve the following objectives (Myanmar: The World Bank Group Country Survey FY 2014, Report of Findings, November 2014):

Assist the World Bank Group in gaining a better understanding of how stakeholders in Myanmar perceive the Bank Group;
Obtain systematic feedback from stakeholders in Myanmar regarding:
Their views regarding the general environment in Myanmar;
Their overall attitudes toward the World Bank Group in Myanmar;
Overall impressions of the World Bank Group’s operations, knowledge work and activities, and communication and information sharing in Myanmar;
Perceptions of the World Bank Group’s future role in Myanmar.
Use data to help inform Myanmar country team’s strategy.”

Its methodology and scope of the survey were described as:

Between June and August 2014, 662 stakeholders of the World Bank Group in Myanmar were invited to provide their opinions on the WBG’s work in the country by participating in a country opinion survey. Participants were drawn from the office of the President, Prime Minster; office of a minister; office of a parliamentarian; ministries/ministerial departments; consultants/contractors working on WBG-supported projects/programs; PMUs overseeing implementation of a project; local government officials; bilateral and multilateral agencies; private sector organizations; private foundations; the financial sector/private banks; NGOs; community based organizations; the media; independent government institutions; trade unions; faith-based groups; academia/research institutes/think tanks; judiciary branch; and other organizations. A total of 173 stakeholders participated in the survey (26% response rate).

Respondents received and returned questionnaires through the courier service. Respondents were asked about: general issues facing Myanmar; their overall attitudes toward the WBG; the WBG’s importance and results; the WBG’s knowledge work and activities; the WBG’s future role in Myanmar; and the WBG’s communication and information sharing.”
Quickly going over the contents I was most interested in knowing what the stakeholders considered

'the top three most important development priorities, which areas the government should focus on, which areas would contribute most to reducing poverty and generating economic growth in Myanmar, and how “shared prosperity” would be best achieved'.

Chapter-IV and General Issues sections of Appendix-A and Appendix-B reported findings directly relevant to my interest. I was interested in understanding how the stakeholders view the development priorities and the similarities and differences within the individuals and groups. For that matter, I couldn't imagine an effective way to summarize the three responses given by each of the respondent in the form of tables of data so that they would bring out patterns across the three responses and across individuals/groups. In our case the respondents were to pick three development priorities out of a list of 35 and I guess the most appropriate way to see the patterns in the data is to draw a parallel coordinate plot. As human beings we can't see more than three dimensions, but using parallel coordinates we can see multiple dimensions by representing all the dimensions in just two dimensions through parallel coordinates!

The following is the famous Fisher's Iris data plotted this way. You can find it under the entry “Parallel Coordinates” in Wikipedia.

In parallel coordinates, the idea basically is to have as many y-axes as the number of dimensions and connect the points on these axes for each individual, case or observation.

In the following plot I used the downloaded World Bank Country Survey data (Reference ID: MMR_2014_WBCS_v01_M; Country: Myanmar; Producer:Public Opinion Research Group - The World Bank Group) to create the parallel coordinates plot for the three Development Priorities picked by each stakeholder. Each line in the plot shows the response of each stakeholder and is identified with a particular color to show the stakeholder group which he/she belongs. I used R to process the data and the graphs were plotted with the ggplot2 package.

The first plot is for all the nine stakeholder groups. One thing clearly seen from this plot is that one respondent has gone over the three-response limit to give the fourth one. While the rest of the respondents gave three responses, one gave only one response and another respondent give only two responses.


As we can see, it is not easy to distinguish different stakeholder groups in the above plot because there were too many of them and the colors become hard to distinguish. To overcome this we could try making a particular group stands out by plotting it against the rest of the groups. The following is the plot of Media group vs. the rest made that way.


So far it looks as if the message from the plot is clear. You can see each respondent's pattern of response as well as the collective pattern for the group. From such plots it is relatively easy to judge if one group is more homogeneous or not in terms of responses or get an idea of the pattern of response for most popular development priorities, and so on. But I've the uneasy feeling that opinion surveys may suffer from the problem of people unwilling to speak out their minds. That was found to be true, for example, even in the case of exit polls which, I felt, typically have the most insensitive questions. Another problem is the low response rates.

The World Bank invited 662 stakeholders to participate in their opinion survey. It was a pity that only 173 (23%) responded. Yet 3 stakeholders didn't have any answer for the Development Priority question (item a2_1 to a2_35) and out of the remaining 170 stakeholders, 22 didn't answer the question on which stakeholder group they belong. So my plots just covered 148 stakeholders (22%) out of 662.

At the beginning I felt rather uneasy about the low percentage of response (22 or 23%) of this survey. That means users may have to look beyond the survey data to make up their minds if the data reflect the opinions of the respective stakeholder groups. Anyway I am pretty sure that parallel coordinate plots are highly suitable for analyzing high dimensional data. Also, my primary intention was not the interpretation and making sense out of the analysis results. It was just a modest ambition of sharing my do-it-yourself experience. As this sharing would have my “easier done than said” twist, I went on happily doing my parallel coordinate plots.


All the plots for the World Bank Opinion Survey shown in this post were based on the microdata mentioned earlier. The microdata included the codes for nine stakeholder groups used in the analyses for the survey report. The number of respondents who responded to the question on which stakeholder group they belong as well as the question on development priorities were:
The plots would have been more readable if the Development Priority items were in full text instead of their codes. However that would take too much space and would leave the actual plot area too small. So we will have to refer to the list below:
I find ggplot2 not easy to learn but it works great. I don't know much about R graphics or particularly ggplot2 for that matter. But I guess ggplot2 produces the prettiest complex graphics from your data.

I have zero experience of ggplot2 before I started working on these plots. I worked through trial and error with the help of various tutorials and question-answers from Cross Validated and Stack Overflow, among others. My errors were much more than my trials, as one cartoon character said. But I've finally made it.



No comments:

Post a Comment