A little over a month ago I happened to
have read the news about Myanmar Demographic and Health Survey
and by the first week of June I had been able to download the
microdata. It wasn't without frustration and struggle, though.
Perhaps an exaggeration of my capabilities and my plan for analysis
helped. Then, trying to fulfill the promise I made myself on
producing something not so dumb out of microdata downloads, I chose
the easy way out to compare population pyramids from the DHS and the
most recent population results. Such was the theme of my last post.
How about trying my hand at a more
refined graphing of complex data?
Looking around I found The World
Bank Group Country Survey FY 2014 and promptly downloaded the
microdata from the World Bank Microdata portal here.
Unlike the microdata for the DHS, you don't need to make a formal
request for it. All you need to do is to accept the “Terms and
conditions” for the microdata by clicking the “Accept” button
on that page, choose the data file format, and start downloading.
The
Myanmar Country Opinion Survey is part of the County Opinion Survey
Program series of the World Bank Group. It was designed to achieve
the following objectives (Myanmar:
The World Bank Group Country Survey FY 2014, Report of Findings,
November 2014):
“
Assist
the World Bank Group in gaining a better understanding of how
stakeholders in Myanmar perceive the Bank Group;
Obtain
systematic feedback from stakeholders in Myanmar regarding:
Their
views regarding the general environment in Myanmar;
Their
overall attitudes toward the World Bank Group in Myanmar;
Overall
impressions of the World Bank Group’s operations, knowledge work
and activities, and communication and information sharing in Myanmar;
Perceptions
of the World Bank Group’s future role in Myanmar.
Use
data to help inform Myanmar country team’s strategy.”
Its methodology and scope of the survey
were described as:
“Between
June and August 2014, 662 stakeholders of the World Bank Group in
Myanmar were invited to provide their opinions on the WBG’s work in
the country by participating in a country opinion survey.
Participants were drawn from the office of the President, Prime
Minster; office of a minister; office of a parliamentarian;
ministries/ministerial departments; consultants/contractors working
on WBG-supported projects/programs; PMUs overseeing implementation of
a project; local government officials; bilateral and multilateral
agencies; private sector organizations; private foundations; the
financial sector/private banks; NGOs; community based organizations;
the media; independent government institutions; trade unions;
faith-based groups; academia/research institutes/think tanks;
judiciary branch; and other organizations. A total of 173
stakeholders participated in the survey (26% response rate).
Respondents
received and returned questionnaires through the courier service.
Respondents were asked about: general issues facing Myanmar; their
overall attitudes toward the WBG; the WBG’s importance and results;
the WBG’s knowledge work and activities; the WBG’s future role in
Myanmar; and the WBG’s communication and information sharing.”
Quickly going over the contents I was
most interested in knowing what the stakeholders considered
'the top three
most important development priorities, which areas the government
should focus on, which areas would contribute most to reducing
poverty and generating economic growth in Myanmar, and how “shared
prosperity” would be best achieved'.
Chapter-IV and General Issues sections
of Appendix-A and Appendix-B reported findings directly relevant to
my interest. I was interested in understanding how the stakeholders
view the development priorities and the similarities and differences
within the individuals and groups. For that matter, I couldn't
imagine an effective way to summarize the three responses given by
each of the respondent in the form of tables of data so that they
would bring out patterns across the three responses and across
individuals/groups. In our case the respondents were to pick three
development priorities out of a list of 35 and I guess the most
appropriate way to see the patterns in the data is to draw a parallel
coordinate plot. As human beings
we can't see more than
three dimensions, but using parallel coordinates we can see multiple
dimensions by representing all the dimensions in just two dimensions
through parallel coordinates!
The following is
the famous Fisher's Iris data plotted this way. You can find it under
the entry “Parallel Coordinates” in Wikipedia.
In parallel
coordinates, the idea basically is to have as many y-axes as the
number of dimensions and connect the points on these axes for each
individual, case or observation.
In the
following plot I used the downloaded World Bank Country Survey data
(Reference
ID: MMR_2014_WBCS_v01_M; Country: Myanmar; Producer:Public Opinion
Research Group - The World Bank Group) to create
the parallel coordinates plot for the three Development Priorities
picked by each stakeholder. Each line in the plot shows the response
of each stakeholder and is identified with a particular color to show
the stakeholder group which he/she belongs. I used R to process the
data and the graphs were plotted with the ggplot2 package.
The first plot is for all the nine
stakeholder groups. One thing clearly seen from this plot is that one
respondent has gone over the three-response limit to give the fourth
one. While the rest of the respondents gave three responses, one gave
only one response and another respondent give only two responses.
As we can see, it is not easy to
distinguish different stakeholder groups in the above plot because
there were too many of them and the colors become hard to
distinguish. To overcome this we could try making a particular group
stands out by plotting it against the rest of the groups. The
following is the plot of Media group vs. the rest made
that way.
So far it looks as if the message from
the plot is clear. You can see each respondent's pattern of response
as well as the collective pattern for the group. From such plots it
is relatively easy to judge if one group is more homogeneous or not
in terms of responses or get an idea of the pattern of response for
most popular development priorities, and so on. But I've the uneasy
feeling that opinion surveys may suffer from the problem of people
unwilling to speak out their minds. That was found to be true, for
example, even in the case of exit polls which, I felt, typically have
the most insensitive questions. Another problem is the low response
rates.
The World Bank invited 662 stakeholders
to participate in their opinion survey. It was a pity that only 173
(23%) responded. Yet 3 stakeholders didn't have any answer for the
Development Priority question (item a2_1 to a2_35) and out of the
remaining 170 stakeholders, 22 didn't answer the question on which
stakeholder group they belong. So my plots just covered 148
stakeholders (22%) out of 662.
At the beginning I felt rather uneasy
about the low percentage of response (22 or 23%) of this survey. That
means users may have to look beyond the survey data to make up their
minds if the data reflect the opinions of the respective stakeholder
groups. Anyway I am pretty sure that parallel coordinate plots are
highly suitable for analyzing high dimensional data. Also, my
primary intention was not the interpretation and making sense out of
the analysis results. It was just a modest ambition of sharing my
do-it-yourself experience. As this sharing would have my “easier
done than said” twist, I went on happily doing my
parallel coordinate plots.
All the plots for the World Bank
Opinion Survey shown in this post were based on the microdata
mentioned earlier. The microdata included the codes for nine
stakeholder groups used in the analyses for the survey report. The
number of respondents who responded to the question on which
stakeholder group they belong as well as the question on development
priorities were:
The plots would have been more readable if the Development Priority items were in full text instead of their codes. However that would take too much space and would leave the actual plot area too small. So we will have to refer to the list below:
The plots would have been more readable if the Development Priority items were in full text instead of their codes. However that would take too much space and would leave the actual plot area too small. So we will have to refer to the list below:
I find ggplot2 not
easy to learn but it works great. I don't know much about R graphics
or particularly ggplot2 for that matter. But I guess ggplot2 produces
the prettiest complex graphics from your data.
I have zero experience of ggplot2
before I started working on these plots. I worked through trial and
error with the help of various tutorials and question-answers from
Cross Validated and Stack Overflow, among others. My errors were
much more than my trials, as one cartoon character said. But I've
finally made it.
No comments:
Post a Comment