There's is a saying "can't see forest (or wood) for the trees" which meant to
warn us that focusing too much on details could make you miss the whole
picture, overall impression, or key point. What about turning that upside down
and talk about individual trees by just looking at the forest?
Years ago, I had known a community development initiative by
one UN agency in Myanmar, which engaged a local consulting firm to select a
number of poorest communities in some selected areas in Myanmar. As far as I
know the contractor devised some scoring scheme for Village-Tracts which are
the lowest areal units in the administrative system in Myanmar. A Village-Tract
normally contained a number of villages (hamlets) and a village is usually treated as a community
for the purpose of community initiatives in Myanmar. Quite by accident I came
to know at least one concrete instance of the problem of using Village-Tracts
(collection of communities) to identify the poorest communities for targeting community
development assistance.
That time, I was working on a terminal assessment for a
community forestry initiative assisted by an INGO. I visited an independent project
located in central Myanmar that was doing research in organic farming and
providing training on that subject. The researcher in residence told me that he
had been approached by the UN agency, which I have mentioned, to administer the
distribution of aid to the poorest communities which the UN agency had
identified. The researcher was an expatriate who have been living and working
on site for ten years. He knew the area well and he wasn't impressed with the "poorest communities" that had
been identified. So he insisted that he and his team would discard this list
and would freshly determine the poorest communities on their own if they were
to distribute the aid. And happily, he was allowed to go on working with his
improved targeting exercise and the distribution of aid.
Lesson: it is a clear case of ecological fallacy.
The ecological fallacy is a logical error
of interpretation that involves
deriving conclusions about the nature of individuals solely on an analysis of
group data. (It's Fallacy Friday:
Ecological Fallacy, February 27, 2015, PHI KAPPA Literary Society)
An ecological
fallacy (or ecological inference fallacy) is a logical fallacy in the
interpretation of statistical data where inferences about the nature of individuals are deduced from inference
for the group to which those individuals belong. (Wikipedia)
The ecological fallacy refers to the incorrect assumption
that the relationships between variables observed at the aggregated, or
ecological-level, are the same at the individual-level. (Ecological Fallacy: Concepts, Causes and Solutions,
Wei at. al, University of Manitoba, January 10, 2010)
Ecological fallacy is a fallacy in research, wherein you draw an inference about a group, and incorrectly attribute that inference to any individual in that group. (Explanation of Ecological Fallacy in Research with Examples, Neha B Deshpande in Buzzle, March 9, 2015). Illustration below:
Pollet
et. al. cites a case due to which ecological
fallacy became well known (http://www.willem.maartenfrankenhuis.nl/wp-content/uploads/2013/06/Pollet-et-al.-2014-human-nature1.pdf
):
The
term “ecological fallacy” became well known after William Robinson (1950) used U.S. census data
to test hypotheses related to immigration and literacy ... hypothesis: those
who are literate are more likely to migrate, and therefore proportions of
immigrants within a state will be positively related to literacy rates in those
states. (He) found evidence for such a positive relationship between the
average literacy of U.S. states and the proportion of immigrants living in
those states. However, at the individual level, immigrants were less likely to
be literate than native individuals ... The positive state-level relationship
between proportion of immigrants and literacy rates might have arisen because
immigrants tended to settle in states with higher literacy levels, perhaps
because these states afforded better economic opportunities or were otherwise
more tolerant of immigrants. Thus, literacy levels are higher in some states
despite, rather than because of, lower literacy among immigrants. The
state-level literacy statistics at the aggregate level did not accurately
reflect the literacy of immigrants (and, indeed, portrayed a pattern that was
opposite to the individual-level pattern). In sum, then, the ecological fallacy
is committed when group-level relationships are assumed to reflect
individual-level relationships. The fallacy can occur when group aggregates are
incorrectly assumed to be representative of individuals within those groups, or
when macrolevel relationships are governed by processes that are unrelated to
those hypothesized to operate at the individual level. In the Robinson (1950) study on literacy,
for example, the scores at state level were assumed to represent literacy of
immigrants and nonimmigrants equally, whereas at the individual level,
immigrants were less likely to be literate than non-immigrants.
In
our example of the targeting exercise for the poorest villages think
about a particular village-tract that is excluded because it is not poor enough
in terms of the village-tract level poverty score. Consider the case that this village-tract contains
one village for which if the poverty score were taken at the village level it would be poorer than any of the village in the
village-tracts included in the target. Here we could easily see how a village could
be wrongly excluded while another could be wrongly included in its place because
of the ecological fallacy arising from taking village-tracts instead of villages.
The Ecological Fallacy entry in Wikipedia contains an interesting example on
election for governor of Washington, USA:
The ecological fallacy was discussed in a court challenge to
the Washington gubernatorial
election, 2004 in which a number of illegal voters were
identified, after the election; their votes were unknown, because the vote was
by secret
ballot. The challengers
argued that illegal votes cast in the election would have followed the voting
patterns of the precincts in which they had been cast, and thus adjustments
should be made accordingly. An
expert witness said this approach was like trying to figure out Ichiro
Suzuki's batting
average by looking at the batting average of the entire Seattle
Mariners team, since the illegal votes were cast by an unrepresentative sample of
each precinct's voters, and might be as different from the average voter in the
precinct as Ichiro was from the rest of his team. The judge determined that the
challengers' argument was an ecological fallacy and rejected it.
In my previous post "Big data: problems of correlation,
bias, and machine learning" I showed the graph of statistically
significant correlation between chocolate consumption per capita and number of
Nobel laureates in a country to illustrate the fact that "correlation does
not imply causation". The information for it came from a paper published
in the New England Journal of Medicine in 2012. It claimed that chocolate
consumption could enhance cognitive function. The basis for this conclusion was
that the number of Nobel Prize laureates in each country was strongly
correlated with the per capita consumption of chocolate in that country. However
a recent paper by Velickovic in Scientific American (What Everyone Should Know about Statistical Correlation: A common
analytical error hinders biomedical research and misleads the public,
January-February 2015 - http://www.americanscientist.org/issues/pub/2015/1/what-everyone-should-know-about-statistical-correlation/99999)
the author and his commentators were unsure of if the authors of the New
England Journal of Medicine article made a blunder or if they were writing it
tongue-in-cheek.
The interesting point, however, is that apart from being
educational as illustrating correlation
does not imply causation, the analysis in the New England Journal of
Medicine article could be seen as an example of ecological fallacy. Velickovic
writes:
... the authors fell into an
ecological fallacy, when a conclusion about individuals is reached based on
group-level data. In this case, the authors calculated the correlation
coefficient at the aggregate level (the country), but then erroneously used
that value to reach a conclusion about the individual level (eating chocolate
enhances cognitive function). Accurate data at the individual level were
completely unknown: No one had collected data on how much chocolate the Nobel
laureates consumed, or even if they consumed any at all. I was not the only one
to notice this error. Many other scientists wrote about this case of erroneous
analysis. Chemist Ashutosh Jogalekar wrote a thorough critique on his Scientific
American blog The
Curious Wavefunction ,
and Beatrice A. Golomb of University of California, San Diego, even tested this
hypothesis with a team of coauthors, pointing out that there is no link.