Wednesday, March 18, 2015

Forest to Trees, the Ecological Fallacy


There's is a saying "can't see forest (or wood) for the trees" which meant to warn us that focusing too much on details could make you miss the whole picture, overall impression, or key point. What about turning that upside down and talk about individual trees by just looking at the forest?


Years ago, I had known a community development initiative by one UN agency in Myanmar, which engaged a local consulting firm to select a number of poorest communities in some selected areas in Myanmar. As far as I know the contractor devised some scoring scheme for Village-Tracts which are the lowest areal units in the administrative system in Myanmar. A Village-Tract normally contained a number of villages (hamlets) and a village is usually treated as a community for the purpose of community initiatives in Myanmar. Quite by accident I came to know at least one concrete instance of the problem of using Village-Tracts (collection of communities) to identify the poorest communities for targeting community development assistance.  

That time, I was working on a terminal assessment for a community forestry initiative assisted by an INGO. I visited an independent project located in central Myanmar that was doing research in organic farming and providing training on that subject. The researcher in residence told me that he had been approached by the UN agency, which I have mentioned, to administer the distribution of aid to the poorest communities which the UN agency had identified. The researcher was an expatriate who have been living and working on site for ten years. He knew the area well and he wasn't impressed with the "poorest communities" that had been identified. So he insisted that he and his team would discard this list and would freshly determine the poorest communities on their own if they were to distribute the aid. And happily, he was allowed to go on working with his improved targeting exercise and the distribution of aid.

Lesson: it is a clear case of ecological fallacy.

The ecological fallacy is a logical error of interpretation that involves deriving conclusions about the nature of individuals solely on an analysis of group data. (It's Fallacy Friday: Ecological Fallacy, February 27, 2015, PHI KAPPA Literary Society)

An ecological fallacy (or ecological inference fallacy) is a logical fallacy in the interpretation of statistical data where inferences about the nature of individuals are deduced from inference for the group to which those individuals belong. (Wikipedia)

The ecological fallacy refers to the incorrect assumption that the relationships between variables observed at the aggregated, or ecological-level, are the same at the individual-level. (Ecological Fallacy: Concepts, Causes and Solutions, Wei at. al, University of Manitoba, January 10, 2010)

Ecological fallacy is a fallacy in research, wherein you draw an inference about a group, and incorrectly attribute that inference to any individual in that group. (Explanation of Ecological Fallacy in Research with Examples, Neha B Deshpande in Buzzle, March 9, 2015). Illustration below:


Pollet et. al. cites a case due to which ecological fallacy became well known (http://www.willem.maartenfrankenhuis.nl/wp-content/uploads/2013/06/Pollet-et-al.-2014-human-nature1.pdf ):

The term ecological fallacybecame well known after William Robinson (1950) used U.S. census data to test hypotheses related to immigration and literacy ... hypothesis: those who are literate are more likely to migrate, and therefore proportions of immigrants within a state will be positively related to literacy rates in those states. (He) found evidence for such a positive relationship between the average literacy of U.S. states and the proportion of immigrants living in those states. However, at the individual level, immigrants were less likely to be literate than native individuals ... The positive state-level relationship between proportion of immigrants and literacy rates might have arisen because immigrants tended to settle in states with higher literacy levels, perhaps because these states afforded better economic opportunities or were otherwise more tolerant of immigrants. Thus, literacy levels are higher in some states despite, rather than because of, lower literacy among immigrants. The state-level literacy statistics at the aggregate level did not accurately reflect the literacy of immigrants (and, indeed, portrayed a pattern that was opposite to the individual-level pattern). In sum, then, the ecological fallacy is committed when group-level relationships are assumed to reflect individual-level relationships. The fallacy can occur when group aggregates are incorrectly assumed to be representative of individuals within those groups, or when macrolevel relationships are governed by processes that are unrelated to those hypothesized to operate at the individual level. In the Robinson (1950) study on literacy, for example, the scores at state level were assumed to represent literacy of immigrants and nonimmigrants equally, whereas at the individual level, immigrants were less likely to be literate than non-immigrants.

In our example of the targeting exercise for the poorest villages think about a particular village-tract that is excluded because it is not poor enough in terms of the village-tract level poverty score.  Consider the case that this village-tract contains one village for which if the poverty score were taken at the village level it would be poorer than any of the village in the village-tracts included in the target. Here we could easily see how a village could be wrongly excluded while another could be wrongly included in its place because of the ecological fallacy arising from taking village-tracts instead of villages.  

The Ecological Fallacy entry in Wikipedia contains an interesting example on election for governor of Washington, USA:

The ecological fallacy was discussed in a court challenge to the Washington gubernatorial election, 2004 in which a number of illegal voters were identified, after the election; their votes were unknown, because the vote was by secret ballot. The challengers argued that illegal votes cast in the election would have followed the voting patterns of the precincts in which they had been cast, and thus adjustments should be made accordingly. An expert witness said this approach was like trying to figure out Ichiro Suzuki's batting average by looking at the batting average of the entire Seattle Mariners team, since the illegal votes were cast by an unrepresentative sample of each precinct's voters, and might be as different from the average voter in the precinct as Ichiro was from the rest of his team. The judge determined that the challengers' argument was an ecological fallacy and rejected it.

In my previous post "Big data: problems of correlation, bias, and machine learning" I showed the graph of statistically significant correlation between chocolate consumption per capita and number of Nobel laureates in a country to illustrate the fact that "correlation does not imply causation". The information for it came from a paper published in the New England Journal of Medicine in 2012. It claimed that chocolate consumption could enhance cognitive function. The basis for this conclusion was that the number of Nobel Prize laureates in each country was strongly correlated with the per capita consumption of chocolate in that country.  However a recent paper by Velickovic in Scientific American (What Everyone Should Know about Statistical Correlation: A common analytical error hinders biomedical research and misleads the public, January-February 2015 - http://www.americanscientist.org/issues/pub/2015/1/what-everyone-should-know-about-statistical-correlation/99999) the author and his commentators were unsure of if the authors of the New England Journal of Medicine article made a blunder or if they were writing it tongue-in-cheek.

The interesting point, however, is that apart from being educational as illustrating correlation does not imply causation, the analysis in the New England Journal of Medicine article could be seen as an example of ecological fallacy. Velickovic writes:

... the authors fell into an ecological fallacy, when a conclusion about individuals is reached based on group-level data. In this case, the authors calculated the correlation coefficient at the aggregate level (the country), but then erroneously used that value to reach a conclusion about the individual level (eating chocolate enhances cognitive function). Accurate data at the individual level were completely unknown: No one had collected data on how much chocolate the Nobel laureates consumed, or even if they consumed any at all. I was not the only one to notice this error. Many other scientists wrote about this case of erroneous analysis. Chemist Ashutosh Jogalekar wrote a thorough critique on his Scientific American blog The Curious Wavefunction , and Beatrice A. Golomb of University of California, San Diego, even tested this hypothesis with a team of coauthors, pointing out that there is no link.