Views > Business modeling > The ecological correlation fallacy - part 2
Two things I didn’t expect to happen after I finished writing the first blog post about the
ecological correlation fallacy.
First, that my blog post would show up on the first page of Google search for keywords like “ecological correlation” and “ecological correlation fallacy”.
Second, that getting fooled by the ecological correlation fallacy is not only prevalent among journalists and social scientists but also among economists—with tens of thousands of Twitter followers—including Nobel laureates.
Let me show you two examples. Here is a tweet from Paul Krugman—a Nobel laureate in economics—referencing an article in The Washington Post titled “ How Donald Trump appeals to men secretly insecure about their manhood”.
The map shows the popularity of Google searches in 2016 for certain keywords, that is, search topics the authors of the article believed might be common among men with what they labeled "fragile masculinity". The darker the color on the map, the more prevalent the searches were.
THE ECOLOGICAL CORRELATION FALLACY - PART 2
The article—and consequently Krugman’s tweet—suggest that important correlations exist between "fragile masculinity" and voting for Trump—as shown in the 2016 electoral map below.
Here is another one, a tweet from Justin Wolfers—a New York Times contributor and a professor of economics, among other things.
Let’s illustrate—by example—how the wrong inferences can be reached by using the scatterplot referenced by Steven Wolfers. Take California for instance. According to the US Census Bureau, Its 2016 population was estimated at 39,209,127. Of that—In the 2016 Presidential General Elections—4,483,814 people voted for Trump, 9,754,079 votes went to other candidates and 24,971,234 didn’t vote. According to the chart referenced by Wolfers—in 2010—52% of California’s population had passports. If we assume that the same rate holds for 2016, we can say that the number of passport holders is around 20,388,746. You can see these numbers represented graphically in the charts below.
Source: How Charts Lie by Alberto Cairo—a dataviz book with many examples about ecological correlations.
Source:
Population figures from The US Census Bureau.
Voter counts from
www.uselectionatlas.org
“When you look at correlation, when you hear correlation, you’ve got to be suspicious.”
Nassim Nicholas Taleb
Lebanese-American philosopher
Scenario 1
What makes the above two associations different from other correlation studies is that they examine associations between two characteristics of the population at the group and not the individual level. That is, the unit of analysis is not an individual person but a group of people. We call this an ecological correlation.
Have a look at the different likely outcomes that may happen if we consider individuals—rather than groups—as the unit of analysis. The outcome could range from—one extreme that supports Justin Wolfers point of view—where none of “Trump voters” hold a passport (Scenario 1) to—another extreme—where none of the “other voters” hold a passport (Scenario 3) or to a third extreme where all passport holders didn’t even vote (Scenario 4). Or it could be that all voters—both Trump and non-Trump supporters—hold passports (Scenario 2). Note that in all scenarios the % of passport holders doesn't change—it’s 52% of the population—and yet the way passport holders voted is different in each scenario.
The truth may be lying somewhere in between these extreme scenarios. Probably the most likely scenario is scenario 5. A mix between Trump voters, Other voters and those who didn't vote. In Which proportions? We will never know the answer to this question, unless we conduct a survey—among a representative sample—and ask participants two questions:
1) Did you vote Trump, Yes or No?
2) Do you hold a passport, Yes or No? (Or more precisely, have you traveled abroad, Yes or No?)
Anything other than that is an opinion and complete speculation.
The answer is that you cannot automatically jump to these conclusions. For example, we simply don’t know whether the people who voted for Trump within a certain state are the same people who do not hold passports and vice versa. Don’t get me wrong, these inferences may be correct but are only weakly supported by the aggregate data. More investigation is required. Without knowledge about the individuals, Paul Krugman and Justin Wolfers may be committing what is called the ecological correlation fallacy.
In it he is referencing a scatterplot showing a negative correlation between the share of voters that elected Trump and the share of people who hold passports in a state. That is, the more passport holders a state has, the less support for Trump its voters gave. What the tweet suggests is that Trump supporters are more likely to have never been outside of the US.
The fact that you see a positive correlation between the share of vote that went to Trump and the prevalence of “fragile masculinity” at the population level, does that automatically imply that this association carries over to individuals? If counties with more support for Trump have higher prevalence of men with “fragile masculinity”, then men with “fragile masculinity” must be more likely to vote for Trump? In other words, do men who are secretly insecure about their manhood tend to be the same people who voted for Trump and vice versa?
Or the fact that you see a negative correlation between the share of vote that went to Trump and the share of population with passports at the state level, does that automatically imply that this association carries over to individuals? If states with more support for Trump have lower rates of passport holders, then people without passports must be more likely to vote for Trump? In other words, do people without passports tend to be the same people who voted for Trump and vice versa?
Source: Passport rates from
THE EXPEDITIONER
Note: I couldn't verify the % with passport numbers. The U.S. Department of State doesn't report the number of
Valid Passports in Circulation by state.
Scenario 5
Scenario 4
Scenario 3
Scenario 2