Skip to content
Simpson's Paradox: How Trends Reverse When Data Groups Are Combined

Photo via Pexels

Discovery

Edited by Alex Surfaced·Statistics·2 min read
Share:

Researchers at Stanford University illustrate Simpson's Paradox, a statistical phenomenon where a trend appearing in several groups of data disappears or reverses when these groups are combined. For instance, a famous 1973 study on UC Berkeley graduate admissions found a higher overall admission rate for men (44% vs. 35%), but when broken down by individual departments, women were admitted at a higher or similar rate in most departments. This paradox occurs when an unobserved confounding variable influences the aggregation of data, revealing the importance of disaggregation in statistical analysis. The counterintuitive implication is that seemingly clear overall trends can be completely misleading without deeper contextual understanding. This concept was notably discussed by Peter Bickel, Eugene Hammel, and J. William O'Connell in 'Sex Bias in Graduate Admissions: Data from Berkeley' (Science, 1975).

Source linkedContext summarizedStatistics

Editorial check

How this page is checked

Source:jstor.org

Source trail

jstor.org

External links are separated from Surfaced commentary.

Reader safety

Context before clicks

Product links and external services are not presented as guarantees.

Monetization

No affiliate flag

Ads and commerce links are kept distinct from editorial text.

Surfaced take

Why It’s Fascinating

Experts are often surprised by Simpson's Paradox because it highlights how easily aggregated data can obscure critical insights, leading to flawed conclusions in everything from medicine to social policy. It overturns the simplistic understanding that a trend observed across multiple subsets must necessarily hold true for the combined whole. In the next 5-10 years, understanding this paradox will be crucial for developing fairer AI algorithms, for example, ensuring that models don't inadvertently perpetuate biases by only looking at overall performance metrics. It's like judging a sports team only by its overall win-loss record, without realizing that different players perform vastly differently against specific opponents. Policymakers and data scientists benefit most by avoiding erroneous conclusions from aggregated data. How many societal biases are perpetuated simply because we fail to disaggregate our data?

Enjoyed this? Get five picks like this every morning.

Free daily newsletter — zero spam, unsubscribe anytime.

Get the day's top tech discoveries delivered at 6 PM.

Free, source-linked, and easy to unsubscribe from.