Tag Archives: statistics

Spurious Regression

William Storage           14 Jun 2012
Visiting Scholar, UC Berkeley Center for Science, Technology, Medicine & Society


I’ve been looking into the range of usage of the term “Design Thinking” (see previous post on this subject) on the web along with its rate of appearance in publications. According to Google, the term first appeared in print in 1973, occurring occasionally until 1988. Over the next five years its usage increased ten-fold, then calming down a bit. It peaked again in 2003 and has declined a bit since then.

Design-ThinkingRate of appearance of “Design Thinking” in publications
since 1970 (bottom horizontal is zero) per Google.

More interesting than term publication rates was the Google data on search requests. I happened upon a strong correlation between Google searches for “Design Thinking” and both “Bible verse” and “scriptures.” That is, the rate of Google searches for Design Thinking rise and fall in sync with searches for Bible verses.

A scatter plot of search activity for Design Thinking and Bible verse from 2005 to present shows an uncanny correlation:

DesignThinking-BibleVerse_scatter
US web search activity for Design Thinking and Bible verse (r=0.9648) Source: Google Correlate

From this, we might conclude that Design Thinking is a religion or that holism is central to both Christianity and Design Thinking. Or that studying Design Thinking causes interest in scriptures or vice versa. While at least one of these four possibilities is in fact true (Christianity and Design Thinking both rely on holism), we would be very wrong to think the relationship between search behavior on these terms to be causal.

A closer look at the Design Thinking – Bible verse data, this time as a line plot, over a few years is telling. Searches for the both terms hit a yearly minimum the last week of December and another local minimum near mid-July. It would seem that time of year has something do with searching on both terms.

Google-DesignThinking-BibleVerse2
Google Correlate relative rates of searches on Design Thinking
and Bible verse, July 09-July 2011 (r=0.964)

If two sets of data, A and B, correlate, there are four possibilities to explain the correlation:

1. A causes B
2. B causes A
3. C causes both A and B
4. The correlation is merely coincidental

Item 3, known as the hidden variable or ignoring a common cause, is standard fare for politics and TV news (imagine what Fox News or NPR might do with the Design Thinking – Bible verse correlation). But in statistics, spurious correlations are bad news.

Spurious regression is the term for the scenario above. In this linear regression model, A was regressed on B. But there is some unknown C probably having to do with seasonal interest/disinterest due to time availability or more pressing topics of interest. Searches on Broncos and Tebow, for example, have negative correlations with Design Thinking and Bible verse.

 


Untitled