Spurious Regression

William Storage           14 Jun 2012
Visiting Scholar, UC Berkeley Center for Science, Technology, Medicine & Society


I’ve been looking into the range of usage of the term “Design Thinking” (see previous post on this subject) on the web along with its rate of appearance in publications. According to Google, the term first appeared in print in 1973, occurring occasionally until 1988. Over the next five years its usage increased ten-fold, then calming down a bit. It peaked again in 2003 and has declined a bit since then.

Design-ThinkingRate of appearance of “Design Thinking” in publications
since 1970 (bottom horizontal is zero) per Google.

More interesting than term publication rates was the Google data on search requests. I happened upon a strong correlation between Google searches for “Design Thinking” and both “Bible verse” and “scriptures.” That is, the rate of Google searches for Design Thinking rise and fall in sync with searches for Bible verses.

A scatter plot of search activity for Design Thinking and Bible verse from 2005 to present shows an uncanny correlation:

DesignThinking-BibleVerse_scatter
US web search activity for Design Thinking and Bible verse (r=0.9648) Source: Google Correlate

From this, we might conclude that Design Thinking is a religion or that holism is central to both Christianity and Design Thinking. Or that studying Design Thinking causes interest in scriptures or vice versa. While at least one of these four possibilities is in fact true (Christianity and Design Thinking both rely on holism), we would be very wrong to think the relationship between search behavior on these terms to be causal.

A closer look at the Design Thinking – Bible verse data, this time as a line plot, over a few years is telling. Searches for the both terms hit a yearly minimum the last week of December and another local minimum near mid-July. It would seem that time of year has something do with searching on both terms.

Google-DesignThinking-BibleVerse2
Google Correlate relative rates of searches on Design Thinking
and Bible verse, July 09-July 2011 (r=0.964)

If two sets of data, A and B, correlate, there are four possibilities to explain the correlation:

1. A causes B
2. B causes A
3. C causes both A and B
4. The correlation is merely coincidental

Item 3, known as the hidden variable or ignoring a common cause, is standard fare for politics and TV news (imagine what Fox News or NPR might do with the Design Thinking – Bible verse correlation). But in statistics, spurious correlations are bad news.

Spurious regression is the term for the scenario above. In this linear regression model, A was regressed on B. But there is some unknown C probably having to do with seasonal interest/disinterest due to time availability or more pressing topics of interest. Searches on Broncos and Tebow, for example, have negative correlations with Design Thinking and Bible verse.

 


Untitled

2 thoughts on “Spurious Regression

  1. or could it just be possible that the search term ‘design thinking’ has been taken out of context- perhaps misconstrued or misunderstood as a definition for ‘creationism’- the thought and plans of the design of our universe (design thinking of a God). Furthermore, the term design thinking as designers know it wasn’t coined until the 80s…

    that is, though, if you choose to twist/read into this data. personally i think this is taken way out of context and any popular search term sitting next to ‘scriptures’ or ‘bible verse’ would appear as though there is some kind of correlation

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s