Category Archives: Probability & Statistics

Intuitive Probabilities

GothGuy3Meet Vic. Vic enjoys a form of music that features heavily distorted guitars, slow growling vocals, atonality, frequent tempo changes, and what is called “blast beat” drumming in the music business. His favorite death metal bands are Slayer, Leviticus, Dark Tranquility, Arch Enemy, Behemoth, Kreator, Venom, and Necrophagist.

Vic has strong views on theology and cosmology. Which is more likely?

  1. Vic is a Christian
  2. Vic is a Satanist

While teaching courses on probabilistic risk analysis over the years, I’ve found that very intelligent engineers, much more experienced than I, often find probability extremely unintuitive. Especially when very large (or very small) numbers are involved. Other aspects of probability and statistics are unintuitive for other interesting reasons. More on those later.

The matter of Vic’s belief system involves several possible biases and unintuitive aspects of statistics. Vic is almost certainly a Christian. Any other conclusion would involve the so-called base-rate fallacy, where the secondary, specific facts (affinity for death metal) somehow obscure the primary, base-rate relative frequency of Christians versus Satanists.

The Vatican claims over one billion Catholics, and most US Christians are not Catholic. Even with papal exaggeration, we can guess that there are well over a billion Christians on earth. I know hundreds if not thousands of them. I don’t know any Satanists personally, and don’t know of any public figures who are (there is conflicting evidence on Marilyn Manson). A quick Google search suggests a range of numbers of Satanists in the world, the largest of which is under 100,000. Further, I don’t ever remember seeing a single Satanist meeting facility, even in San Francisco. A web search also reveals a good number of conspicuously Christian death metal bands, including Leviticus, named above.

Without getting into the details of Bayes Theorem, it is probably obvious that the relative frequencies of Christians against Satanists governs the outcome. And judging Vic by his appearance is likely very unreliable.

South Park Community Presbyterian Church
South Park Community Presbyterian Church
Fairplay, Colorado

Spurious Regression

William Storage           14 Jun 2012
Visiting Scholar, UC Berkeley Center for Science, Technology, Medicine & Society

I’ve been looking into the range of usage of the term “Design Thinking” (see previous post on this subject) on the web along with its rate of appearance in publications. According to Google, the term first appeared in print in 1973, occurring occasionally until 1988. Over the next five years its usage increased ten-fold, then calming down a bit. It peaked again in 2003 and has declined a bit since then.

Design-ThinkingRate of appearance of “Design Thinking” in publications
since 1970 (bottom horizontal is zero) per Google.

More interesting than term publication rates was the Google data on search requests. I happened upon a strong correlation between Google searches for “Design Thinking” and both “Bible verse” and “scriptures.” That is, the rate of Google searches for Design Thinking rise and fall in sync with searches for Bible verses.

A scatter plot of search activity for Design Thinking and Bible verse from 2005 to present shows an uncanny correlation:

US web search activity for Design Thinking and Bible verse (r=0.9648) Source: Google Correlate

From this, we might conclude that Design Thinking is a religion or that holism is central to both Christianity and Design Thinking. Or that studying Design Thinking causes interest in scriptures or vice versa. While at least one of these four possibilities is in fact true (Christianity and Design Thinking both rely on holism), we would be very wrong to think the relationship between search behavior on these terms to be causal.

A closer look at the Design Thinking – Bible verse data, this time as a line plot, over a few years is telling. Searches for the both terms hit a yearly minimum the last week of December and another local minimum near mid-July. It would seem that time of year has something do with searching on both terms.

Google Correlate relative rates of searches on Design Thinking
and Bible verse, July 09-July 2011 (r=0.964)

If two sets of data, A and B, correlate, there are four possibilities to explain the correlation:

1. A causes B
2. B causes A
3. C causes both A and B
4. The correlation is merely coincidental

Item 3, known as the hidden variable or ignoring a common cause, is standard fare for politics and TV news (imagine what Fox News or NPR might do with the Design Thinking – Bible verse correlation). But in statistics, spurious correlations are bad news.

Spurious regression is the term for the scenario above. In this linear regression model, A was regressed on B. But there is some unknown C probably having to do with seasonal interest/disinterest due to time availability or more pressing topics of interest. Searches on Broncos and Tebow, for example, have negative correlations with Design Thinking and Bible verse.