According to the LA police, Nicole Brown Simpson was shouting “He’s going to kill me!” when they responded to her domestic violence call. Her husband had hit her hard enough on Jan 1, 1989 that she required hospitalization. There had been eight previous domestic-violence calls to the Simpson house.

“That punching, pushing and slapping is a prelude to murder.” This is what prosecutor Scott Gordon said about the relationship between domestic violence and the murder of Nicole. Alan Dershowitz would hear nothing of it. He argued that the probability of domestic violence leading to murder was very remote, and he was right. But is that probability the relevant one in the OJ Simpson case?

Lawyers, like most people, often have trouble with probability. A particularly famous case, addressed in *Handbook of Ethics in Quantitative Methodology *by Panter and Sterba, is the 1964 People v Collins. Juanita Brooks walked with a cane as she pulled a wicker shopping basket behind her. Someone sneaked up from behind, knocked her down, and stole her purse from the basket. Juanita saw only the back of her attacker, a woman with “dark blonde” hair.

A neighbor heard screaming and saw a woman hop into a yellow car driven by an African American man with a beard and mustache. He didn’t catch the make of the car.

The LA police arrested Malcolm Collins, a black man with a mustache, who said he’d recently shaved his beard, and his white wife Janet, known to wear her blonde hair in a ponytail. Collins drove a yellow Lincoln. Neither the victim nor her neighbor could positively identify either Janet or Malcolm.

Lacking any other witnesses, the prosecution produced a state-college math instructor. Listing probability values for attributes such as man with beard, man with mustache, African American man with Caucasian woman, yellow car, etc. He explained that the joint probability of the combination of traits was 1 in 12 million (8.3E-8). Collins was convicted.

After reversal on appeal, J. Sullivan wrote an opinion showing better knowledge of math. Noting that the math teacher gave no source for his frequency values for beards and yellow cars, he also showed how the witness had badly violated the rules for calculating intersections. Specifically, the probability of A *and* B equals the product of their individual probabilities — i.e. P(A *and* B) = P(A) * P(B) — only if A and B are independent.

In 1964 beard frequency was high. Black, beard and mustache were strongly correlated. We also know criminals sometimes wear fake beards. Interracial marriages were uncommon in 1964; interracial unmarried couples were more common. We know the Collins couple was married, but we don’t know if the perpetrators of the crime were married. In fact, we don’t know that the perps were an interracial couple; we know only that the woman had dark blonde hair.

Such fine points knock that math teacher’s combined probability down an order of magnitude or so. Still, 1 in a million is compelling, no?

No. One in a million sets the probability that an individual man, selected at random near San Pedro, CA would have the attributes: bearded, black, blonde wife with ponytail, yellow car, etc. Millions of people lived near San Pedro in 1964. In a population of 3 million, roughly three people can be expected to be matches. That changes the probability that Mr. Collins was the perp from the stated 1-in-12-million to about 1 in 3. That’s quite a difference. As the CA Supreme Court noted, “this seems as indefensible as arguing for the conviction of X on the ground that a witness saw either X or X’s twin commit the crime.”

The term “prosecutor’s fallacy” appears often in discussion of both the OJ Simpson and Collins cases. While prosecutors commit many miscarriages of mathematics, different groups seem to each restrict the term to two different specific errors. One is the case we see in Collins – mistaking the probability of a random match for the probability that a defendant is not guilty.

Another is a misunderstanding of conditional probability. I.e., it is believing that P(A|B) = P(B|A), where “P(A|B)” means the probability of A given B. This can generally be understood to mean the probability of A given that B has already occurred (though technically speaking, chronological sequence is not required), with no requirement of independence between A and B.

This particular fallacy actually seems rather rare in court, unlike mistaking random-match probability for probability of guilt (a staple of DNA cases). But the conditional-probability error is rampant in medicine.

The relevant mathematical rule here is Bayes’ Theorem:

P(A|B) = P(B|A) * P(A) / P(B)

In medicine, this is handy when we seek to know the probability that the patient has the disease, given a positive test result. If we label the hypothesis (patient has disease) as H and the test data as D, the useful form of Bayes’ Theorem is

P(H|D) = P(D|H) P(H) / P(D) where P(D) is the total probability of positive results, e.g.,

P(D) = P(D|H) * P(H) + P(D | not H) * P(not H)

In medicine, values of the terms of this last equation are usually known. In a 1982 survey later made famous by Daniel Kahneman, David Eddy asked physicians to estimate the probability that a patient had the disease given the following information:

- rate of disease in population: 1%
- false negative rate for test: 20%
- false positive rate: 10%

In Eddy’s survey, most physicians’ estimates for the probability of the disease in a patient were off by a factor of ten – ten times too high. Making this error in medical practice costs patients a lot of unnecessary worry.

Details (where “H” means *Hypothesis*, i.e. the disease; and “D” means *Data*, i.e. positive test):

- P(H) is the disease frequency = 0.01 [ observed frequency in population ]
- P( not H) is 1 – P(H) = 0.99
- P(D|H) = 80% = 0.8 [ i.e., false negative rate = 20% ]
- P(D | not H) = 10% = 0.1 [ false positive rate ]

Substituting:

P(D) = .8 * .01 + .99 * .1 = 0.107 [ total probability of a positive test ]

P(H|D) = .8 * .01 / .107 = .075 ≈ 8% [ probability that patient has disease, give the test result ]

A frequency tree might be useful for visualizing:

For this hypothetical case, most physicians’ estimates were around 75%, just below the true-positive rate. We shouldn’t expect MDs to calculate the 8% in their heads, but they should know that for low-frequency diseases, the probability of disease given a positive test is much lower than the probability of a positive test given the disease (8% vs. 80% in the above example).

Eddy speculated that the physicians’ errors result from some combination of misunderstood conditional probability (equating P(A|B) with P(B|A) and what Kahneman called *base-rate neglect*. I.e., the value we want to calculate is highly sensitive to the base rate of the disease in a population. We shouldn’t expect the “priors” (conditional probability based on personal beliefs) of MDs to always equal the base rate, but base rate must play a role.

Viewed algebraically, the equation for P(H|D) takes the form, P = x / (x + y). In it, the value (x / (x + y)) swings wildly when x varies against y.

As I mentioned above, I see many more examples of conditional-probability confusion in medicine than in law. Your experience may differ. I’d enjoy hearing about such court cases if you’re aware of any.

A case often cited as an example is that of OJ Simpson. Both prosecutors and defenders abused arithmetic, but I don’t think their errors/misrepresentations fit conditional-probability confusion per-se. One that comes close was when Alan Dershowitz argued that slapping is in fact not a prelude to murder. Only one in 2500 women subjected to spousal abuse end up being murdered, the defense said. This implied a probability of Simpson’s guilt from the numbers alone of 0.04%.

Of course the relevant number is not that one, but the probability that a woman who was battered by her husband and was also murdered was murdered by her husband. So, technically speaking, the error is not confusing P(A|B) with P(B|A) but instead confusing P(A|B) with P(A|B and C).

P(husband killed wife | husband battered wife) = 0.04% vs.

P(husband killed wife | husband battered wife AND wife was killed)

In his coverage of Simpson, Gerd Gigerenzer assigns a probability of about 90% to the latter, noting that, since other evidence existed, the 90% figure should not be interpreted to be the probability of Simpson’s guilt. Gigerenzer concludes that, despite Dershowitz’s claim, battery is a strong predictor of guilt when a wife is murdered.

In *The Drunkard’s Walk*, Leonard Mlodinow suggests Dershowitz knew he was pulling a fast one. He quotes *The Best Defense*, where Dershowitz says:

“The courtroom oath—‘to tell the truth, the whole truth and nothing but the truth’—is applicable only to witnesses”

How this reconciles with California Rule of Professional Conduct 5-200 is beyond me. It certainly seems to “mislead the jury by an artifice.” As did the prosecutor’s statements in People v Collins when he told the jury that his “new *math* approach” to law was superior to the “hackneyed, stereotyped [and] trite” concept of beyond-reasonable-doubt. This prosecutor admitted that “on some *rare* occasion … an innocent person may be convicted.” But he said without taking that risk, “life would be intolerable because … there would be immunity for the Collinses, for people who chose [to] push old ladies down and take their money and be immune because how could we ever be sure they are the ones who did it?”

The prosecutor’s mastery of fundamentals of probability and logic brings to mind a question posed by John Allen Paulos in his *Innumeracy: Mathematical Illiteracy and Its Consequences. *Calling Dershowitz’s on his “astonishingly irrelevant” numerical contribution, Paulos charitably added that inability to deal with chance “plagues far too many otherwise knowledgeable citizens.”

I wonder if, in any other aspect of professional life, customers tolerate incompetence in their suppliers because the customers themselves lack the relevant skill. We have zero tolerance for doctors’ and lawyers’ lack of knowledge of legal codes and scalpel technique. Do we perhaps extend society’s perverse pride in being bad at math to those whose work also depends on math knowledge?

My sister, a math teacher, on hearing another educated guest at a party say he never really learned math, replied that she understood perfectly. With a straight face she told him that she had never really learned to read. He looked at her like she was a cluck. The prosecution rests.