# Fallacies for Physicians and Prosecutors

According to the LA police, Nicole Brown Simpson was shouting “He’s going to kill me!” when they responded to her domestic violence call. Her husband had hit her hard enough on Jan 1, 1989 that she required hospitalization. There had been eight previous domestic-violence calls to the Simpson house.

“That punching, pushing and slapping is a prelude to murder.” This is what prosecutor Scott Gordon said about the relationship between domestic violence and the murder of Nicole. Alan Dershowitz would hear nothing of it. He argued that the probability of domestic violence leading to murder was very remote, and he was right. But is that probability the relevant one in the OJ Simpson case?

Lawyers, like most people, often have trouble with probability. A particularly famous case, addressed in Handbook of Ethics in Quantitative Methodology by Panter and Sterba, is the 1964 People v Collins. Juanita Brooks walked with a cane as she pulled a wicker shopping basket behind her. Someone sneaked up from behind, knocked her down, and stole her purse from the basket. Juanita saw only the back of her attacker, a woman with “dark blonde” hair.

A neighbor heard screaming and saw a woman hop into a yellow car driven by an African American man with a beard and mustache. He didn’t catch the make of the car.

The LA police arrested Malcolm Collins, a black man with a mustache, who said he’d recently shaved his beard, and his white wife Janet, known to wear her blonde hair in a ponytail. Collins drove a yellow Lincoln. Neither the victim nor her neighbor could positively identify either Janet or Malcolm.

Lacking any other witnesses, the prosecution produced a state-college math instructor. Listing probability values for attributes such as man with beard, man with mustache, African American man with Caucasian woman, yellow car, etc. He explained that the joint probability of the combination of traits was 1 in 12 million (8.3E-8). Collins was convicted.

After reversal on appeal, J. Sullivan wrote an opinion showing better knowledge of math. Noting that the math teacher gave no source for his frequency values for beards and yellow cars, he also showed how the witness had badly violated the rules for calculating intersections. Specifically, the probability of A and B equals the product of their individual probabilities — i.e. P(A and B) = P(A) * P(B) — only if A and B are independent.

In 1964 beard frequency was high. Black, beard and mustache were strongly correlated. We also know criminals sometimes wear fake beards. Interracial marriages were uncommon in 1964; interracial unmarried couples were more common. We know the Collins couple was married, but we don’t know if the perpetrators of the crime were married. In fact, we don’t know that the perps were an interracial couple; we know only that the woman had dark blonde hair.

Such fine points knock that math teacher’s combined probability down an order of magnitude or so. Still, 1 in a million is compelling, no?

No. One in a million sets the probability that an individual man, selected at random near San Pedro, CA would have the attributes: bearded, black, blonde wife with ponytail, yellow car, etc. Millions of people lived near San Pedro in 1964. In a population of 3 million, roughly three people can be expected to be matches. That changes the probability that Mr. Collins was the perp from the stated 1-in-12-million to about 1 in 3. That’s quite a difference. As the CA Supreme Court noted, “this seems as indefensible as arguing for the conviction of X on the ground that a witness saw either X or X’s twin commit the crime.”

The term “prosecutor’s fallacy” appears often in discussion of both the OJ Simpson and Collins cases. While prosecutors commit many miscarriages of mathematics, different groups seem to each restrict the term to two different specific errors. One is the case we see in Collins – mistaking the probability of a random match for the probability that a defendant is not guilty.

Another is a misunderstanding of conditional probability. I.e., it is believing that P(A|B) = P(B|A), where “P(A|B)” means the probability of A given B. This can generally be understood to mean the probability of A given that B has already occurred (though technically speaking, chronological sequence is not required), with no requirement of independence between A and B.

This particular fallacy actually seems rather rare in court, unlike mistaking random-match probability for probability of guilt (a staple of DNA cases). But the conditional-probability error is rampant in medicine.

The relevant mathematical rule here is Bayes’ Theorem:

P(A|B) = P(B|A) * P(A) / P(B)

In medicine, this is handy when we seek to know the probability that the patient has the disease, given a positive test result. If we label the hypothesis (patient has disease) as H and the test data as D, the useful form of Bayes’ Theorem is

P(H|D) = P(D|H) P(H) / P(D)    where P(D) is the total probability of positive results, e.g.,

P(D) = P(D|H) * P(H) + P(D | not H) * P(not H)

In medicine, values of the terms of this last equation are usually known. In a 1982 survey later made famous by Daniel Kahneman, David Eddy asked physicians to estimate the probability that a patient had the disease given the following information:

• rate of disease in population: 1%
• false negative rate for test: 20%
• false positive rate: 10%

In Eddy’s survey, most physicians’ estimates for the probability of the disease in a patient were off by a factor of ten – ten times too high. Making this error in medical practice costs patients a lot of unnecessary worry.

Details (where “H” means Hypothesis, i.e. the disease; and “D” means Data, i.e. positive test):

• P(H) is the disease frequency = 0.01   [ observed frequency in population ]
• P( not H) is 1 – P(H) = 0.99
• P(D|H) = 80% = 0.8   [ i.e., false negative rate = 20% ]
• P(D | not H) = 10% = 0.1   [ false positive rate ]

Substituting:

P(D) = .8 * .01 + .99 * .1 = 0.107   [ total probability of a positive test ]

P(H|D) = .8 * .01 / .107 = .075 ≈ 8%   [ probability that patient has disease, give the test result ]

A frequency tree might be useful for visualizing:

For this hypothetical case, most physicians’ estimates were around 75%, just below the true-positive rate. We shouldn’t expect MDs to calculate the 8% in their heads, but they should know that for low-frequency diseases, the probability of disease given a positive test is much lower than the probability of a positive test given the disease (8% vs. 80% in the above example).

Eddy speculated that the physicians’ errors result from some combination of misunderstood conditional probability (equating P(A|B) with P(B|A) and what Kahneman called base-rate neglect. I.e., the value we want to calculate is highly sensitive to the base rate of the disease in a population. We shouldn’t expect the “priors” (conditional probability based on personal beliefs) of MDs to always equal the base rate, but base rate must play a role.

Viewed algebraically, the equation for P(H|D) takes the form, P = x / (x + y). In it, the value (x / (x + y)) swings wildly when x varies against y.

As I mentioned above, I see many more examples of conditional-probability confusion in medicine than in law. Your experience may differ. I’d enjoy hearing about such court cases if you’re aware of any.

A case often cited as an example is that of OJ Simpson. Both prosecutors and defenders abused arithmetic, but I don’t think their errors/misrepresentations fit conditional-probability confusion per-se. One that comes close was when Alan Dershowitz argued that slapping is in fact not a prelude to murder. Only one in 2500 women subjected to spousal abuse end up being murdered, the defense said. This implied a probability of Simpson’s guilt from the numbers alone of 0.04%.

Of course the relevant number is not that one, but the probability that a woman who was battered by her husband and was also murdered was murdered by her husband. So, technically speaking, the error is not confusing P(A|B) with P(B|A) but instead confusing P(A|B) with P(A|B and C).

P(husband killed wife | husband battered wife) = 0.04%   vs.
P(husband killed wife | husband battered wife AND wife was killed)

In his coverage of Simpson, Gerd Gigerenzer assigns a probability of about 90% to the latter, noting that, since other evidence existed, the 90% figure should not be interpreted to be the probability of Simpson’s guilt. Gigerenzer concludes that, despite Dershowitz’s claim, battery is a strong predictor of guilt when a wife is murdered.

In The Drunkard’s Walk, Leonard Mlodinow suggests Dershowitz knew he was pulling a fast one. He quotes The Best Defense, where Dershowitz says:

“The courtroom oath—‘to tell the truth, the whole truth and nothing but the truth’—is applicable only to witnesses”

How this reconciles with California Rule of Professional Conduct 5-200 is beyond me. It certainly seems to “mislead the jury by an artifice.” As did the prosecutor’s statements in People v Collins when he told the jury that his “new math approach” to law was superior to the “hackneyed, stereotyped [and] trite” concept of beyond-reasonable-doubt. This prosecutor admitted that “on some rare occasion … an innocent person may be convicted.” But he said without taking that risk, “life would be intolerable because … there would be immunity for the Collinses, for people who chose [to] push old ladies down and take their money and be immune because how could we ever be sure they are the ones who did it?”

The prosecutor’s mastery of fundamentals of probability and logic brings to mind a question posed by John Allen Paulos in his Innumeracy: Mathematical Illiteracy and Its Consequences. Calling Dershowitz’s on his “astonishingly irrelevant” numerical contribution, Paulos charitably added that inability to deal with chance “plagues far too many otherwise knowledgeable citizens.”

I wonder if, in any other aspect of professional life, customers tolerate incompetence in their suppliers because the customers themselves lack the relevant skill. We have zero tolerance for doctors’ and lawyers’ lack of knowledge of legal codes and scalpel technique. Do we perhaps extend society’s perverse pride in being bad at math to those whose work also depends on math knowledge?

My sister, a math teacher, on hearing another educated guest at a party say he never really learned math, replied that she understood perfectly. With a straight face she told him that she had never really learned to read. He looked at her like she was a cluck. The prosecution rests.

# My Trouble with Bayes

In past consulting work I’ve wrestled with subjective probability values derived from expert opinion. Subjective probability is an interpretation of probability based on a degree of belief (i.e., hypothetical willingness to bet on a position) as opposed a value derived from measured frequencies of occurrences (related posts: Belief in Probability, More Philosophy for Engineers). Subjective probability is of interest when failure data is sparse or nonexistent, as was the data on catastrophic loss of a space shuttle due to seal failure. Bayesianism is one form of inductive logic aimed at refining subjective beliefs based on Bayes Theorem and the idea of rational coherence of beliefs. A NASA handbook explains Bayesian inference as the process of obtaining a conclusion based on evidence,  “Information about a hypothesis beyond the observable empirical data about that hypothesis is included in the inference.” Easier said than done, for reasons listed below.

Bayes Theorem itself is uncontroversial. It is a mathematical expression relating the probability of A given that B is true to the probability of B given that A is true and the individual probabilities of A and B:

P(A|B) = P(B|A) x P(A) / P(B)

If we’re trying to confirm a hypothesis (H) based on evidence (E), we can substitute H and E for A and B:

P(H|E) = P(E|H) x P(H) / P(E)

To be rationally coherent, you’re not allowed to believe the probability of heads to be 0.6 while believing the probability of tails to be 0.5; the sum of chances of all possible outcomes must equal exactly one. Further, for Bayesians, the logical coherence just mentioned (i.e., avoidance of Dutch book arguments) must hold across time (synchronic coherence) such that once new evidence E on a hypothesis H is found, your believed probability for H given E should equal your prior conditional probability for H given E.

Plenty of good sources explain Bayesian epistemology and practice far better than I could do here. Bayesianism is controversial in science and engineering circles, for some good reasons. Bayesianism’s critics refer to it as a religion. This is unfair. Bayesianism is, however, like most religions, a belief system. My concern for this post is the problems with Bayesianism that I personally encounter in risk analyses. Adherents might rightly claim that problems I encounter with Bayes stem from poor implementation rather than from flaws in the underlying program. Good horse, bad jockey? Perhaps.

Problem 1. Subjectively objective
Bayesianism is an interesting mix of subjectivity and objectivity. It imposes no constraints on the subject of belief and very few constraints on the prior probability values. Hypothesis confirmation, for a Bayesian, is inherently quantitative, but initial hypotheses probabilities and the evaluation of evidence is purely subjective. For Bayesians, evidence E confirms or disconfirms hypothesis H only after we establish how probable H was in the first place. That is, we start with a prior probability for H. After the evidence, confirmation has occurred if the probability of H given E is higher than the prior probability of H, i.e., P(H|E) > P(H). Conversely, E disconfirms H when P(H|E) < P(H). These equations and their math leave business executives impressed with the rigor of objective calculation while directing their attention away from the subjectivity of both the hypothesis and its initial prior.

2. Rational formulation of the prior
Problem 2 follows from the above. Paranoid, crackpot hypotheses can still maintain perfect probabilistic coherence. Excluding crackpots, rational thinkers – more accurately, those with whom we agree – still may have an extremely difficult time distilling their beliefs, observations and observed facts of the world into a prior.

3. Conditionalization and old evidence
This is on everyone’s short list of problems with Bayes. In the simplest interpretation of Bayes, old evidence has zero confirming power. If evidence E was on the books long ago and it suddenly comes to light that H entails E, no change in the value of H follows. This seems odd – to most outsiders anyway. This problem gives rise to the game where we are expected to pretend we never knew about E and then judge how surprising (confirming) E would have been to H had we not know about it. As with the general matter of maintaining logical coherence required for the Bayesian program, it is extremely difficult to detach your knowledge of E from the rest of your knowing about the world. In engineering problem solving, discovering that H implies E is very common.

4. Equating increased probability with hypothesis confirmation.
My having once met Hillary Clinton arguably increases the probability that I may someday be her running mate; but few would agree that it is confirming evidence that I will do so. See Hempel’s raven paradox.

5. Stubborn stains in the priors
Bayesians, often citing success in the business of establishing and adjusting insurance premiums, report that the initial subjectivity (discussed in 1, above) fades away as evidence accumulates. They call this washing-out of priors. The frequentist might respond that with sufficient evidence your belief becomes irrelevant. With historical data (i.e., abundant evidence) they can calculate P of an unwanted event in a frequentist way: P = 1-e to the power -RT, roughly, P=RT for small products of exposure time T and failure rate R (exponential distribution). When our ability to find new evidence is limited, i.e., for modeling unprecedented failures, the prior does not get washed out.

6. The catch-all hypothesis
The denominator of Bayes Theorem, P(E), in practice, must be calculated as the sum of the probability of the evidence given the hypothesis plus the probability of the evidence given not the hypothesis:

P(E) = [P(E|H) x p(H)] + [P(E|~H) x P(~H)]

But ~H (“not H”) is not itself a valid hypothesis. It is a family of hypotheses likely containing what Donald Rumsfeld famously called unknown unknowns. Thus calculating the denominator P(E) forces you to pretend you’ve considered all contributors to ~H. So Bayesians can be lured into a state of false choice. The famous example of such a false choice in the history of science is Newton’s particle theory of light vs. Huygens’ wave theory of light. Hint: they are both wrong.

7. Deference to the loudmouth
This problem is related to no. 1 above, but has a much more corporate, organizational component. It can’t be blamed on Bayesianism but nevertheless plagues Bayesian implementations within teams. In the group formulation of any subjective probability, normal corporate dynamics govern the outcome. The most senior or deepest-voiced actor in the room drives all assignments of subjective probability. Social influence rules and the wisdom of the crowd succumbs to a consensus building exercise, precisely where consensus is unwanted. Seidenfeld, Kadane and Schervish begin “On the Shared Preferences of Two Bayesian Decision Makers” with the scholarly observation that an outstanding challenge for Bayesian decision theory is to extend its norms of rationality from individuals to groups. Their paper might have been illustrated with the famous photo of the exploding Challenger space shuttle. Bayesianism’s tolerance of subjective probabilities combined with organizational dynamics and the shyness of engineers can be a recipe for disaster of the Challenger sort.

All opinions welcome.

# Galileo, Cantor and the Countably Infinite

I recently found my high school algebra book from the classic Dolciani series. In Chapter 1’s exercises, I stumbled upon this innocent question: Determine whether there exists a one-to-one correspondence between the two sets {natural numbers} and {even natural numbers}. At the end of chapter 1 is a short biography of Georg Cantor (d. 1918), crediting him with inventing set theory, an approach toward dealing with the concept of infinity.

I’m going out on a limb here. I’m not a mathematician. I understand that Cantor is generally accepted as being right about infinity and countable sets in the math world; but I think I think his work on on one-to-one correspondence and the countability of infinite sets is flawed.

First, let’s get back to my high school algebra problem. The answer given is that yes, a one-to-one correspondence does exist between natural number and even numbers, and thus they have the same number of elements. The evidence is that the sets can be paired as shown below:

1 <—> 2
2 <—> 4
3 <—> 6

n <—> 2n

This seems a valid demonstration of one-to-one correspondence. In most of math – where deduction rules – a single case of confirming evidence is assumed to exclude all possibility of disconfirming evidence. But this infinity business is not math of that sort. It employs math and takes the general form of mathematical analysis; but some sleight of hand is surely at work. Cantor, in my view, indulged in something rather close to math, but also having a foot in philosophy, and perhaps several more feet (possibly of an infinite number of them) in language and psychology. One might call it multidisciplinary. Behold.

I can with equal validity show the two sets (natural numbers and even numbers) not to have a one-to-one correspondence but a two-to-one correspondence. I do this with the following pairing. Set 1 on the left is the natural numbers. Set 2 on the right is the even numbers:

1      unpaired
2 <—> 2
3      unpaired
4 <—> 4
5      unpaired

2n -1      unpaired
2n <—> 2n

By removing all the unpaired (odd) elements from the set 1, I pair each  remaining member of set 1 with each element of set 2. It seems arguable that if a one to one correspondence exists between part of set one and all of set two, the two whole sets cannot support a one-to-one correspondence. By inspection, the set of even numbers is included within the set of natural numbers and obviously not coextensive with it. Therefore Cantor’s argument, based solely on correspondence, works only by promoting one fact – pairing of terms – while ignoring an equally obvious fact, the matter of inclusion.  Against my argument Cantor seems to dismiss the obvious difficulty by making a sort of mystery-of-faith argument – his concept of infinity entails that a set and a proper subset of it can be the same size.

Let’s dig a bit deeper. First, Cantor’s usage of the one-to-one concept (often called bijection) is heavy handed. It requires that such correspondence be established by starting with sets having their members placed in increasing order. Then it requires the first members of each set to be paired with one another, and so on. There is nothing particularly natural about this way of doing things; Cantor devised it to suit his needs. It got him into enough logical difficulty that he had to devise the concepts of cardinality and ordinality, with problematic definitions. Gottlob Frege and Bertrand Russell had to patch up his definitions. The notion of equipollent sets fell out of this work, along with complications addressed by mental heavy lifters like von Neumann and Tarski, which are out of scope here. Finally, it seems to me that Cantor implies – but fails to state outright – that the existence of a simultaneous two-to-one correspondence (i.e., group each n and n+1 in set 1 with each 2n in set 2 to get a two-to-one correspondence between the two sets) does no damage to the claims that one-to-one correspondence between the two sets makes them equal in size. In other words, Cantor helped himself to an unnaturally restrictive interpretation (i.e., a matter of language) of one-to-one correspondence – one that favored his agenda. Cantor slips a broader meaning of equality on us than the strict numerical equality that math grew up with. Further, his usage of the term – and concept of – “size” requires a special definition.

Cantor’s rule set for the pairing of terms and his special definitions are perfectly valid axioms for mathematical system, but there is nothing within mathematics that justifies these axioms. Believing that the consequences of a system or theory justify its postulates is exactly the same as believing that the usefulness of Euclidean geometry justifies Euclid’s fifth postulate. Euclid knew this wasn’t so, and Proclus tells us Euclid wasn’t alone in that view.

Galileo, who, like Cantor, hurled some heavy-handed arguments when he was in a jam, seems to have had a more grounded sense of the infinite than Cantor. For Galileo, the concrete concept of equality, even when dressed up in fancy clothes like equipollence, does not reconcile with the abstract concept of infinity. Galileo thought concepts like similarity, countability, size and equality just don’t apply to the infinite. By the time of Leibnitz and Newton, infinity had earned a place in math, but as something that could be only approached, but not reached, equaled, measured or compared.

Cantor’s model of infinity may be interesting and useful, but it is a shame that’s it’s taught and reported as fact, e.g., “infinity comes in infinitely many different sizes – a fact discovered by Georg Cantor” (Science News, Jan 8, 2008).

The under-celebrated WVO Quine comes to mind as bearing on this topic. Quine argued that the distinction between analytic and synthetic statements was  false, and that no claim should be immune to empirical falsification. Armed with that idea, I’ll offer that Cantor’s math is subject to scientific examination. Since confirming evidence is always weaker than disconfirming evidence (i.e., Popperian falsifiability) I’d argue the demonstration of inequality of the sets of natural and even numbers (inclusion of one within the other) trumps the demonstration of equal size by correspondence.

Mathematicians who state the equal-size concept as a fact discovered by Cantor have overstepped the boundaries of their discipline. Galileo regarded the natural-even set problem as a true paradox. I agree. Did Cantor resolve this paradox, or did he merely conceal it with language?

# Belief in Probability – Part 2

Last time I started with my friend Willie’s bold claim that he doesn’t believe in probability; then I gave a short history of probability. I observed that defining probability is a controversial matter, split between objective and subjective interpretations. About the only thing these interpretations agree on is that probability values range from zero to one, where P = 1 means certainty. When you learn probability and statistics in school, you are getting the frequentist interpretation, which is considered objective. Frequentism relies on directly equating observed frequencies with probabilities. In this model, the probability of an event exactly equals the limit of the relative frequency of that outcome in an infinitely large number of trials.

The problem with this interpretation in practice – in medicine, engineering, and gambling machines – isn’t merely the impossibility of an infinite number of trials. A few million trials might be enough. Running trials works for dice but not for earthquakes and space shuttles. It also has problems with things like cancer, where plenty of frequency data exists. Frequentism requires placing an individual specimen into a relevant population or reference class. Doing this is easy for dice, harder for humans. A study says that as a white males of my age I face a 7% probability of having a stroke in the next 10 years. That’s based on my membership in the reference class of white males. If I restrict that set to white men who don’t smoke, it drops to 4%. If I account for good systolic blood pressure, no family history of atrial fibrillation or ventricular hypertrophy, it drops another percent or so.

Ultimately, if I limit my population to a set of one (just me) and apply the belief that every effect has a cause (i.e., some real-world chunk of blockage causes an artery to rupture), you can conclude that my probability of having a stroke can only be one of two values – zero or one.

Frequentism, as seen by its opponents, too closely ties probabilities to observed frequencies. They note that the limit-of-relative-frequency concept relies on induction, which might mean it’s not so objective after all. Further, those frequencies are unknowable in many real-world cases. Still further, finding an individual’s correct reference class is messy and subjective. Finally, no frequency data exists for earthquakes that haven’t happened yet. Every one is unique. All that seems to do some real damage to frequentism’s utility score.

The subjective interpretations of probability offers fixes to some of frequentism’s problems. The most common subjective interpretation is Bayesianism, which itself comes in several flavors. All subjective interpretations see probability as a degree of belief in a specific outcome, as held by a rational person. Think of it as a fair bet with odds. The odds you’re willing to accept for a bet on your race horse exactly equals your degree of belief in that horse’s ability to win. If your filly were in the same race an infinite number of times, you’d expect to break even, based on those odds, whether you bet on her or against her.

Subjective interpretations rely on logical coherence and belief. The core of Bayesianism, for example, is that beliefs must 1) originate with a numerical probability estimate, 2) adhere to the rules of probability calculation, and 3) follow an exact rule for updating belief estimates based on new evidence. The second rule deals with the common core of probability math used in all interpretations. These include things like how to add and multiply probabilities and Bayes theorem, not to be confused with Bayesianism the belief system. Bayes theorem is an uncontroversial equation relating the probability of A given B to the probability of A and the probability of B. The third rule of Bayesianism is similarly computational, addressing how belief is updated after new evidence. The details aren’t needed here. Note that while Bayesianism is generally considered subjective, it is still computationally exacting.

The obvious problem with all subjective interpretations, particularly as applied to engineering problems, is that they rely, at least initially, on expert opinion. Life and death rides on the choice of experts and the value of their opinions. As Richard Feynman noted in his minority report on the Challenger, official rank plays too large a part in the choice of experts, and the higher (and less technical) the rank, the more optimistic the probability estimates.

The engineering risk analysis technique most consistent with the frequentist (objective) interpretation of probability is fault tree analysis. Other risk analysis techniques, some embodied in mature software products, are based on Bayesian (subjective) philosophy.

When Willie said he didn’t believe in probability, he may have meant several things. I’ll try to track him down and ask him; but I doubt the incident stuck in his mind as it did mine. If he meant that he doesn’t believe that probability was useful in system design, he had a rational belief – but one with which I strongly disagree. I doubt he meant that though.

Willie was likely leaning toward the ties between probability and redundancy in system design. Probability is the calculus by which redundancy is allocated to redundant systems. Willie may think that redundancy doesn’t yield the expected increase in safety because having more equipment means more things than can fail. This argument fails to face that, ideally speaking, a redundant path does double the chance having a component failure, but squares the probability of system failure. That’s a good thing, since squaring a number less than one makes it smaller. In other words, the benefit in reducing the chance of system failure vastly exceeds the deficit of having more components to repair. If that was his point, I disagree in principle, but accept that redundancy doesn’t eliminate the need for component design excellence.

He may also think system designers can be overly confident of the exponential increase in modeled probability of system reliability that stems from redundancy. That increase in reliability is only valid if the redundancy creates no common-cause or cascading failures, and no truly latent (undetected for unknown time intervals) failures of  redundant paths that aren’t currently operating. If that’s his point, then we agree completely. This is an area where pairing the experience and design expertise of someone like Willie with rigorous risk analysis using fault trees yields great systems.

Unlike Willie, Challenger-era NASA gave no official statement on its belief in probability. Feynman’s report points to NASA’s use of numeric probabilities for specific component failure modes. The Rogers Commission report says that NASA management talked about degrees of probability. From this we might guess that NASA believed in probability and its use in measuring risk. On the other hand, the Rogers Commission report also gives examples of NASA’s disbelief in probability’s usefulness. For example, the report’s Technical Management section states that, “NASA has rejected the use of probability on the basis that such techniques are insufficient to assure that adequate safety margins can be applied to protect the lives of the crew.”

Regardless of NASA’s beliefs about probability, it’s clear that NASA didn’t use fault tree analysis for the space shuttle program prior to the Challenger disaster. Nor did it use Bayesian inference methods, any hybrid probability model, or any consideration of probability beyond opinions about failures of  critical items. Feynman was livid about this. A Bayesian (subjective, but computational) approach would have at least forced NASA to make its subjective judgments explicit and would have produced a rational model of its beliefs. Post-Challenger Bayesian analyses, including one by NASA, varied widely, but all indicated unacceptable risk. NASA has since adopted risk management approaches more consistent with those used in commercial aircraft design.

An obvious question arises when you think about using a frequentist model on nearly one-of-a-kind vehicles. How accurate can any frequency data be for something as infrequent as a shuttle flight? Accurate enough, in my view. If you see the shuttle as monolithic and indivisible, the data is too sparse; but not if you view it as a system of components, most of which, like o-ring seals, have close analogs in common use, having known failure rates.

The FAA mandated probabilistic risk analyses of the frequentist variety (effectively mandating fault trees) in 1968. Since then flying has become safe, by any measure. In no other endeavor has mankind made such an inherently dangerous activity so safe. Aviation safety progressed through many innovations, redundant systems being high on the list. Probability is the means by which you allocate redundancy. You can’t get great aircraft systems without designers like Willie. Nor can you get them without probability.

# Belief in Probability – Part 1

Years ago in a meeting on design of a complex, redundant system for a commercial jet, I referred to probabilities of various component failures. In front of this group of seasoned engineers, a highly respected, senior member of the team interjected, “I don’t believe in probability.”

His proclamation stopped me cold. My first thought was what kind a backward brute would say something like that, especially in the context of aircraft design. But Willie was no brute. In fact he is a legend in electro-hydro-mechanical system design circles; and he deserves that status. For decades, millions of fearless fliers have touched down on the runway, unaware that Willie’s expertise played a large part in their safe arrival. So what can we make of Willie’s stated disbelief in probability?

Friends and I have been discussing risk science a lot lately – diverse aspects of it including the Challenger disaster, pharmaceutical manufacture in China, and black swans in financial markets. Risk science relies on several different understandings of risk, which in turn rely on the concept of probability. So before getting to risk, I’m going to jot down some thoughts on probability. These thoughts involve no computation or equations, but they do shed some light on Willie’s mindset. First a bit of background.

Oddly, the meaning of the word probability involves philosophy much more than it does math, so Willie’s use of belief might be justified. People mean very different things when they say probability. The chance of rolling a 7 is conceptually very different from the chance of an earthquake in Missouri this year. Probability is hard to define accurately. A look at its history shows why.

Mathematical theories of probability only first appeared in the late 17th century. This is puzzling, since gambling had existed for thousands of years. Gambling was enough of a problem in the ancient world that the Egyptian pharaohs, Roman emperors and Achaemenid satraps outlawed it. Such legislation had little effect on the urge to deal the cards or roll the dice. Enforcement was sporadic and halfhearted. Yet gamblers failed to develop probability theories. Historian Ian Hacking  (The Emergence of Probability) observes, “Someone with only the most modest knowledge of probability mathematics could have won himself the whole of Gaul in a week.”

Why so much interest with so little understanding? In European and middle eastern history, it seems that neither Platonism (determinism derived from ideal forms) nor the Judeo/Christian/Islamic traditions (determinism through God’s will) had much sympathy for knowledge of chance. Chance was something to which knowledge could not apply. Chance meant uncertainty, and uncertainty was the absence of knowledge. Knowledge of chance didn’t seem to make sense.

The term probability is tied to the modern understanding of evidence. In medieval times, and well into the renaissance, probability literally referred to the level of authority –  typically tied to the nobility –  of a witness in a court case. A probable opinion was one given by a reputable witness. So a testimony could be highly probable but very incorrect, even false.

Through empiricism, central to the scientific method, the notion of diagnosis (inference of a condition from key indicators) emerged in the 17th century. Diagnosis allowed nature to be the reputable authority, rather than a person of status. For example, the symptom of skin-spots could testify, with various degrees of probability, that measles had caused it. This goes back to the notion of induction and inference from the best explanation of evidence, which I discussed in a post on The Multidisciplinarian blog. Pascal, Fermat and Huygens brought probability into the respectable world of science.

But outside of science, probability and statistics still remained second class citizens right up to the 20th century. You used these tools when you didn’t have an exact set of accurate facts. Recognition of the predictive value of probability and statistics finally emerged when governments realized that death records had uses beyond preserving history, and when insurance companies figured out how to price premiums competitively.

Also around the turn of  the 20th century, it became clear that in many realms – thermodynamics and quantum mechanics for example – probability would take center stage against determinism. Scientists began to see that some – perhaps most – aspects of reality were fundamentally probabilistic in nature, not deterministic. This was a tough pill for many to swallow, even Albert Einstein. Einstein famously argued with Niels Bohr, saying, “God does not play dice.” Einstein believed that some hidden variable would eventually emerge to explain why one of two identical atoms would decay while the other did not. A century later Bohr is still winning that argument.

What we mean when we say probability today may seem uncontroversial – until you stake lives on it. Then it gets weird, and definitions become important. Defining probability is a wickedly contentious matter, because wildly conflicting conceptions of probability exist.  They can be roughly divided into the objective and subjective interpretations. In the next post I’ll focus on the frequentist interpretation, which is objective, and the subjectivist interpretations as a group. I’ll look at the impact of accepting – or believing in – each of these on the design of things like airliners and space shuttles from the perspectives of my pal Willie, Richard Feynman, and NASA. Then I’ll defend my own views on when and where to hold various beliefs about probability.

Autobrake diagram courtesy of Biggles Software.

# Is Fault Tree Analysis Deductive?

An odd myth persists in systems engineering and risk analysis circles. Fault tree analysis (FTA), and sometimes fault trees themselves, are said to be deductive. FMEAs are called inductive. How can this be?

By fault trees I mean Boolean logic modeling of unwanted system states by logical decomposition of equipment fault states into combinations of failure states of more basic components. You can read more on fault tree analysis and its deductive nature at Wikipedia. By FMEA (Failure Mode & Effects Analysis) I mean recording all the things that can go wrong with the components of a system. Writers who find fault trees deductive also find FMEAs, their complement, to be inductive. I’ll argue here that building fault trees is not a deductive process, and that there is possible harm in saying so. Secondarily, I’ll offer that while FMEA creation involves inductive reasoning, the point carries little weight, since the rest of engineering is inductive reasoning too.

Word meanings can vary with context; but use of the term deductive is consistent across math, science, law, and philosophy. Deduction is the process of drawing a logically certain conclusion about a particular instance from a rule or premise about the general. Assuming all men are mortal, if Socrates is a man, then he is mortal. This is true regardless of the meaning of the word mortal. It’s truth is certain, even if Socrates never existed, and even if you take mortal to mean living forever.

Example from a software development website:

FMECA is an inductive analysis of system failure, starting with the presumed failure of a component and analyzing its effect on system stability: “What will happen if valve A sticks open?” In contrast, FTA is a deductive analysis, starting with potential or actual failures and deducing what might have caused them: “What could cause a deadlock in the application?”

The well-intended writer says we deduce the causes of the effects in question. Deduction is not up to that task. When we infer causes from observed effects, we are using induction, not deduction.

How did the odd claims that fault trees and FTAs are deductive arise? It might trace to William Vesely, NASA’s original fault tree proponent. Vesely sometimes used the term deductive in his introductions to fault trees. If he meant that the process of reducing fault trees into cut sets (sets of basic events or initiators) is deductive, he was obviously correct. But calculation isn’t the critical aspect of fault trees; constructing them is where the effort and need for diligence lie. Fault tree software does the math. If Vesely saw the critical process of constructing fault trees and supplying them with numerical data (often arduous, regardless of software) as deductive – which I doubt – he was certainly wrong.

Inductive reasoning, as used in science, logic and philosophy, means inferring general rules or laws from observations of particular instances. The special use of the term math induction actually refers to deduction, as mathematicians are well aware. Math induction is deductive reasoning with a confusing title. Induction in science and engineering stems from our need to predict future events. We form theories about how things will behave in the future based on observations of how similar things behaved in the past. As I discussed regarding Bacon vs. Descartes, science is forced into the realm of induction because deduction never makes contact with the physical world – it lives in the mind.

Inductive reasoning is exactly what goes on when you construct a fault tree. You are making inferences about future conditions based on modeling and historical data – a purely inductive process. The fact that you use math to solve fault trees does not make fault trees any more deductive than the presence of math in lab experiments makes empirical science deductive.

Does this matter?

It’s easy enough to fix this technical point in descriptions fault tree analysis. We should do so, if merely to avoid confusing students. But more importantly, quantitative risk analysis – including FTA – has its enemies. They range from several top consultancies selling subjective, risk-score matrix methodologies dressed up in fancy clothes (see Tony Cox’s SIRA presentation on this topic) to some of NASA’s top management – those flogged by Richard Feynman in his minority report on the Challenger disaster. The various criticisms of fault tree analysis say it is too analytical and correlates poorly with the real world. Sound familiar? It echoes a feud between the heirs of Bacon (induction) and the heirs of Descartes (deduction). Some of fault trees’ foes find them overly deductive. They then imply that errors found in past quantitative analyses impugn objectivity itself, preferring subjective analyses based on expert opinion. This curious conclusion would not follow, even if fault tree analyses were deductive, which they are not.

.
——————————————

Science is the belief in the ignorance of experts. – Richard Feynman

.
.

# Intuitive Probabilities – Conjunction Malfunction

In a recent post I wrote about Vic, who might not look like a Christian, but probably is one. The Vic example reminded me of a famous study of unintuitive probabilities done in 1983. Amos Tversky and Daniel Kahneman surveyed students at the University of British Columbia using something similar to my Vic puzzle:

Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Which is more probable?

A.    Linda is a bank teller.
B.    Linda is a bank teller and is active in the feminist movement.

About 90% of students said (B) was more probable. Mathematicians point out that, without needing to know anything about Linda, (A) has to be more probable than (B). Thinking otherwise is the conjunction fallacy. It’s simple arithmetic. The probability of a conjunction, P(A&B), cannot exceed the probabilities of its constituents, P(A) and P(B), because the extension (possibility set) of the conjunction is included in the extension of its constituents. In a coin toss, the probability of heads has to exceed the probability of heads AND that it will rain today.

Putting numbers to Linda, one might guess there’s 1% probability that Linda, based on the description given, is a bank teller, but a 99% probability that she’s a feminist. Even so, 1% is still a bigger number (probability) than 1% AND 99%, which means 1% times 99% – which is a tad less than 1%.

So why does it seem like (B) is more likely? Lots of psychological and semantic reasons have been proposed. For example, in normal communications, we usually obey some unspoken principle of relevance; a sane person would not mention Linda’s marital status, political views and values if they were irrelevant to the question at hand – which somehow seems to have something to do with Linda’s profession. Further, humans learn pattern recognition and apply heuristics. It may be a fair bit of inductive reasoning based on past evidence that women active in the feminist movement are more likely than those who are not to major in philosophy, be single, and be concerned with discrimination. This may be a reasonable inference, or it may just prove you’re a sexist pig for even thinking such a thing. I attended a lecture at UC Berkeley where I was told that any statement by men that connects attributes (physical, ideological or otherwise) to any group (except white men) constituted sexism, racism or some otherism. This made me wonder how feminists are able to recognize other feminists.

In any case, there are reasons that student would not give the mathematically correct answer about Linda beyond the possibility that they are mathematically illiterate. Tversky and Kahneman tried various wordings of the problem, pretty much getting the same results. At some point they came up with this statement of the problem that seems to drive home the point that they were seeking a mathematical interpretation of the problem:

Argument 1: Linda is more likely to be a bank teller than she is to be a feminist bank teller, because every feminist bank teller is a bank teller, but some bank tellers are not feminists, and Linda could be one of them.

Argument 2: Linda is more likely to be a feminists bank  teller than she is likely to be a bank teller, because she resembles an active feminist more than she resembles a bank teller.

In this case 65% of students chose the extension argument (2), despite its internal logical flaw. Note that argument 1 explains why the conjunction fallacy is invalid and that argument 2 doesn’t really make much sense.

Whatever the reason we tend to botch such probability challenges, there are cases in engineering that are surprisingly analogous to the Linda problem. For example, when building a fault tree (see fig. 1), your heuristics can make you miss event dependencies and common causes between related failures. For example, if an aircraft hydraulic brake system accumulator fails by exploding instead of by leaking, and in doing so severs a hydraulic line, an “AND” relationship disappears so that what appeared to be P(A&B) becomes simply P(A). Such logic errors can make calculations of probability of catastrophe off by factors of thousands or millions. This is bad, when lives are at stake. Fortunately, engineers apply great skill and discipline to modeling this sort of thing. We who fly owe our lives to good engineers. Linda probably does too.

– – –

Fig. 1. Segment of a fault tree for loss of braking in a hypothetical 8-wheeled aircraft using FTA software I authored in 1997. This fault tree addresses only a single Class IV hazard in aircraft braking – uncontrolled departure from the end of the runway due to loss of braking during a rejected takeoff. It calculates the probability of this “top event” as being more remote than the one-per-billion flight hours probability limit specified by the guidelines of FAA Advisory Circular 25.1309-1A, 14CFR/CS 25.1309, and SAE ARP4754. This fault tree, when simplified by standard techniques, results in about 200,000 unique cut sets – combinations of basic events leading to the catastrophic condition.

– – –

Uncertainty is an unavoidable aspect of the human condition- Opening sentence of “Extensional Versus Intuitive Reasoning” by Tversky and Kahneman, Oct. 1983 Psychological Review.