On the Use and Abuse of FMEAs

– William Storage, VP LiveSky, Inc.; Visiting Scholar, UC Berkeley History of Science

Analyzing about 80 deaths associated with the drug heparin in 2009, the FDA found that over-sulphated chondroitin with toxic effects had been intentionally substituted for a legitimate ingredient for economic reasons. That is, an unscrupulous supplier sold a counterfeit drug material costing 1% as much as the real thing; and it killed people.

This wasn’t unprecedented. Something similar happened with gentamicin in the late 1980s, with cefaclor in 1996, and again with DEG sold as glycerin in 2006.

Adulteration and toxic excipients are obvious failure modes of supply chains and operations for drug manufacturers. Presumably, the firms buying the adulterated raw material had conducted failure mode effects analyses at several levels. An early-stage FMEA would have seen the failure mode and assessed its effects, thereby triggering the creation of controls to prevent the failure. So what went wrong?

The FDA’s reports on the heparin incident did not make public any analyses done by the drug makers. But based on the “best practices” specified by standards bodies, consulting firms, and many risk managers, we can make a good guess. Their risk assessments were likely misguided, poorly executed, and impotent.

Abuse of FMEA - On Risk Of. Photo by Bill StoragePromoters of FMEAs – and of risk analysis in general – as any conference attendee on the topic can attest – regularly cite aerospace as a source for the basis of their product or initiative, and how to do things in matters of risk. Commercial aviation – as opposed to aerospace in general – should be the exemplar of risk management. In no other endeavor has mankind made such an inherently dangerous activity so safe as commercial jet flight.

While promoters of risk management of all sorts extol aviation, they tend to stray far from its methods, mindset, and values. This is certainly the case with the FMEA, a tool poorly understood, misapplied, poorly executed, and then blamed for failing to prevent catastrophe.

In the case of heparin, a properly performed FMEA exercise would certainly have identified the failure mode. But FMEA wasn’t really the right tool for identifying that hazard in the first place. A functional hazard anlysis (FHA) or Business Impact Analysis (BIA) would have highlighted chemical contamination leading to death of patients, supply disruption, and reputation damage as a top hazard in minutes. I know this for fact, because I use drug manufacture as an example when teaching classes on FHA. Day-one students identify that hazard without being coached.

FHAs can be done very early in the conceptual phase of a project or system design. They need no implementation details. They’re typically short and sweet, yielding concerns to address with high priority as a plan is taking form. Early writers on the topic of FMEA explicitly identified it as being directly opposed to FHA, for former being “bottom-up, the latter “top down,” NASA’s response to the USGS on the suitability of FMEAs their needs, for example, stressed this point. FMEAs rely on at least preliminary implementation details to be useful. And they produce a lot of essential but lower-value content (essential because FMEAs help confirm which failure modes can be de-prioritized) at the time of design or process conception.

So a common failure mode of risk management is using FMEAs for purposes other than those for which they were designed. More generally, equating FMEA with risk analysis and risk management is a failure mode of management.

Assuming we stop misusing FMEAs, we then face the hurdle of doing them well. This is a challenge, as the quality of training, guidance, and facilitation of FMEAs has degraded markedly over the past twenty years. FMEAs, as promoted by the PMI, ISO 31000, and APM PRAM, to name a few, bear little resemblance to those in aviation. I know this, from three decades of risk work in diverse industries, half of it in aerospace. You can see the differences by studying sample FMEAs on the web. I’ll give some specifics.

The inventors of the FMEA themselves acknowledged that FMEAs would need to be tailored for different domains. This was spelled out in the first version of MIL-P-1629 in 1949. But math, psychology, behavioral economics, and philosophy can all point out major flaws in the approach to FMEAs as commonly taught in most fields today. That is, the excuse that nuances of a specific industry turn bad analysis into good will not fly. The same laws of physics and economics apply to all industries.

I’m not sure how  FMEAs went so far astray. Some blame the explosion of enterprise risk management suppliers in the 1990s. ERM, partly rooted in the sound discipline of actuarial science, unfortunately took on many aspects of management fads of the period. It was up-sold by consultancies to their existing corporate clients, who assumed those consultancies actually had background in risk science, which they did not.  Studies a decade later by Protiviti and the EIU failed to show any impact on profit or other benefit of ERM initiatives, except for positive self-assessments by executives of the firms.

But bad FMEAs predated the ERM era. Adopted by the automotive industry in the 1970s, FMEAs seem to have been used to justify optimistic warranty claims estimates for accounting purposes. Few suspect that automotive engineers conspired to misrepresent reliability; but their rosy FMEAs indirectly supported bullish board presentations in struggling auto firms wracked by double-digit claims rates. While Toyota was implementing statistical process control to precisely predict the warranty cost of adverse tolerance accumulation, Detroit was pretending that multiplying ordinal scales of probability, severity, and detectability was mathematically or scientifically valid.

Citing inability to quantify failure rates of basic components and assemblies (an odd claim given the abundance of warranty and repair data), auto firms began to assign scores or ranks to failure modes rather than giving probability values between zero and one. This first appears in automotive conference proceedings around 1971. Lacking hard failure rates – if in fact they did – reliability workers could have estimated numeric probability values based on subjective experience or derived them from reliability handbooks then available. Instead they began to assign ranks or scores on a 1 to 10 scale.

In principle there is no difference between guessing a probability of 0.001 (a numerical probability value) and guessing a value of “1” on a 10 scale (either an ordinal number or a probability value mapped to a limited-range score); but in practice there is a big difference. I see this while doing risk assessments for clients. One difference is that those assigning probability scores in facilitated FMEA sessions usually use grossly different mental mapping processes to get from labels such as “extremely likely” or “moderately unlikely” to numerical probabilities. A physicist sees “likely” for a failure mode to mean more than once per million; a drug trial manager interprets it to mean more than 5%. Neither is wrong; the terms have different meanings in different domains. But if those two specialists aren’t alert to the issue, on jointly calling a failure likely, there will be an illusion of communication and agreement where none exists.

Further, FMEA participants don’t agree – and often don’t know they don’t agree – on the mapping of numerical probability values into 1-10 scores. Unless, of course, if they use an explicit mapping table to translate probabilities into probability scores. But if you have such a table, why use scores at all?

There’s a reason, and it’s a poor one. Probability scores (or sometimes worse, ranks) between 1 and 10 are needed to generate the Risk Priority Numbers (RPN), alluded-to above), made popular by the American automotive industry. You won’t find RPN or anything like it in aviation FMEAs – no arithmetic product of any measures of probability, severity and/or detectability. Probability values, for given failure modes in specific operational modes are however calculated on the basis of observed failure frequency distributions and exposure rates. RPN attempts to move in this direction, but fails miserably.

RPNs are defined as the arithmetic product of a probability score, a severity score, and a detection (more precisely, the inverse of detectability) score. The explicit thinking here is that risks can be prioritized on the basis of the product of three numbers, each ranging from 1 to 10.

The implicit – but critical, though never addressed by users of RPN – thinking here is that all engineers, businesses, regulators and consumers are risk-neutral. Risk neutrality, as conceived in portfolio choice theory, would in this context mean that everyone would be indifferent to two risks of the same RPN, even comprising very different probability and severity values.That is, an RPN formed from the values {2,8,4} would dictate the same risk response as failure modes with RPN scores {8,4,2} and {4,4,4} since the RPN values (product of the scores) are equal. In the real world this is never true. Often it is very far from true. Most people and businesses are not risk-neutral, they’re risk-averse. That changes things. As a trivial example, banks might have valid reasons for caring more about a single $100M loss than one hundred $1M losses.

Beyond the implicit assumption of risk-neutrality, RPN has other problems. As mentioned above, there are cognitive and group-dynamics issues when FMEA teams attempt to model probabilities as ranks or scores. Similar difficulties arise with scoring the cost of a loss, i.e., the severity component of RPN. Again there is the question of why, if you know the cost of a failure (in dollars, lives lost, or patients not cured) why convert a valid measurement into a subjective score (granting, for sake of argument, that risk-neutrality is justified)? Again the answer is to enter that score into the RPN calculation.

Still more problematic is the detectability value used in RPNs. In a non-trivial system or process, detectability and probability are not independent variables. And there is vagueness around the meaning of detectability. Is it the means by which you know the failure mode has happened, after the fact? Or is there an indication that the failure is about to happen, such that something can be observed thereby preventing the failure? If the former, detection is irrelevant to risk of failure, if the latter the detection should be operationalized in the model of the system. That is, if a monitor (e.g, brake fluid low) is in a system, the monitor is a component with its own failure modes and exposure times, which impact its probability of failure. This is how aviation risk analysis models such things.

A simple summary of the problems with scoring, ranking and RPN is that adding ambiguity to a calculation does not eliminate uncertainty about its parameters; it merely adds errors and reduces precision.

Another wrong turn has been the notion that a primary function of FMEAs is to establish cause of failures. Aviation found FMEAs to be ineffective for this purpose long ago. Reasoning back from observation to cause (a-posteriori logic) is tricky business and is often beyond the reach of facilitated FMEA sessions. This was one reason why supplier-FMEAs came to be. In defense of the “Cause” column on FMEA templates used in the automotive world, in relatively simple systems and components, causes are often entailed in failure modes (leakage caused by corrosion as opposed to leakage caused by stress fracture). In such cases cause may not be out of reach. But in the general case, more so in complex cases such as manufacturing process or military operations, seeking causes in FMEAs encourages leaping to regrettable conclusions. I’ll dig deeper into the problem of causes in FMEAs in a future post.

I’ve identified  several major differences between the approach to FMEAs used in aviation and those who claim to use methods based on aerospace. In addition to the reasons given above why I side with aviation on FMEA method, I’ll also note that we know the approach to risk used in aviation has reduced risk – by a factor of roughly a thousand, based on fatal accident rates since aviation risk methods were developed. I don’t think any other industry or domain can show similar success.

A partial summary of failure modes of common FMEA processes includes the following, based on the above discussion:

  • Confusing FMEA with Hazard Analysis
  • Equating FMEA with risk assessment
  • Viewing the FMEA as a Quality (QC) function
  • Insufficient rigor in establishing probability and severity values
  • Unwarranted (and implicit) assumption of risk-neutrality
  • Unsound quantification of risk (RPN)
  • Confusion about the role of detection
  • Using the FMEA as a root-cause analysis

The corrective action for most of these should be obvious, including steering clear of RPN, operationalizing detection methods, using numeric (non-ordinal) probability and cost values (even if estimated), instead of masking ignorance and uncertainty with ranking and scoring.  I’ll add more in a future post.

 – – –

Text and photos © 2016 by William Storage. All rights reserved.

One thought on “On the Use and Abuse of FMEAs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s