# Fault Tree Construction Basics

Last week I introduced fault trees, giving a hint at what they’re good for, and showing some fault-tree diagram basics. Today I’ll focus on the mechanics of building one.

The image below, showing two different diagrams for the exact same logical fault tree, serves as a quick review. The top event, “A,” has two 2nd-level events, “B” and “C,” having two and three child events respectively. An OR gate is associated with the top event and event B, while an AND gate is associated with event C. Events 1 through 5 are basic events, which are initiators. They are literally “bottom events” though we don’t usually use that term.

The difference between the top and bottom renderings below is just a matter of formatting. The bottom rendering looks much nicer in diagrams when we replace “A,” “B,” “2,” etc. with descriptive text. Here I’ll use whichever convention fits best on the page. It’s important to realize that in the bottom style, the logic gate goes with (sticks to) the event above it.

Last time I mentioned that building a tree is a top-down process in the sense that you start with the top event of the tree. Since fault trees can have only one top event but a large number of bottom events (initiators), the analogy with living trees is weak. An organizational chart might be a closer analogy, but even that isn’t accurate, since fault trees can contain the same initiator events in multiple places. I’ll show how this can happen later.

We typically get the top event of a tree from a hazard assessment where an unwanted outcome has been deemed critical – something like an aircraft hitting the end of a runway at speeds over 50 mph due to a brake system fault.

From the top event, we identify its high-level contributors. In the case of the aircraft brakes example, the redundancy designed into the system may make descriptions of these conditions a bit wordy. For example, consider a dual-hydraulic-brake design with two systems, each feeding half the brakes, with eight brakes total. In that system, one equipment state causing the top event (there are several) would be complete loss of hydraulic system number 1 plus sufficient additional failures to render braking inadequate. Those additional failures could include, for example, the complete failure of hydraulic system number 2 OR mechanical failures of one or more of the system 2 brakes OR loss of the command signal to the system two brakes, and a few others.

That is an example of one of the possible causes (i.e., one of the 2nd level events) of faults that could cause the loss of braking specified in the top event. There may be five or ten others, any of which could produce the hazardous state. The word “any” in the previous sentence tells you the relationship between the top event and this collection of second-level contributors. It is an OR relationship, since any one of them would be sufficient to cause the hazard. An OR gate is therefore tied directly to the top event.

For the first 2nd-level intermediate event of the aircraft brake system example above, we would carefully come up with a name for this fault; we can refine it later. Something like “Loss of hydraulic system #1 plus additional failures” would be sufficient. Note that a good name for this event gives a clue about the logic gate associated with it. The word “plus” suggests that the gate for this event will be an AND gate.

A more accurate description of the event – perhaps too wordy – would be “Loss of hydraulic system #1 plus any of several different additional failures or combinations of failures.” This tells us that this event, as we’re modeling it, will have two children, which we can label:

1. Loss of hydraulic system 1 brake hydraulic power to brakes
2. Additional failures leading to loss of braking (meaning loss of braking sufficient to result in the top event when combined with loss of system 1 hydraulic power to brakes)

Without knowing anything else about the system and its operation at this point (details we would get from system schematics and operating manuals) we can’t really specify the gate associated with the first of the two events listed above. If hydraulic system #1 is itself redundant, it might be an AND gate, otherwise an OR. We can be infer that event #2 above has an OR gate beneath it, since several combinations of additional failures might be sufficient to render the whole system ineffective.

Here’s a diagram for what we’ve modeled so far on the brake system failure. This diagram, of the bottom style in the above image, also includes small tags below each event description (above its gate) containing event IDs used by fault tree software. Ignore these for now.

To help understand the style of thinking involved in modeling systems, whether physical, like a brake system, or or procedural, like complex surgery, compare what we’ve done so far, for aircraft brakes, with the brake system of your car. For modeling purposes, let’s ignore the parking brake for now.

Your car’s brake system also contains some redundancy, but it’s limited. If it’s a simple hydraulic brake system (most cars are more complex), it has two cylinders, one powering the front two brakes and one powering the rear. Both of these have to be in a failed state, for you to be without brakes for hydraulic reasons.

Notice I said “have to be in a failed state” and not “have to fail.” The likelihood of both failing independently during a trip is much lower than the probability that one failed some time ago without detection and one failed during the trip. Fault trees deal with both these cases, the latter involving a latent failure and a monitor with an indicator to report the otherwise latent failure. Of course, the monitor or the indicator might be in a failed state, again without your knowing it. Redundancy and monitors complicate things. We’ll get into that later.

Since both of your car’s brake cylinders must be failed for you to be without hydraulic power for braking, you might think that a fault tree for total loss of braking in your car would start with a top event having an AND gate. But your brake pedal probably isn’t redundant. If it falls off, you’re without brakes. Historical data on a large population of cars shows this event to have a very low probability. But we’ll include it for thoroughness. If we diagram the model we’ve developed for car brakes so far, we have this:

One thing immediately apparent in these two barely-begun fault trees, one for aircraft brakes, and one for car brakes, is that car brakes, as modeled, have a single-point failure leading to the top event, albeit an improbable one. The FAA guidance for design of aircraft systems specifies that no single failure regardless of probability should have catastrophic consequences. If we imagine a similar requirement for car design, we would have to add a brake subsystem with an independent actuator, like the separate hand or foot-controlled parking brake actuator in most cars. Have you tested yours lately?

Next time we’ll explore the bottom events of fault trees, the initiator events, and the topic of exposure times and monitors. Working with these  involves examination of failure probabilities and the time during which you’re exposed to failures, some of which – as with your parking brake – may be very long, dramatically increasing the probability that a latent failure has occurred, resulting in a loss of perceived redundancy.

– – –

In the San Francisco Bay area?

If you live near San Francisco, consider joining our new formed Risk Management meetup group.

Risk management has evolved separately in  various industries. This group aims to cross-pollinate, compare and contrast the methods and concepts of diverse areas of risk including enterprise risk (ERM), project risk, safety, product reliability, aerospace and nuclear, financial and credit risk, market, data and reputation risk.

This meetup seeks to build community among risk professionals – internal auditors and practitioners, external consultants, job seekers, and students – by providing forums and events that showcase current trends, case studies, and best practices in our profession with a focus on practical application and advancing the state of the art.

https://www.meetup.com/San-Francisco-Risk-Managers/