Hubbard and Seiersen on Cyber Risk

William Storage – Jan 3, 2017
VP, LiveSky, Inc.
Visiting Scholar, UC Berkeley Center for Science, Technology, Medicine & Society

Our intuitions about risk and probability are usually poor, despite our resolute belief  that we judge risk well. In their new book, How to Measure Anything in Cybersecurity Risk, Hubbard and Seiersen challenge their industry’s dogged reliance on ineffective methods to assess and manage cybersecurity risk – despite a lack of evidence that these methods have any value at all.  Indeed, as Tony Cox and others have noted, we have evidence that in some cases they are worse than useless; they do harm. In their view, cybersecurity has strayed from established, sound engineering risk methods and adopted the worst risk practices of project management and ERM. Poorly designed risk programs combined with increased exposure to cyber threats sets the stage for huge potential exposures, which might make the JP Morgan and Target attacks look tiny.

Ten years ago I bought Hubbard’s How to Measure Anything: Finding the Value of Intangibles in Business and found his pragmatic approach refreshing. The new book continues along that path, highly accessible and free of business jargon.  An empiricist, Hubbard subscribes to Rear Admiral Meyer’s policy of build a little; test a little; learn a lot. Hubbard’s “Rule of Five” is a great example of an underappreciated practical tool. Sample five of anything out of any population. There is a 93.75% chance [ 1 – 2x(1/2)^5 ] that the median of the entire population is between the maximum and minimum values of your sample. This assumes a continuous probability distribution, but does not require any other knowledge about the distribution. Cool stuff.

The authors fight nobly against ordinal scoring of risks, heat maps, RPN, and bad justification of soft analyses. Noting that ISO 31000 says the risk map is “strongly applicable for risk identification,” the authors argue that ISO gives zero evidence for strong applicability and that much evidence to the contrary is available. They give clear examples of Tony Cox’s point that risk matrices are ambiguity amplifiers.

They also touch on some of the psychological issues of scoring. They cite Cox’s work showing that arbitrarily reversing impact and probability scales (i.e, making “1” stand for high rather than low probability) in study groups changed the outcome of risk matrix exercises. That is, workers repeatably reach different conclusions about risk as a consequence of the scoring scheme used. I think I can even top that. I’ve seen cases in project management work where RPN was calculated using 10 = high for probability, 10 = high for severity, and 10 = high for detectability. See the problem here? If RPN is used to generate a ranked list of risks, severity (undesirable) and detectability (presumably desirable) both increase the value of RPN. Yes, I’m serious; and I have presenter’s slides from a respected risk consultancy to prove it. Users of that RPN-centric framework are so removed from the fundamentals of risk analysis that they failed to notice it was producing nonsense results.

Cybersecurity Risk says it is the first in a series. It serves as a great intro for those unfamiliar with risk quantification. I strongly recommend it to anyone needing to ramp up cyber risk analysis and those who need to unlearn the rubbish spread by PMI and some of the standards bodies.

I have a few quibbles. One deals with epistemology and semantics, several with methodology and approach. On the former, consider this statement from Chapter 2: “The definition of measurement itself is widely misunderstood. If one understands what ‘measurement’ actually means, a lot more things become measurable.”

How can a definition, in the lexicographical sense,  be misunderstood? I don’t think it can. The authors can claim to have a more useful definition, and can argue that its common use would better serve our needs. But words mean what their users intend them to mean; and it’s tough to argue that most people use a term wrong.

What Hubbard and Seiersen mean by measurement is different from what most people mean by it; and appealing to a “correct” definition will not persuade opponents that quantified estimates by humans have no important differences from measurements of the physical world performed with instruments.

I’m in full agreement that quantification of degrees of belief is useful and desirable, but we can’t sweep the distinction between quantified opinion and measurement by instruments under the carpet. We could even argue that it is the authors, not most people, who misunderstand the definition of measurement – or at least that they misuse the term – if they believe that measurement of human opinions about a thing is equal to the measurement of physical attributes of the thing.

On this topic, they write, “This conception of measurement might be new to many readers, but there are strong mathematical foundations – as well as practical reasons – for looking at measurement this way. A measurement is, ultimately, just information.” Indeed there are practical reasons for quantifying opinions about facts of the world. But there are no mathematical foundations for looking at measurement this way. There are mathematical means of ensuring that statements about human beliefs are collectively coherent and that quantified beliefs are rationally updated according to new evidence. That is, we  can avoid Dutch Book arguments using the axioms of probability (see Frank Ramsey on “Truth and Probability,” 1926), and we can ensure that updates of beliefs with new evidence are coherent across time via Bayesian inference rules. But mathematics is completely silent on the authors’ interpretation of measurement. Strictly speaking, there are no mathematical reasons for doing anything outside of mathematics, and coherence should be no endorsement of an interpretation of measurement or a belief system.

On their claim that measurements are ultimately information, I think most information experts would disagree. Measurement is not information but data. It becomes information when it is structured and used in a context where it can guide action. This is not a trivial distinction. Crackpot rigor and data fetish plague information technology right now; and spurious correlation is the root of much evil.

Hubbard and Seiersen write that “the method described so far requires the subjective evaluation of quantitative probabilities.” This seems misleading. More accurately, the method requires rational evaluation of quantified subjective probabilities. Is this nit-picking? Perhaps, but I think accuracy about where subjectivity lies is important; and it is the probabilities that are subjective. I agree that quantified expert opinions are valuable and sometimes the only data on which to base risk assessments. But why pretend that people have no grounds for arguing otherwise.

A similar issue involves statistical significance. The authors argue that most people are wrong about significance, and that all data is significant. They set up a straw man by not allowing that common usage admits two meanings of data “significance.” One indicates noteworthiness, and another is the common but arguable technical meaning of having a p-value less than 0.05. Most competent professionals understand the difference, and are not arguing that a p-value of 0.06 is insignificant in all ways.

The authors are also a bit sloppy with the concept of proof. For example, they use the phrase “scientifically proven.” They refer to empirical observations being proven true (p. 150), and discuss drugs proven not to be a placebo. If we’re to argue against soft methods claimed to be “proven” (as the authors do), we need to be disciplined about the concept of proof. Proof is in the realm of math; nothing is scientifically proven. I believe in the methods promoted by this book, and I have supporting evidence; but no, they aren’t scientifically proven to be better.

They also call Monte Carlo a “proven method.” Monte Carlo is proven only in a trivial sense, not in a sense that means it gives good answers. Opponents would say it is proven to give most users enough rope to hang themselves. Monte Carlo simulations are just tools – like explosives, chainsaws and log splitters, perhaps. The book gives the impression that Monte Carlo is self-justifying. All you have to do is plug in the right distribution and the right mean value and you’re safe.

By “quantitative methods” the authors seem to mean primarily quantification of expert opinion with possible Bayesian updates and Monte Carlo methods. I too believe in Bayesian inference, but as with their use of “measurement,” this use of the phrase “quantitative methods” seems aimed at blurring an important distinction. It does the causes of Bayesianism no good to dodge objections to subjectivist interpretations of probability by grouping it together with all other quantitative methods. Finite-element stress modeling is also a quantitative method, but it is far less controversial than subjective probability. Quantified measurements by instruments and quantified degrees of belief are fundamentally different.

In criticizing soft methods with imprecise scales, the authors use, as an example, how ridiculous it would be for an engineer to say the mass of a component on an airplane is “medium.” But the engineer has access to a means of calculating or measuring the mass of that component that is fundamentally different from measuring the engineer’s estimate of its mass. Yes, error exists in measurement by instruments, and such measurements rely on many assumptions; but measurement of mass is still different in kind from quantification of estimates about mass. I’d be just as worried about an engineer’s quantified estimate of what the weight of a wing should be (“about 250 tons”) as I would about her guessing it to be “medium.”

There are well-known failures of Bayesian belief networks; and there are some non-trivial objections to Bayesianism. For example, Bayesianism puts absolutely no constraints on initial hypotheses probabilities; it is indifferent to the subjectivity of both the hypothesis and its initial prior.  Likewise, there’s the problem of old evidence having zero confirming power in a Bayesian model, the equating of increased probability with hypothesis confirmation, the catch-all hypothesis problem, and, particularly, the potentially large number of iterations needed to wash out priors. These make real mischief in some applications of Bayesianism.

One of my concerns with this book’s methodology stem from its failure to differentiate risks from hazards. In proposing an alternative to the risk matrices, the authors propose a 5-step process that starts with “define a list of risks” (p. 37). This is jumping to potentially dangerous conclusions. The starting point should be a list of hazards, not risks. Those hazards serve as a basis for hypothetical risks, after assigning arbitrary or particularly useful (to the business exposed to them) impact levels for which a probability determination is sought. For example, for the hazard of sabotage, cyber risk analysts might want to separately examine sabotage incidents having economic impacts of $100K and $10M, or some other values, which, presumably, would not be equally likely. Confusing hazards with risks is something these authors should want to carefully avoid, especially since it is a fundamental flaw of frameworks that rely on risk registers and risk matrices.

A more serious flaw in the book’s approach to cyber-risk analysis is, in my view, its limited and poorly applied  use of the concept of decomposition. While the book sometimes refers to parametric variations analysis as decomposition, its use of decomposition is mainly focused on the type of judgmental decomposition investigated by McGregor and Armstrong in the 80’s and 90’s. That is, given a target value that is difficult to estimate, one breaks the problem down into bits that are easier to estimate. McGregor and Armstrong noted from the start of their work that translating this idea into practice is difficult, and is most valuable when the target quantity is “extreme,” i.e., very difficult to estimate. Further, estimation errors for the components must not have strong positive correlations between one another. This is a difficult requirement to meet in cybersecurity.

Also, the authors fail to mention that McGregor and Armstrong ultimately concluded, after twenty years of research, that judgmental decomposition had a more limited value than they thought it would early in their research. Hubbard and Seiersen report that McGregor and Armstrong found that simple (low variable count) decompositions reduced error by “a factor of as much as 10 or even 100.” I don’t think that is an accurate statement of McGregor and Armstrong’s finding.

While they seem to overstate the value of judgmental decomposition, the authors miss the value of functional decomposition. Functional decomposition of many of the system states being modeled will certainly lead to better estimates. One way it does this is by making explicit all the Boolean logic (AND and OR logic) in the interaction of components of the hazardous system-state being modeled. The book never mentions fault trees or similar decomposition methods. It may be that the systems exposed to cyber risk cannot be meaningfully decomposed in a functional sense, but I doubt it.  If this were true, it would be hard to see how the problem space could accommodate McGregor and Armstrong’s criterion that decomposed elements must be easier to estimate than global ones. For more on estimating low-level probabilities in multiple-level systems, see Section 4 of NASA/SP-2009-569Bayesian Inference for NASA Probabilistic Risk and Reliability Analysis.

Like Hubbard’s first book, this one includes an excellent set of training materials for calibration of subject matter experts in the style developed by Lichtenstein, Fischhoff and Phillips in their work with the Office of Naval Research in the 70’s. On the question of whether calibrated estimators are successful in real-world predictions, Hubbard attributes degrees of success with how closely practitioners follow his recommendations for calibration training (p. 153). This is naive and overconfident.

Using exactly the same calibration workshop procedures, I have seen dramatically different results with different situations in different industries. I suspect two additional factors govern the success of predictions by calibrated experts. One deals with relative degrees of ignorance. The overconfidence of experts can be tamed somewhat through calibration exercises using trivia quizzes. The range of possible values for the population of Sweden or the birth year of Isaac Newton are pretty well bounded for most people. The range of possible values for the probability of rare events – about which we are very ignorant – is not similarly bounded. One-per-million and one-per-trillion differ by a factor of a million, but are both exceedingly rare in the judgment of many experts in the subject matter of real-world predictions. Calibration seems to have less impact in estimates where values range over many orders of magnitude and include extremely large or extremely small numbers.

A more important limitation of expert calibration exists in situations where real-world predictions cannot be practically conducted in the absence of social influence. That peer influence greatly damages the wisdom of crowds is well-documented. The most senior or most charismatic member of a team can influence all assignments of subjective probability. Then the wisdom of the crowd transforms into to a consensus-building exercise, right where consensus is most destructive to good predictions. For more on this aspect of collective predictions, see Seidenfeld, Kadane and Schervish’s “On the Shared Preferences of Two Bayesian Decision Makers.” They note that an outstanding challenge for Bayesian decision theory is to extend its norms of rationality from individuals to groups. This challenge is huge. Organizational dynamics in large corporations combined with the empirically valid stereotype of shy engineers and scientists can render expert calibration ineffective. Still, peer influence with calibration is better than peer influence alone.

A general complaint I have with Hubbard’s work is that when he deals with matters of engineering, he seems not to clearly understand engineering risk assessment. When he talks about Fukushima he seems to have no familiarity with FHA, FMEA and fault tree/PSA  – and misses opportunities for functional decomposition as mentioned above. By framing risk analysis as a matter for the quants, he misses the point, shown well by history, that risk must be tightly integrated with those who “own” the systems. I discovered in aerospace risk workshops I ran 25 years ago that it is far easier to teach risk analysis to aviation engineers than it is to teach aerospace engineering to data scientists; systems knowledge always trumps analytical methods.

Despite these complaints, I strongly recommend this book. Hubbard and Seiersen deserve praise for a valiant effort  to dislodge some increasingly dangerous but established doctrine. Hopefully, it’s not an exercise in trying to sell brains to dinosaurs.

The last chapter of Cybersecurity Risk is particularly noteworthy. In it the authors dress down standards bodies and the consultancies who peddle facile risk training. They suggest that standards bodies should themselves be subject to metrics and empirical testing. This point is strengthened by the realization that auditors ensure compliance with standards, not independent determinations of efficacy. In that sense, standards based on established methods, but methods for which there is no evidence of added value, are doubly harmful. Through auditing, such standards prevent adoption of more effective methods while the auditing process inadvertently endorses ineffective methods embodied in standards. The authors describe scenarios in cybersecurity that parallel what I’ve seen in pharma  – managers managing to an audit rather than managing to actual risks.

Cybersecurity Risk’s last chapter also calls for sanity in rolling out cybersecurity risk management. They seek to position quantitative cybersecurity risk management (CSRM) as a C-level strategic practice, noting that risk management must be a program rather than a bag of tactical quantitative tools. They’re not trying to usurp the corporate investment decision process (as some ERM initiatives seem to be) but are arguing that the CSRM function should be the first gate for executive or board-level technology investment consideration, thereby eliminating the weak, qualitative risk-register stuff commonly heaped on decision makers. Their proposed structure also removes cybersecurity risk from the domain of CTOs and technology implementors, a status quo they liken to foxes guarding the hen house. A CSRM program should protect the firm from bad technology investments and optimize technology investments in relation to probable future losses. Operationally, a big part of this function is answering the question, “Are our investments working well together in addressing key risks?”

  – – – – – – – –
.

Are you in the San Francisco Bay area?

If so, consider joining the Risk Management meetup group.

Risk management has evolved separately in  various industries. This group aims to cross-pollinate, compare and contrast the methods and concepts of diverse areas of risk including enterprise risk (ERM), project risk, safety, product reliability, aerospace and nuclear, financial and credit risk, market, data and reputation risk.

This meetup will build community among risk professionals – internal auditors and practitioners, external consultants, job seekers, and students – by providing forums and events that showcase current trends, case studies, and best practices in our profession with a focus on practical application and advancing the state of the art.

https://www.meetup.com/San-Francisco-Risk-Managers/

Boolean Logic and Cut Sets

Fault Tree Friday  post 5    ( 1 2 3 4 )

In the first post on fault trees I used the term cut sets to refer to any combination of fault tree initiators that can produce the fault tree’s top event. There may be many – sometimes very many – cut sets in the complete collection of cut sets for a tree. The probability of the top event is, roughly speaking in  most cases, the sum of the probabilities of each cut set. The probability of each cut set is, in most cases, the product of the probabilities of each initiator in that set. The previous two sentences describe how things are most of the time. The exceptional cases – when they exist, are important.

Before dealing with the exceptional cases though, let’s look at the cut sets of some simple fault trees. In the second post of this series I showed two logically equivalent trees (repeated below) noting that, in real fault tree analysis, we use the lower rendering. The top one is useful for educational purposes, since it emphasizes gate logic. In this example, there are three cut sets:

  • Set 1: Event 1
  • Set 2: Event 2
  • Set 3: Events 3, 4, and 5, together

Fault tree

If the initiator probability of all events were 0.5 (an unlikely value for any real world initiator event, chosen here just to make a point), the probability values of each cut set would be the product of the probability values in each set (two of which have only one event).

  • Set 1: P = 0.5
  • Set 2: P = 0.5
  • Set 3: P = 0.125

Earlier I said that the top event probability in most, but not all, fault trees roughly equals the sum of the probabilities of all the cut sets. In this case that sum would be 1.125. We know that can’t be right since a probability cannot exceed 1.0.

The problem in the example above stems from using a shortcut form of the calculation for the probability of the union of sets – in this case the union of all cut sets of a fault tree. The accurate form of the solution for the probability of a union of the three cut sets (1, 2, and 3) above would be:

P(1,2,3) = P(1) + P(2) + P(3)  –  P(1) * P(2)  –  P(1) * P(3)   –   P(2 * P(3)   +    P(1) * P(2) * P(3).

Or in set notation:

The generalized form of the equation for the top event probability requires more math than we need to cover here. The Wikipedia entry on inclusion-exclusion principle covers is a good reference for those needing details. A rough summary is that we subtract the probabilities of each combination of even numbers of cut sets and add the probabilities of each combination of odd numbers of cut sets. The resulting equation for the top event probability of a tree modeling faults in a complex system can have billions of terms. No problem for modern computers.

Inclusion exclusion Venn diagram

In one sense the “solution” to a fault tree is simply a set of combinations of initiator events. This means any fault tree can be reorganized into a tree of only three levels. In such a reorganized tree, all top events are associated with an OR gate, and all children on that OR gate are associated with AND gates. That is, you can say that any fault tree can be reduced to an OR of many ANDs. At least that would be the case if no single failure were allowed, e.g., by design, to produce the top event. If single initiator events can lead to the top event, we could say such fault trees could be reduced to an OR of either ANDs (i.e. multiple-event cut sets) or single-event cut sets (allowing that a set can have one element). That is the case for the tree in the above example, which can be rearranged to look like this:

fault tree logic

Perhaps a better example of rearrangement of a fault tree into and OR of many ANDs is the tree below. Note that the AND gate in the rendering below the green line has six child events. The single black vertical line leading from the bottom of the gate joins six branches. Using this drawing technique prevents the clutter that would exist if we attempted to draw six separate parallel vertical lines into the bottom of the AND gate.

equivalent fault trees

The trees above and below the green line are logically equivalent. Presumably, the design of the system being modeled would lead an analyst to draw a tree of the top form in a structured top-down analysis. The bottom tree shows how we get cut sets, of which there are six for this tree:

  • 1, 3, 4
  • 1, 3, 5
  • 1, 3, 6
  • 2, 3, 4
  • 2, 3, 5
  • 2, 3, 6

We can now calculate the top event probability as follows:

P = P(1,3,4) + P(1,3,5) + P(1,3,6) … – P(1,3,4) * P(1,3,5) – P(1,3,4) * P(1,3,6) …
+ P(1,3,4) * P(1,3,5) * P(1,3,6) + P(1,3,4) * P(1,3,5) * P(2,3,5) …, etc.

where P(1,2,3) equals P(1) * P(2) * P(3), because initiator events 1, 2, and 3 are required to be truly independent of each other for the fault tree to be valid. Fault tree software handles the tedious and error-prone job of calculating the top event probability and the probabilities of all intermediate events.

While the rules of fault tree construction require all initiator events to be truly independent of each other, nothing says the same initiator event cannot appear in multiple places in a tree. In fact, for redundant systems, this happens often.

We need to give this situation special attention. Its ramifications for system design are important. Consider this simple fault tree on the left below, noting that the same basic event ( event A) appears in two branches.

On first glance, one might conclude that the top event probability for this tree would be 0.1 * 0.1 = 0.01, since the top event is an AND gate, and AND nominally indicates multiplication. But the tree is modeling a real-world phenomenon in which if event A happens, it happens regardless of our symbolic representation. Or, from an analytic standpoint, we can refer to the rules of Boolean algebra. A AND A is simply A (from a rule known as idempotence of conjunction).

So the collection of cut sets for this simple tree contains only one cut set; and that cut set consists of a single event, A. Since the probability of A is 0.1, the probability of T, the top event, is also 0.1. Both common sense and Boolean algebra reach the same conclusion. Complex fault trees can vastly exceed the grasp of our common sense, and “A AND A” cases can be concealed in complex trees. Software that applies the rules of Boolean algebra saves the day.

Idempotency of disjunction (image on right, above) similarly leads us to conclude that replacing the above tree’s AND gate with an OR gate (fault tree on right, above) yields the same cut set collection – a single set having a single event, A.

In addition to idempotency effects, non-trivial real-world fault trees will also likely have a cut set collection where one cut set consists of a subset of the events of another cut set. This has the sometimes unintuitive consequence of eliminating the cut set with the larger number of terms. Consider this tree:

fault tree disjunctive absorption

One view of its solution is that two cut sets exist:

1.)  A
2.)  A, B

But A OR (A AND B) equals simply A. That is, disjunctive absorption removes cut set 2 from the cut set collection. In a real-world fault tree, complexity may make cases of disjunctive absorption much less obvious; and they often point to areas of ineffective application of design redundancy.

fault tree conjunctive absorption

Conjunctive absorption has the same effect in the above tree. A preliminary account of its cut sets would be:

1.)  A, A
2.)  A, B

Idempotency reduces cut set 1 to A alone, and disjunctive absorption eliminates cut set 2. Thus, conjunctive absorption can be derived from disjunctive absorption plus idempotency. In short form, A AND (A OR B) equals simply A.

The simple fault trees above would never occur in the real world. But logically equivalent conditions do appear in real-world trees. They may not be obvious from inspecting the fault tree diagram; but they become apparent on viewing the cut set collection. A cut set collection from which all supersets have been eliminated (i.e., absorption has been applied) is often called a minimal cut set collection.

Back in the dark ages when computers struggled with fault tree calculations, analysts would use a so-called gate-by-gate (i.e. bottom-up) method to calculate a top event probability by hand. The danger of doing this, if a tree conceals cases where idempotency and absorption are relevant, is immense. Given that real-world initiator probabilities are usually small numbers, a grossly wrong result can stem from effectively squaring – without being justified in doing so – an initiator probability. I mention this only because some textbooks published this century (e.g. one by the Center for Chemical Process Safety) still describe this manual approach – a risky way of dealing with risk.

A more important aspect of the concepts covered here is the problem stemming from a fault tree that does not model reality well. Consider, for example, a fault tree with 10,000 cut sets. Imagine the 1000th most probable cut set, based on minimal cut set analysis, has a probability of one per trillion (1E-12) and contains events A, B, and C.

Imagine further that events A and C each have probabilities of 1E-5. If A and C turn out to be actually the same real-world event – or they result from the same failure, or are in some other way causally correlated – that cut set then reduces to events A and B, having a cut set probability of 1E-7. This probability may be greater than all others in the collection, moving it to the top of an ordered list of cut sets, and possibly into the range of an unacceptably likely top event.

In other words, fault trees that miss common-mode failures are dangerous. Classic cases of this include:

  • Redundant check-valves all installed backwards after maintenance
  • Uncontained engine failure drains all three aircraft hydraulic systems
  • Building fire destroys RAID array and on-premise backup drives
  • Earthquake knocks out electrical power, tsunami destroys backup generators

Environmental factors like flood, fire, lightning, temperature, epidemic, terrorism, sabotage, war, electromagnetic pulse, collision, corrosion, and collusion must enter almost any risk analysis. They can do so through functional hazard analysis (FHA), failure mode effects analysis (FMEA), zonal analysis (ZSA) and other routes. Inspection of fault trees to challenge independence of initiator events requires subject matter expertise. This is yet another reason that fault trees belong to system and process design as much as they belong to post-design reliability analysis.

 –  –  –  –

Are you in the San Francisco Bay area?

If so, consider joining the Risk Management meetup group.

Risk management has evolved separately in  various industries. This group aims to cross-pollinate, compare and contrast the methods and concepts of diverse areas of risk including enterprise risk (ERM), project risk, safety, product reliability, aerospace and nuclear, financial and credit risk, market, data and reputation risk.

This meetup will build community among risk professionals – internal auditors and practitioners, external consultants, job seekers, and students – by providing forums and events that showcase current trends, case studies, and best practices in our profession with a focus on practical application and advancing the state of the art.

https://www.meetup.com/San-Francisco-Risk-Managers/

San Francisco Pension Fund Risk

San Francisco’s Retirement Board has seven members, three elected and four appointed. They manage a Defined Benefit fund for the San Francisco Employees’ Retirement System (SFERS) on behalf of its 50,000 members.

The board sets policy for the fund, allocating the split between stocks, bonds, property, private equity and hedge funds. The board also oversees a $2.8B Deferred Compensation plan and selects investment options.

Two years ago the CIO recommended that $3B (15%) of assets should be in hedge funds. Police and fire unions strongly supported the 15% position. However, opposition by beneficiaries caused the hedge fund investment to be capped at $1B.

At the time the CIO was pushing for 15% in hedge funds, their high risk, high fees, and poor liquidity were in the news, as was the problem of hedge fund transparency. It seems memories fade fast in financial circles. Bankruptcy of Bear Stearns’ “High Quality” hedge fund in 2008 forced a bailout of the highly successful investment banking firm, which serves as a reminder of how the “straight rule” of induction lures Hume’s chicken into thinking the farmer has the chicken’s longevity in mind.

It may be that SFERS lacks even the sense of Hume’s chicken, which might have been skeptical had the farmer previously shown signs of interested not aligned with those of the chicken. SFERS’ misadventure in the FX Concepts “currency overlay” hedge fund in 2013 counts as such evidence.

In 2014 CalPERS (the CA Public Employees Retirement System) decided to move $4B out of hedge funds. Because of illiquidity of the funds, CalPERS still has positions in those funds.

In June 2016, the Retirement Board voted 4-1 to put $500M in a customized fund of hedge-funds program. Herb Meiberger, a commissioner of SFERS, asked for the names and funding amounts of the fund managers. Executive Director Jay Huish told Meiberger that info was confidential and that fund managers would be selected behind closed doors without disclosure or public involvement. More evidence.

Meiberger clearly takes a risk-based approach to governance, and stands out from a majority of “what could possibly go wrong”  board members. As psychologists have noted, humans are naturally ill-suited for rational assessment of risk. Meiberger is running for membership on the SFERS board again in January 2017. We need more officials that have a clue about risk management.

 

The Magic of Redundancy

Fault Tree Friday – Week 4           ( 1  2  3 )

In post 1 of this series I stressed that fault trees were useful in the preliminary design of systems, when alternative designs are being compared (design trade studies). I argued that fault trees are the only rational means of allocating redundancy in complex systems. In that post I used the example of a crude brake system for a car. It consists of a brake pedal, a brake valve with two cylinders each supplying pressure to two brakes, and the hydraulic lines and brake hardware (drums, calipers, etc.). I’ll use that simplified design (it has no reservoir or other essentials yet) again here.

car brakes

We’ll assume that pressure sent to either the front wheels alone or the rear wheels alone delivers enough braking force that the car stops normally. Note that this also means the driver would, using brakes of this design, not know that only two of four brakes were operational.

For sake of simplicity we’ll model the brake system as having only two fault states: front brakes unable to brake, and rear brakes similarly incapacitated. While we would normally not model the collection of all failure modes resulting in loss of braking capability to the front brakes as a basic event (initiator), we’ll do so here to emphasize some aspects of redundant system design.

If we imagine a driving time of one hour and, assign a failure rate of one per thousand hours (1E-3/hr) each to the composite front and rear brake system failures, our fault tree might look like this:

fault tree

If we modeled the total loss of braking using the above fault tree, we’d be dead wrong. Since the system, as designed here, is fully redundant (in terms of braking power), many failures of either front or rear brakes could go unnoticed almost indefinitely. To make a point, let’s say the maintenance events that would eventually detect this condition occur every two years.

To correct the fault tree (leaving the system design alone for now), we’d start with something like what appears below. Note that with the specified failure rates, the probability of either the front or rear brakes being in a failed state at the beginning of a drive is roughly equal to one, using the standard calculus of probability given an exposure time and a failure rate based on an exponential distribution. This means that, as modeled, the redundancy is essentially useless. We call the failure that goes unnoticed for a long time period a latent failure.

fault tree

That fault tree isn’t quite correct either. It models the rear brakes as having failed first – a latent failure that silently awaits failure of the front brakes. But in an equally likely scenario, the front brakes could fail first. That means either of two similar scenarios could occur. In one (as shown above) the loss of braking in the rear brakes goes unnoticed for a long time period, combining with a suddenly apparent failure of the front brakes. In the other scenario, the front brake failure is latent. A corrected version appears below. Note that the effect of this correction is to double the top event probability, making our design look even worse and making redundancy seem not so magic.

The fault tree tells us that our brake system design isn’t very good. We need to reduce the probability of the latent failures, thereby getting some value from the redundancy. We can do this by adding a pressure sensor to detect loss of pressure to front or rear brakes, along with an indicator on the dashboard to report that this failure was detected. To keep this example simple, assume we can use the same sensor for both systems.

The astute designer will likely see where this is heading. Failure of the pressure sensor to detect low pressure is now a latent failure. And so is failure of the indicator in the dashboard. Again for simplicity of example, we’ll model these as a single unit; but failure of that unit to be able to tell the driver anything that either front or rear brakes are incapacitated is a latent failure that could go undetected for years. The resulting fault tree would look like this:

fault tree

Despite being a latent failure, monitoring subsystems are usually more reliable than the the things they monitor. I’ve shown that in this example by using a failure rate of one per 10,000 hrs for the monitor equipment. With an exposure time of two years, the probability that the monitor is in a failed state during a drive is 0.16. Consequently, adding all the monitoring equipment only changed our top event probability from 1E-3 to 3.2E-4. It’s better, but not impressively so. And we added cost to the car and more things that can fail.

It’s important to realize that, at the bottom of the above fault tree segment, the exposure time to loss of braking in both rear brakes cannot be shorter than the exposure time to failure of its monitor. That is almost always the case for monitored components in any redundant design.

We could reduce the top event’s likelihood by shortening the system’s maintenance intervals. Checking the monitoring system every 2 months, instead of two years, would get us another factor of twelve.

But such an inspection need not verify that each element of the monitoring subsystem is separately functional. Logic allows us to see that we need only verify that it is  or is not capable of monitoring a failure as a group. For now, if it’s failed we aren’t really concerned with what went wrong in particular. We can test the monitor by inducing  it to test for pressure when none exists. If the monitor reports the condition as a pressure failure, it is good.

We don’t need an auto mechanic for this. The driver could run the test on startup. We’ll design the monitor to test for no pressure as the car starts, when no brakes are applied. We call on the driver to verify that the warning indicator illuminates.

Two failure modes of the monitored brake system – which now includes an operator – should now be apparent. An unlikely failure mode is that the indicator somehow illuminates (“fails high”) during the test sequence, indicating no pressure at startup, but fails to illuminate when no pressure is applied to the brakes.

A far more likely fault state is operator error. This condition exists when the operator fails to notice that the startup test sequence does not display the illumination. This fault does not involve exposure time at all. The operator either remembers or forgets. For an untrained operator, the error probability in such a situation may be 100%. A trained operator will make this error somewhere between one and ten percent of the time, especially when distracted. Two operators who monitor each other will do better. Two operators with a check list will do better still.

With the startup-check procedures in place, our system no longer has any latent failures. It has high-probability error states, but they have the advantage of only contributing to the top event in conjunction with another failure. The resulting fault tree makes us feel better about driving:

FTA

While the system we’ve modeled here is not representative of modern brake systems (and my example has other shortcuts that would be deemed foul in a real analysis), this example shows how fault trees can be used in preliminary system design to make better system-design choices. It really only begins to make that point. In a quad-redundant flight control system, such analysis has far more impact. This sort of modeling can also reveal weak spots in driverless cars, chemical batch processes, uranium refinement, complex surgery procedures, cyber-security, synthetic biology, and operations where human checks and balances (a form of redundancy) are important.

In the above fault trees, two similar branches appear, one each for front and rear brakes. The real-world components and their failures represented in each branch are physically distinct. In the final version, immediately above, indicator failure and operator error occur in both branches. Unlike the other initiator events, these events logically belong to both branches, but each represent the same real-word event. The operator’s failure to verify the function of the pressure warning indicator, for example, if it occurred, would occur in both branches simultaneously. The event IDs (e.g., “OE1” above) remind us that this is the case. Appearance of the same event in multiple branches of a tree can profoundly impact the top event probability in a way that isn’t obvious  unless you’re familiar with Boolean algebra. We’ll go there next time.

 

Risk Culture

Risk culture Risk culture has been a hot topic of late. For example, it’s common to hear claims that culture is the most undervalued aspect of risk, or that it is the element most critical for the Board’s management of risks. If that seems a stretch, consider our recent credit crunch, and see the film, The Big Short. The importance of culture in corporate risk may be the one thing on which we all agree – all but a few die-hard quants.

Despite agreement on the importance of risk culture, the topic gets rather thin coverage in many frameworks. What then, might an ideal risk culture be?

On most accounts, risk culture involves the values, norms, beliefs, ethics and attitudes about risk shared by a group. Most writings on the topic also include the claim that senior management must be the driver of change to an effective risk culture. It’s a plausible claim, since there are few alternative sources. Regulatory bodies don’t seem to have that effect on employees, and organic growth of optimal risk culture seems unlikely.

Two fields I have experience in – aviation and pharmaceuticals – immediately come to mind. In aviation, risk is deeply embedded at nearly all levels of organizations. Oddly, the aviation industry started out with an affable relationship with its regulator. It has cooled slightly in recent decades, but is still today far from contentious. In pharmaceuticals, risk culture is poorly developed, and relationships with the FDA are often adversarial.

This dichotomy likely stems more from accidental environmental factors than from any inherent differences in dispositions or competencies between the fields. Commercial aviation was lucky enough to emerge at a time when the FAA was so resource-strapped that it was forced into a tight partnership with aircraft builders – a situation from which we all benefited greatly. The early FDA had a much broader scope, and was regulating a vastly larger number of suppliers (food, drugs, tobacco, etc.) who were much less virtuous. The FDA’s short leash had the unwanted side-effect of fostering a culture where risk management is equated with regulatory compliance. Attempts to move beyond that state (e.g., in ICH Q8, 9, and 10) have been slow to progress.

Lessons from the comparison between the two fields? To start, risk culture is real. Safety risk in passenger flight has fallen by a factor of a thousand or more, in a risk culture that extends from subcontractors to pilots to controllers. Technological advances cannot claim all the credit for this. Aviation workers are proud of their work. The motivation for doing the right thing is intrinsic, and the goals of workers align reasonably well with those of management and regulators.

Second, no external agent (agency) can supply your firm with risk-avoidance. A regulator might protect society from a firm’s evils and errors, but it won’t protect the firm from itself. The FDA only cares about a pharma firm’s bottom line to the extent that it seeks to prevent drug-availability crises.

The uncommonly beneficial state of risk culture in commercial aviation, which was not imposed, but grew organically, could be taken as an argument that kick-starting something similar in a random firm will be impossible. It need not be. But it will require a different tool kit than what’s in the standard ERM bag, because we’re now squarely in the realm of Change Management.

Michael Beer and John Kotter are my two favorite Change Management writers (Beer hates the term). They disagree on quite a lot; but they agree that any time the CEO needs to push a cultural change downstream, he first has to be seen as walking the walk. That is, there must be a vision; and management must embody it. The vision need not be mystical, Beer points out.

Further, employees must believe top and middle management is committed to the vision; and that management isn’t shallow, or deceiving themselves with hogwash about yet another strategic initiative.

Kotter and Beer, along with Bert Spector and Russell Eisenstat, all agree that under-communicating the vision – in this case, the risk culture objective – is a leading cause of failed transformation efforts. Frequent communications, using every possible channel, over a long period, are essential. The purpose is not to coerce workers into compliance. It is to demonstrate the relevance of the vision and to train by example. Kotter notes that even with several communications per week, if management behavior is antithetical to the vision, cynicism spreads fast, and no one believes the communications.

Drawing on the aviation example, I think we might strengthen the Change Management experts’ points for the specific area of risk culture by observing that clear goals, purpose, autonomy, continuous feedback, and a sense of control greatly add to development of inner standards and pride of work. These intrinsic motivators apply at levels from factory workers to the CFO. Worker engagement leads to trust; and trust promotes acceptance of shared values, norms, beliefs, and ethics, which is what definitions or risk culture rightly tell us should be our goal.

 – – – – –

Bill StorageAre you in the San Francisco Bay area?

If so, consider joining the Risk Management meetup group.

Risk management has evolved separately in  various industries. This group aims to cross-pollinate, compare and contrast the methods and concepts of diverse areas of risk including enterprise risk (ERM), project risk, safety, product reliability, aerospace and nuclear, financial and credit risk, market, data and reputation risk.

This meetup will build community among risk professionals – internal auditors and practitioners, external consultants, job seekers, and students – by providing forums and events that showcase current trends, case studies, and best practices in our profession with a focus on practical application and advancing the state of the art.

https://www.meetup.com/San-Francisco-Risk-Managers/

Fault Trees – View from the Bottom

Fault Tree Friday #3. See also #1 #2

Last week I showed how to begin building a fault tree from the top down. explained a tree’s structure, and looked at intermediate events and their associated logic gates. Now we can briefly cover initiators, the events at the bottom of the tree.

In most fault trees, most initiators get their probability values from the failure rate of a component related to the specific failure mode in question, and from the time window during which the mission or operation is exposed to that failure. For example, resistors fail by open more often than they fail by short , and the consequences of those two failure modes is often very different.

Some missions are exposed to certain failures for a fixed number of times regardless of the length of the mission. Aircraft are exposed to aborted takeoffs commanded by the control tower at some historical rate. This has nothing to do with the duration of a flight. Initial Public Offerings occur once in a firm’s lifetime, and occasionally fail miserably at a low historical rate, often with no chance of recovery. If modeling these events, you might infer their probabilities from the known cases in the total population.

Most hardware failures and many human errors are modeled as occurring at a fixed rate over the duration of a mission or lie of a project. This assumes, in the case of mechanical or electrical equipment, that infant-mortality cases have been removed from the population by a burn-in process, as is often applied to 100% of semiconductors in critical applications. Likewise, wear-out failures are prevented in critical applications by maintenance, non-destructive testing, and replacement of finite-life structural components.

Between the extremes of infant mortality and wear-out – during the “useful-life” period – most equipment fails at a roughly constant rate. During that normal-life period, the probability of failure during a mission (e.g., a flight, reactor-time in a chemical batch process, or the time between scheduled maintenance in power-generation) is a simple function of the exposure time and a historical failure rate.

The model for this is commonly called an exponential failure distribution. The probability P of a failure in time interval T for a component having a failure rate R (where R equals the reciprocal of man time between failure, MTBF) is given by:

On Risk Of - Fault Tree Analysis

where “e” is Euler’s Number, the number having a natural log of 1.

As a historical note, when the product R * T is small, you can approximate P as P = RT and leave your slide rule in the desk. Otherwise, fault tree software will let you supply values for R and T and will calculate P for each initiator event where you didn’t assign P directly. As another historical note, events for which P is supplied directly are sometimes called “undeveloped events” and those given R and T values are often called “basic events.” “Undeveloped” partly stems from an old practice of braking trees into chunks to ease computation (the top of one tree supplying the probability to an “undeveloped” (developed elsewhere) event. Try to avoid this; it risk grave computational errors, for reasons we’ll cover later. I’ll call any event that doesn’t have kids an initiator.

fault tree

You might be wondering where failure rates come from. Good question. Sources include GIDEP, IEEE 500, Backblaze hard drive data, USAF Rome Laboratory, MIL-HDBK-217F, and RIAC. And those who write procurement specs should require vendors to supply detailed data of this sort.

Above I used the example of different failure rates for the different failure modes of a resistor. Since the effort of building fault trees is usually only justified for catastrophic fault states (hazards), you’re unlikely to see a resistor failure appear as an initiator. The top-down development of a tree need only descend to a point dictated by logic and availability of historical failure-rate data. So initiators might specify a failure rate for electronics at the “box” (component) level or perhaps the circuit-board level when a box contains redundant boards, as would be case for an auto-land controller in my aircraft braking example.

The Human Factor

For fault-tree purposes, human errors are modeled as faults. Error is generally modeled as the probability of a mistake or omission per relevant action. This typically enters fault trees as events involving maintenance errors and primary operator errors – like pilots for aircraft and chemists for batch processes. Fault trees point out the needs for operator redundancy and for monitors.

Monitors might take the form of more humans, i.e., inspectors or copilots. Or monitors might take the form of machines that watch the output of human operators or machines that check the output of other machines.

One thing history has taught us – in general, humans make poor monitors, particularly when called upon to ensure that a machine is working properly. Picture Homer Simpson, eyes glazed over, while the needle is in the red.

Bored humans do poor work (reminder: blog post on human capital risk), and scared humans are even worse. This is particularly relevant for critical operations in degraded systems where skill is required – think fighter pilots and deep divers.

While diodes, motors, pumps, valves and surprisingly many other things fail at a rate of about one per million hours (1E-6/hr) – and RAM is better still – humans mess up surprisingly often. For example:

  • Omission of step in batch process by skilled operator:    0.005 to .003
  • Arithmetic error in single simple calculation:   0.03
  • Human monitor (inspector) doesn’t catch error:  0.03
  • Omission of step during stressful emergency procedure:   0.1

Commercial aircraft operation uses redundant pilots and redundancy in critical procedures to deal with errors of omission. Knowledge of error criticality does little to prevent critical errors. For example, failure to deploy flaps for takeoff – about as critical an error as can be imagined – resulted in several crashes, including Delta 1141 in 1988, despite redundancy and checklists. Non-human monitors of humans’ configuration of control surfaces is a better approach.

Fault trees are great for helping us allocating redundancy in system design. To get this right, we need to take a close look at the failure rates and exposure times supplied to initiator events in redundant designs. I intended to cover that today, but this post is already pushing the limits. I’ll get to it next time.

In the meantime, consider two other aspects of redundancy, failure rates, exposure time, and probability. First, imagine two components in parallel, each having P = .01. This arrangement may cost much less than one component having P = .0001. Then again, two components in parallel will weigh more than twice the weight of one, and will take up at least twice the space.

Second, consider the fault-tree ramifications of choosing the two-in-parallel design in the above example. Are there any possible common-mode (or common-cause) failures of both the components? What happens if one explodes and takes out the other? What happens if the repair crew installs both of them backwards after scheduled maintenance?

 

ISO 31000 and Those Who Don’t Know History

William Storage – Dec 8, 2016
VP, LiveSky, Inc.
Visiting Scholar, UC Berkeley History of Science

Risk: “the effect of uncertainty on objectives.”
ISO 31000 risk definition

ISO 31000, along with other frameworks, uses a definition of risk that is not merely incompatible with the common business and historical usage; it is highly destructive to its own goals. A comment on a recent LinkedIn post about “positive risk” asked fellow risk managers, “can we grow up?” I share the frustration. ERM must step into the real world, meeting business on its own terms – literally.

The problem with an offbeat definition of risk isn’t just a matter of terminology. The bad definition is at the heart of several derivative concepts, which ultimately lead to contradictions and confusion. That confusion is not lost on CEOs and boards of directors. Proponents claim that these audiences welcome ERM and that they align strategies accordingly, e.g. COSO 2009: “boards and management teams are embracing the concept of ERM”. But dig into this recent Deloitte survey, like many before it, and you’ll see that the self-congratulatory self-assessment projected onto boards comes with some less optimistic hard data. For example, Deloitte’s data actually shows that just over half of even financial-service boards get updates on top risks, and less than half of those get such updates more than once a year.

I’ve recently had the chance to speak about risk management with a few Fortune-500 CEOs (telecom, insurance and healthcare) and a number of their board members. Unsurprisingly, these folk tend to be learned – some downright expert in science and math. Many were aware of ERM’s quirky use of “risk” and related terms central to science, and did not need prompting to express dismay. All five healthcare execs I spoke with told me their boards have no contact with ERM output.

A retired CEO told me she suspected that ERM’s “positive risk” concept is a turf grab – a way for risk managers to inject themselves into strategic decisions. Of course, risk managers have good evidence that risk should move upstream in the decision process. But idiosyncratic language and muddled reinterpretations of core analytical concepts are unlikely to persuade educated executives. If you think otherwise, try searching the web for praise of an ERM framework by a board of directors or top executive.

To understand why the issue of defining risk is one of several big changes that ISO 31000 and some of its brethren must undergo, a historical perspective on risk and the roots of ERM’s conception of it may help.

Risk started with probability theory, which, oddly, did not emerge until the 16th century. Before that, despite widespread gambling, humans, possibly for religious reasons, could not imagine any way to predict the future. As historian Ian Hacking  (The Emergence of Probability) wrote, “someone with only the most modest knowledge of probability mathematics could have won himself the whole of Gaul in a week.”

Then Geralomo Cardano realized that, whether or not through the will of God, rolling two dice resulted in more sevens than twos. Pascal and Fermat later devised a means of calculating probability based on a known problem-space. Soon after, John Graunt realized he could predict future death rates based on historical data.and, With help from Huygens and Bernoulli, statistical inference was born.

While annuities and mutual-aid societies existed in ancient Rome, modern insurance had to wait for Graunt’s concepts to spread. Only then could probability and statistical inference (as these terms are used where italicized above) become a rational basis for setting premiums, as shown by Edmond Halley, who discovered other regularities in the natural world.

“Insurance Against Risk”

Risk insurance was soon widespread. Risk‘s Latin root means danger, and that’s how the term was used in insurance. The 1828 American Dictionary of the English Language says risk signifies a degree of hazard or danger. It explains that “the premiums of insurance are calculated upon the risk.” In insurance, science, medicine, and engineering, risk is a combination of likelihood and severity of a hazard (potential loss); and that how the term is used everywhere outside of ERM and some Project Management imitators.

For example, in Google’s data, the top 25 two-word collocations starting with “risk” all associate risk with cost or loss:

risk bigrams

Further, in Google’s data “positive risk” or similar expressions do not occur in the first 10,000 bi-grams ending in “risk,” despite the popularity of that concept in blog posts and on LinkedIn.

Defining risk as the effect of uncertainty on objectives causes many problems. One is that we don’t know the context of uncertainty; another is that it omits mention of loss. The rationale for this omission is that the consequences associated with a risk can enhance the achievement of objectives.

This rationale confuses risk-reward calculus with the concept of risk alone. Despite claiming to be neutral about risk (not the same thing as risk-neutrality) nearly all usage in the ISO 31000 is in terms of risk being tolerated, retained/transferred, shared, reduced, controlled, mitigated or avoided.

Uncertainty

To understand risk as the effect of uncertainty on objectives, we must know what is meant by uncertainty. Again, this isn’t just an exercise in philosophy of language. Uncertainty has been a problem term since Frank Knight (Risk, Uncertainty & Profit 1921) chose to redefine it (misuse, according to Frank Ramsey and other of Knight’s contemporaries) in two ways – incompatible with each other and with the standard use in math and science. We see echoes of Knight’s work in risk frameworks.

Knight’s concept of uncertainty relevant to this discussion is the one in which he equates risk with “measurable uncertainty”:

“To preserve the distinction which has been drawn in the last chapter between the measurable uncertainty and an unmeasurable one we may use the term “risk” to designate the former and the term “uncertainty” for the latter.” 

Knight’s critic (as we can infer Ramsey, Kolmogorov, von Mises and de Finetti were) might point out that Knight has constructed a self-referential definition; but a charitable reading of Knight is that risk equals uncertainty and uncertainty equals ignorance, in the non-pejorative sense, i.e., “unknown unknowns.”

Even in the charitable interpretation, Knight’s usage makes dialog with math and science nearly impossible, since in those realms we call the measure of uncertainty probability, (whether the frequentist or subjectivist variety). That is, it is not merely Knight’s language that is at odds with math and science, it is his world view and ontology.

Effects of Uncertainty

If the uncertainty in ISO 31000’s definition of risk is the Knightian variety, i.e., ignorance, then uncertainty describes an agent’s state of mind.The immediate effect of that uncertainty is necessarily a reflection on his/her/its ignorance, if there is an effect at all (a person unaware of his uncertainty would not be uncertain).  Given that the only possible first effect of awareness of a state of ignorance is cognitive or emotional, defining risk as the effect of uncertainty (the sort of Knightian uncertainty described above) is unworkable. Risk is certainly not an emotional response or a mental state of reflection, yet that is what a literal reading of ISO 31000 would require, assuming Knightian uncertainty.

If instead of Knight’s understanding of uncertainty, we use the math/science meaning of the term, things are only slightly better. If uncertainty involves a known problem space (as opposed to ignorance) the effect of uncertainty in any situation would be to affect our decisions. We might deliberate on what to do about quantified uncertainty (and therefore quantified risk). If we follow a subjectivist interpretation of probability we might choose to gather more information with which to refine our estimated probabilities (modify our uncertainty by updating our priors). But in neither of these cases, where uncertainty is not ignorance, would we call what we’re doing about uncertainty (the effect it has on us) “risk.” Here, uncertainty is a component of risk; but risk is not the effect of uncertainty on objectives.

An obvious remedy is to abandon arcane conceptions of risk and accept that a few centuries of evolution of rational thought has given us a decent alternative. Risk is a combination of the likelihood of an unwanted occurrence and its severity. This holds however we choose to measure or estimate likelihood, and regardless of how we measure severity. It does not require that we multiply likelihood times severity; and it allows that taking risks might have benefits. Further, it addresses the role of analysis of risks in decision making, i.e., “objectives.” I think this is where ISO 31000 was heading, but went off course, leaving much confusion in its wake. It’s time for a correction.

– – –

ISO 31000 risk definition


 

Are you in the San Francisco Bay area?

If so, consider joining the Risk Management meetup group.

Risk management has evolved separately in  various industries. This group aims to cross-pollinate, compare and contrast the methods and concepts of diverse areas of risk including enterprise risk (ERM), project risk, safety, product reliability, aerospace and nuclear, financial and credit risk, market, data and reputation risk.

This meetup will build community among risk professionals – internal auditors and practitioners, external consultants, job seekers, and students – by providing forums and events that showcase current trends, case studies, and best practices in our profession with a focus on practical application and advancing the state of the art.

https://www.meetup.com/San-Francisco-Risk-Managers/