**2. The increasing case for resiliency**

Reliability and resiliency are sometimes discussed in a similar context with respect to subsystem performance; however, they differ conceptually in both the events they measure and the characteristics they quantify. The measures which define reliability provide insights as to the context of the metrics use. Many of the most common reliability metrics utilize mean-based calculations from reoccurring failures over time. These metrics include mean time between failure (MTBF), mean time to failure (MTTF), and mean time to repair (MTTR). These metrics require successive failures in order to quantify subsystem performance. Mean time between failure (MTBF) is used in reliability to provide the number of failures per million hours for a subsystem. Mean time to repair (MTTR) is the time needed to repair a failed subsystem. Mean time to failure (MTTF) measures reliability for a subsystem which cannot be repaired. It is the mean time expected until the first failure of a subsystem. MTTF is a statistical value and represents the mean over a long period of time and a large number of operations. The reliability metrics can effectively represent common cause events which produce reoccurring failures; however, these calculations are less applicable to low probability special cause events. A special cause is something special, not part of the system of common causes. It is detected by a point that falls outside the control limits [1]. Often, subsystems have an allowable level of tolerance to minor disruption preventing sustained impairment in accomplishing the aim of the subsystem. Plotting the number of events by type versus percent of subsystem output disrupted graphically displays the relationship between common cause and special cause events. The allocation of events is closely represented by a pareto distribution **Figure 2**.

**Figure 1.** *Research phases.*

*The Modulus of Resilience for Critical Subsystems DOI: http://dx.doi.org/10.5772/intechopen.93783*

resources to study, adapt, and mitigate these high impact, low probability events before they unexpectedly fracture the established subsystems we rely on. The avoidance of fracture is central to the application of the modulus of resilience in critical subsystems. The chapter will review the differences between the reliability and resiliency as well as the importance of distinguishing between the concepts. Additionally, ideals related to resilience are identified and expressed in a concise operational definition. The research utilized the progression shown in **Figure 1** for

*Operations Management - Emerging Trend in the Digital Era*

Borrowing concepts from materials science allows for an isomorphic application where analogous structures are leveraged to represent HILP event scenarios. In this chapter, the isomorphic application is presented to provide a method of quantifying resiliency or its absence based on the intended aim of the subsystem. This concept is consistent with select portions of previous literature, but divergent in others. Following a review of previous research, a gap analysis was completed to identify opportunities for new considerations in quantifying resiliency. Lastly, an example in applying the modulus of resilience for critical subsystems is provided to

Reliability and resiliency are sometimes discussed in a similar context with respect to subsystem performance; however, they differ conceptually in both the events they measure and the characteristics they quantify. The measures which define reliability provide insights as to the context of the metrics use. Many of the most common reliability metrics utilize mean-based calculations from reoccurring failures over time. These metrics include mean time between failure (MTBF), mean time to failure (MTTF), and mean time to repair (MTTR). These metrics require successive failures in order to quantify subsystem performance. Mean time between failure (MTBF) is used in reliability to provide the number of failures per million hours for a subsystem. Mean time to repair (MTTR) is the time needed to repair a failed subsystem. Mean time to failure (MTTF) measures reliability for a subsystem which cannot be repaired. It is the mean time expected until the first failure of a subsystem. MTTF is a statistical value and represents the mean over a long period of time and a large number of operations. The reliability metrics can effectively represent common cause events which produce reoccurring failures; however, these calculations are less applicable to low probability special cause events. A special cause is something special, not part of the system of common causes. It is detected by a point that falls outside the control limits [1]. Often, subsystems have an allowable level of tolerance to minor disruption preventing sustained impairment in accomplishing the aim of the subsystem. Plotting the number of events by type versus percent of subsystem output disrupted graphically displays the relationship between common cause and special cause events. The allocation of events is closely

the investigation.

demonstrate the computational process.

**2. The increasing case for resiliency**

represented by a pareto distribution **Figure 2**.

**Figure 1.** *Research phases.*

**18**

**Figure 2.** *Representative plot of event type distribution.*

Resiliency events reside at the tail of the distribution as rare events resulting from extraordinary scenarios. Such events have been produced by multiple failures within a single subsystem as discussed in the book Normal Accidents by Charles Perrow. His work examined failures in highly complex operating environments. The increasing interdependence results in an interconnected ecosystem where a failure in a single subsystem can create failures in multiple subsystems. When interactive complexity is joined with tight coupling, the risk of a system accident is considerably increased. Interconnectedness and complexity among contemporary subsystems is increasing at a rapid pace as technologies develop faster than assessments can be made regarding their risks. As we move away from individual events and account for the larger system, we find the "eco-system accident," an interaction of systems that were thought to be independent but are not because of the larger ecology [2]. As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostile environments, increasing their ties to other subsystems, they experience more and more incomprehensible or unexpected interactions [2]. Common mode failures, first included in analytical models in 1967, can contribute to unexpected actions from complex systems. In addition to common mode failures, proximity and indirect information sources are two additional indications of interconnectedness. Ultimately, the probability of a subsystem being subjected to significant disruption is dependent on the cumulative probability of both internal and external risks. Inevitably, the probability of significant disruption will increase as interdependence increases. While increases in events causing significant disruption are expected, their count is not expected to be significant enough for the application of mean-based reliability metrics. Therefore, resiliency-based metrics are needed which match the periodicity and scale of high impact, low probability events.
