**Abstract**

Accelerating digitization of critical infrastructures is increasing interconnection and interdependence among high-reliability subsystems. The resulting dependencies create new challenges in preventing underinvestment in high impact, low probability (HILP) events which can have disastrous consequences for society's critical subsystems. These more impactful events highlight the differences between reliability and resiliency, with the latter applicable to black swans. A number of approaches for quantifying resiliency have been proposed; however, a review of literature identified conceptual gaps when applied to empirical event data. This chapter provides a scenario agnostic method to quantify resiliency by applying concepts from materials science in a generalized form. This new formulation resulted from a mapping of constructs used in tensile testing to characteristics of protracted subsystem disruptions. Based on the mapping and gap analysis, a resiliency index calculation was developed and applied using examples based on empirical data from high impact events.

**Keywords:** resiliency, critical infrastructures, high impact, low probability (HILP), reliability, digital systems

### **1. Introduction**

Digitization is occurring in many industries in many different forms; however, regardless of the application, a common set of enablers are employed. As the proliferation of digital transformation continues, decision makers will need to distinguish between reliability and resiliency in the planning, design, and operation of these subsystems. Tightly coupled common hardware and software platforms potentially increase the breadth of accidental failures as well as the impact of intentional sabotage. Beyond end use applications is an overall reliance on electricity which these digital subsystems require to function. Hardware, software, and electricity form the foundation upon which digitalization rest. The increased interdependence and interconnection can lead to common failure modes of previously isolated subsystems, resulting in increased probability of high impact events. Interconnection results in the establishment of a singular system with all other structures existing as subsystems. Evaluation of subsystems will need to include internally and externally initiated disruptive events. Highly impactful events, sometimes termed black swans, cannot only disrupt subsystems but fundamentally change their structure. Impactful as they are, rarity can make these events prone to underinvestment due to heuristics and biases, most prominently the availability heuristic. A quantifiable metric can aid in our ability to appropriately allocate

resources to study, adapt, and mitigate these high impact, low probability events before they unexpectedly fracture the established subsystems we rely on. The avoidance of fracture is central to the application of the modulus of resilience in critical subsystems. The chapter will review the differences between the reliability and resiliency as well as the importance of distinguishing between the concepts. Additionally, ideals related to resilience are identified and expressed in a concise operational definition. The research utilized the progression shown in **Figure 1** for the investigation.

Borrowing concepts from materials science allows for an isomorphic application where analogous structures are leveraged to represent HILP event scenarios. In this chapter, the isomorphic application is presented to provide a method of quantifying resiliency or its absence based on the intended aim of the subsystem. This concept is consistent with select portions of previous literature, but divergent in others. Following a review of previous research, a gap analysis was completed to identify opportunities for new considerations in quantifying resiliency. Lastly, an example in applying the modulus of resilience for critical subsystems is provided to demonstrate the computational process.

Resiliency events reside at the tail of the distribution as rare events resulting from extraordinary scenarios. Such events have been produced by multiple failures within a single subsystem as discussed in the book Normal Accidents by Charles Perrow. His work examined failures in highly complex operating environments. The increasing interdependence results in an interconnected ecosystem where a failure in a single subsystem can create failures in multiple subsystems. When interactive complexity is joined with tight coupling, the risk of a system accident is considerably increased. Interconnectedness and complexity among contemporary subsystems is increasing at a rapid pace as technologies develop faster than assessments can be made regarding their risks. As we move away from individual events and account for the larger system, we find the "eco-system accident," an interaction of systems that were thought to be independent but are not because of the larger ecology [2]. As systems grow in size and in the number of diverse functions they serve, and are built to function in ever more hostile environments, increasing their ties to other subsystems, they experience more and more incomprehensible or unexpected interactions [2]. Common mode failures, first included in analytical models in 1967, can contribute to unexpected actions from complex systems. In addition to common mode failures, proximity and indirect information sources are two additional indications of interconnectedness. Ultimately, the probability of a subsystem being subjected to significant disruption is dependent on the cumulative probability of both internal and external risks. Inevitably, the probability of significant disruption will increase as interdependence increases. While increases in events causing significant disruption are expected, their count is not expected to be significant enough for the application of mean-based reliability metrics. Therefore, resiliency-based metrics are needed which match the periodicity and scale of high

impact, low probability events.

**19**

**Figure 2.**

*Representative plot of event type distribution.*

*The Modulus of Resilience for Critical Subsystems DOI: http://dx.doi.org/10.5772/intechopen.93783*

**3. Quantifying high impact, low probability events**

HILP events require a subsystem to bounce back to normalcy following major disruption. The goal is to regain pre-disruption levels of output as quickly as possible; however, recovery time is not the only metric of importance. The shape of the recovery curve is also of significance. Resiliency aids in defining a disaster response paradigm which differs from previous approaches such as resistance and sustainability by emphasizing return to normal. Nonetheless, the literature frequently uses the concept of resilience to imply the ability to recover or bounce back to normalcy after a disaster occurs [3]. Review of scholarly work related to the resiliency concept identified three main ideals: no assumption that disaster prevention is always possible,
