**3.1 Thresholds of tolerable risk**

Limits of tolerable risk reflect the level of "risk deemed acceptable by society in order that some particular benefit or functionality can be obtained, but in the knowledge that the risk has been evaluated and is being managed" (https://www. encyclopedia.com). Starting with its first report, the New York (City) Panel on

**65**

**Figure 1.**

*On the Value of Conducting and Communicating Counterfactual Exercise…*

Climate Change [31] employed this notion to frame both its evaluation and management of climate change risks to public and private infrastructure. NPCC communicated the concept to planners and decision-makers by pointing out, for example, that building codes imposed across the City did not try to guarantee that a building will never fall down. Instead, they were designed to produce an environment in which the likelihood of the building's falling down was below some X% threshold, that is, risk above X% was not "tolerable." As climate change or a pandemic or any other outside stressor pushes a particular risk profile closer and closer to similarly defined thresholds of social tolerability, it is reasonable to expect that the investment in risk-reducing adaptations can quickly become a critical part of an iterative

**Figure 1** portrays one way by which current and future risk can be evaluated. A smaller version was created to support adaptation considerations in the face of climate change for public and private investment in New York City infrastructure. The idea was to locate infrastructure on the matrix under the current climate—the beginning of the arrow indicates that location. Planners could then envision how the location on the matrix would move as future trajectories of change evolved upward curving lines, perhaps, that generally move up and to the right at an increasing rate, but drawn as straight lines in **Figure 1** for illustrative simplicity. Green boxes identify low-risk combinations of likelihood and consequence; they are benign and need not be of much worry to the people who manage the facility and the people who benefit from the services that it provides. Yellow and orange boxes identify moderate and significant risk combinations, respectively; they both lie below society's perception of the limit of tolerable risk. Yellow boxes suggest moderate concern, but the orange boxes capture combinations that fall just short of the threshold of tolerability—the boundary between the orange and red boxes. The arrow in **Figure 1** shows how analysts could, by anticipating a dynamic scenario of climate change, alert decision-makers (as they moved along the arrow into the orange region) about the shrinking proximity of intolerable red combinations to which some reactive or preventative actions would be required. Assume for comparison that it takes 4 units of time to reach the tip of the arrow and that time is linear with the box dimensions. In the iterative response program, passing from green region to the yellow takes one unit of time and puts the risk on somebody's radar screen. Passing from yellow to orange in another 1.25 units of time triggers earnest planning and preparation for adaptive response. Finally, passing into the red region during the final 1.75 units of time identifies the anticipated time for action that would certainly include the implementation of outcome monitoring initiatives.

*Risk matrix representation. The formal conceptualization of risk as the product of likelihood and consequence can be portrayed by a two dimensional matrix. Here, subjective calibrations of likelihood and consequence are qualitatively depicted by seven different categories from "Virtually Impossible" to "Virtually Certain" (probabilities close but not equal to zero and close but not equal to one, respectively). Source: Box 4.6 in [32].*

*DOI: http://dx.doi.org/10.5772/intechopen.93639*

response strategy over time.

#### *On the Value of Conducting and Communicating Counterfactual Exercise… DOI: http://dx.doi.org/10.5772/intechopen.93639*

*Environmental Issues and Sustainable Development*

they should be taken seriously.

the next iteration of the modeling.

ensemble.

a political arena. Moreover, of course, improved communication depends in large measure on better modeling—taken one model at a time or together as informative

These challenges bring to mind several strategies that can be productive in improving the workings of the models and the supporting of more confidence in their results. Before they are discussed, however, it can be productive to organize thoughts around more practical issues that can productively be considered when

1.Models should be designed to produce results that are calibrated in terms of the welfare metrics that decision-makers and/or the public are using to compare

2.Modelers should expend some significant efforts using their models to answer "What if?" questions that are actually being asked by decision-makers and members of the public (presumably in reference to tolerable risk). What would happen if we did nothing? Or if we did that? Or something else? What variables are most important in determining trends or variability in the answers to these questions? *These are challenges that call for organized counterfactual* 

3.Even more specifically, modelers can find profit in organizing themselves to examine systematically why different models can produce different results. Are the reasons structural, a matter of different assumptions, reflections of different sensitivities to exogenous drivers, and so on? Organizing and participating in *carefully designed model comparison experiments*, conducted as part of routine model development using representations of uncertainty against tolerable risk levels, can build capacity to communicate with some transparency and intuition why the results of a model or an ensemble of models are true and why

4.Time scale matters in these questions, and so modelers should expect to asked about "When?" as well as "How?" and "What?". Do calibrations of risk manifest themselves over short or long term? Immediately, or with a lag? When should decision-makers plan to act, and what metrics should be monitored to best inform evaluations of the efficacy of their decisions. It follows that *answers to counterfactual and model comparison questions can be very time sensitive*.

5.Reporting on *value of information (VOI)* calculations can often support conclusions about which variables are most important in driving the results into the future. This can be important information when it comes to framing plans for

These thoughts are clearly interwoven, but the following subsections will provide some annotated descriptions of the italicized concepts and how they support

Limits of tolerable risk reflect the level of "risk deemed acceptable by society in order that some particular benefit or functionality can be obtained, but in the knowledge that the risk has been evaluated and is being managed" (https://www. encyclopedia.com). Starting with its first report, the New York (City) Panel on

framing a complete research plan from creation to dissemination:

possible futures against society's implicit levels of *tolerable risk*.

*explorations* to consolidate insights from studies like [21–23]*.*

**64**

the connections.

**3.1 Thresholds of tolerable risk**

Climate Change [31] employed this notion to frame both its evaluation and management of climate change risks to public and private infrastructure. NPCC communicated the concept to planners and decision-makers by pointing out, for example, that building codes imposed across the City did not try to guarantee that a building will never fall down. Instead, they were designed to produce an environment in which the likelihood of the building's falling down was below some X% threshold, that is, risk above X% was not "tolerable." As climate change or a pandemic or any other outside stressor pushes a particular risk profile closer and closer to similarly defined thresholds of social tolerability, it is reasonable to expect that the investment in risk-reducing adaptations can quickly become a critical part of an iterative response strategy over time.

**Figure 1** portrays one way by which current and future risk can be evaluated. A smaller version was created to support adaptation considerations in the face of climate change for public and private investment in New York City infrastructure. The idea was to locate infrastructure on the matrix under the current climate—the beginning of the arrow indicates that location. Planners could then envision how the location on the matrix would move as future trajectories of change evolved upward curving lines, perhaps, that generally move up and to the right at an increasing rate, but drawn as straight lines in **Figure 1** for illustrative simplicity. Green boxes identify low-risk combinations of likelihood and consequence; they are benign and need not be of much worry to the people who manage the facility and the people who benefit from the services that it provides. Yellow and orange boxes identify moderate and significant risk combinations, respectively; they both lie below society's perception of the limit of tolerable risk. Yellow boxes suggest moderate concern, but the orange boxes capture combinations that fall just short of the threshold of tolerability—the boundary between the orange and red boxes.

The arrow in **Figure 1** shows how analysts could, by anticipating a dynamic scenario of climate change, alert decision-makers (as they moved along the arrow into the orange region) about the shrinking proximity of intolerable red combinations to which some reactive or preventative actions would be required. Assume for comparison that it takes 4 units of time to reach the tip of the arrow and that time is linear with the box dimensions. In the iterative response program, passing from green region to the yellow takes one unit of time and puts the risk on somebody's radar screen. Passing from yellow to orange in another 1.25 units of time triggers earnest planning and preparation for adaptive response. Finally, passing into the red region during the final 1.75 units of time identifies the anticipated time for action that would certainly include the implementation of outcome monitoring initiatives.

#### **Figure 1.**

*Risk matrix representation. The formal conceptualization of risk as the product of likelihood and consequence can be portrayed by a two dimensional matrix. Here, subjective calibrations of likelihood and consequence are qualitatively depicted by seven different categories from "Virtually Impossible" to "Virtually Certain" (probabilities close but not equal to zero and close but not equal to one, respectively). Source: Box 4.6 in [32].*

**Figure 1** also suggests how this conceptual device can be used to insert uncertainty about the future into the depiction and the iterative story. The upper dotted line represents a hypothetical 95th percentile scenario that portends larger consequences with growing likelihood. It starts at the same location as the arrow, but it gets to the red region in just 2.4 units of time and spends the remaining 1.6 units plunging farther into the red area. The lower dotted line represents the 5th percentile trajectory; it is also shorter, because it tracks below the median depicts cases where consequences increase more slowly along climate change scenarios that also proceed at a more leisurely pace. It does not even reach the orange level of risk over 4 units of time. Together, these two pathways bound 90% of possible futures drawn from Monte Carlo simulations of a single model or an ensemble of parallel modeling efforts that are all anchored at current conditions. Decision-makers would expect to accelerate preparation and implementation at the point where the upper boundary of the inner 90% projection region (or any other higher or lower likelihood range determined by social norms) crosses the orange-red boundary as a hedge against a high consequence but lower likelihood risk tail. The reported results could, if this analysis were completed, include a distribution of projected response-action trigger-times rather than a single-valued best guess.

Achieving broad acceptance for any tolerable risk threshold is a huge task for many reasons, of course. For one, risk tolerance varies widely across societies and individuals (the locations of their institutional or personal risk thresholds). For another, the real challenge for governors confronting a pandemic or extreme climate change might be navigating between different, but perhaps strongly contradictory or competing risk management plans. It is possible, though. New York State, for example, relied on science to frame its economic strategies in terms of avoiding futures that would overwhelm its hospital system during a second wave of the virus after what had been a successful first response. It supplemented White House [33] "gating criteria" with two forward-looking thresholds: (1) hold the transmission rate of the virus below 1.0 and (2) keep vacancies of hospital beds and ICU beds across the state above 30% of total bed capacity [13]. These are two tolerable risk thresholds to which results from integrated epidemiological-economic models can certainly speak if they are properly designed.

#### **3.2 Counterfactual exercises**

What can be learned when public health and climate change researchers confront the ubiquitous "What if?" questions of science? Recall that Section 2 reported on three COVID-19 counterfactual studies that were of extreme interest to decisionmakers and the public at large: "What if we had started sooner?" and "What if we had not shut down the economy?" [21–23]. The results were striking, but plausible. More importantly, all three studies were also direct applications of one of the most fundamental research strategies in all of science. Counterfactual explorations, in fact, represent an approach to rigorous scientific inquiry that defines a research question, a trial group to test an answer and a control group to provide a basis for comparison—that is, the scientific method applied to scenarios with policy interventions and scenarios without.

Similar examples are abundant across the world of climate science, as well. The Summary for Policymakers of IPCC [17], for example, contains an iconic result from a comparison of two extreme assumptions. Figure SPM.4 is replicated here in **Figure 2**; it depicts a result that changed the way the entire world thought about global warming and our confidence in the proposition that it was primarily the product of human activity. The various panels of the figure compare the actual historical global mean surface temperature record (starting in 1910) with distributions

**67**

continents.

**Figure 2.**

*On the Value of Conducting and Communicating Counterfactual Exercise…*

of estimated global mean trajectories produced an ensemble of climate models including (trial group) and not including (control group) historically observed carbon emissions and associated forcings. The actual temperature pathway tracks inside only the distributions that include carbon forcings. Moreover, the inner 90-percentile regions of the two distributions around the mean estimates bifurcate around 1980 (earlier for some continents and later for Australia); that is, beyond those bifurcation dates, the likelihood that both distributions are the products of a static climate are virtually nil. Actual temperature tracking therefore combines with the bifurcations to confirm, with very high confidence in 2007, that carbon emissions are a primary cause of observed long term warming globally and across 6 of 7

*Comparison of observed continental- and global-scale changes in surface temperature with results simulated by climate models using either natural or both natural and anthropogenic forcings. Decadal averages of observations are shown for the period 1906–2005 (black line) plotted against the center of the decade and relative to the corresponding average for the period 1901–1950. Lines are dashed where spatial coverage is less than 50%. Blue-shaded bands show 5–95% range for 19 simulations from five climate models using only the natural forcings due to solar activity and volcanoes. Red-shaded bands show 5–95% range for 58 simulations from 14 climate models using both natural and anthropogenic forcings. Source: Figure SPM.4 in [17].*

**Figure 3** shows results from a more recent counterfactual approach that confronts a "try this versus try that" comparison from [20]. The three panels show the results of a modeling exercise designed to produce distributions of economic cost (or benefit) from climate change in 4–20 year climate eras running from 2020 to 2100 for 4 different mitigation (temperature target) futures and 7 geographical regions that cover the contiguous 48 states of the continental US [34]; distributions of transient regional temperature changes were drawn from [35]. Panel A shows estimates for labor costs (in terms of lost annual wages

*DOI: http://dx.doi.org/10.5772/intechopen.93639*

*On the Value of Conducting and Communicating Counterfactual Exercise… DOI: http://dx.doi.org/10.5772/intechopen.93639*

#### **Figure 2.**

*Environmental Issues and Sustainable Development*

trigger-times rather than a single-valued best guess.

certainly speak if they are properly designed.

**3.2 Counterfactual exercises**

ventions and scenarios without.

**Figure 1** also suggests how this conceptual device can be used to insert uncertainty about the future into the depiction and the iterative story. The upper dotted line represents a hypothetical 95th percentile scenario that portends larger consequences with growing likelihood. It starts at the same location as the arrow, but it gets to the red region in just 2.4 units of time and spends the remaining 1.6 units plunging farther into the red area. The lower dotted line represents the 5th percentile trajectory; it is also shorter, because it tracks below the median depicts cases where consequences increase more slowly along climate change scenarios that also proceed at a more leisurely pace. It does not even reach the orange level of risk over 4 units of time. Together, these two pathways bound 90% of possible futures drawn from Monte Carlo simulations of a single model or an ensemble of parallel modeling efforts that are all anchored at current conditions. Decision-makers would expect to accelerate preparation and implementation at the point where the upper boundary of the inner 90% projection region (or any other higher or lower likelihood range determined by social norms) crosses the orange-red boundary as a hedge against a high consequence but lower likelihood risk tail. The reported results could, if this analysis were completed, include a distribution of projected response-action

Achieving broad acceptance for any tolerable risk threshold is a huge task for many reasons, of course. For one, risk tolerance varies widely across societies and individuals (the locations of their institutional or personal risk thresholds). For another, the real challenge for governors confronting a pandemic or extreme climate change might be navigating between different, but perhaps strongly contradictory or competing risk management plans. It is possible, though. New York State, for example, relied on science to frame its economic strategies in terms of avoiding futures that would overwhelm its hospital system during a second wave of the virus after what had been a successful first response. It supplemented White House [33] "gating criteria" with two forward-looking thresholds: (1) hold the transmission rate of the virus below 1.0 and (2) keep vacancies of hospital beds and ICU beds across the state above 30% of total bed capacity [13]. These are two tolerable risk thresholds to which results from integrated epidemiological-economic models can

What can be learned when public health and climate change researchers confront the ubiquitous "What if?" questions of science? Recall that Section 2 reported on three COVID-19 counterfactual studies that were of extreme interest to decisionmakers and the public at large: "What if we had started sooner?" and "What if we had not shut down the economy?" [21–23]. The results were striking, but plausible. More importantly, all three studies were also direct applications of one of the most fundamental research strategies in all of science. Counterfactual explorations, in fact, represent an approach to rigorous scientific inquiry that defines a research question, a trial group to test an answer and a control group to provide a basis for comparison—that is, the scientific method applied to scenarios with policy inter-

Similar examples are abundant across the world of climate science, as well. The Summary for Policymakers of IPCC [17], for example, contains an iconic result from a comparison of two extreme assumptions. Figure SPM.4 is replicated here in **Figure 2**; it depicts a result that changed the way the entire world thought about global warming and our confidence in the proposition that it was primarily the product of human activity. The various panels of the figure compare the actual historical global mean surface temperature record (starting in 1910) with distributions

**66**

*Comparison of observed continental- and global-scale changes in surface temperature with results simulated by climate models using either natural or both natural and anthropogenic forcings. Decadal averages of observations are shown for the period 1906–2005 (black line) plotted against the center of the decade and relative to the corresponding average for the period 1901–1950. Lines are dashed where spatial coverage is less than 50%. Blue-shaded bands show 5–95% range for 19 simulations from five climate models using only the natural forcings due to solar activity and volcanoes. Red-shaded bands show 5–95% range for 58 simulations from 14 climate models using both natural and anthropogenic forcings. Source: Figure SPM.4 in [17].*

of estimated global mean trajectories produced an ensemble of climate models including (trial group) and not including (control group) historically observed carbon emissions and associated forcings. The actual temperature pathway tracks inside only the distributions that include carbon forcings. Moreover, the inner 90-percentile regions of the two distributions around the mean estimates bifurcate around 1980 (earlier for some continents and later for Australia); that is, beyond those bifurcation dates, the likelihood that both distributions are the products of a static climate are virtually nil. Actual temperature tracking therefore combines with the bifurcations to confirm, with very high confidence in 2007, that carbon emissions are a primary cause of observed long term warming globally and across 6 of 7 continents.

**Figure 3** shows results from a more recent counterfactual approach that confronts a "try this versus try that" comparison from [20]. The three panels show the results of a modeling exercise designed to produce distributions of economic cost (or benefit) from climate change in 4–20 year climate eras running from 2020 to 2100 for 4 different mitigation (temperature target) futures and 7 geographical regions that cover the contiguous 48 states of the continental US [34]; distributions of transient regional temperature changes were drawn from [35]. Panel A shows estimates for labor costs (in terms of lost annual wages

#### **Figure 3.**

*Selected results from regional transient sectoral damage trajectories. Regional damage trajectories (median, 5th and 95th percentiles and the inner quartile range) are displayed across the contiguous 48 states for the 4 benchmark climate eras along all four of the emissions-driven GMT scenarios. Source: [34] Panel A: Labor damages: Annual lost wages per capita by region and year for the "2.0" and "BAU" scenarios. Panel B: Labor damages: Annual lost wages per capita by region and year for the "1.5" and "3.0" scenarios. Panel C: "1.5" versus "2.0" damages by sector in 2090: 5th to 95th percentile range of total costs (in millions of dollars) across the contiguous 48 states by sector for the "1.5" degree (in orange) and "2.0" degree scenarios (in blue). Overlapping ranges are shown in gray. Median estimates for each sector and scenarios are shown as dots in the color representing the scenario. Note the differences in scales between the two panels due to large variation in magnitude across sectors. Three sectors (aeroallergens, harmful algal blooms, and municipal and industrial water supply) are not shown, as the magnitude of damages is negligible compared to other sectors.*

**69**

tolerable risk).

**3.3 Model comparisons**

*On the Value of Conducting and Communicating Counterfactual Exercise…*

per capita) for two different emissions scenarios—one is a "business as usual (BAU)" scenario, and the other keeps global mean surface temperature (GMST) increases below 2°C through 2100 along the median trajectory. Bifurcations of the inner 90% ranges occur by mid-century; and losses along BAU are uniformly much higher. Panel B replicates A for another two emissions scenarios—one limits the median GMST increase to 1.5°C, and the other, to 3°C. Again, statistically significant bifurcations occur in the mid-century, and losses are higher with

Loss differences for labor and 15 other sectors were a critical topic of concern

When it came to economic damages, though, Yohe [35] suggests that the value of hitting a warming target of 2°C instead 1.5°C might not be as impressive as it is for natural systems and other social systems that are already stressed by confounding factors. Panel C of **Figure 3** makes this point for 16 sectors that were subjected to the same regional analysis as described above. All of the 2°C distributions overlap the 1.5°C distributions in 2090, so no bifurcations can be observed. Any conclusion of higher economic cost for the 2°C target must therefore be offered with at most

These examples show that decision-makers and the public should be happy to see their decisions and perceptions informed by counterfactual experiments designed to identify the *when* differences in the risk profiles of alternative responses become statistically significant in terms of their net social benefit. Plotting the foundational distributions over time for alternative response options allows these experiments quickly to estimate when, in the future, it can be expected that the risk portraits of various policy options will become statistically different with, say, *very high confidence* because the 5th to 95th percentile distribution cones bifurcate valuable information, no doubt, for designing and implementing an iterative risk management response for a particular decision-making structure (like avoiding (in)

In the climate arena, large groups of willing modelers sometimes all agree to run their models with the same distributions of the same sets of driving variables to explore their models' respective sensitivities or compare response policies' performances across a spectrum projected futures [37–39]. Sometimes, the participants also run contrasting idiosyncratic "modelers' choice" scenarios; and some even run full Monte Carlo analyses across relevant sources of uncertainty. When that happens, scientists can learn something about themselves as well as their topics

when IPCC received an invitation from the members of the United Nations Framework Convention on Climate Change to provide report "on the impacts of global warming of 1.5°C above pre-industrial levels." The IPCC accepted the invitation in April of 2016 when it decided to prepare a "Special Report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty" [36]. The headline messages included: "Climate-related risks for natural and human systems are higher for global warming of 1.5°C than at present, but lower than at 2°C (*high confidence*). These risks depend on the magnitude and rate of warming, geographic location, levels of development and vulnerability, and on the choices and implementation of adaptation and mitigation options (*high confidence*). The avoided climate change impacts on sustainable development, eradication of poverty and reducing inequalities would be greater if global warming were limited to 1.5°C rather than 2°C, if mitigation and adaptation synergies are maximized while trade-

*DOI: http://dx.doi.org/10.5772/intechopen.93639*

warmer temperatures [34].

offs are minimized (*high confidence*)."

*medium confidence*, on the basis of a single study, anyway.

#### *On the Value of Conducting and Communicating Counterfactual Exercise… DOI: http://dx.doi.org/10.5772/intechopen.93639*

*Environmental Issues and Sustainable Development*

**68**

**Figure 3.**

*Selected results from regional transient sectoral damage trajectories. Regional damage trajectories (median, 5th and 95th percentiles and the inner quartile range) are displayed across the contiguous 48 states for the 4 benchmark climate eras along all four of the emissions-driven GMT scenarios. Source: [34] Panel A: Labor damages: Annual lost wages per capita by region and year for the "2.0" and "BAU" scenarios. Panel B: Labor damages: Annual lost wages per capita by region and year for the "1.5" and "3.0" scenarios. Panel C: "1.5" versus "2.0" damages by sector in 2090: 5th to 95th percentile range of total costs (in millions of dollars) across the contiguous 48 states by sector for the "1.5" degree (in orange) and "2.0" degree scenarios (in blue). Overlapping ranges are shown in gray. Median estimates for each sector and scenarios are shown as dots in the color representing the scenario. Note the differences in scales between the two panels due to large variation in magnitude across sectors. Three sectors (aeroallergens, harmful algal blooms, and municipal and industrial water supply) are not shown, as the magnitude of damages is negligible compared to other sectors.*

per capita) for two different emissions scenarios—one is a "business as usual (BAU)" scenario, and the other keeps global mean surface temperature (GMST) increases below 2°C through 2100 along the median trajectory. Bifurcations of the inner 90% ranges occur by mid-century; and losses along BAU are uniformly much higher. Panel B replicates A for another two emissions scenarios—one limits the median GMST increase to 1.5°C, and the other, to 3°C. Again, statistically significant bifurcations occur in the mid-century, and losses are higher with warmer temperatures [34].

Loss differences for labor and 15 other sectors were a critical topic of concern when IPCC received an invitation from the members of the United Nations Framework Convention on Climate Change to provide report "on the impacts of global warming of 1.5°C above pre-industrial levels." The IPCC accepted the invitation in April of 2016 when it decided to prepare a "Special Report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable development, and efforts to eradicate poverty" [36]. The headline messages included: "Climate-related risks for natural and human systems are higher for global warming of 1.5°C than at present, but lower than at 2°C (*high confidence*). These risks depend on the magnitude and rate of warming, geographic location, levels of development and vulnerability, and on the choices and implementation of adaptation and mitigation options (*high confidence*). The avoided climate change impacts on sustainable development, eradication of poverty and reducing inequalities would be greater if global warming were limited to 1.5°C rather than 2°C, if mitigation and adaptation synergies are maximized while tradeoffs are minimized (*high confidence*)."

When it came to economic damages, though, Yohe [35] suggests that the value of hitting a warming target of 2°C instead 1.5°C might not be as impressive as it is for natural systems and other social systems that are already stressed by confounding factors. Panel C of **Figure 3** makes this point for 16 sectors that were subjected to the same regional analysis as described above. All of the 2°C distributions overlap the 1.5°C distributions in 2090, so no bifurcations can be observed. Any conclusion of higher economic cost for the 2°C target must therefore be offered with at most *medium confidence*, on the basis of a single study, anyway.

These examples show that decision-makers and the public should be happy to see their decisions and perceptions informed by counterfactual experiments designed to identify the *when* differences in the risk profiles of alternative responses become statistically significant in terms of their net social benefit. Plotting the foundational distributions over time for alternative response options allows these experiments quickly to estimate when, in the future, it can be expected that the risk portraits of various policy options will become statistically different with, say, *very high confidence* because the 5th to 95th percentile distribution cones bifurcate valuable information, no doubt, for designing and implementing an iterative risk management response for a particular decision-making structure (like avoiding (in) tolerable risk).

#### **3.3 Model comparisons**

In the climate arena, large groups of willing modelers sometimes all agree to run their models with the same distributions of the same sets of driving variables to explore their models' respective sensitivities or compare response policies' performances across a spectrum projected futures [37–39]. Sometimes, the participants also run contrasting idiosyncratic "modelers' choice" scenarios; and some even run full Monte Carlo analyses across relevant sources of uncertainty. When that happens, scientists can learn something about themselves as well as their topics

of interest. The early EMF-12 experiment, for example, displayed a curious result that persists over time—the variances of the output distributions for the modelers' choice runs were significantly smaller than the output variances for the "common inputs" runs. It would seem that integrated assessment teams tended to be uncomfortable if their results were outliers in comparison with competing teams—a cautionary bias for decision contexts where ensemble distributions may be too narrow because thick tails could be catastrophic [37].

The Coupled Model Intercomparison Project (CMIP) was established by the Program for Climate Model Diagnosis and Inter-comparison (PCMDI) at Lawrence Livermore National Laboratories. PCMDI's mission since 1989 has been to develop methods to rigorously diagnose and evaluate climate models from around the world, because the causes and character of divergent modeling results should be uncovered before they are trusted by decision-makers around the world. Over time, it has "inspired a fundamental cultural shift in the climate research community: there is now an expectation that everyone should have timely and unimpeded access to output from standardized climate model simulations. This has enabled widespread scientific analysis and scrutiny of the models and, judging by the large number of resulting scientific publications, has accelerated our understanding of climate and climate change" [38].

CMIP, itself, began in 1995 with the support and encouragement of the World Climate Research Program (WCRP). Its first set of common experiments—comparing model responses to an "Idealized" forcing of 1% per year increase in carbon dioxide emissions. Subsequent experiments expanded continuing idealized forcing work to include parallel investigations historical forcings and comparisons with the observed records of climate variables like global mean surface temperature [39]. CMIP5 and CMIP6, for example, have explored why ensemble results do not track observations perfectly. The reason is uncertainty. Model results reflect uncertainties, of course; but temperature observations are also imprecise. They are not records from a global set of thermometers; they are, instead, the products of model interpretations of remotely sensed data. Understanding why and how the differences occur is especially important because they are the ammunition for attacks by science skeptics and politicians with an anti-climate change perspective [40, 41].

CMIP also sponsors coordinated experiments like the "water hosing" experiments designed to explore the sensitivity of the strength of the overturning of the North Atlantic thermohaline global circulation current to changes in upper ocean salinity. There, climate is held constant except for a simulated influx of un-salty glacial melt water from Greenland. CMIP has, as well and since its inception, focused a lot of attention of making model inter-comparison data available to a wider scientific community than the modelers themselves. Here, a global coordination effort for scientific collaboration is admitting that communication to other research communities, decision-makers at all levels, and private citizens is an important part of their job description; some authors (e.g., [40]) have even included second abstracts in their published versions written in plain language.

Contrasting that approach to public health model comparisons with dramatically different time scales and therefore dramatically different client needs adds diversity to the sources of new knowledge about the models and their relative skills. Shea et al. [42], motivated by the aggressive responses of many modeling groups to "forecast disease trajectory, assess interventions, and improve understanding of the pathogen," expressed concern that their disparate projections might "hinder intervention planning and response by policy-makers." These authors recognized that models do differ widely for a variety of good and not so good reasons. They also noted that relying on one model for authority might cause valuable "insights and information from other models" to be overlooked, thereby "limiting the

**71**

and the public.

**3.5 The value of information**

*On the Value of Conducting and Communicating Counterfactual Exercise…*

opportunity for decision-makers to account for risk and uncertainty and resulting in more lives lost." As a result, they advocated a more systematic approach that would use expert elicitation methods to inform a CMIP style model comparison architecture within which decision-theoretic frameworks would provide rigorous access to calibration techniques. While certainly an addition to a long tradition of sometimes sporadic model comparison in public health, NSF [43] notes that this proposal would be the first time modelers would be allowed in the structure itself to

Taken together, an ensemble of models that were designed to inform COVID-19

Daily recalculations may be excessive for most pandemic models, but surely, on a regular basis, decision-makers, analysts, and media types are anxious to compare the ensemble distributions of these results against the actual historical data. These plots produce insight into the relative near-term skills of the models across different geographic scales. Modelers would surely be interested, as well and as shown by large participation in the CMIP exercises, because they will continue to try to improve their work and make it more valuable and accessible to the decision-makers who use it and the correspondents who interpret it. The point, as described in the mission of the PCMDI, is to generate early confidence in modelers' abilities to project what will likely happen given what just happened and to communicate what

In light of this responsibility, climate scientists and epidemiologists have found it useful to conduct short-run skill tests of their models because they anticipate the need to understand and portray future changes that may happen very quickly. For example, it may become imperative at some point in the future to cope with sudden downstream impacts along an otherwise gradual scenario of change—an impact caused, for example, by crossing some unexpected critical threshold at some unknown date. In other cases, testing near term skill may be important to reassure clients of the quality of model results so that the implications of new and significantly different information can be processed by quickly decision-makers

In an era of increasingly tight budgets, it is imperative that funders in both the public and private sectors understand the value of investments in different types of information distributed across and within the germane research areas. Climate change research is a case in point; billions of dollars are being spent to improve the knowledge base for future decision-making. A study by the National Research Council [44] called for decision tools to assist in estimating "the value of new information which can help decision makers plan research programs and determine

response decisions were capable simultaneously and independently to produce estimates at many time scales and different geographic resolutions—daily, monthly, and a few years into the future for a city or town, a state, a region, or the country as a whole. So, too, are ensembles of climate models. Informed by new data and or new understanding of processes which drive component parts of their models, some modelers in either context can publish new sets of estimates for new combinations and permutations of scale and location diversity at the same time. Done often enough over the course of a month or two for pandemics and 5 or 10 years for climate, those modelers could synthesize collections of time series of short-term

estimates for any number of important output variables.

that means clearly to the populations that care.

*DOI: http://dx.doi.org/10.5772/intechopen.93639*

see why their models disagree.

**3.4 Time scale matters**

#### *On the Value of Conducting and Communicating Counterfactual Exercise… DOI: http://dx.doi.org/10.5772/intechopen.93639*

opportunity for decision-makers to account for risk and uncertainty and resulting in more lives lost." As a result, they advocated a more systematic approach that would use expert elicitation methods to inform a CMIP style model comparison architecture within which decision-theoretic frameworks would provide rigorous access to calibration techniques. While certainly an addition to a long tradition of sometimes sporadic model comparison in public health, NSF [43] notes that this proposal would be the first time modelers would be allowed in the structure itself to see why their models disagree.

### **3.4 Time scale matters**

*Environmental Issues and Sustainable Development*

narrow because thick tails could be catastrophic [37].

climate and climate change" [38].

of interest. The early EMF-12 experiment, for example, displayed a curious result that persists over time—the variances of the output distributions for the modelers' choice runs were significantly smaller than the output variances for the "common inputs" runs. It would seem that integrated assessment teams tended to be uncomfortable if their results were outliers in comparison with competing teams—a cautionary bias for decision contexts where ensemble distributions may be too

The Coupled Model Intercomparison Project (CMIP) was established by the Program for Climate Model Diagnosis and Inter-comparison (PCMDI) at Lawrence Livermore National Laboratories. PCMDI's mission since 1989 has been to develop methods to rigorously diagnose and evaluate climate models from around the world, because the causes and character of divergent modeling results should be uncovered before they are trusted by decision-makers around the world. Over time, it has "inspired a fundamental cultural shift in the climate research community: there is now an expectation that everyone should have timely and unimpeded access to output from standardized climate model simulations. This has enabled widespread scientific analysis and scrutiny of the models and, judging by the large number of resulting scientific publications, has accelerated our understanding of

CMIP, itself, began in 1995 with the support and encouragement of the World Climate Research Program (WCRP). Its first set of common experiments—comparing model responses to an "Idealized" forcing of 1% per year increase in carbon dioxide emissions. Subsequent experiments expanded continuing idealized forcing work to include parallel investigations historical forcings and comparisons with the observed records of climate variables like global mean surface temperature [39]. CMIP5 and CMIP6, for example, have explored why ensemble results do not track observations perfectly. The reason is uncertainty. Model results reflect uncertainties, of course; but temperature observations are also imprecise. They are not records from a global set of thermometers; they are, instead, the products of model interpretations of remotely sensed data. Understanding why and how the differences occur is especially important because they are the ammunition for attacks by science skeptics and politicians with an anti-climate change perspective [40, 41]. CMIP also sponsors coordinated experiments like the "water hosing" experiments designed to explore the sensitivity of the strength of the overturning of the North Atlantic thermohaline global circulation current to changes in upper ocean salinity. There, climate is held constant except for a simulated influx of un-salty glacial melt water from Greenland. CMIP has, as well and since its inception, focused a lot of attention of making model inter-comparison data available to a wider scientific community than the modelers themselves. Here, a global coordination effort for scientific collaboration is admitting that communication to other research communities, decision-makers at all levels, and private citizens is an important part of their job description; some authors (e.g., [40]) have even included second

abstracts in their published versions written in plain language.

Contrasting that approach to public health model comparisons with dramatically different time scales and therefore dramatically different client needs adds diversity to the sources of new knowledge about the models and their relative skills. Shea et al. [42], motivated by the aggressive responses of many modeling groups to "forecast disease trajectory, assess interventions, and improve understanding of the pathogen," expressed concern that their disparate projections might "hinder intervention planning and response by policy-makers." These authors recognized that models do differ widely for a variety of good and not so good reasons. They also noted that relying on one model for authority might cause valuable "insights and information from other models" to be overlooked, thereby "limiting the

**70**

Taken together, an ensemble of models that were designed to inform COVID-19 response decisions were capable simultaneously and independently to produce estimates at many time scales and different geographic resolutions—daily, monthly, and a few years into the future for a city or town, a state, a region, or the country as a whole. So, too, are ensembles of climate models. Informed by new data and or new understanding of processes which drive component parts of their models, some modelers in either context can publish new sets of estimates for new combinations and permutations of scale and location diversity at the same time. Done often enough over the course of a month or two for pandemics and 5 or 10 years for climate, those modelers could synthesize collections of time series of short-term estimates for any number of important output variables.

Daily recalculations may be excessive for most pandemic models, but surely, on a regular basis, decision-makers, analysts, and media types are anxious to compare the ensemble distributions of these results against the actual historical data. These plots produce insight into the relative near-term skills of the models across different geographic scales. Modelers would surely be interested, as well and as shown by large participation in the CMIP exercises, because they will continue to try to improve their work and make it more valuable and accessible to the decision-makers who use it and the correspondents who interpret it. The point, as described in the mission of the PCMDI, is to generate early confidence in modelers' abilities to project what will likely happen given what just happened and to communicate what that means clearly to the populations that care.

In light of this responsibility, climate scientists and epidemiologists have found it useful to conduct short-run skill tests of their models because they anticipate the need to understand and portray future changes that may happen very quickly. For example, it may become imperative at some point in the future to cope with sudden downstream impacts along an otherwise gradual scenario of change—an impact caused, for example, by crossing some unexpected critical threshold at some unknown date. In other cases, testing near term skill may be important to reassure clients of the quality of model results so that the implications of new and significantly different information can be processed by quickly decision-makers and the public.
