**3. The theoretical mechanistic/statistical inductive (TM/SI) framework**

The TM/SI framework borrows from a strong body of earlier work (e.g. [4, 14, 15, 22, 23]) and was outlined in Section 2 of JR2017 to provide support for reasoning about climate where the scientific debate had been muddied by competing claims from outside the science community.

The approach follows Haig [15], and employs the concept of severe testing [3], and in keeping with it, error-statistical methods [4, 23]. It requires a carefully reasoned matching between scientific hypotheses about the physical world, with statistical hypotheses about the observed data.

The theoretical-mechanistic part consists of the physical aspects, components, relationships and measurable quantities.

The statistical-inductive part consists of the process of drawing conclusions about specified hypotheses concerning the system given the physical model, realworld data and statistical tests.

The goal is to construct a chain of reasoning that ties physical hypotheses, *H1 .. Hn*, to statistical hypotheses *h1 .. hn*. That is, features of the world map to defined outcomes of statistical tests (preferably one to one). One to one mapping meets a requirement of severe testing. Misspecification testing assists this mapping.

#### **3.1 The TM/SI structure**

Suppes [14] suggested that science employs a values hierarchy of models that ranges from experimental experience to theory, claiming that theoretical models, high on the hierarchy, are not compared directly with empirical data, which are low on the hierarchy. Rather, he said, they are compared with models of the data, which are higher than data on the hierarchy. Following on, Haig describes an egalitarian framework in which three different types of models are interconnected and serve to structure error-statistical inquiry [15].

He describes: Primary models which break a research question into a set of local hypotheses; Experimental models which "structure the particular models", and link Primary models to Data models; which in turn generate and model raw data, and check that the data satisfies the assumptions of experimental models. Although Haig does not fully explain experimental models which "structure the particular models" it seems implicit that they map hypotheses to model components and processes. He leaves to his data models the role of checking that data meets the assumptions of experimental models.

To summarize, the TM/SI was constructed with physically grounded work in mind, and adapts Haig's approach. Physical entities and their relationships about which we propose hypotheses guided by Physical models are the Primary models. These link to the Statistical models which support reasoning with an inductive framework, via Data models which includes Sampling procedures. Sampling procedures guide the accumulation of data on which we reason. All data sampling procedures and statistical tests are framed against *ruling assumptions*. Violation of the ruling assumptions weakens statistical inference.

equally in all samples. An influence travelling in space and time when averaged will

Temperature records increase in spatial density over time, they are records of opportunity. Conditions over land and ocean differ. To enable inter-comparison with models they are re-interpolated onto regular grids. They are also homogenized

Both *M1* and *M2* worlds have time varying temperature records but because forced change in M1 propagates rapidly, averaging does not induce troublesome artefacts. This is not the case for *M2*. A step-like change occurring serially across regions may give rise to trends and auto-regression, and or may obscures more

Different statistical models are involved in detection of changes, and in the assessment of the relative merits of *M1* and *M2*. It is important that the probabilities

In break-point analysis the family of segmented linear regression models is used. The choice of specific parameters from within a specified family is termed model selection, and would in our work include the serial selection of specific changepoints. The MSBV differs from other approaches in that it does not terminate the search for change-points (feature detection) on the basis of an all of model information criterion such AIC (a model selection criterion), but usually earlier, when no

In our work, detection of such steps, supported by evidence that they are not artefacts provided by M-S testing supports constitutes support for *M2*, and thus

Mayo and Spanos differentiate between model specification and model selection. An adequate model specification licenses primary statistical inference, and with it statistical model selection from the specified family. Serial feature detection in any time series is a form of model selection from a family of related models, reliant on

Chapter 2 of [25] defines experimental error as all extraneous variation outside experimental treatments, and states "Neither the presence of experimental errors or their causes need concern the investigator, provided his [sic] results are sufficiently accurate to permit definite conclusions to be reached". This definition still dominates statistical climatology. Climate data are not generally experimental, but often a feature of interest in climate data is investigated by treating natural variability as extraneous variation. Experimental design requires that statistical models are properly specified, however complex systems being observed may align to many different statistical models and have multiple features of interest, leading to the

Mayo and Spanos [2] (MS2004) introduce a methodology for testing misspecifications in statistical models (M-S testing). Taking this as a point of departure we then propose that a full understanding of the assumptions of statistical models allows one to probe data for features even when available tests are misspecified.

model specification. It must be noted that for our work a series of tests are performed, a single detection test and multiple probative tests, but that as each is against an independent null, this does not involve a multiple-testing issue, instead

increasing the overall power of the testing regime.

from statistical *feature detection* not be also used for *model selection*.

appear as some form of non-stationarity in the time series of the mean.

*Severe Testing and Characterization of Change Points in Climate Time Series*

*DOI: http://dx.doi.org/10.5772/intechopen.98364*

prior to gridding to deal with instrumentation changes [24].

regional signals.

*3.2.3 Statistical model(s)*

segment can be sub-divided.

**4. Misspecification testing**

possibility of misspecification.

**213**

support for *H2*.

Physical Model: Concerns the system of physical entities and their interactions. Entities have measurable properties, which are accessed through Sampling procedures.

Statistical model. A mapping between a sampled set of observations and a set of parameterized probability distributions. This is informed by an error model – the theoretical behavior and characteristic distribution of sampling error, generally assumed to be random. If properly specified, the statistical model(s) license(s) valid statistical inductions about hypotheses, generally via statistical model selection from a specified statistical family.

Sampling procedures: Data models which cover the collection of measurements. Measurements are made, and treated (e.g. homogenized), and output to become sample data input to statistical models. The choice of sampling model (random sampling, averaging) influences subsequent induction since sampling error subsumes both random processes and statistical misspecification.

Severe Testing requires that these issues be accounted for so that to the extent possible, when features are present they are detected, and when they are not present they are not erroneously identified.

#### **3.2 Applying the TM/SI to climate**

#### *3.2.1 Physical model: surface temperatures*

To guide investigation we propose in JR2017 (a) physical model *M1* – a world in which average surface temperatures closely track forced warming, and natural variability is independent, and reflective of the indices of variability and by contrast (b) a physical model *M2* – which mirrors *M1* but in which there is interaction between forcing and natural variability. The *M2* world requires that Earth's surface temperature is additionally reflective of, and tracks, internal physical states of variability modes which may change abruptly, thus imprinting step-like shifts into the temperature records. These shifts mark state changes in the climate system, and represent the major response at decadal time scales of the climate system to the gradually increasing greenhouse forcing. Earth's surface temperature is sampled, but it is understood that this also reflects the overall state of heat transport in the fluid layers.

#### *3.2.2 Sampling considerations*

Observed climate data is derived over time using evolving and fallible instrumentation. This dictates the use of a wide variety of strategies to enable intercomparisons. In our analyses we are concerned with annual or monthly averages which, in the case of gridded data, have been further averaged and re-interpolated spatially. We must consider the effects of these procedure.

Averaging implicitly assumes a signal/noise model where mean noise converges on zero at all time points to enhance the signal which is assumed to be represented

#### *Severe Testing and Characterization of Change Points in Climate Time Series DOI: http://dx.doi.org/10.5772/intechopen.98364*

equally in all samples. An influence travelling in space and time when averaged will appear as some form of non-stationarity in the time series of the mean.

Temperature records increase in spatial density over time, they are records of opportunity. Conditions over land and ocean differ. To enable inter-comparison with models they are re-interpolated onto regular grids. They are also homogenized prior to gridding to deal with instrumentation changes [24].

Both *M1* and *M2* worlds have time varying temperature records but because forced change in M1 propagates rapidly, averaging does not induce troublesome artefacts. This is not the case for *M2*. A step-like change occurring serially across regions may give rise to trends and auto-regression, and or may obscures more regional signals.

#### *3.2.3 Statistical model(s)*

Different statistical models are involved in detection of changes, and in the assessment of the relative merits of *M1* and *M2*. It is important that the probabilities from statistical *feature detection* not be also used for *model selection*.

In break-point analysis the family of segmented linear regression models is used. The choice of specific parameters from within a specified family is termed model selection, and would in our work include the serial selection of specific changepoints. The MSBV differs from other approaches in that it does not terminate the search for change-points (feature detection) on the basis of an all of model information criterion such AIC (a model selection criterion), but usually earlier, when no segment can be sub-divided.

In our work, detection of such steps, supported by evidence that they are not artefacts provided by M-S testing supports constitutes support for *M2*, and thus support for *H2*.
