**4.1 Summary of MS2014**

MS2004 says, "A full methodology of M-S testing, as we see it, would tell us how to specify and validate statistical models, and how to proceed when statistical assumptions are violated."

Statistical model specification (goals and assumptions) is different from statistical model selection (from an assumed family of models, with a heuristic (e.g. AIC))). We consider a statistical model *M*, selected from a family of models.

MS2004 considers firstly the primary questions of statistical inference, whether the assumptions needed to reliably model the data are met; and secondary questions, whether there are influential gaps between variables in a statistical model and primary questions. Primary questions are addressed within the selected statistical model *M*. Formally, the hypothesis *H*M:

*H*<sup>M</sup> : data **z** supports the probabilistic assumptions of statistical model *M:* (1)

Secondary questions are essentially meta-questions conducted outside the model *M*, they address the suitability of the test, given the data and require auxiliary models and put *M*'s assumptions to the test. Formally this would test

*H*<sup>0</sup> : the assumption sð Þ of statistical model *M* hold for data **z**,

against all possible assumptions by which *<sup>H</sup>*<sup>0</sup> could fail ð Þ *<sup>H</sup>*1*:* … *<sup>H</sup>*<sup>n</sup> *:* (2)

It is critically important to recognise that this use of multiple tests does not constitute multiple tests of HM. It augments and increases (not diminishes) the confidence in the conclusions.

They present a case study of an empirical relationship between the USA population (*yi* ) and a secret variable (*xi*). They commence with a proposed explanatory model, with *<sup>R</sup>*<sup>2</sup> <sup>¼</sup> <sup>0</sup>*:*995 and *<sup>p</sup>*-value nearly zero. Here, *<sup>u</sup>*^*<sup>i</sup>* represents the estimated error process.

$$\mathbf{M0}: \mathbf{y}\_i = \mathbf{1} \mathbf{67} \mathbf{115} + \mathbf{1} \mathbf{90} \mathbf{7} \mathbf{x}\_i + \hat{\mathbf{u}}\_i \tag{3}$$

Concerning *H*M. An assumption of the regression is that errors *u*^*<sup>i</sup>* are normally distributed, independent and identically distributed (NIID). Is this met? A runs test suggests not, and a parametric Durbin-Watson test suggests autocorrelation.

*Severe Testing and Characterization of Change Points in Climate Time Series DOI: http://dx.doi.org/10.5772/intechopen.98364*

$$\mathbf{M1}: \boldsymbol{\upchi}\_i = \boldsymbol{\upbeta}\_0 + \boldsymbol{\upbeta}\_1 \boldsymbol{\upchi}\_i + \boldsymbol{\upmu}\_i, \boldsymbol{\upmu}\_i = \boldsymbol{\uprho} \boldsymbol{\upmu}\_{i-1} + \boldsymbol{\upvarepsilon}\_i. \tag{4}$$

However an alternative AR(2) model is then shown to explain more variance *without* the *β*1*xi* term.

$$\mathcal{M}\mathcal{D}: \mathcal{Y}\_i = \beta\_0 + \beta\_1 \mathcal{Y}\_{i-1} + \beta\_2 \mathcal{Y}\_{i-2} + \hat{u}\_i \tag{5}$$

Probing the model M0 shows it to be misspecified due to an irrelevant variable. The secret variable *xi* is the number of shoes owned by Spanos's grandmother!

### **4.2 Application to climate data**

#### *4.2.1 Abrupt changes in previous literature*

In some papers step-like changes are introduced *en passant*, on the way to revealing or locating in time various phenomena. For instance the delineation of the Pacific Decadal Oscillation [26–28], or reduction in South-Western Western Australian rainfall [29]. In the last decade an astonishing number of papers addressed the so-called hiatus, many purporting to show that it never happened [30] or was simply routine variability [31, 32], or a methodological/statistical error [33], or suggesting that natural variability, internal variability and extrinsic factors combined with forced warming [34]. However others, one way or another, simply incorporate it as fact [35, 36].

From these and other papers and some personal communication, the objections/ challenges to the existence of abrupt changes (including but not limited to the socalled hiatus) appear to be


Not all of these concern statistical M-S. Objection 1, physically implausibility of discontinuities in surface temperatures [37] can only result from an underlying

assumption of a physical model where heat is dispersed rapidly and uniformly. Objection 7 is regrettable but not uncommon and not further considered.

*4.3.1 Detection test*

detection methods used must be considered.

*DOI: http://dx.doi.org/10.5772/intechopen.98364*

sub-detection threshold events.

*4.3.3 Tests for stationarity in a segment*

either in the null or contrast hypothesis.

tation of the tests.

**217**

Complex climate time-series data is almost certainly misspecified for *any* change-point detection test – thus the goal is adequate applicability to questions of interest. In testing for multiple change-points, many methods, including the MSBV, examine data only between presumptive lower and upper bounding points and restart estimation of the distribution parameters. The assumptions of the basic

*Severe Testing and Characterization of Change Points in Climate Time Series*

Issues potentially arising include false detection, timing errors, and false negatives. Timing error includes misplacement and imprecision. The MSBV incorporates a resampling strategy [38] which reduces imprecision. False positives (deterministic and non-deterministic) can be uncovered by post-detection assessments, but false negatives introduce down-stream non-stationarities that interfere with detection of later change-points. Combining tests with differing assumptions and different nulls probes for both non-deterministic and deterministic causes of false results including

Analysis of covariance (ANCOVA) is used post-detection of a change-point by MSBV (which does not consider trend) to ensure that the presence of the changepoint provides explanatory power in an unconstrained disjoint linear statistical model which allows trend. It does not attempt to locate an alternative change-point – the Zivot-Andrews test however, see below, does this in passing. ANOVA tests for change of trend and change of level are obtained in passing but in R2019, final *p-*

values for change-points are obtained from only from ANCOVA.

*4.3.2 Tests for heteroskedasticity for segmentation of data with change-points*

The full set of change-points in an entire sequence is tested here by the studentized Breusch-Pagan test (hereafter SBP test) for homoskedasticity of the residuals of the disjoint multi-segment model (JR2017 utilised the equivalent White's test [39]). An adequate model explanation of a time series, under the assumption of i.i.d. error, should have a featureless residual. This test has a null of homoskedasticity, rejected in favour of heteroskedasticity at low probabilities.

Our detection test and the subsequent probability assignments by ANCOVA or ANOVA, and the further misspecification testing all assume serial independence

In these tests the segment containing a provisional change-point is tested for features that may deceive tests for shifts and trends. The MY test, ANCOVA, and where used ANOVA tests, have ruling assumptions of serial independence. The MSBV, and other multiple break tests assume some form of censorship between provisional data segments (determination of change-points within provisional bounds includes only the data within the bounds); but tests of the overall model assume homogeneity of error, thus of variance (e.g. the Akaike Information Criterion or AIC). The SBP also assumes this. All of these above tests are formalised as null hypothesis statistical tests (NHST) and as such they each are subject to their own ruling assumptions. The ruling assumptions are incorporated in the interpre-

Autocorrelation in climate time-series is variously treated; some propose its estimation and removal [40], some warn against this idea [41]. Some treat it as a short term process and a cause of deception in change-point analyses [42], others have treated it as a persistent signal [43]. In climate signals, autocorrelation often
