**2. Correlation and coherence**

Correlation and coherence are techniques for comparing time-series (more generally, waveforms, signals). In brief, correlation compares shape, i.e. envelope, and is a timedomain technique; coherence compares composition, i.e. frequency (harmonic) content, and is a frequency-domain technique. Neither technique directly compares scale, i.e. neither directly detects magnitude anomalies.

Correlation is a relatively familiar and straightforward technique, widely used in various forms to compare datasets in general. However, correlation can be misleading, particularly if used in isolation as sole means of comparison. For example, consider a pair of identical time-series, such as two equal-frequency sinusoids in the simplest case. If the two sinusoids are exactly in-phase, then their correlation coefficient will be 1 (maximum positive correlation) because both time-series are always changing in the same sense (positively or negatively) at the same time. Conversely, if they are exactly, i.e. a half-cycle, out-of-phase

Identification of Simultaneous Similar Anomalies in Paired Time-Series 129

For further information on and fuller descriptions of Fourier transforms and their properties

Covariance and correlation are measures of similarity of shape, correlation is normalised covariance and so is independent of scale (e.g. Cramer, 1946; Mood *et al.* 1974). Generalising from the simple case outlined above, the correlation coefficient between two datasets (or samples) will be +1 or –1 if they have the same shape, in-phase or out-of-phase/signinverted respectively. The magnitude of the correlation coefficient will reduce according to

More formally, in summary, for the comparison of two *N*-sample time-series, { },{ } *i i x y* , e.g. to ascertain if they contain similar features possibly occurring with a lag between them,

 ( ) ( ) *xy*

The straightforward unlagged covariance and (cross-) correlation of two time-series are (5)

 1

*xy xy*

*n*

*N*

1

*N xy xy n xn y n*

*x y k*

(0) ( )( )

(0) (0) *xy*

For the comparison of a time-series against lagged versions of itself, e.g. to find the period(s)

 

*xx xx*

 

*xx n x nk x*

( ) ( )( )

 () () ( ) (0) *xx xx xx*

 

*x y*

 

*kx x* , (9)

*k k R k* , (10)

*xy n x nk y*

*xy*

( ) ( )( ) , for

*n*

*N*

1

( ) *xy k* , and lagged cross-correlation, ( ) *R k xy* , i.e.:

*k x y kN* , (5)

*R k* . (6)

*R R* . (8)

*x y* , (7)

( ) *xx k* , and (lagged) autocorrelation

01 1

, for ( ) , ,..., ; = ( ) ; elements

*x Xe X f X X X X Xf N* . (4)

 

<sup>2</sup> <sup>1</sup>

*<sup>i</sup> <sup>N</sup> nk N*

differences in shape and, as noted above, mismatch in phase.

, *i i x y* are the *i*th members of time-series { },{ } *i i x y* respectively

*<sup>y</sup>* are the standard deviations of { },{ } *i i x y* respectively

*<sup>y</sup>* are the mean values of { },{ } *i i x y* respectively

of any cyclic features, there is (lagged) autocovariance,

*n*

**2.2 Correlation analysis** 

there is lagged covariance,

and (6) evaluated at zero-lag, i.e.:

Where, in equations (5) – (8):

*x* ,

 *<sup>x</sup>* ,

( ) *R k xx* , i.e.:

0

*k n N nn*

see, for example, Riley, 1974; Gabel & Roberts, 1986; Proakis & Manolakis, 2006.

then their correlation coefficient will be –1 (maximum negative correlation) because one time-series is always changing in the opposite sense to the other. Depending on their phase difference, their correlation coefficient will be between or equal to these two limiting values. However, if they are a quarter-cycle out of phase, their correlation coefficient will be zero. This is because in consecutive quarter-periods the time-series are alternately changing in the same or opposite sense, yielding alternate quarter-periods of positive and negative correlation, equal in magnitude, which sum to zero over a complete period. Thus, whilst a zero, or small, correlation coefficient can indicate a real lack of similarity between two timeseries, it can be misinterpreted if there is no information with regard to their frequency content (and phase relationship).

This leads to the less familiar coherence technique. Coherence measures common frequency content, with or without phase information depending on the exact definition. Strictly, the coherence coefficient (i.e. "coherence") measures similarity of frequency content only, irrespective of relative phase, but the overall coherence analysis readily yields a phasedifference for each frequency considered for the coherence coefficient. In the hypothetical case of paired equal-frequency sinusoids considered above, the coherence would be maximal, i.e. coherence coefficient of 1, in all cases irrespective of the phase difference, although the coherence-derived phase-difference would vary exactly as the actual phasedifference. If there is no common frequency content, the coherence is zero.

Correlation is a time-domain technique, coherence is a frequency-domain technique: they are related via their Fourier transforms. Before considering correlation and coherence, therefore, a brief review of Fourier transforms – discrete Fourier transforms – will be given.

#### **2.1 Fourier transform and Discrete Fourier Transform (DFT)**

For a continuous function of time, *x t*( ) , of infinite duration its continuous Fourier transform is:

$$X(f) = \int\_{-\infty}^{\infty} x(t)e^{-2\pi i\theta}dt\,,\tag{1}$$

and it is a frequency-domain representation of it (its spectrum). The inverse transform is:

$$\mathbf{x}(t) = \bigcap\_{-\infty}^{\infty} X(f)e^{2\pi i ft} df \,\,. \tag{2}$$

In practice, time-series are neither continuous nor infinite, and so the Discrete Fourier Transform (DFT) is used, generally via a Fast Fourier Transform (FFT) algorithm. Subject to constraints arising from the finite and discrete nature of the time-domain function *x t*( ) , its DFT, *X f* ( ) , is a frequency-domain representation of it comprising a spectrum of discrete frequencies on a finite frequency interval. Thus, the forward DFT, *xt X f* () ( ) , is:

$$X\_n = \frac{1}{N} \sum\_{k=0}^{N-1} x\_k e^{-\left(\frac{2\pi i}{N}\right)^{pk}}, \text{ for } \mathbf{x}(t) = x\_0, x\_1, \dots, x\_{N-1}; \ x\_k = \mathbf{x}(t\_k) : N \text{ samples} \tag{3}$$

The inverse DFT (IDFT), *X f xt* ( ) () , is:

$$\mathbf{x}\_{k} = \sum\_{n=0}^{N-1} \mathbf{X}\_{n} \mathbf{c}^{\left(\frac{2\pi i}{N}\right)n\mathbf{k}}, \text{for } \mathbf{X}(f) = \mathbf{X}\_{0}, \mathbf{X}\_{1}, \dots, \mathbf{X}\_{N-1}; \mathbf{X}\_{n} = \mathbf{X}(f\_{n}):N \text{ elements} \tag{4}$$

For further information on and fuller descriptions of Fourier transforms and their properties see, for example, Riley, 1974; Gabel & Roberts, 1986; Proakis & Manolakis, 2006.

#### **2.2 Correlation analysis**

128 Earthquake Research and Analysis – Statistical Studies, Observations and Planning

then their correlation coefficient will be –1 (maximum negative correlation) because one time-series is always changing in the opposite sense to the other. Depending on their phase difference, their correlation coefficient will be between or equal to these two limiting values. However, if they are a quarter-cycle out of phase, their correlation coefficient will be zero. This is because in consecutive quarter-periods the time-series are alternately changing in the same or opposite sense, yielding alternate quarter-periods of positive and negative correlation, equal in magnitude, which sum to zero over a complete period. Thus, whilst a zero, or small, correlation coefficient can indicate a real lack of similarity between two timeseries, it can be misinterpreted if there is no information with regard to their frequency

This leads to the less familiar coherence technique. Coherence measures common frequency content, with or without phase information depending on the exact definition. Strictly, the coherence coefficient (i.e. "coherence") measures similarity of frequency content only, irrespective of relative phase, but the overall coherence analysis readily yields a phasedifference for each frequency considered for the coherence coefficient. In the hypothetical case of paired equal-frequency sinusoids considered above, the coherence would be maximal, i.e. coherence coefficient of 1, in all cases irrespective of the phase difference, although the coherence-derived phase-difference would vary exactly as the actual phase-

Correlation is a time-domain technique, coherence is a frequency-domain technique: they are related via their Fourier transforms. Before considering correlation and coherence, therefore, a brief review of Fourier transforms – discrete Fourier transforms – will be given.

For a continuous function of time, *x t*( ) , of infinite duration its continuous Fourier transform is:

and it is a frequency-domain representation of it (its spectrum). The inverse transform is:

frequencies on a finite frequency interval. Thus, the forward DFT, *xt X f* () ( ) , is:

*n k N N kk*

 

<sup>2</sup> <sup>1</sup>

*<sup>i</sup> <sup>N</sup> nk N*

*k*

The inverse DFT (IDFT), *X f xt* ( ) () , is:

0

1

In practice, time-series are neither continuous nor infinite, and so the Discrete Fourier Transform (DFT) is used, generally via a Fast Fourier Transform (FFT) algorithm. Subject to constraints arising from the finite and discrete nature of the time-domain function *x t*( ) , its DFT, *X f* ( ) , is a frequency-domain representation of it comprising a spectrum of discrete

01 1

*X xe xt x x x x xt N* , (3)

, for ( ) , ,..., ; = ( ) ; samples

<sup>2</sup> ( ) () *ift X f x t e dt* , (1)

<sup>2</sup> () ( ) *ift x t X f e df* . (2)

difference. If there is no common frequency content, the coherence is zero.

**2.1 Fourier transform and Discrete Fourier Transform (DFT)** 

content (and phase relationship).

Covariance and correlation are measures of similarity of shape, correlation is normalised covariance and so is independent of scale (e.g. Cramer, 1946; Mood *et al.* 1974). Generalising from the simple case outlined above, the correlation coefficient between two datasets (or samples) will be +1 or –1 if they have the same shape, in-phase or out-of-phase/signinverted respectively. The magnitude of the correlation coefficient will reduce according to differences in shape and, as noted above, mismatch in phase.

More formally, in summary, for the comparison of two *N*-sample time-series, { },{ } *i i x y* , e.g. to ascertain if they contain similar features possibly occurring with a lag between them, there is lagged covariance, ( ) *xy k* , and lagged cross-correlation, ( ) *R k xy* , i.e.:

$$\sigma\_{xy}(k) = \sum\_{n=1}^{N} (\mathbf{x}\_n - \boldsymbol{\mu}\_x)(\mathbf{y}\_{n-k} - \boldsymbol{\mu}\_y) \text{ , for } k \le N \text{ ,}\tag{5}$$

$$R\_{xy}(k) = \frac{\sigma\_{xy}(k)}{\sigma\_x \sigma\_y} \,. \tag{6}$$

The straightforward unlagged covariance and (cross-) correlation of two time-series are (5) and (6) evaluated at zero-lag, i.e.:

$$
\sigma\_{xy} = \sigma\_{xy}(0) = \sum\_{n=1}^{N} (\chi\_n - \mu\_x)(y\_n - \mu\_y) \, \prime \tag{7}
$$

$$R\_{xy} = R\_{xy}(0) = \frac{\sigma\_{xy}(0)}{\sigma\_x \sigma\_y} \,. \tag{8}$$

Where, in equations (5) – (8):

, *i i x y* are the *i*th members of time-series { },{ } *i i x y* respectively

*x* ,*<sup>y</sup>* are the mean values of { },{ } *i i x y* respectively

 *<sup>x</sup>* ,*<sup>y</sup>* are the standard deviations of { },{ } *i i x y* respectively

For the comparison of a time-series against lagged versions of itself, e.g. to find the period(s) of any cyclic features, there is (lagged) autocovariance, ( ) *xx k* , and (lagged) autocorrelation ( ) *R k xx* , i.e.:

$$\sigma\_{\text{xx}}(k) = \sum\_{n=1}^{N} (\mathbf{x}\_n - \boldsymbol{\mu}\_{\text{x}}) (\mathbf{x}\_{n-k} - \boldsymbol{\mu}\_{\text{x}}) \tag{9}$$

$$R\_{\rm xx}(k) = \frac{\sigma\_{\rm xx}(k)}{\sigma\_{\rm xx}(0)} = \frac{\sigma\_{\rm xx}(k)}{\sigma\_{\rm xx}} \,\,\,\,\tag{10}$$

Identification of Simultaneous Similar Anomalies in Paired Time-Series 131

The coherence (coherence-coefficient), *Cxy* , is the cross-spectral density (complex)

( ) ( ) () () *xy*

Im( ( )) tan( ( )) Re( ( ))

The coherence, *Cxy* , is real and varies between zero, i.e. no common components of the signals (time-series) at frequency *k*, and 1, i.e. equal proportional components of the signals

This section has outlined the mathematics of correlation and coherence and also summarised their strengths and weaknesses. Both techniques have their strengths and weaknesses but complement each other. Correlation compares shape (envelope), effectively in-phase periodic features, but gives no information regarding frequency content and so can be misleading with regard to out-of-phase features. Conversely, coherence compares periodic structure (frequency composition and phase difference) and so informs with regard to similarity of composition and structure, accounting for phase difference, but not with regard to overall shape. Thus, used in combination, a much more complete comparison can

The techniques outlined above have been investigated, developed and adapted using primarily one paired time-series dataset: this is a radon dataset comprising two hourlysampled time-series spanning 5.5 months from late June to mid December 2002. This period also included the ML=5 Dudley (UK) earthquake of 23 September (22 September GMT), which was widely felt by people in Northampton (and elsewhere in the English Midlands), and the Manchester (UK) earthquake swarm of 21-29 October, which wasn't felt in Northampton but was widely felt in southern parts of NW England and northern parts of the English Midlands. Such events are unusual for the UK and, the Dudley earthquake in

In effect, this is an investigation of three time-series, one of earthquake incidence and two of hourly-sampled radon concentrations, for common radon responses to earthquake stimuli. The hypothesis for this investigation is that a big disturbance, such as an earthquake, occurring at a relatively large distance compared to the detector separation could be expected to produce simultaneous similar radon anomalies (Crockett *et al.*, 2006a). Conversely, any anomalies which are neither simultaneous nor similar are more likely to arise from stimuli local to individual monitoring locations. This hypothesis is appropriate under the circumstances for the radon time-series considered herein, as described below.

particular, were the stimulus for the original investigation (Crockett *et al.*, 2006a).

*xx yy*

*G k*

2

*xy*

*xy G k*

*G kG k* , (13)

*G k* . (14)

*xy*

*k*

normalised by the product of the individual spectral densities (real), i.e.:

*xy*

*C k*

**2.3.2 Coherence (magnitude-squared coherence)** 

and the phase difference, *xy* , is given by:

be obtained than by using either in isolation.

**3. Case-study: July – November 2002** 

(time-series) at frequency *k*.

**2.4 Summary** 

which are (5) and (6) with both time-series the same, and clearly both have their maximum values (i.e. autocorrelation is +1) at zero lag.

#### **2.2.1 Rolling/sliding correlation**

Whilst lagged cross-correlation reveals any lag between two time-series and crosscorrelation of whole time-series gives an overall measure of similarity, neither reveals any time-dependence of similarity during the time-series, i.e. sections where the time-series correlate to greater or lesser extents. One means of achieving this is to window the paired time-series, starting at the beginning, cross-correlate across the window, roll/slide the window forwards a number of samples and repeat the cross-correlation, repeating the roll/slide-and-correlate procedure until the end of the time-series is reached. This yields a time-series of correlation coefficients which reveals those sections of the paired time-series which are varying exactly in phase (R = 1), those which are varying exactly out-of-phase (R = –1) and all intermediate values. This can be repeated for different window-durations and, for example, the results presented as a contour plot to reveal the time-duration relationships of any periods of significant cross-correlation between the time-series, analogous to the more familiar spectrogram representation of time-frequency relationships in single timeseries, and to cross-coherence, as described below (e.g. Crockett *et al.*, 2006a).

#### **2.3 Coherence (cross-spectral) analysis**

Coherence (cross-coherence, magnitude-squared coherence) can be useful in that it measures the similarity of two signals, i.e. time-series in this context, in terms of their frequency composition (Brockwell & Davis, 2009; Penny, 2009; Proakis & Manolakis, 2006; Venables & Ripley, 2002). It is a normalised measure of power cross-spectral density and is a frequency-domain measure of correlation of the two signals (time-series).

#### **2.3.1 Power spectral density**

The concept of "signal power" is not itself useful in this context but the terminology is established and is retained herein for reasons of simplicity. In signals which transmit energy, power, it is significant and is determined by the square of the amplitude.

The power spectral density is obtained via the Discrete Fourier Transform and is the proportion of the total power content, i.e. square-of-magnitude, carried at given frequencies. As defined by the Wiener-Khintchine Theorem (Proakis & Manolakis, 2006), the power spectral density, *Gxx* , of a signal (time-series) is the Fourier transform of the autocovariance, i.e.:

$$G\_{\rm xx}(k) = \sum\_{n=0}^{N-1} \sigma\_{\rm xx}(n) e^{-i\frac{2\pi}{N}nk} \,. \tag{11}$$

Also, the power cross-spectral density, *Gxy* , of two signals (time-series) is the Fourier transform of their cross-covariance, i.e.:

$$G\_{xy}(k) = \sum\_{n=0}^{N-1} \sigma\_{xy}(n) e^{-i\frac{2\pi}{N}nk} \,. \tag{12}$$

#### **2.3.2 Coherence (magnitude-squared coherence)**

The coherence (coherence-coefficient), *Cxy* , is the cross-spectral density (complex) normalised by the product of the individual spectral densities (real), i.e.:

$$\mathbf{C}\_{xy}(k) = \frac{\left| \mathbf{G}\_{xy}(k) \right|^2}{\mathbf{G}\_{xx}(k)\mathbf{G}\_{yy}(k)},\tag{13}$$

and the phase difference, *xy* , is given by:

$$\tan(\Phi\_{xy}(k)) = \frac{\operatorname{Im}(G\_{xy}(k))}{\operatorname{Re}(G\_{xy}(k))}.\tag{14}$$

The coherence, *Cxy* , is real and varies between zero, i.e. no common components of the signals (time-series) at frequency *k*, and 1, i.e. equal proportional components of the signals (time-series) at frequency *k*.

#### **2.4 Summary**

130 Earthquake Research and Analysis – Statistical Studies, Observations and Planning

which are (5) and (6) with both time-series the same, and clearly both have their maximum

Whilst lagged cross-correlation reveals any lag between two time-series and crosscorrelation of whole time-series gives an overall measure of similarity, neither reveals any time-dependence of similarity during the time-series, i.e. sections where the time-series correlate to greater or lesser extents. One means of achieving this is to window the paired time-series, starting at the beginning, cross-correlate across the window, roll/slide the window forwards a number of samples and repeat the cross-correlation, repeating the roll/slide-and-correlate procedure until the end of the time-series is reached. This yields a time-series of correlation coefficients which reveals those sections of the paired time-series which are varying exactly in phase (R = 1), those which are varying exactly out-of-phase (R = –1) and all intermediate values. This can be repeated for different window-durations and, for example, the results presented as a contour plot to reveal the time-duration relationships of any periods of significant cross-correlation between the time-series, analogous to the more familiar spectrogram representation of time-frequency relationships in single time-

series, and to cross-coherence, as described below (e.g. Crockett *et al.*, 2006a).

frequency-domain measure of correlation of the two signals (time-series).

energy, power, it is significant and is determined by the square of the amplitude.

Coherence (cross-coherence, magnitude-squared coherence) can be useful in that it measures the similarity of two signals, i.e. time-series in this context, in terms of their frequency composition (Brockwell & Davis, 2009; Penny, 2009; Proakis & Manolakis, 2006; Venables & Ripley, 2002). It is a normalised measure of power cross-spectral density and is a

The concept of "signal power" is not itself useful in this context but the terminology is established and is retained herein for reasons of simplicity. In signals which transmit

The power spectral density is obtained via the Discrete Fourier Transform and is the proportion of the total power content, i.e. square-of-magnitude, carried at given frequencies. As defined by the Wiener-Khintchine Theorem (Proakis & Manolakis, 2006), the power spectral density, *Gxx* , of a signal (time-series) is the Fourier transform of the autocovariance,

Also, the power cross-spectral density, *Gxy* , of two signals (time-series) is the Fourier

 

*xy xy n*

0 () ()

2 1

*N i nk*

*<sup>N</sup> xx xx n*

 

0 () ()

2 1

*N i nk*

*N*

*G k ne* . (11)

*G k ne* . (12)

values (i.e. autocorrelation is +1) at zero lag.

**2.3 Coherence (cross-spectral) analysis** 

**2.3.1 Power spectral density** 

transform of their cross-covariance, i.e.:

i.e.:

**2.2.1 Rolling/sliding correlation** 

This section has outlined the mathematics of correlation and coherence and also summarised their strengths and weaknesses. Both techniques have their strengths and weaknesses but complement each other. Correlation compares shape (envelope), effectively in-phase periodic features, but gives no information regarding frequency content and so can be misleading with regard to out-of-phase features. Conversely, coherence compares periodic structure (frequency composition and phase difference) and so informs with regard to similarity of composition and structure, accounting for phase difference, but not with regard to overall shape. Thus, used in combination, a much more complete comparison can be obtained than by using either in isolation.

#### **3. Case-study: July – November 2002**

The techniques outlined above have been investigated, developed and adapted using primarily one paired time-series dataset: this is a radon dataset comprising two hourlysampled time-series spanning 5.5 months from late June to mid December 2002. This period also included the ML=5 Dudley (UK) earthquake of 23 September (22 September GMT), which was widely felt by people in Northampton (and elsewhere in the English Midlands), and the Manchester (UK) earthquake swarm of 21-29 October, which wasn't felt in Northampton but was widely felt in southern parts of NW England and northern parts of the English Midlands. Such events are unusual for the UK and, the Dudley earthquake in particular, were the stimulus for the original investigation (Crockett *et al.*, 2006a).

In effect, this is an investigation of three time-series, one of earthquake incidence and two of hourly-sampled radon concentrations, for common radon responses to earthquake stimuli. The hypothesis for this investigation is that a big disturbance, such as an earthquake, occurring at a relatively large distance compared to the detector separation could be expected to produce simultaneous similar radon anomalies (Crockett *et al.*, 2006a). Conversely, any anomalies which are neither simultaneous nor similar are more likely to arise from stimuli local to individual monitoring locations. This hypothesis is appropriate under the circumstances for the radon time-series considered herein, as described below.

Identification of Simultaneous Similar Anomalies in Paired Time-Series 133

Channel earthquake on 26 August and a North Sea earthquake on 22 November (not of

Fig. 2. Spectrograms of the radon time-series, central 20-week period. The amplitude is calibrated in dB, from maximum 0 dB (red) to –50 dB (blue-white). The vertical axes are

Fig. 3. Autocorrelograms of the radon time-series, central 20-week period (TS1a in red, TS1b

26/08/2002 23:41 50.048 -0.009 4.0 3.0 247 Eng. Channel 22/09/2002 23:53 52.520 -2.150 9.4 5.0 94 Dudley 23/09/2002 03:32 52.522 -2.136 9.3 3.2 93 Dudley 21/10/2002 07:45 53.475 -2.000 5.0 3.7 161 Manchester 21/10/2002 11:42 53.478 -2.219 5.0 4.3 169 Manchester 22/10/2002 12:28 53.473 -2.146 4.2 3.5 165 Manchester 23/10/2002 01:53 53.477 -2.157 5.0 3.3 166 Manchester 24/10/2002 08:24 53.485 -2.179 3.7 3.8 168 Manchester 29/10/2002 04:42 53.481 -2.198 5.0 3.1 168 Manchester 22/11/2002 01:40 52.921 2.430 10.0 3.4 237 North Sea

(km)

Mag. (ML) Dist. (km) Location

Lat. Lon. Depth

Table 1. Earthquakes ( M 3 <sup>L</sup> ) within 250km of Northampton, July – December 2002.

identified interest in previous stages of this research).

cycles per day.

in blue).

Date / Time (GMT)

However, under different circumstances, e.g. where the detector separation is greater in comparison to the distance from the stimulus, or where one detector location is significantly closer to the stimulus than the other, the analysis might have to modified appropriately to account for a time lag, e.g. moving one time-series relative to the other in the time domain as indicated by lagged cross-correlation. Also, where the individual time-series arise from different monitored substances or processes (e.g. radon and another soil-gas or rock property), pre-processing as indicated in section 1.3 (or otherwise) might be appropriate.

#### **3.1 The time-series**

The radon data were collected using Durridge RAD7s, operated 2.25 km apart in Northampton (Phillips *et al.*, 2004; Crockett *et al.*, 2006a,b). A central 20-week extract, 14 July – 30 November 2002, of the 5.5-month paired time-series is shown in Figure 1.

Fig. 1. The paired radon time-series, central 20-week period.

In summary, both time-series are lognormally distributed, with correlation coefficients to lognormal distributions of > 0.91, although TS1a recorded radon levels typically 30-40 times greater than TS1b (Crockett *et al.*, 2006a). Both time-series are characterised by weak, noisy, non-stationary 24 h cycles having short autocorrelation times, *ca.* 1-2 days, as shown in the spectrograms and autocorrelograms in Figures 2 and 3 respectively. There is also weak evidence of 7 day cycles, typical of anthropogenic influences, and in TS1b there is evidence of some longer-period variations, durations *ca.* 15 and 30 days, which are possibly attributable to lunar-tidal influences (Crockett *et al.*, 2006a, 2006b). With the exception of rainfall, behind which TS1a and TS1b lag by 14 and 10 days respectively, there are no observed meteorological dependencies (Crockett *et al.*, 2006a).

The spectrograms are for 6 day windows, stepped at 1 day intervals. This window duration gives the best balance between resolution in the time and frequency domains in light of the durations of the periods of high correlation and coherence discussed below.

The earthquake data for the monitoring period, for earthquakes occurring within 250 km of Northampton, are presented in Table 1 (BGS (http://www.bgs.ac.uk) 2003; USGS/ANSS (http://quake.geo.berkeley.edu/anss), 2011). The next nearest earthquakes during this period occurred at distances greater than 420 km, i.e. there is a distance interval of 170 km between the earthquakes listed and the next nearest earthquakes. As well as the Dudley and Manchester earthquakes, there were other earthquakes of interest, these being an English

However, under different circumstances, e.g. where the detector separation is greater in comparison to the distance from the stimulus, or where one detector location is significantly closer to the stimulus than the other, the analysis might have to modified appropriately to account for a time lag, e.g. moving one time-series relative to the other in the time domain as indicated by lagged cross-correlation. Also, where the individual time-series arise from different monitored substances or processes (e.g. radon and another soil-gas or rock property), pre-processing as indicated in section 1.3 (or otherwise) might be appropriate.

The radon data were collected using Durridge RAD7s, operated 2.25 km apart in Northampton (Phillips *et al.*, 2004; Crockett *et al.*, 2006a,b). A central 20-week extract, 14 July

In summary, both time-series are lognormally distributed, with correlation coefficients to lognormal distributions of > 0.91, although TS1a recorded radon levels typically 30-40 times greater than TS1b (Crockett *et al.*, 2006a). Both time-series are characterised by weak, noisy, non-stationary 24 h cycles having short autocorrelation times, *ca.* 1-2 days, as shown in the spectrograms and autocorrelograms in Figures 2 and 3 respectively. There is also weak evidence of 7 day cycles, typical of anthropogenic influences, and in TS1b there is evidence of some longer-period variations, durations *ca.* 15 and 30 days, which are possibly attributable to lunar-tidal influences (Crockett *et al.*, 2006a, 2006b). With the exception of rainfall, behind which TS1a and TS1b lag by 14 and 10 days respectively, there are no

The spectrograms are for 6 day windows, stepped at 1 day intervals. This window duration gives the best balance between resolution in the time and frequency domains in light of the

The earthquake data for the monitoring period, for earthquakes occurring within 250 km of Northampton, are presented in Table 1 (BGS (http://www.bgs.ac.uk) 2003; USGS/ANSS (http://quake.geo.berkeley.edu/anss), 2011). The next nearest earthquakes during this period occurred at distances greater than 420 km, i.e. there is a distance interval of 170 km between the earthquakes listed and the next nearest earthquakes. As well as the Dudley and Manchester earthquakes, there were other earthquakes of interest, these being an English

– 30 November 2002, of the 5.5-month paired time-series is shown in Figure 1.

Fig. 1. The paired radon time-series, central 20-week period.

observed meteorological dependencies (Crockett *et al.*, 2006a).

durations of the periods of high correlation and coherence discussed below.

**3.1 The time-series** 

Channel earthquake on 26 August and a North Sea earthquake on 22 November (not of identified interest in previous stages of this research).

Fig. 2. Spectrograms of the radon time-series, central 20-week period. The amplitude is calibrated in dB, from maximum 0 dB (red) to –50 dB (blue-white). The vertical axes are cycles per day.

Fig. 3. Autocorrelograms of the radon time-series, central 20-week period (TS1a in red, TS1b in blue).


Table 1. Earthquakes ( M 3 <sup>L</sup> ) within 250km of Northampton, July – December 2002.

Identification of Simultaneous Similar Anomalies in Paired Time-Series 135

duration used for the spectrograms and also to the durations of the two periods of high

In both coherence and phase-coherence, it is the 24 h cycles, as revealed by the spectrograms and autocorrelograms, which are the focus as these constitute the main frequency component in each time series and, thus, the main frequency at which the time-series could cohere. The figures also show coherence and phase-coherence for 12 h cycles, i.e. twice the frequency of the main cycles, because coherence at this frequency also reveals some information. There is no significant coherence information outside this frequency range.

Fig. 5. Coherence shown contour-plotted (upper plot, strong red and pale yellow areas indicate high and low coherence respectively) with time-series and earthquake incidence (lower plot, TS1a and TS1b in red and blue respectively, earthquake timings as vertical black

Fig. 6. Phase coherence shown contour-plotted (upper plot, strong red and pale yellow areas indicate positive and negative phase difference respectively, mid-orange indicates small phase difference) with time-series and earthquake incidence (lower plot, TS1a and TS1b in

red and blue respectively, earthquake timings as vertical black lines).

positive correlation.

lines).

#### **3.2 Correlation results**

The rolling/sliding windowed correlation, for windows of duration 1-10 days, is shown contour-plotted in the upper plot of Figure 4 (vertical axis is window duration), with the radon time-series and earthquake timings also shown in the lower plot. In this and subsequent figures the time-series are shown normalised to unit mean, to assist visual comparison. This is a simple scaling of amplitudes: in terms of the waveforms of the timeseries, the shape and phase are preserved unaltered by this normalisation.

Fig. 4. Rolling/sliding windowed cross-correlation shown contour-plotted (upper plot, strong red and pale yellow areas indicate high positive and negative correlation respectively) with time-series and earthquake incidence (lower plot, TS1a and TS1b in red and blue respectively, earthquake timings as vertical black lines).

Figure 4 (upper) shows two distinct periods of high positive correlation (i.e. red); around (i) 21-23 September, across window durations of up to 10 days, and (ii) 25-27 August, across window durations of up to 5-6 days. The correlation coefficient for the whole period is R 0.08 and the mean values of the correlation coefficient across all ten window durations is |R| 0.036. Therefore, the paired time-series typically do not correlate and periods of high correlation are anomalous: e.g. the maximum values of the correlation coefficient for 1-5 day windows occur less than 5% of the time and do so predominantly in these two periods. The more significant of these two periods, 21-23 September, corresponds temporally to the Dudley earthquake of 22 September. The other period, 25-27 August, corresponds temporally to the English Channel earthquake of 26 August. Each period lasts *ca.* 5-7 days and, in each, correlation peaks prior to the earthquakes by *ca.* 1 day. There is no similar period of high positive correlation at the time of the Manchester or North Sea earthquakes.

#### **3.3 Coherence results**

The coherence, for windows of duration 6 days stepped at 1 day intervals, is shown in Figure 5. The corresponding phase coherence is shown in Figure 6 (phase difference of TS1b relative to TS1a). In these (upper) figures, the coherence information is contour-plotted to show the variation in coherence (Fig. 5) and phase-difference (Fig. 6) at frequencies of 0-3 cycles per day (vertical axes). The 6 day window duration corresponds to the window

The rolling/sliding windowed correlation, for windows of duration 1-10 days, is shown contour-plotted in the upper plot of Figure 4 (vertical axis is window duration), with the radon time-series and earthquake timings also shown in the lower plot. In this and subsequent figures the time-series are shown normalised to unit mean, to assist visual comparison. This is a simple scaling of amplitudes: in terms of the waveforms of the time-

Fig. 4. Rolling/sliding windowed cross-correlation shown contour-plotted (upper plot, strong red and pale yellow areas indicate high positive and negative correlation

and blue respectively, earthquake timings as vertical black lines).

respectively) with time-series and earthquake incidence (lower plot, TS1a and TS1b in red

Figure 4 (upper) shows two distinct periods of high positive correlation (i.e. red); around (i) 21-23 September, across window durations of up to 10 days, and (ii) 25-27 August, across window durations of up to 5-6 days. The correlation coefficient for the whole period is R 0.08 and the mean values of the correlation coefficient across all ten window durations is |R| 0.036. Therefore, the paired time-series typically do not correlate and periods of high correlation are anomalous: e.g. the maximum values of the correlation coefficient for 1-5 day windows occur less than 5% of the time and do so predominantly in these two periods. The more significant of these two periods, 21-23 September, corresponds temporally to the Dudley earthquake of 22 September. The other period, 25-27 August, corresponds temporally to the English Channel earthquake of 26 August. Each period lasts *ca.* 5-7 days and, in each, correlation peaks prior to the earthquakes by *ca.* 1 day. There is no similar period of high positive correlation at the time of the Manchester or North Sea earthquakes.

The coherence, for windows of duration 6 days stepped at 1 day intervals, is shown in Figure 5. The corresponding phase coherence is shown in Figure 6 (phase difference of TS1b relative to TS1a). In these (upper) figures, the coherence information is contour-plotted to show the variation in coherence (Fig. 5) and phase-difference (Fig. 6) at frequencies of 0-3 cycles per day (vertical axes). The 6 day window duration corresponds to the window

series, the shape and phase are preserved unaltered by this normalisation.

**3.2 Correlation results** 

**3.3 Coherence results** 

duration used for the spectrograms and also to the durations of the two periods of high positive correlation.

In both coherence and phase-coherence, it is the 24 h cycles, as revealed by the spectrograms and autocorrelograms, which are the focus as these constitute the main frequency component in each time series and, thus, the main frequency at which the time-series could cohere. The figures also show coherence and phase-coherence for 12 h cycles, i.e. twice the frequency of the main cycles, because coherence at this frequency also reveals some information. There is no significant coherence information outside this frequency range.

Fig. 5. Coherence shown contour-plotted (upper plot, strong red and pale yellow areas indicate high and low coherence respectively) with time-series and earthquake incidence (lower plot, TS1a and TS1b in red and blue respectively, earthquake timings as vertical black lines).

Fig. 6. Phase coherence shown contour-plotted (upper plot, strong red and pale yellow areas indicate positive and negative phase difference respectively, mid-orange indicates small phase difference) with time-series and earthquake incidence (lower plot, TS1a and TS1b in red and blue respectively, earthquake timings as vertical black lines).

Identification of Simultaneous Similar Anomalies in Paired Time-Series 137

the two locations are different. The reasons for this will be differences in the rocks and soils at the two locations and possibly different ventilation characteristics in the two basements where the RAD7s were placed. However, if the time-series are represented in terms of SRIs, effectively standard normal variables, then these different responses are (partially) equalised in terms of probability of occurrence. This is shown in Figure 9 for the September anomalies.

Fig. 8. Late August Anomalies (TS1a in red, TS1b in blue, earthquake timing as a vertical

frequency composition and phase, rather than in the magnitude domain.

Figure 9 shows that these simultaneous daily maxima are more similar in probability than in relative magnitude. In SRIs, these maxima have magnitudes *ca.* 1.5-2.5, i.e. 1.5-2.5 standard deviations from the mean. Thus, some of these would 'pass' a plus-or-minus 2 standard deviations criterion for magnitude anomaly whilst others would not, and none would 'pass' a plus-or-minus 3 standard deviations criterion. Therefore, what the correlation and coherence analyses have identified are anomalies in the time domain, i.e. anomalies in

Fig. 9. Late September Anomalies (TS1a in red, TS1b in blue, earthquake timings as vertical

The late September and late August periods are identified as time-domain anomalies by correlation, coherence and phase-coherence (at 24 h and 12 h cycles), and these periods temporally correspond to the Dudley and English Channel earthquakes. The mid-late November period of coherence (at 24 h and 12 h cycles), but without significant correlation or phase-coherence, is another time-domain anomaly, which has similarities to (coherence) and differences from (correlation, phase-coherence) the late September and late August time-domain anomalies. This mid-late November anomaly occurs a few days prior to the 22 November earthquake but any temporal association with that earthquake is weakened by the absence of corresponding correlation or phase-coherence information. The two periods of 24 h coherence (without 12 h coherence) are clearly ambiguous: the late October period

black line).

black lines).

Figure 5 (upper) shows two conspicuous periods of high coherence (i.e. red) for both 24 h and 12 h cycles; i.e. late August, late September, and another less conspicuous period in mid-late November. The late August and late September periods correspond to the periods of high positive correlation shown in Figure 4 and correspond temporally to the English Channel and Dudley earthquakes respectively. The mid-late November period occurs a few days before the North Sea earthquake of 22 November. Additionally, there is a period of high coherence for 24 h cycles in late October, which corresponds temporally to the end of the Manchester earthquake swarm, and also a period of high coherence for 24 h cycles in late July with no apparent correspondence to any recorded earthquakes in the region.

Figure 6 (upper) shows two periods where the time-series are in phase (i.e. approximately zero phase difference, medium orange) for both 24 h and 12 h cycles, one in late August and the other in late September. These two periods confirm the two periods of high correlation: if the principal periodic content (24 h) and a principal harmonic (12 h) are in phase then there will be underlying in-phase similarities in the envelopes, giving rise to (high) positive correlation. However, across the time-series as a whole, the daily maxima typically occur at *ca.* 18:00 and 16:00 GMT for TS1a and TS1b respectively, i.e. typically the time-series are not in-phase. This is confirmed directly in Figure 6, which shows variations in phase-difference throughout the period, and also confirmed by the short autocorrelation times shown in Figure 3 and the small average cross-correlation coefficients shown in Figure 4.

#### **3.4 Summary of results**

The rolling/sliding windowed correlation, as reported in the initial investigation (Crockett *et al.* 2006a), reveals strong evidence that the two radon time-series correlate positively at the time of the Dudley earthquake, and also evidence that they correlate positively at the time of the English Channel earthquake, but that in general they do not correlate. These results are confirmed by the coherence results, both coherence coefficients and phase-difference. Seven day details from the time-series are shown in Figure 7, for the period around the Dudley earthquake and Figure 8, for the period around the English Channel earthquake.

Fig. 7. Late September Anomalies (TS1a in red, TS1b in blue, earthquake timings as vertical black lines).

The two periods are similar in showing the daily maxima, i.e. spikes of *ca.* 6 h duration, occurring simultaneously (i.e. in-phase), and are dissimilar to the remainder of the timeseries. Some of these simultaneous maxima occur before the earthquakes – and are potentially earthquake precursory phenomena.

It is clear from these figures, however, that the relative magnitudes of these daily maxima are 2-4 times greater in TS1b than TS1a, indicating that the radon emission characteristics in

Figure 5 (upper) shows two conspicuous periods of high coherence (i.e. red) for both 24 h and 12 h cycles; i.e. late August, late September, and another less conspicuous period in mid-late November. The late August and late September periods correspond to the periods of high positive correlation shown in Figure 4 and correspond temporally to the English Channel and Dudley earthquakes respectively. The mid-late November period occurs a few days before the North Sea earthquake of 22 November. Additionally, there is a period of high coherence for 24 h cycles in late October, which corresponds temporally to the end of the Manchester earthquake swarm, and also a period of high coherence for 24 h cycles in late July with no apparent correspondence to any recorded earthquakes in the region. Figure 6 (upper) shows two periods where the time-series are in phase (i.e. approximately zero phase difference, medium orange) for both 24 h and 12 h cycles, one in late August and the other in late September. These two periods confirm the two periods of high correlation: if the principal periodic content (24 h) and a principal harmonic (12 h) are in phase then there will be underlying in-phase similarities in the envelopes, giving rise to (high) positive correlation. However, across the time-series as a whole, the daily maxima typically occur at *ca.* 18:00 and 16:00 GMT for TS1a and TS1b respectively, i.e. typically the time-series are not in-phase. This is confirmed directly in Figure 6, which shows variations in phase-difference throughout the period, and also confirmed by the short autocorrelation times shown in

Figure 3 and the small average cross-correlation coefficients shown in Figure 4.

The rolling/sliding windowed correlation, as reported in the initial investigation (Crockett *et al.* 2006a), reveals strong evidence that the two radon time-series correlate positively at the time of the Dudley earthquake, and also evidence that they correlate positively at the time of the English Channel earthquake, but that in general they do not correlate. These results are confirmed by the coherence results, both coherence coefficients and phase-difference. Seven day details from the time-series are shown in Figure 7, for the period around the Dudley earthquake and Figure 8, for the period around the English Channel earthquake.

Fig. 7. Late September Anomalies (TS1a in red, TS1b in blue, earthquake timings as vertical

The two periods are similar in showing the daily maxima, i.e. spikes of *ca.* 6 h duration, occurring simultaneously (i.e. in-phase), and are dissimilar to the remainder of the timeseries. Some of these simultaneous maxima occur before the earthquakes – and are

It is clear from these figures, however, that the relative magnitudes of these daily maxima are 2-4 times greater in TS1b than TS1a, indicating that the radon emission characteristics in

**3.4 Summary of results** 

black lines).

potentially earthquake precursory phenomena.

the two locations are different. The reasons for this will be differences in the rocks and soils at the two locations and possibly different ventilation characteristics in the two basements where the RAD7s were placed. However, if the time-series are represented in terms of SRIs, effectively standard normal variables, then these different responses are (partially) equalised in terms of probability of occurrence. This is shown in Figure 9 for the September anomalies.

Fig. 8. Late August Anomalies (TS1a in red, TS1b in blue, earthquake timing as a vertical black line).

Figure 9 shows that these simultaneous daily maxima are more similar in probability than in relative magnitude. In SRIs, these maxima have magnitudes *ca.* 1.5-2.5, i.e. 1.5-2.5 standard deviations from the mean. Thus, some of these would 'pass' a plus-or-minus 2 standard deviations criterion for magnitude anomaly whilst others would not, and none would 'pass' a plus-or-minus 3 standard deviations criterion. Therefore, what the correlation and coherence analyses have identified are anomalies in the time domain, i.e. anomalies in frequency composition and phase, rather than in the magnitude domain.

Fig. 9. Late September Anomalies (TS1a in red, TS1b in blue, earthquake timings as vertical black lines).

The late September and late August periods are identified as time-domain anomalies by correlation, coherence and phase-coherence (at 24 h and 12 h cycles), and these periods temporally correspond to the Dudley and English Channel earthquakes. The mid-late November period of coherence (at 24 h and 12 h cycles), but without significant correlation or phase-coherence, is another time-domain anomaly, which has similarities to (coherence) and differences from (correlation, phase-coherence) the late September and late August time-domain anomalies. This mid-late November anomaly occurs a few days prior to the 22 November earthquake but any temporal association with that earthquake is weakened by the absence of corresponding correlation or phase-coherence information. The two periods of 24 h coherence (without 12 h coherence) are clearly ambiguous: the late October period

Identification of Simultaneous Similar Anomalies in Paired Time-Series 139

correspondence. The analysis does not prove that the anomalies are related to the earthquakes, i.e. does not demonstrate that an earthquake-stimulus radon-response relationship exists, but such analysis does provide necessary evidence towards the

With regard to magnitude anomalies, care must be taken to apply criteria used for identifying anomalies correctly dependent upon the probability distribution(s) of the data being investigated. Noting the simplicity and familiarity of the normal distribution, and associated *de facto* standard criteria for determining anomalies, a technique such as the SRI which maps data onto standard normal variables is useful, but this technique is also useful in effectively equalising different, generally non-linear radon-emission characteristics and

The author gratefully acknowledges members of the University of Northampton Radon Research Group for the collection of the data and also DEFRA (UK) for funding the research under which the 2002 data were collected (EPG 1/4/72, RW 8/1/64). The author also acknowledges UNESCO / IUGS / IGCP Project 571 for facilitating preparation and

The data-analysis was performed using the open-source R (http://www.r-project.org)) and Scilab (http://www.scilab.org) software packages, with the EMD and Seewave R libraries.

Asada, T. (1982). *Earthquake Prediction Techniques: Their Application in Japan*, University of

Baykut, S., Akgul, T. & Seyis, C. (2010). Observation and removal of daily quasi-periodic

Bella, F. & Plastino, W. (1999). *Radon time series analysis at LNGS, II*, LNGS Annual Report

Bolt, B.A. (2004). *Earthquakes* (5th edition). W.H. Freeman & Co, New York, USA. ISBN 0-

Boulton, G.S. (1992). Quaternary, In: *Geology of England and Wales*, Duff, P.McL.D. & Smith,

Brockwell, P.J. & Davis, R.A (2009). *Time Series: Theory and Methods*, Springer. ISBN 978-1-

Chyi, L., Chou, C-Y., Yang, F. & Chen, C-H. (2001). *Continuous Radon Measurements in Faults* 

Climent, H., Tokonami, S. & Furukawa, M. (1999). Statistical Analysis Applied to Radon and

Cramer, H. (1974, repub. 1999). *Mathematical Methods of Statistics*, Princeton University Press,

*and Earthquake Precursor Pattern Recognition*, Western Pacific Earth Sciences, Vol.1,

Natural Events, *Proc. Radon in the Living Environment*, Athens, Greece, April 1999,

1999, Gran Sasso National Laboratory, Italy, pp.199-203.

A.J. (Eds.),The Geological Society, London, pp.413-444.

components in soil radon data, *Rad. Meas.*, Vol.45, No.7, pp. 872-879,

demonstration that such a relationship might exist.

**5. Acknowledgements** 

**6. References** 

Tokyo Press, Japan.

7167-5618-8.

4419-0319-8.

pp.241-253.

No.2, pp.227-246.

ISBN 978-06-910-0547-8.

facilitating comparison in terms of probability of occurrence.

dissemination of earlier stages of this and related research.

doi:10.1016/j.radmeas.2010.04.002.

corresponds temporally to the later Manchester earthquakes but the late July period does not correspond temporally to any recorded earthquake in the region.

Noting the relative magnitudes and proximities of the Manchester earthquakes compared to the English Channel earthquake, it is perhaps surprising the there is no similarly welldefined time-domain anomaly in the radon time-series at the time of Manchester earthquakes. The reasons for this are not currently understood but there are essentially two possibilities. First, any such temporal correspondence between earthquake and timedomain anomalies in the radon time-series is coincidental, as discussed below. Second, the temporal correspondence is not coincidental but the natures of the geologies at and between Northampton and Manchester are such that any earthquake-related radonstimuli are blurred and attenuated. For information on geology see, for example, Boulton, 1992; Hains & Horton, 1969; Poole *et al.*, 1968; Smith *et al.*, 2000 and Toghill, 2003. However, having also identified time-domain radon anomalies which temporally correspond to the Market Rasen earthquake of 27 February 2008 (Crockett & Gillmore, 2010), the second reason is arguably more probable but more data are required, both earthquake and radon, to investigate this more fully.

In these data, another potential geophysical explanation is possible: i.e. lunar-tidal influences, which have been reported for TS1b in terms of cyclic variations in radon concentration (Crockett *et al.*, 2006b). Tidal influences might account for the periods of 24 h coherence which temporally correspond to tidal maxima associated with the full moons of 24 July, 22 August, 21 September and 20 November, but not the absence of any coherence around the 21 October full moon. This potential explanation is further weakened by the absence of (a) consistent coherence around the new-moon maxima and (b) consistent correlation or phase coherence around both sets of maxima. Also, a tidal-maximum explanation does not account for the anomalies in the February 2008 time-series, which temporally correspond to the Market Rasen earthquake but not a tidal maximum. Lastly, despite the apparent similarities in timing, it is unknown whether there is any lunar-tidal influence on the English Channel, Dudley, Manchester and North Sea earthquakes and no such influence has been reported for UK earthquakes in general.
