**4. Methods for detecting anomalies in radon time series**

An anomaly in radon concentration is defined as a significant deviation from the mean value. Due to the high background noise of radon time series, it is often impossible to distinguish an anomaly caused solely by a seismic event from one resulting from meteorological or hydrological parameters. For this reason, the implementation of more advanced statistical methods in data evaluation is important (Belyaev, 2001; Cuomo et al., 2000; Negarestani et al., 2003; Sikder & Munakata, 2009; Steinitz et al., 2003). In our research, radon has been monitored in several thermal springs (Gregorič et al., 2008; Zmazek et al., 2002a; Zmazek et al., 2006) and in soil gas (Zmazek et al., 2002b) and different approaches to distinguishing radon anomalies were applied.

#### **4.1 Standard deviation**

182 Earthquake Research and Analysis – Statistical Studies, Observations and Planning

of the bedrock and soil in areas of crustal discontinuities, such as fractures and fault zones, promotes intense degassing fluxes, which causes higher soil gas radon concentrations on the ground surface above active fault zones. Although several measurements, experiments and models have been performed, the understanding of the mechanism of radon anomalies and their connection to earthquakes is still inadequate (Chyi et al., 2010; King, 1978; Ramola et al., 1990). Before the earthquake stress in the Earth's crust builds up causing a change in the strain field; the formation of new cracks and pathways under the tectonic stress leads to changes in gas transport and a rise in volatiles from the deep layers to the surface. In fact, fluids play a widely recognised role in controlling the strength of crustal fault zones (Hickman et al., 1995). Anomalous changes of radon concentration are closely linked to changes in fluid flow and, therefore, also to highly permeable areas along fault zones. Large faults are not discrete surfaces but rather a braided array of slip surfaces encased in a highly fractured and often hydrothermally altered transition – or "damage" – zone. Episodic fracturing and brecciation are followed by cementation and crack healing, leading to cycles

of permeability enhancement and reduction along faults (Hickman et al., 1995).

increasing rate.

Several mechanisms have been proposed, which could explain the relationship between radon anomalies and earthquake. Two models of earthquake precursors are discussed by Mjachkin et al. (1975), with a common principle: at a certain preparation stage, a region of many cracks is formed. According to the dilatancy-diffusion model (Martinelli, 1991; Mjachkin et al., 1975), the increase in tectonic stress causes the extension and opening of favourably-oriented cracks in a porous, cracked, saturated rock. Water flows into the opened cracks, drying the rock near each pore and finally resulting in a decrease of pore pressure in the total earthquake preparation zone. Water from the surrounding medium diffuses into the zone. The increased water-rock surface area, due to cracking, leads to an increase in radon transfer from the rock matrix to the water. At the end of the diffusion period, the appearance of pore pressure and the increased number of cracks leads to the main rupture. According to the crack-avalanche model (Mjachkin et al., 1975), the increasing tectonic stress leads to the formation of a cracked focal rock zone, with slowly altering volume and shape. At a certain stage – when the whole focal zone becomes unstable – the cracks quickly concentrate near the fault surface, triggering the main rupture. An alternate mechanism for earthquake precursory study, based on stress-corrosion theory, has been proposed by Anderson and Grew (1977). According to them, the observed radon anomalies are due to slow crack growth controlled by stress corrosion in a rock matrix saturated by ground waters. King (1978) has proposed a compression mechanism for radon release, whereby anomalous high radon release may be due to an increase of crustal compression before an impending earthquake, that squeezes out soil gas into the atmosphere at an

Toutain and Baubron (1999) observed that gas transfer within the upper crust is affected by strains less than 10–7, much smaller than those causing earthquakes. According to Dobrovolsky (1979), the radius of the effective precursory manifestation zone depends on

Where *R*D is the strain radius in km and *M*L is the magnitude of the earthquake. Considering the Earth's crust to be an anisotropic medium, this law can be modified according to the

��=100.43×*M*L (1)

the earthquake magnitude and can be calculated using the empirical equation:

A very common practice in determining radon anomalies is the use of standard deviation. The average radon concentration is calculated for different periods with regard to the nature of yearly cycles of radon concentration. In the case of radon in soil gas, the mean value of radon concentration is calculated separately for four seasons (spring, summer, autumn and winter) based on the air and soil temperature.

Fig. 1. Continuous radon concentration recorded in soil gas at Krško basin. Straight lines represent the mean value and two standard deviations of the radon concentration. Local seismicity is expressed in terms of local magnitude (*M*L) and the distance between the measuring location and the earthquake epicentre (*D*). Radon anomalies are *C*Rn values outside the ±2σ region.

Radon as an Earthquake Precursor – Methods for Detecting Anomalies 185

the ground, thus diluting the radon concentration. Therefore, deviations from this rule during these periods – when the time gradient of barometric pressure, Δ*P*/Δ*t*, and the time gradient of radon concentration, Δ*C*Rn/Δ*t*, in soil gas have the same sign – can be considered to be radon anomalies (Zmazek et al., 2002b). A clear negative correlation between the time gradient of radon concentration and the time gradient of barometric pressure can be seen in Fig. 2a, when no seismic activity is present. The radon anomaly, characterised by positive correlation of time gradients, is marked in Fig. 2b. Anomalous behaviour in radon concentration started 14 days before the earthquake with a local magnitude of 2.6, and

Machine learning methods have been successfully applied to many problems in the environmental sciences (Džeroski, 2002). In the case of radon as an earthquake precursor, it must be considered – as discussed in section 2.1 – that the variation in radon concentration is controlled not only by geophysical phenomena in the Earth's crust, but also by the environmental parameters associated with the radon monitoring sites. With machine learning methods, a model for the prediction of radon concentration can be built, taking into account various environmental parameters (e.g., barometric pressure, rainfall, and air and soil temperature). The aim is to identify radon anomalies which might be caused by seismic events. The application of artificial neural networks (Negarestani et al., 2002, 2003; Torkar et al., 2010), regression and model trees (Džeroski et al., 2003; Sikder & Munakata, 2009; Zmazek et al., 2003; Zmazek et al., 2006) and some other methods (Sikder & Munakata, 2009; Steinitz et al., 2003) have proven to be useful means of extracting radon anomalies caused by

An artificial neural network (ANN) is a well-known computational structure inspired by the operation of the biological neural system (Jain et al., 1996) and it is a well-established tool, being used widely in signal processing, pattern recognition and other applications. An ANN consists of a set of units (neurons, nodes), and a set of weighted interconnections among them (links). The organisation of neurons and their interconnections defines the net topology. The inputs are grouped in an input layer, the outputs in an output layer and all the other units in so-called hidden layers. The algorithm repeatedly adjusts the weights to minimise the mean square error between the actual output vector and the desired network output vector. The universal approximator functional form of ANNs is well-suited for the requirements of modelling the non-linear dependency of radon concentrations on multiple variables. Among a number of various topologies, training algorithms and architectures of ANNs, the traditional multilayer perceptron (MLP) with a conjugate gradient learning algorithm was chosen in the case of analysing the soil gas radon concentration time series at the Krško basin (Torkar et al., 2010). The series was first split into seismically non-active periods (NSA) and seismically active periods (SA), adjusting the duration of the seismic window from 0 to 10 days before and after the earthquake and with the purpose of investigating the influence of a complete earthquake event on radon concentration (the preparation phase, the earthquake itself and aftershocks). The ANN of the MLP type was trained with each of the NSA datasets, which were divided into three sets: the training set (60%), the cross-validation set (15%) and the test set (25%). The ANN was trained with the

ended a few days after the earthquake.

**4.3 Machine learning methods** 

**4.3.1 Artificial neural networks** 

seismic events.

In contrast to soil gas, radon in ground or spring water is greatly influenced by the hydrologic cycle, which has to be considered during the data analysis. To define the mean and standard deviation, anomalously high and low values – which may cause unnecessary high deviation and perturb the real anomalies – have to be neglected. The periods when radon concentration deviates by more than ±2σ from the related seasonal value are considered as radon anomalies that are possibly caused by earthquake events and not by meteorological parameters (Ghosh et al., 2007; Gregorič et al., 2008; Virk et al., 2002; Zmazek et al., 2002b). Fig. 1 shows an anomalous radon concentration, exceeding 2σ above the average value, which appeared approximately 10 days before the occurrence of three earthquakes with magnitudes from 1.8 to 3.2.

#### **4.2 Relationship between radon exhalation and barometric pressure**

An inverse relationship exists between the time derivative of radon concentration in soil gas and the time derivative of barometric pressure (as was discussed previously in section 2.1). A decrease in barometric pressure causes an increase in radon exhalation from the ground, whereas during periods of rising pressure, air with low radon concentration is forced into

Fig. 2. The time gradient of radon concentration in soil gas and the time gradient of barometric pressure during two periods at the Krško basin: a) the period without local seismic activity, b) the seismically active period, whereby the radon anomaly 14 days before the earthquake is marked by the green rectangle. The earthquake is expressed in terms of local magnitude (*M*L) and the distance between the measuring location and the earthquake epicentre (*D*).

the ground, thus diluting the radon concentration. Therefore, deviations from this rule during these periods – when the time gradient of barometric pressure, Δ*P*/Δ*t*, and the time gradient of radon concentration, Δ*C*Rn/Δ*t*, in soil gas have the same sign – can be considered to be radon anomalies (Zmazek et al., 2002b). A clear negative correlation between the time gradient of radon concentration and the time gradient of barometric pressure can be seen in Fig. 2a, when no seismic activity is present. The radon anomaly, characterised by positive correlation of time gradients, is marked in Fig. 2b. Anomalous behaviour in radon concentration started 14 days before the earthquake with a local magnitude of 2.6, and ended a few days after the earthquake.

#### **4.3 Machine learning methods**

184 Earthquake Research and Analysis – Statistical Studies, Observations and Planning

In contrast to soil gas, radon in ground or spring water is greatly influenced by the hydrologic cycle, which has to be considered during the data analysis. To define the mean and standard deviation, anomalously high and low values – which may cause unnecessary high deviation and perturb the real anomalies – have to be neglected. The periods when radon concentration deviates by more than ±2σ from the related seasonal value are considered as radon anomalies that are possibly caused by earthquake events and not by meteorological parameters (Ghosh et al., 2007; Gregorič et al., 2008; Virk et al., 2002; Zmazek et al., 2002b). Fig. 1 shows an anomalous radon concentration, exceeding 2σ above the average value, which appeared approximately 10 days before the occurrence of three

An inverse relationship exists between the time derivative of radon concentration in soil gas and the time derivative of barometric pressure (as was discussed previously in section 2.1). A decrease in barometric pressure causes an increase in radon exhalation from the ground, whereas during periods of rising pressure, air with low radon concentration is forced into

Fig. 2. The time gradient of radon concentration in soil gas and the time gradient of barometric pressure during two periods at the Krško basin: a) the period without local seismic activity, b) the seismically active period, whereby the radon anomaly 14 days before the earthquake is marked by the green rectangle. The earthquake is expressed in terms of local magnitude (*M*L) and the distance between the measuring location and the earthquake

earthquakes with magnitudes from 1.8 to 3.2.

epicentre (*D*).

**4.2 Relationship between radon exhalation and barometric pressure** 

Machine learning methods have been successfully applied to many problems in the environmental sciences (Džeroski, 2002). In the case of radon as an earthquake precursor, it must be considered – as discussed in section 2.1 – that the variation in radon concentration is controlled not only by geophysical phenomena in the Earth's crust, but also by the environmental parameters associated with the radon monitoring sites. With machine learning methods, a model for the prediction of radon concentration can be built, taking into account various environmental parameters (e.g., barometric pressure, rainfall, and air and soil temperature). The aim is to identify radon anomalies which might be caused by seismic events. The application of artificial neural networks (Negarestani et al., 2002, 2003; Torkar et al., 2010), regression and model trees (Džeroski et al., 2003; Sikder & Munakata, 2009; Zmazek et al., 2003; Zmazek et al., 2006) and some other methods (Sikder & Munakata, 2009; Steinitz et al., 2003) have proven to be useful means of extracting radon anomalies caused by seismic events.

#### **4.3.1 Artificial neural networks**

An artificial neural network (ANN) is a well-known computational structure inspired by the operation of the biological neural system (Jain et al., 1996) and it is a well-established tool, being used widely in signal processing, pattern recognition and other applications. An ANN consists of a set of units (neurons, nodes), and a set of weighted interconnections among them (links). The organisation of neurons and their interconnections defines the net topology. The inputs are grouped in an input layer, the outputs in an output layer and all the other units in so-called hidden layers. The algorithm repeatedly adjusts the weights to minimise the mean square error between the actual output vector and the desired network output vector. The universal approximator functional form of ANNs is well-suited for the requirements of modelling the non-linear dependency of radon concentrations on multiple variables. Among a number of various topologies, training algorithms and architectures of ANNs, the traditional multilayer perceptron (MLP) with a conjugate gradient learning algorithm was chosen in the case of analysing the soil gas radon concentration time series at the Krško basin (Torkar et al., 2010). The series was first split into seismically non-active periods (NSA) and seismically active periods (SA), adjusting the duration of the seismic window from 0 to 10 days before and after the earthquake and with the purpose of investigating the influence of a complete earthquake event on radon concentration (the preparation phase, the earthquake itself and aftershocks). The ANN of the MLP type was trained with each of the NSA datasets, which were divided into three sets: the training set (60%), the cross-validation set (15%) and the test set (25%). The ANN was trained with the

Radon as an Earthquake Precursor – Methods for Detecting Anomalies 187

Decision trees are machine-learning methods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree, where each internal node contains a test on an attribute, each branch corresponds to an outcome of the test, and each leaf node gives a prediction for the value of the class variable (Džeroski, 2001; Loh, 2011). Regression trees are designed for dependent variables that take continuous or ordered discrete values. Like classical regression equations, they predict the value of a dependent variable (called the class) from

Fig. 5. A schematic description of the different stages of radon data series analysis with

mechanism used to prevent the tree from over-fitting data is tree pruning.

The model in each leaf can be either a linear equation or just a constant; trees with linear equations in the leaves are also called model trees. Tree construction proceeds recursively, starting with the entire set of training examples. At each step, the most discriminating attribute is selected as the root of the sub-tree and the current training set is split into subsets according to the values of the selected attribute. For continuous attributes, a threshold is selected and two branches are created, based on that threshold. The attributes that appear in the training set are considered to be thresholds. Tree construction stops when the variance of the class values of all examples in a node is small enough. These nodes are called leaves and are labelled with a model for predicting the class value. An important

Regression (RT) and model trees (MT), as implemented with the WEKA data mining suite (Witten & Frank, 1999), were used for predicting radon concentration from meteorological parameters in the case of radon time series in soil gas at the Krško basin (Zmazek et al., 2003; Zmazek et al., 2005) and in the thermal spring water in Zatolmin (Zmazek et al., 2006).

the values of a set of independent variables (called attributes).

**4.3.2 Decision trees** 

machine learning methods.

training and cross-validation set, while the test set was used to verify its performance. The topology of the ANN generated for each NSA dataset is shown in Fig. 3.

Fig. 3. The ANN topology for learning radon concentration dependency on environmental parameters.

In the testing phase, the correlation between the measured (m-*C*Rn) and predicted (p-*C*Rn) radon concentration in NSA periods was compared to the correlation between the measured and predicted radon concentration in the entire dataset (NSA and SA). The difference between the correlation coefficients might indicate a period of seismically induced radon anomaly. The ratio between the measured and predicted values (m-*C*Rn/p-*C*Rn)1 represents the discrepancy between both values (Fig. 4).

Fig. 4. The ratio between the measured and predicted radon concentration (m-*C*Rn/p-*C*Rn)1 using an ANN in the case of soil gas radon in the Krško basin for a seismic window of ±7 days. A radon anomaly, possibly caused by a seismic event, is observed when the signal exceeds the threshold value of 0.2.

A radon anomaly is held to be when the absolute value of signal (m-*C*Rn/p-*C*Rn)1 exceeds the predefined threshold of 0.2. The ANN in this case performed the best in the case of a seismic window of ±7 days (indicating the length of the period of pre- and post-seismic changes).
