**1.1.2 Telemetry, A/D conversion and data storage**

Seismic stations may be designed to be either portable or permanent. Portable ones are equipped with on-site data storage devices such as internal memories and external hard drives and are specially deployed for medium time periods. In order to avoid periodic visits to collect data in remote areas and ensure continuity in the historical records, permanent stations are installed by applying telemetry technologies, see Fig. 1. A typical analog radio telemetry system comprises —in the transmitting side— a sensor (see Sec. 1.1.1), a modulator, a radio and an antenna; similarly, in the receiving side, it is composed by an antenna, a radio, a demodulator or discriminator and an A/D system coupled to a storage device. The modulator usually corresponds to a voltage controlled oscillator with frequency modulation (Havskov & Alguacil, 2004, Chap. 8) followed by a second modulation introduced by the radio and aimed to transmit the signals in VHF or UHF bands1. When signals are digitized on-site, a digital telemetry system is used with a variety of modulation schemes (Bormann, 2009, Chap. 7). Moreover, recent deployments of seismic arrays have taken advantage of mobile telephone

<sup>1</sup> VHF band: 30 to 300 MHz; UHF band: 300 MHz to 3 GHz.

networks and internet technologies (Vargas-Jimenez & Rincón-Botero, 2003; Werner-Allen et al., 2006). Readers that require a thorough introduction to data transmission are referred to (Temes & Schultz, 1998) and (Eskelinen, 2004) for the analog case and to (Hsu, 2003) for both analog and digital cases.

The digital acquisition of seismic signals involves stages for signal conditioning and A/D conversion. The first one includes amplifiers and antialias filters, required to scale low-level outputs of passive sensors and fulfill the Nyquist criterion2, respectively. The A/D conversion is carried out by using analog-to-digital converters (ADCs), typically having sampling rates of 50, 100 or 200 Hz and resolutions between 12 and 24 bit. Individual events are extracted from the continuous records by applying segmentation methods, see Sec. 1.3.1. Further details about A/D conversion and filtering can be found in publications by Scherbaum (1994; 2002; 2007).

Segmented seismic events can be stored in a variety of file formats. The choice of a particular format depends on technical convenience for both space and compatibility. Plain text files are simple enough that most programs can read them because they use the ASCII standard to represent characters (Brown & Musil, 2004); however, text files are neither optimized in size according to the number of bits of the corresponding ADC nor suitable to embed codes indicating formatting and additional capabilities. These weaknesses are overcome by special binary formats such as the Seismic Unified Data System (SUDS), the Seismic Analysis Code (SAC), the SEISmic ANalysis system (SEISAN), the Guralp Compressed Format (GCF) and the Standard for the Exchange of Earthquake Data (SEED).

#### **1.2 Seismic waveforms and classes of volcanic earthquakes**

Seismic signals reveal the propagation of elastic waves through the ground. An earthquake generates two different types of such waves; namely body waves and surface waves (Kayal, 2008). The former propagate within a body of rock; the latter travel along the ground surface. A further distinction is made in body waves between the primary wave (P-wave) and the secondary or shear wave (S-wave). The P-wave is faster than the S-wave; therefore, it appears before the S-wave in the seismograph record as shown in Fig. 3.

The vibrations following the arrival of a wave are called *coda*. Since the coda of the P-wave is often hidden by the onset of the S-wave, the term coda usually refers to S-coda (i.e. the trailing part of the seismogram) unless indicated otherwise. Refer again to Fig. 3.

Fig. 3. Parts of a seismic signal.

<sup>2</sup> The sampling rate must be greater than twice the highest frequency component of the signal.

networks and internet technologies (Vargas-Jimenez & Rincón-Botero, 2003; Werner-Allen et al., 2006). Readers that require a thorough introduction to data transmission are referred to (Temes & Schultz, 1998) and (Eskelinen, 2004) for the analog case and to (Hsu, 2003) for

The digital acquisition of seismic signals involves stages for signal conditioning and A/D conversion. The first one includes amplifiers and antialias filters, required to scale low-level outputs of passive sensors and fulfill the Nyquist criterion2, respectively. The A/D conversion is carried out by using analog-to-digital converters (ADCs), typically having sampling rates of 50, 100 or 200 Hz and resolutions between 12 and 24 bit. Individual events are extracted from the continuous records by applying segmentation methods, see Sec. 1.3.1. Further details about A/D conversion and filtering can be found in publications by Scherbaum (1994; 2002;

Segmented seismic events can be stored in a variety of file formats. The choice of a particular format depends on technical convenience for both space and compatibility. Plain text files are simple enough that most programs can read them because they use the ASCII standard to represent characters (Brown & Musil, 2004); however, text files are neither optimized in size according to the number of bits of the corresponding ADC nor suitable to embed codes indicating formatting and additional capabilities. These weaknesses are overcome by special binary formats such as the Seismic Unified Data System (SUDS), the Seismic Analysis Code (SAC), the SEISmic ANalysis system (SEISAN), the Guralp Compressed Format (GCF) and

Seismic signals reveal the propagation of elastic waves through the ground. An earthquake generates two different types of such waves; namely body waves and surface waves (Kayal, 2008). The former propagate within a body of rock; the latter travel along the ground surface. A further distinction is made in body waves between the primary wave (P-wave) and the secondary or shear wave (S-wave). The P-wave is faster than the S-wave; therefore, it appears

The vibrations following the arrival of a wave are called *coda*. Since the coda of the P-wave is often hidden by the onset of the S-wave, the term coda usually refers to S-coda (i.e. the trailing

Time (s)

coda

0 20 40 60 80 100

the Standard for the Exchange of Earthquake Data (SEED).

**1.2 Seismic waveforms and classes of volcanic earthquakes**

before the S-wave in the seismograph record as shown in Fig. 3.

Counts

Fig. 3. Parts of a seismic signal.

part of the seismogram) unless indicated otherwise. Refer again to Fig. 3.

P-wave onset S-wave onset

<sup>2</sup> The sampling rate must be greater than twice the highest frequency component of the signal.

both analog and digital cases.

2007).

Volcanic earthquakes are typically categorized into four classes according to their mode of generation and the time-frequency behavior of their associated seismic signals. The first criterion —the mode of generation— corresponds to two distinct types of processes occurring either in the solid rock or in the magmatic and hydrothermal fluids within the volcanic edifice. A variety of names have been used to describe the four classes of volcanic earthquakes (McNutt, 2005; Zobin, 2003); however, nowadays, the following denominations are widely accepted: *volcano tectonic* (VT) events, *long period* (LP) events, *tremors* (TR), and *hybrid* (HB) events; see Fig. 4. Concise explanations including their geophysical origin, time-frequency characteristics and importance for monitoring and forecasting volcanic activity are given below. Some special events are observed in particular volcanoes, e.g. multiphase (MP) earthquakes at Mt. Merapi volcano (Hidayat et al., 2000); and flute tremors, spasmodic tremor (Gil-Cruz, 1999) and 'tornillo'-type signals at Galeras volcano (Narváez-M. et al., 1997).

Tectonic earthquakes such as teleseismic (TS), regional (RE) and local (TL) ones are also observed at the seismic volcanic stations. Furthermore, rock falls (RF), explosions (EX), landslides (LS), avalanches, icequakes (IC) and even lightnings are also recorded by the instruments. Descriptions for those non-volcanic events are not given here due to space constraints. Details of the TS, RE and TL classes are available in (Kayal, 2008).

Fig. 4. Examples of seismic volcanic signals observed at Nevado del Ruiz Volcano, together with their associated spectrograms. Events were recorded at Olleta station in 2006. Spectrograms were scaled to highlight the top 50 dB of the signals.

#### **1.2.1 Volcano Tectonic (VT) earthquakes**

These earthquakes are indicative of fractures in the solid rock, which are caused by either pressure from magmatic intrusion into the volcano or stress relaxation due to a withdrawal of magma in the crust (Guillier & Chatelain, 2006). VT waveforms are characterized by clear and impulsive arrivals of P and S waves and a short coda typically lasting 7 to 15 s. In the spectral domain, VT events are characterized by a relatively high-frequency content with energy peaking in the band from 6 to 8 or 10 Hz (Chouet, 1996; Guillier & Chatelain, 2006), little energy in the frequencies below 3.5 Hz and significant components up to 15 or 20 Hz, see Fig. 4(a). It is important to monitor VT events because an increase in such seismic activity has been found to be often a first sign of volcanic unrest (Trombley, 2006); nonetheless, their consideration as eruption precursors may not be reliable since the activity may last from days to months or even years (Chouet, 1996). Therefore, VT events must be always correlated with the locations of occurrence and the other classes of volcanic earthquakes (Londoño-Bonilla, 2010).

#### **1.2.2 Long Period (LP) earthquakes**

These events are caused by pressure changes in channels filled with magmatic and hydrothermal fluids. Such changes, in turn, are produced by unsteady mass transport and/or thermodynamics of the fluid (Chouet, 1996). The interaction between the surrounding solid and the aforementioned pressure fluctuations constitutes a resonator system (Kumagai & Chouet, 1999) that exhibits decaying harmonic oscillations. LP waveforms are characterized by more or less emergent first arrivals, a lack of clear S waves (Lesage, 2009) and coda waves lasting up to 1.5 minutes (Gil-Cruz & Chouet, 1997). In the spectral domain, energies are concentrated in low frequencies ranging from 0.5 to 3 Hz according to Trombley (2006) or up to 5 Hz according to Chouet (1996). Weak energies at higher frequencies, up to 13 Hz, are only present at the onset. These time and frequency properties can be examined in the sample signal shown in Fig. 4(b).

The forecasting potential of LP events has been pointed out by several studies. They commonly precede and accompany volcanic eruptions (Chouet, 2003) and their analysis may provide an understanding of the dynamic state and mechanical properties of the fluids at their sources.

#### **1.2.3 Tremors (TR)**

Tremors are produced by the same phenomena that cause LP earthquakes but their oscillations may last from minutes to days, and sometimes for months or longer (Chouet, 1996). Such an extended manifestation reveals the presence of a sustained excitation. Trombley (2006) claims that such a sustained excitation is caused by extra pushes that the waves of pressure, traveling through the magma, get as a result of pressure changes coming from below.

There is no significant difference between the signal characteristics of LP and TR events, except for the longer duration of the latter. The study of TR earthquakes is considered crucial for the investigation of gas/liquid within a magma conduit (Martinelli, 1997) and also for improving eruption forecasting since, as LP earthquakes, TR events have been frequently observed prior to volcanic eruptions (Lesage et al., 2002).

#### **1.2.4 Hybrid (HB) earthquakes**

6 Will-be-set-by-IN-TECH

These earthquakes are indicative of fractures in the solid rock, which are caused by either pressure from magmatic intrusion into the volcano or stress relaxation due to a withdrawal of magma in the crust (Guillier & Chatelain, 2006). VT waveforms are characterized by clear and impulsive arrivals of P and S waves and a short coda typically lasting 7 to 15 s. In the spectral domain, VT events are characterized by a relatively high-frequency content with energy peaking in the band from 6 to 8 or 10 Hz (Chouet, 1996; Guillier & Chatelain, 2006), little energy in the frequencies below 3.5 Hz and significant components up to 15 or 20 Hz, see Fig. 4(a). It is important to monitor VT events because an increase in such seismic activity has been found to be often a first sign of volcanic unrest (Trombley, 2006); nonetheless, their consideration as eruption precursors may not be reliable since the activity may last from days to months or even years (Chouet, 1996). Therefore, VT events must be always correlated with the locations of occurrence and the other classes of volcanic earthquakes (Londoño-Bonilla,

These events are caused by pressure changes in channels filled with magmatic and hydrothermal fluids. Such changes, in turn, are produced by unsteady mass transport and/or thermodynamics of the fluid (Chouet, 1996). The interaction between the surrounding solid and the aforementioned pressure fluctuations constitutes a resonator system (Kumagai & Chouet, 1999) that exhibits decaying harmonic oscillations. LP waveforms are characterized by more or less emergent first arrivals, a lack of clear S waves (Lesage, 2009) and coda waves lasting up to 1.5 minutes (Gil-Cruz & Chouet, 1997). In the spectral domain, energies are concentrated in low frequencies ranging from 0.5 to 3 Hz according to Trombley (2006) or up to 5 Hz according to Chouet (1996). Weak energies at higher frequencies, up to 13 Hz, are only present at the onset. These time and frequency properties can be examined in the sample

The forecasting potential of LP events has been pointed out by several studies. They commonly precede and accompany volcanic eruptions (Chouet, 2003) and their analysis may provide an understanding of the dynamic state and mechanical properties of the fluids at their

Tremors are produced by the same phenomena that cause LP earthquakes but their oscillations may last from minutes to days, and sometimes for months or longer (Chouet, 1996). Such an extended manifestation reveals the presence of a sustained excitation. Trombley (2006) claims that such a sustained excitation is caused by extra pushes that the waves of pressure, traveling

There is no significant difference between the signal characteristics of LP and TR events, except for the longer duration of the latter. The study of TR earthquakes is considered crucial for the investigation of gas/liquid within a magma conduit (Martinelli, 1997) and also for improving eruption forecasting since, as LP earthquakes, TR events have been frequently observed prior

through the magma, get as a result of pressure changes coming from below.

**1.2.1 Volcano Tectonic (VT) earthquakes**

**1.2.2 Long Period (LP) earthquakes**

signal shown in Fig. 4(b).

sources.

**1.2.3 Tremors (TR)**

to volcanic eruptions (Lesage et al., 2002).

2010).

The occurrence of a VT earthquake may trigger a LP event or vice versa (Trombley, 2006). As a result, a combined event — so-called HB earthquake— appears, containing a mixture of the two former ones. HB earthquakes may be episodic or be related to a steady process as, for instance, the interaction between magmatic heat and underground water systems (Guillier & Chatelain, 2006).

The longest HB events last a few tens of seconds (Neuberg, 2000). Chouet (1996) highlights two particular properties of HB seismic signals: a high-frequency onset and a LP-like coda. The first property is caused by a VT event preceding the LP event. The ambiguous physical origin of HB earthquakes limits their use for forecasting purposes (Harrington & Brodsky, 2007).

#### **1.3 Pattern recognition systems**

Duin et al. (2002) define PR as an engineering field that studies theories and methods for designing machines that are able to recognize patterns in noisy data. Many of the techniques and methods in the PR field are borrowed from other fundamental and applied disciplines such as DSP, statistics and machine learning. DSP techniques are mainly applied in the first two stages of the PR system pipeline, see Fig. 5. Statistical and machine learning methods are used in the classification task. The remaining stage —representation— is the focus of interest for PR practitioners and researchers working towards the solution of the following questions: (1) how to represent real-world objects or phenomena in such a way that measurements coming from the sensor stage can be appropriately arranged, e.g. in a vector space, to be provided to the classification methods? and (2) is the representation technically suitable in terms of discriminant power and computational complexity? In addition, the PR community is also devoted to modify classification methods in order to adapt them to the particular technical requirements of the application.

Fig. 5. Building blocks of a PR system.

#### **1.3.1 Sensor subsystem**

Consider the particular case of the automated identification of volcanic earthquakes and refer again to Fig. 5. Sensors, as described in Sec. 1.1.1, are seismometers. The subsequent stage —data processing— includes data storage and/or telemetered transmission, A/D conversion (Sec. 1.1.2), and segmentation. This last task in the data processing stage is carried out with a two-fold purpose: (1) to detect the events of interest in the whole continuous raw data; (2) to save space for data storage. In real time implementations, the conventional method for segmenting seismic events is the so-called short-term average - long-term average (STA/LTA) trigger (Havskov & Ottemöller, 2010). Since a detailed discussion of the STA/LTA trigger method is out of the scope of this chapter, the reader is referred to (Havskov & Alguacil, 2004).

#### **1.3.2 Representation approaches**

The issue of representation has been traditionally addressed by extracting a set of discriminant features from the segmented sensor measurements. Those features span a vector space which is consequently known as the *feature space*. Good features should allow the building of accurate classifiers to partition the space into decision regions that are associated to the classes to be distinguished —types of volcanic earthquakes in this case. Let *x*(*t*) be a segment of the continuous record containing a seismic event and let *x*(*n*) be its associated discrete-time sequence. *<sup>N</sup>* features extracted from *<sup>x</sup>*(*n*) are arranged in a *feature vector <sup>x</sup>* <sup>∈</sup> **<sup>R</sup>***N*. Typical features extracted from the morphology of a seismic signal in the time-domain are amplitudes and durations of the waves shown in Fig. 3.

The dissimilarity representation has been proposed as a feasible alternative to represent signals for PR (Pekalska & Duin, 2005). For a given signal *x*(*n*), this representation approach consists in computing a dissimilarity measure between either *x*(*n*) or some associated transform and a set of *M* reference signals belonging to a so-called *representation set*. The reference signals are called *prototypes* whenever the set is composed by archetypal examples of each class. Similarly to the feature-based approach, dissimilarities are arranged as a *dissimilarity vector <sup>d</sup>* <sup>∈</sup> **<sup>R</sup>***<sup>M</sup>* in the so-called *dissimilarity space*. Dissimilarity measures typically correspond to metric distances; however, relaxed versions of the metrics are also common in practical applications, e.g. the weighted edit distance and the modified Hausdorff distance which are asymmetric.

Pekalska & Duin (2005) advocate the use of dissimilarity representations instead of classical feature-based ones by presenting several conceptual and practical motivations. Here it is worthwhile to mention the following practical ones: dissimilarities can be derived from raw data such as images, spectra or time samples; dissimilarity-based classifiers outperform the nearest-neighbor rule.

#### **1.3.3 Classification approaches**

The last block in Fig. 5 consists in applying classification algorithms to infer a class label *ω*ˆ(*x*) ∈ Ω, where Ω = {*ω*1,..., *ωK*} is the set of labels for the *K* different types of volcanic earthquakes to be identified. According to the nature of the classification algorithms, three different approaches can be distinguished (Jain et al., 2000): similarity-based classification, density-based classifiers and geometric classifiers. These approaches are succinctly described below, including the relatively recent strategy of combining multiple classifiers. A thorough presentation of the classification algorithms can be found in several good textbooks on the subject of PR, such as the ones by Duda et al. (2001), Webb (2002), van der Heijden et al. (2004), Theodoridis & Koutroumbas (2006) and Bishop (2006).

#### Similarity-based classifiers

This classification approach is based on the elementary rationale of resemblance, i.e. similar events —volcanic earthquakes in our problem— should be identified as belonging to the same class. Among the classifiers in this category, the following two are widely used: the nearest mean classifier (NMC), and the *k*-nearest neighbor (*k*-NN) rule. Decision in the first one is taken by examaning the class label of the closest vector among the mean vectors per class; in the second one, the closest event in the vector space defines the assigned class label *ω*ˆ(*x*) for a new incoming event to be identified.

Density-based classifiers

8 Will-be-set-by-IN-TECH

The issue of representation has been traditionally addressed by extracting a set of discriminant features from the segmented sensor measurements. Those features span a vector space which is consequently known as the *feature space*. Good features should allow the building of accurate classifiers to partition the space into decision regions that are associated to the classes to be distinguished —types of volcanic earthquakes in this case. Let *x*(*t*) be a segment of the continuous record containing a seismic event and let *x*(*n*) be its associated discrete-time sequence. *<sup>N</sup>* features extracted from *<sup>x</sup>*(*n*) are arranged in a *feature vector <sup>x</sup>* <sup>∈</sup> **<sup>R</sup>***N*. Typical features extracted from the morphology of a seismic signal in the time-domain are amplitudes

The dissimilarity representation has been proposed as a feasible alternative to represent signals for PR (Pekalska & Duin, 2005). For a given signal *x*(*n*), this representation approach consists in computing a dissimilarity measure between either *x*(*n*) or some associated transform and a set of *M* reference signals belonging to a so-called *representation set*. The reference signals are called *prototypes* whenever the set is composed by archetypal examples of each class. Similarly to the feature-based approach, dissimilarities are arranged as a *dissimilarity vector <sup>d</sup>* <sup>∈</sup> **<sup>R</sup>***<sup>M</sup>* in the so-called *dissimilarity space*. Dissimilarity measures typically correspond to metric distances; however, relaxed versions of the metrics are also common in practical applications, e.g. the weighted edit distance and the modified Hausdorff distance

Pekalska & Duin (2005) advocate the use of dissimilarity representations instead of classical feature-based ones by presenting several conceptual and practical motivations. Here it is worthwhile to mention the following practical ones: dissimilarities can be derived from raw data such as images, spectra or time samples; dissimilarity-based classifiers outperform the

The last block in Fig. 5 consists in applying classification algorithms to infer a class label *ω*ˆ(*x*) ∈ Ω, where Ω = {*ω*1,..., *ωK*} is the set of labels for the *K* different types of volcanic earthquakes to be identified. According to the nature of the classification algorithms, three different approaches can be distinguished (Jain et al., 2000): similarity-based classification, density-based classifiers and geometric classifiers. These approaches are succinctly described below, including the relatively recent strategy of combining multiple classifiers. A thorough presentation of the classification algorithms can be found in several good textbooks on the subject of PR, such as the ones by Duda et al. (2001), Webb (2002), van der Heijden et al.

This classification approach is based on the elementary rationale of resemblance, i.e. similar events —volcanic earthquakes in our problem— should be identified as belonging to the same class. Among the classifiers in this category, the following two are widely used: the nearest mean classifier (NMC), and the *k*-nearest neighbor (*k*-NN) rule. Decision in the first one is taken by examaning the class label of the closest vector among the mean vectors per class; in the second one, the closest event in the vector space defines the assigned class label *ω*ˆ(*x*) for

(2004), Theodoridis & Koutroumbas (2006) and Bishop (2006).

**1.3.2 Representation approaches**

and durations of the waves shown in Fig. 3.

which are asymmetric.

nearest-neighbor rule.

**1.3.3 Classification approaches**

Similarity-based classifiers

a new incoming event to be identified.

These classifiers are based on the well-known Bayesian decision theory, i.e. on the application of the Bayes decision rule, which consists in the maximization of the posterior probability *P*(*ω*ˆ *<sup>k</sup>*|*x*) across Ω. *P*(*ω*ˆ *<sup>k</sup>*|*x*) corresponds, in turn, to the conditional probability density *p*(*x*|*ωk*) weighted by the prior probability *P*(*ωk*). Costs of missclassifications are often included in the rule as an additional weighting parameter.

The key issue in this approach is the estimation of the conditional probability densities, i.e. *p*ˆ(*x*|*ωk*). A distinction between parametric and nonparametric estimates can be made (Jain et al., 2000), where the parametric case corresponds to the assumption of a model for the probability density (e.g. a Gaussian distribution) and the nonparametric one consists in either estimating the probability densities by the standard histogramming technique or by defining window functions in the vector space. Such windows are used to define the contribution of the samples contained in them to the estimation of the probability density. A further division in the window-based nonparametric case is the one between the Parzen window approach and the *k*-nearest-neighbor method, whether the estimation process is space-invariant or not, respectively.

Consider again the parametric case and the assumption of Gaussian distributions. Parameters to be estimated are the mean vectors and the covariance matrices. According to the assumptions made about the latter, two well-known decision rules result: (1) the Bayes-normal-linear classifier (LDC), when covariance matrices are assumed to be equal; (2) the Bayes-normal-quadratic classifier (QDC), when the covariance matrices are assumed to be different.

Seismic volcanic signals are composed by sequential data, analogously to the case of speech records and time series. A widely used tool for modeling and classifying such sequences is the hidden Markov model (HMM) method. A HMM is composed by a set of states, a matrix of probabilities of transitions between the states, a vector of initial probabilities and an emission model. The HMM-based classification typically consists in training one HMM for each class and, afterwards, using a density-based classifier. Additional details of this method are not given here but can be found in (Rabiner, 1989) as well as in the reviewed studies referenced in Sec. 2.3.

#### Geometric classifiers

In these classifiers, decision boundaries are built by optimizing a performance criterion instead of considering proximities or densities as in the two previous approaches. Examples of geometric classifiers are the Fisher's linear discriminant, decision trees, single- and multi-layer perceptrons (and, in general, artificial neural networks) and the support vector classifier. Here we only describe the last two classifiers in more detail since they are the most used in volcano seismology applications, as it will be discussed in Sec. 2.

Artificial neural networks (ANNs) are able to implement linear as well as nonlinear classifiers, depending on their architecture (number of layers and number of neurons) and training method. In spite of their tricky tuning procedures, they are still extensively used due to their flexibility and potential good performance. Nonetheless, the emergence of the support vector method has progressively displaced ANNs from their consideration as general solutions for classification and regression; indeed, over the last 15 years, the support vector method have gained a solid theoretical development and an overwhelming number of applications. In few words, the basic principle of the support vector classifier (SVM) is to maximize the margin between two classes, which is defined by the so-called *support vectors*: the closest training examples to the decision boundary. The SVM is extended to nonlinear and multiclass problems by using strategies called the kernel trick and the one-against-rest approach. Further details can be found in some of the PR textbooks cited above as well as in the original work by Vapnik (1998).

Combination of multiple classifiers

The strategy of combining multiple classifiers aims to exploit (1) the availability of multiple sources of data from different sensors or representations, and (2) the possibilities of training several classifiers for the same training set and performing different tuning sessions for the same classifier. Data mentioned in item 1 may belong to either the same events or to different ones. Most seismic volcanic data sets are multiple in nature since they are acquired at multiple recording stations and across several months or years; thereby, multiple sources —stations for the same events are often available and different sets of examples can be arranged by date of acquisition.

Several strategies for combining classifiers have been proposed. They are typically categorized according to their architecture into parallel, serial and hierarchical; or according to the combination rule into static and trainable (Kuncheva, 2004). PR systems that include these strategies are called *multiple classifier systems*. There has been a sustained interest in this field during the last decade as evidenced by the series of workshops started by Kittler & Roli (2000) and recently organized by Gayar et al. (2010).
