**4. Applications to seismo-volcanic data**

Eruptions are usually preceded by some kind of change in seismicity, making seismic data one of the key dataset in any attempt to forecast volcanic activity [4]. As we mentioned before, manual detection and classification of discrete events can be very time consuming, up to becoming unfeasible during a volcanic crisis.

**113**

generalization accuracy.

*Machine Learning in Volcanology: A Review DOI: http://dx.doi.org/10.5772/intechopen.94217*

An automatic classification procedure becomes therefore highly valuable, also as a first step towards forecasting techniques such as material Failure Forecast Method (FFM) [87, 88]. Feature vectors should be built in order to provide most information about the source, minimizing e.g., path and site effects. In many cases features can be independent from a specific physical model describing a phenomenon. This allows ML to work well even when there is no scientific agreement on the generation of a given seismic signal. A good example in volcano seismology is given by the LP events. Standardizing data, making them independent from unwanted variables is also in general a convenient approach [31]. Time-domain and spectral-based amplitudes, spectral phases, auto- and cross-correlations, statistical and dynamical parameters have been considered as the output of data reduction procedures that can be included into feature vectors [14]. In the literature, these have included linear predictor coding for spectrograms [66], wavelet transforms [89], spectral autocorrelation functions [90], statistical and cepstral coefficients [91]. Extracted feature

CA is probably the most used class of unsupervised techniques and the applications to volcano seismology follow this general rule. Spectral clustering was applied e.g., to seismic data of Piton de la Fournaise [60]. The fact that e.g., LP seismic signals can be clustered into families indicates that the family members are very similar to each other. The existence of similar events implies similar location and similar source process, i.e., it means the presence of a source that repeats over time in an almost identical way. Clustering data after some kind of normalization forces CA algorithms to look for similar shapes, independently of size. If significant variations in amplitude are then seen within families, this can indicate that the source processes of these events are not only repeatable but also scalable in size, as observed e.g., at Soufrière Hills Volcano, Montserrat [92] or at Irazú, Costa Rica [93]. The similarity of events in the different classes can then be used to detect other events, e.g., for the purpose of stacking them and obtain more accurate phase arrivals; this was done e.g., at Kanlaon, Philippines [94]. For this purpose, an efficient open-source package

is available, called Repeating Earthquake Detector in Python (REDPy) [95].

In volcano-seismology SOM were applied e.g., to Raoul Island, New Zealand [61]. A hierarchical clustering was applied to results of SOM tremor analysis at Ruapehu [62] and Tongariro [96] in New Zealand, using the Scilab environment. A similar combined approach was applied in Matlab to Etna volcanic tremor [97]. Several geometries of SOM were used, with rectangular or hexagonal nearest neighbors cells, planar, toroidal or spherical maps, etc. [61]. The classic ANN/MLP approach was applied e.g., to seismic data recorded at Vesuvius [66], Stromboli [98], Etna [99], while DNN architectures were applied e.g., to Volcán de Fuego, Colima [100]. The use of genetic algorithms for the optimization of the MLP configuration was proposed for the analysis of seismic data of Villarrica, Chile [101]. CNN were applied e.g., to Llaima Volcano (Chile) seismic data, comparing the results to other methods of classification [102]. RNNs were applied, together with other methods, to classify signals of Deception Island Volcano, Antarctica [68]. The architectures were trained with data recorded in 1995–2002 and models were tested on data recorded in 2016–2017, showing good

Supervised LR models have been applied in the estimation of landslide susceptibility [103] and to volcano seismic data to estimate the ending date of an eruption at Telica (Nicaragua) and Nevado del Ruiz (Colombia) [104]. SVM were applied many times to volcano seismology e.g., to classify volcanic signals recorded at Llaima, Chile [105] and Ubinas, Peru [106]. Multinomial Logistic Regression was used, together with other methods, to evaluate the feasibility of earthquake prediction

using 30 years of historical data in Indonesia, also at volcanoes [107].

vectors become then the input to one or another ML method.

### *Machine Learning in Volcanology: A Review DOI: http://dx.doi.org/10.5772/intechopen.94217*

*Updates in Volcanology – Transdisciplinary Nature of Volcano Science*

Sparse Multinomial Logistic Regression (SMLR) is a class of supervised methods

for learning sparse classifiers that incorporate weighted sums of basis functions with sparsity-promoting priors encouraging the weight estimates to be either significantly large or exactly zero [81]. The sparsity concept is similar to the one at the base of Non-negative Matrix Factorization (NMF) [82]. The sparsity-promoting priors result in an automatic feature selection, enabling to somehow avoid the so-called "curse of dimensionality". So, sparsity in the kernel basis functions and automatic feature selection can be achieved at the same time [83]. SMLR methods control the capacity of the learned classifier by minimizing the number of basis functions used, resulting in better generalization. There are fast algorithms for SMLR that scale favorably in both the number of training samples and the feature dimensionality, making them applicable even to large data sets in high-dimensional

A Decision Tree (DT) is an acyclic graph. At each branching node, a specific feature *xi* is examined. The left or right branch is followed depending on the value of *xi* in relation to a given threshold. A class is assigned to each datum when a leaf node is reached. As usual, a DT can be learned from labeled data, using different strategies. In the DT class we can mention Best First Decision Tree (BFT), Functional Tree (FT), J48 Decision Tree (J48DT), Naïve Bayes Tree (NBT) and Reduced Error Pruning Trees (REPT). Ensemble learning techniques such as Random SubSpace (RSS) can be used to combine the results of the different

The Boosting concept, a kind of ensemble meta-algorithm mostly (but not only) associated to supervised learning, uses original training data to create iteratively multiple models by using a weak learner. Each model would be different from the previous one as the weak learners try to "fix" the errors made by previous models. An ensemble model will then combine the results of the different weak models. On the other side, Bootstrap aggregating, also called by the contracted name Bagging, consists of creating many "almost-copies" of the training data (each copy is slightly different from the others) and then apply a weak learner to each copy and finally combine the results. A popular and effective algorithm based on bagging is Random Forest (RF). Random Forest (**Figure 3d**) is different from the standard bagging in just one way. At each learning step, a random subset of the features is chosen; this helps to minimize correlation of the trees, as correlated predictors are not efficient in improving classification accuracy. Particular attention has to be taken in order to

best choose the number of trees and the size of the random feature subsets.

A Hidden Markov Model (HMM) (**Figure 2e**) is a statistical model in which the system being modeled is assumed to be a Markov process. It describes a sequence of possible events for which the probability of each event depends only on the state occupied in the previous event. The states are unobservable ("hidden") but at each state the Model emits a "message" which depends probabilistically on the current state. Applications are wide in scope, from reinforcement learning to temporal pattern recognition, and the approach works well when time is important; speech [85], handwriting and gesture recognition are then typical fields of applications, but also

Eruptions are usually preceded by some kind of change in seismicity, making seismic data one of the key dataset in any attempt to forecast volcanic activity [4]. As we mentioned before, manual detection and classification of discrete events can be very time consuming, up to becoming unfeasible during a volcanic crisis.

**112**

volcano seismology [69, 86].

**4. Applications to seismo-volcanic data**

feature spaces.

trees [84].

An automatic classification procedure becomes therefore highly valuable, also as a first step towards forecasting techniques such as material Failure Forecast Method (FFM) [87, 88]. Feature vectors should be built in order to provide most information about the source, minimizing e.g., path and site effects. In many cases features can be independent from a specific physical model describing a phenomenon. This allows ML to work well even when there is no scientific agreement on the generation of a given seismic signal. A good example in volcano seismology is given by the LP events. Standardizing data, making them independent from unwanted variables is also in general a convenient approach [31]. Time-domain and spectral-based amplitudes, spectral phases, auto- and cross-correlations, statistical and dynamical parameters have been considered as the output of data reduction procedures that can be included into feature vectors [14]. In the literature, these have included linear predictor coding for spectrograms [66], wavelet transforms [89], spectral autocorrelation functions [90], statistical and cepstral coefficients [91]. Extracted feature vectors become then the input to one or another ML method.

CA is probably the most used class of unsupervised techniques and the applications to volcano seismology follow this general rule. Spectral clustering was applied e.g., to seismic data of Piton de la Fournaise [60]. The fact that e.g., LP seismic signals can be clustered into families indicates that the family members are very similar to each other. The existence of similar events implies similar location and similar source process, i.e., it means the presence of a source that repeats over time in an almost identical way. Clustering data after some kind of normalization forces CA algorithms to look for similar shapes, independently of size. If significant variations in amplitude are then seen within families, this can indicate that the source processes of these events are not only repeatable but also scalable in size, as observed e.g., at Soufrière Hills Volcano, Montserrat [92] or at Irazú, Costa Rica [93]. The similarity of events in the different classes can then be used to detect other events, e.g., for the purpose of stacking them and obtain more accurate phase arrivals; this was done e.g., at Kanlaon, Philippines [94]. For this purpose, an efficient open-source package is available, called Repeating Earthquake Detector in Python (REDPy) [95].

In volcano-seismology SOM were applied e.g., to Raoul Island, New Zealand [61]. A hierarchical clustering was applied to results of SOM tremor analysis at Ruapehu [62] and Tongariro [96] in New Zealand, using the Scilab environment. A similar combined approach was applied in Matlab to Etna volcanic tremor [97]. Several geometries of SOM were used, with rectangular or hexagonal nearest neighbors cells, planar, toroidal or spherical maps, etc. [61]. The classic ANN/MLP approach was applied e.g., to seismic data recorded at Vesuvius [66], Stromboli [98], Etna [99], while DNN architectures were applied e.g., to Volcán de Fuego, Colima [100]. The use of genetic algorithms for the optimization of the MLP configuration was proposed for the analysis of seismic data of Villarrica, Chile [101]. CNN were applied e.g., to Llaima Volcano (Chile) seismic data, comparing the results to other methods of classification [102]. RNNs were applied, together with other methods, to classify signals of Deception Island Volcano, Antarctica [68]. The architectures were trained with data recorded in 1995–2002 and models were tested on data recorded in 2016–2017, showing good generalization accuracy.

Supervised LR models have been applied in the estimation of landslide susceptibility [103] and to volcano seismic data to estimate the ending date of an eruption at Telica (Nicaragua) and Nevado del Ruiz (Colombia) [104]. SVM were applied many times to volcano seismology e.g., to classify volcanic signals recorded at Llaima, Chile [105] and Ubinas, Peru [106]. Multinomial Logistic Regression was used, together with other methods, to evaluate the feasibility of earthquake prediction using 30 years of historical data in Indonesia, also at volcanoes [107].

RF was applied to the discrimination of rockfalls and VT recorded at Piton de la Fournaise in 2009–2011 and 2014–2015. 60 features were used, and excellent results were obtained. However, a RF trained with 2009–2011 data did not perform well on data recorded in 2014–2015, demonstrating how difficult it is to generalize models even at the same volcano [108]. RF, together with other methods, was recently used on volcano seismic data with the specific purpose to determine when an eruption has ended [104], a problem which is far from being trivial. RF was also used to derive ensemble mean decision tree predictions of sudden steam-driven eruptions at Whakaari (New Zealand) [109].

Most of the methods described so far try to classify discrete seismic events that were already extracted from the continuous stream, i.e., already characterized by a given start and end. There are therefore in general two separated phases: detection and classification [106]. Continuous HMM on the other side are able to process continuous data and can therefore extract and classify in a single, potential realtime, step. HMM are finite-state machines and model sequential patterns where time direction is an essential information. This is typical of (volcano) seismic data. For instance, P waves always arrive before S waves. HMM-based volcanic seismic data classifiers have therefore been used by many authors [87, 110–113]. HMM are also used routinely in some volcano observatories e.g., at Colima and Popocatepetl in Mexico [71]. Etna seismic data was processed by HMM applied to characters generated by the Symbolic Aggregate approXimation (SAX) which maps seismic data into symbols of a given alphabet [114]. HMM can be also combined with standardization procedures such as Empirical Mode Decomposition (EMD) when classifying volcano seismic data [31].

Another characteristic common to many of the applications published in the literature is the fact that feature vectors are extracted from data recorded at a single station. There are relatively few attempts to build multi-station classification schemes. At Piton de la Fournaise a system based on RF was implemented [115]. At the same volcano, a multi-station approach was used to classify tremor measurements and identify fundamental frequencies of the tremor associated to different eruptive behavior [60]. A scalable multi-station, multi-channel classifier, using also the empirical mode decomposition (EMD) first proposed by [31] was applied to Ubinas volcano (Peru). The principal component analysis is used to reduce the dimensionality of the feature vector and a supervised classification is carried out using various methods, with SVM obtaining the best performance [116]. Of course, with a multi-station approach particular care has to be taken in order to build a system which is robust with respect to the loss of one or more seismic stations due to volcanic activity or technical failures.

Open source software and open access papers are luckily becoming more and more common. If we consider the processing and classification of volcano seismic data, several tools are now available for free download and use, especially within the Python environment. Among the most popular, we can cite ObsPy [117] and Msnoise [118], with which researchers and observatories can easily process big quantities of continuous seismic data. Once these tools have produced suitable feature vectors, we can look for open source software to implement the different ML approaches described in this contribution. Many generic ML libraries are available e.g., on GitHub [59] but very few are dedicated specifically to the classification of volcano seismic data. Among these, we can cite the recent package Python Interface for the Classification of Seismic Signals (PICOSS) [119]. It is a graphical, modular open source software for detection, segmentation and classification of seismic data. Modules are independent and adaptable. The classification is currently based on two modules that use Frequency Index analysis [120] or a multi-volcano pre-trained neural network, in a transfer learning fashion [52]. The concept of a multi-volcano

**115**

*Machine Learning in Volcanology: A Review DOI: http://dx.doi.org/10.5772/intechopen.94217*

models [125].

recognizer is also at the core of the EU-funded VULCAN.ears project [31, 121]. The aim is to build an automatic Volcano Seismic Recognition (VSR) system, conceptually supervised (as it is based on HMM) but practically unsupervised, because once it is trained on a number of volcanoes with labeled sample data, it can be used on volcanoes without any previous data in an unsupervised fashion. The idea is in fact to build robust models trained on many datasets recorded by different teams on different volcanoes, and to integrate these models on the routinely used monitoring system of any volcano observatory. Also in this case, the open source software is made freely available; this includes a command interface called PyVERSO [122] based on HTK, a speech recognition HMM toolkit [123], a graphical interface called geoStudio and a script called liveVSR, able to process real-time data downloaded from any online seismic data server [124], together with some pre-trained ML

As we mentioned before, in order to train supervised models for classifying seismic events, few events with reliable labels are better than many unreliably labeled examples. Just to give a rough idea, 20 labeled events per class is a good starting point, but a minimum of 50 labeled events per class is recommended. Labelling discrete events is enough for many methods, but for approaches like HMM, where the concept is to run the classification on continuous data, it is essential to have a sufficient number of continuously labeled time periods, in order to "show" the classifier enough examples of transition from tremor to a discrete event, and then back to tremor. It is important to have many examples also of "garbage" events, i.e., events we are not interested in, so that the classifier can recognize and discard them. Finally, it is advisable to have a wide variability of events within each given class rather than having many very similar events. There is not yet an agreement on a single file format to store these labels. As speech recognition is much older and more developed than seismic recognition, it is suggested to adopt standard labelling formats of that domain, i.e., the transcription MLF files, which are normal text files that include for each event the start time, the end time and of course the label. These files can be created manually with a simple text editor, or by using a program with a GUI, such as geoStudio [124] or Seismo\_volcanalysis [126]. Other graphical software packages like SWARM [127] use other formats to store the labels, such as CSV, but it is always possible to build scripts that convert the resulting label files into MLF format, which remains the recommended one.

**5. Applications of machine learning to geochemical data**

combining both unsupervised and supervised approaches.

ML applications to geochemical data of volcanoes are increasing in the last years, although most of them are limited to the use of cluster analysis. CA has been used for example to identify and quantify mixing processes using the chemistry of minerals [128], also for the study of volcanic aquifers [129, 130] or to differentiate magmatic systems e.g., [131]. Platforms used to carry out these analyses include the Statistical Toolbox in Matlab [132], or the R platform [54]; some geochemical software made in this last platform include the CA as the GCDkit [33]. In most ML analyses on geochemical samples it is common to use whole rock major elements and selected trace elements; some applications also include isotopic ratios. Many ML applications to geochemical data use more than one technique, frequently

A combination of SVM, RF and SMLR approaches were used by [37] to account for variations of geochemical composition of rocks from eight different tectonic settings. The authors note that SVM used to discriminate tectonic settings as used by [34] is a powerful tool. The RF approach is shown to have the advantage, with

### *Machine Learning in Volcanology: A Review DOI: http://dx.doi.org/10.5772/intechopen.94217*

*Updates in Volcanology – Transdisciplinary Nature of Volcano Science*

at Whakaari (New Zealand) [109].

classifying volcano seismic data [31].

volcanic activity or technical failures.

RF was applied to the discrimination of rockfalls and VT recorded at Piton de la Fournaise in 2009–2011 and 2014–2015. 60 features were used, and excellent results were obtained. However, a RF trained with 2009–2011 data did not perform well on data recorded in 2014–2015, demonstrating how difficult it is to generalize models even at the same volcano [108]. RF, together with other methods, was recently used on volcano seismic data with the specific purpose to determine when an eruption has ended [104], a problem which is far from being trivial. RF was also used to derive ensemble mean decision tree predictions of sudden steam-driven eruptions

Most of the methods described so far try to classify discrete seismic events that were already extracted from the continuous stream, i.e., already characterized by a given start and end. There are therefore in general two separated phases: detection and classification [106]. Continuous HMM on the other side are able to process continuous data and can therefore extract and classify in a single, potential realtime, step. HMM are finite-state machines and model sequential patterns where time direction is an essential information. This is typical of (volcano) seismic data. For instance, P waves always arrive before S waves. HMM-based volcanic seismic data classifiers have therefore been used by many authors [87, 110–113]. HMM are also used routinely in some volcano observatories e.g., at Colima and Popocatepetl in Mexico [71]. Etna seismic data was processed by HMM applied to characters generated by the Symbolic Aggregate approXimation (SAX) which maps seismic data into symbols of a given alphabet [114]. HMM can be also combined with standardization procedures such as Empirical Mode Decomposition (EMD) when

Another characteristic common to many of the applications published in the literature is the fact that feature vectors are extracted from data recorded at a single station. There are relatively few attempts to build multi-station classification schemes. At Piton de la Fournaise a system based on RF was implemented [115]. At the same volcano, a multi-station approach was used to classify tremor measurements and identify fundamental frequencies of the tremor associated to different eruptive behavior [60]. A scalable multi-station, multi-channel classifier, using also the empirical mode decomposition (EMD) first proposed by [31] was applied to Ubinas volcano (Peru). The principal component analysis is used to reduce the dimensionality of the feature vector and a supervised classification is carried out using various methods, with SVM obtaining the best performance [116]. Of course, with a multi-station approach particular care has to be taken in order to build a system which is robust with respect to the loss of one or more seismic stations due to

Open source software and open access papers are luckily becoming more and more common. If we consider the processing and classification of volcano seismic data, several tools are now available for free download and use, especially within the Python environment. Among the most popular, we can cite ObsPy [117] and Msnoise [118], with which researchers and observatories can easily process big quantities of continuous seismic data. Once these tools have produced suitable feature vectors, we can look for open source software to implement the different ML approaches described in this contribution. Many generic ML libraries are available e.g., on GitHub [59] but very few are dedicated specifically to the classification of volcano seismic data. Among these, we can cite the recent package Python Interface for the Classification of Seismic Signals (PICOSS) [119]. It is a graphical, modular open source software for detection, segmentation and classification of seismic data. Modules are independent and adaptable. The classification is currently based on two modules that use Frequency Index analysis [120] or a multi-volcano pre-trained neural network, in a transfer learning fashion [52]. The concept of a multi-volcano

**114**

recognizer is also at the core of the EU-funded VULCAN.ears project [31, 121]. The aim is to build an automatic Volcano Seismic Recognition (VSR) system, conceptually supervised (as it is based on HMM) but practically unsupervised, because once it is trained on a number of volcanoes with labeled sample data, it can be used on volcanoes without any previous data in an unsupervised fashion. The idea is in fact to build robust models trained on many datasets recorded by different teams on different volcanoes, and to integrate these models on the routinely used monitoring system of any volcano observatory. Also in this case, the open source software is made freely available; this includes a command interface called PyVERSO [122] based on HTK, a speech recognition HMM toolkit [123], a graphical interface called geoStudio and a script called liveVSR, able to process real-time data downloaded from any online seismic data server [124], together with some pre-trained ML models [125].

As we mentioned before, in order to train supervised models for classifying seismic events, few events with reliable labels are better than many unreliably labeled examples. Just to give a rough idea, 20 labeled events per class is a good starting point, but a minimum of 50 labeled events per class is recommended. Labelling discrete events is enough for many methods, but for approaches like HMM, where the concept is to run the classification on continuous data, it is essential to have a sufficient number of continuously labeled time periods, in order to "show" the classifier enough examples of transition from tremor to a discrete event, and then back to tremor. It is important to have many examples also of "garbage" events, i.e., events we are not interested in, so that the classifier can recognize and discard them. Finally, it is advisable to have a wide variability of events within each given class rather than having many very similar events. There is not yet an agreement on a single file format to store these labels. As speech recognition is much older and more developed than seismic recognition, it is suggested to adopt standard labelling formats of that domain, i.e., the transcription MLF files, which are normal text files that include for each event the start time, the end time and of course the label. These files can be created manually with a simple text editor, or by using a program with a GUI, such as geoStudio [124] or Seismo\_volcanalysis [126]. Other graphical software packages like SWARM [127] use other formats to store the labels, such as CSV, but it is always possible to build scripts that convert the resulting label files into MLF format, which remains the recommended one.
