**5. Applications of machine learning to geochemical data**

ML applications to geochemical data of volcanoes are increasing in the last years, although most of them are limited to the use of cluster analysis. CA has been used for example to identify and quantify mixing processes using the chemistry of minerals [128], also for the study of volcanic aquifers [129, 130] or to differentiate magmatic systems e.g., [131]. Platforms used to carry out these analyses include the Statistical Toolbox in Matlab [132], or the R platform [54]; some geochemical software made in this last platform include the CA as the GCDkit [33]. In most ML analyses on geochemical samples it is common to use whole rock major elements and selected trace elements; some applications also include isotopic ratios. Many ML applications to geochemical data use more than one technique, frequently combining both unsupervised and supervised approaches.

A combination of SVM, RF and SMLR approaches were used by [37] to account for variations of geochemical composition of rocks from eight different tectonic settings. The authors note that SVM used to discriminate tectonic settings as used by [34] is a powerful tool. The RF approach is shown to have the advantage, with

respect to SVM, of providing the importance of each feature during discrimination. The weakness of applying the RF for tectonic setting discrimination is that the evaluation based only on a majority vote of multiple decision trees often makes the obtained quantitative geochemical interpretation of these elements and isotopic ratios difficult. The authors suggest that the best quantitative discriminant is that of SMLR, as it allows to assign to each sample a probability of belonging to a given group (tectonic setting in this case), with still the possibility of identifying the importance of each feature. This tool is a notable step forward in the discrimination of the geochemical signature of the different tectonic settings, which is commonly assessed based on binary or ternary diagrams e.g., [133, 134] which are useful with many samples but are not able to differentiate a tectonic setting where a complex evolution of magmas has occurred. In the last decade multielement variation diagrams were proposed e.g., [135] and also the use of Decision Trees e.g., [136] or LDA e.g., [137] to accurately assign a tectonic setting based on rock geochemistry. Based on rock sample geochemistry, [37] show that a set of 17 elements and isotopic ratios is needed to clearly identify the tectonic setting. Two new discriminant functions were recently proposed to discriminate the tectonic settings of mid-ocean ridge (MOR) and oceanic plateau (OP). 10 datasets (original concentrations as well as isometric log-ratio transformed variables; all 10 major elements as well as all 10 major and 6 trace elements) were used to evaluate the quality of discrimination from LDA and canonical analysis [138].

The software package Compositional Data Package (CoDaPack) [139] and a combination of unsupervised (CA) and supervised (LDA) learning approaches was used by [36] to identify compositional variation of ignimbrite magmas in the Central Andes, trying to use these methods as a tool for ignimbrite correlation. They have used the Statistica software [140] for both CA and LDA.

Correlating tephra and identifying their volcanic sources is a very difficult task, especially in areas where several volcanoes had explosive eruptions in a relatively short period of time. This is particularly challenging when volcanoes have similar geochemical and petrographic compositions. Electron microprobe analysis of glass compositions and whole-rock geochemical analyses are used frequently to make these correlations. However, correlations may not be so accurate when using only geochemical tools that may mask diagnostic variability; sometimes one of the most important advantages of ML in this regard is the speed at which correlations can be made, rather than the accuracy [35]. Other contributions however demonstrate how ML techniques can make these correlations also accurate. Some highly accurate results of ML techniques applied to tephra correlation include those of LDA [141, 142] and SVM e.g., [143]; however, SVM may fail in specific cases and for the case study of tephra from Alaska volcanoes, the combination of ANN and RF are the best ML techniques to apply [35]. The authors use the R software [54] to apply these methods, and they underline the advantage of producing probabilistic outputs.

SOM was used as an unsupervised neural network approach to analyze geochemical data of Ischia, Vesuvius and Campi Flegrei [144]. The advantage of this method is that there is no need of previous knowledge of geochemical or petrological characteristics and that it allows the use of large databases with large number of variables. The SOM toolbox for Matlab [132] was used by [144] to perform two tests, the first based on major elements and selected trace elements to find similar evolution processes, the second to investigate the magmatic source, so a vector containing a selection of ratios between major and trace elements was adopted. One of the enhancements of this method is that the resulting clusters permitted to differentiate rock samples that were only comparably distinguished by 2D diagrams of isotopic ratios; in other words, similar results were obtained with the limited availability of less expensive geochemical data.

**117**

detection [146].

*Machine Learning in Volcanology: A Review DOI: http://dx.doi.org/10.5772/intechopen.94217*

deeply buried Pacific slab [145].

elements.

One of the applications of ML techniques that maybe extremely useful in geochemistry is the apparent possibility of predicting the concentration of unknown elements if a large number of data of other elements is known. A combination of ML techniques was used by [38] to predict Rare Earth Elements (REE) concentrations on Ocean Island Basalts (OIB) using RF. They used 1283 analyses of which 80% were used for training and the remaining 20% to validate the results. They found good estimations only in the Light Rare Earth Elements (LREE), suggesting that the results may be improved by using a larger set of input data for training. One possible solution may be the use of not only major elements for training but also of other trace elements obtained through the same analytical method of major

The origin of the volcanoes in Northeast China, analyzed by RF and DNN using the full chemical compositional data, was associated to the Pacific slab, subducting at Japan, reaching ~600-km depth under eastern China, and extending horizontally up to Mongolia. The boundary between volcanoes triggered by fluids and melts from the slab and those not related to it was located at the westernmost edge of the

As highlighted by [143] ML methods require the integration with other techniques such as fieldwork, petrographic observations and classic geochemical studies to obtain a clearer picture of the investigated problem. While in other fields, it is relatively easy (and cheap) to acquire big amounts of data (hundreds or more), this is not the case for geochemistry. However, we underline that the application of ML techniques to the geochemistry of volcanic rocks does need a minimum dataset size. In the literature a set of 250 analyses is described as sufficiently large amount of data but, as usual, one can try using the available data (often even less than 50) but

thousands of examples would definitely improve the results.

application span now also other sub-disciplines.

tion, as shown e.g., at Colima volcano [147].

investigations based on ML [148].

**6. Applications of machine learning to other volcanological data**

ML appears more and more often in volcanology literature, and specific fields of

Mount Erebus in Antarctica has a persistent lava lake showing Strombolian activity, but its location is definitely remote. Therefore, automatic methods to detect these explosions are highly needed. A CNN was trained using infrared images captured from the crater rim and "labeled" with the help of accompanying seismic data, which was not used anymore during the subsequent automatic

Clast morphology is a fundamental tool also for studies concerning volcanic textures. Texture analysis of clasts provides in particular information about genesis, transport and depositional processes. Here, ML has still to be developed fully but e.g., the application of preprocessing techniques such as the Radon transform can be a first step towards an efficient definition of feature vectors to be used for classifica-

The Museum of Mineralogy, Petrography and Volcanology of the University of Catania implemented a communication system based on the visitor's personal experience to learn by playing. There is a web application called I-PETER: Interactive Platform to Experience Tours and Education on the Rocks. This platform includes a labeled dataset of images of rocks and minerals to be used also for petrological

Satellite remote sensing technology is increasingly used for monitoring the surface of the Earth in general, and volcanoes in particular, especially in areas where ground monitoring is scarce or completely missing. For instance, in Latin America

### *Machine Learning in Volcanology: A Review DOI: http://dx.doi.org/10.5772/intechopen.94217*

*Updates in Volcanology – Transdisciplinary Nature of Volcano Science*

from LDA and canonical analysis [138].

respect to SVM, of providing the importance of each feature during discrimination. The weakness of applying the RF for tectonic setting discrimination is that the evaluation based only on a majority vote of multiple decision trees often makes the obtained quantitative geochemical interpretation of these elements and isotopic ratios difficult. The authors suggest that the best quantitative discriminant is that of SMLR, as it allows to assign to each sample a probability of belonging to a given group (tectonic setting in this case), with still the possibility of identifying the importance of each feature. This tool is a notable step forward in the discrimination of the geochemical signature of the different tectonic settings, which is commonly assessed based on binary or ternary diagrams e.g., [133, 134] which are useful with many samples but are not able to differentiate a tectonic setting where a complex evolution of magmas has occurred. In the last decade multielement variation diagrams were proposed e.g., [135] and also the use of Decision Trees e.g., [136] or LDA e.g., [137] to accurately assign a tectonic setting based on rock geochemistry. Based on rock sample geochemistry, [37] show that a set of 17 elements and isotopic ratios is needed to clearly identify the tectonic setting. Two new discriminant functions were recently proposed to discriminate the tectonic settings of mid-ocean ridge (MOR) and oceanic plateau (OP). 10 datasets (original concentrations as well as isometric log-ratio transformed variables; all 10 major elements as well as all 10 major and 6 trace elements) were used to evaluate the quality of discrimination

The software package Compositional Data Package (CoDaPack) [139] and a combination of unsupervised (CA) and supervised (LDA) learning approaches was used by [36] to identify compositional variation of ignimbrite magmas in the Central Andes, trying to use these methods as a tool for ignimbrite correlation.

Correlating tephra and identifying their volcanic sources is a very difficult task, especially in areas where several volcanoes had explosive eruptions in a relatively short period of time. This is particularly challenging when volcanoes have similar geochemical and petrographic compositions. Electron microprobe analysis of glass compositions and whole-rock geochemical analyses are used frequently to make these correlations. However, correlations may not be so accurate when using only geochemical tools that may mask diagnostic variability; sometimes one of the most important advantages of ML in this regard is the speed at which correlations can be made, rather than the accuracy [35]. Other contributions however demonstrate how ML techniques can make these correlations also accurate. Some highly accurate results of ML techniques applied to tephra correlation include those of LDA [141, 142] and SVM e.g., [143]; however, SVM may fail in specific cases and for the case study of tephra from Alaska volcanoes, the combination of ANN and RF are the best ML techniques to apply [35]. The authors use the R software [54] to apply these methods,

They have used the Statistica software [140] for both CA and LDA.

and they underline the advantage of producing probabilistic outputs.

availability of less expensive geochemical data.

SOM was used as an unsupervised neural network approach to analyze geochemical data of Ischia, Vesuvius and Campi Flegrei [144]. The advantage of this method is that there is no need of previous knowledge of geochemical or petrological characteristics and that it allows the use of large databases with large number of variables. The SOM toolbox for Matlab [132] was used by [144] to perform two tests, the first based on major elements and selected trace elements to find similar evolution processes, the second to investigate the magmatic source, so a vector containing a selection of ratios between major and trace elements was adopted. One of the enhancements of this method is that the resulting clusters permitted to differentiate rock samples that were only comparably distinguished by 2D diagrams of isotopic ratios; in other words, similar results were obtained with the limited

**116**

One of the applications of ML techniques that maybe extremely useful in geochemistry is the apparent possibility of predicting the concentration of unknown elements if a large number of data of other elements is known. A combination of ML techniques was used by [38] to predict Rare Earth Elements (REE) concentrations on Ocean Island Basalts (OIB) using RF. They used 1283 analyses of which 80% were used for training and the remaining 20% to validate the results. They found good estimations only in the Light Rare Earth Elements (LREE), suggesting that the results may be improved by using a larger set of input data for training. One possible solution may be the use of not only major elements for training but also of other trace elements obtained through the same analytical method of major elements.

The origin of the volcanoes in Northeast China, analyzed by RF and DNN using the full chemical compositional data, was associated to the Pacific slab, subducting at Japan, reaching ~600-km depth under eastern China, and extending horizontally up to Mongolia. The boundary between volcanoes triggered by fluids and melts from the slab and those not related to it was located at the westernmost edge of the deeply buried Pacific slab [145].

As highlighted by [143] ML methods require the integration with other techniques such as fieldwork, petrographic observations and classic geochemical studies to obtain a clearer picture of the investigated problem. While in other fields, it is relatively easy (and cheap) to acquire big amounts of data (hundreds or more), this is not the case for geochemistry. However, we underline that the application of ML techniques to the geochemistry of volcanic rocks does need a minimum dataset size. In the literature a set of 250 analyses is described as sufficiently large amount of data but, as usual, one can try using the available data (often even less than 50) but thousands of examples would definitely improve the results.
