**3. Estimation module**

Once the GIF databases are obtained for each magnitude (>3, >4, >5 and >6), they are structured for estimation with the DL models. The database is separated in two groups, training and test (67 and 33% of the data, respectively). A lookback of 3 is used, meaning that the output in time t will be estimated considering a window of *t***−1,***t***−2,***t***−<sup>3</sup>** inputs. Also both models were trained with 100 epochs.

#### **3.1 Deep feedforward neural networks (DFANNs)**

Deep feedforward artificial neural network (DFANN), also called feedforward neural networks or multilayer perceptron, is the most popular and widely known artificial neural network. In this network, the information is propagated in a

**7**

**Figure 3.**

*Deep feedforward artificial neural network (DFANN).*

*Assessing Seismic Hazard in Chile Using Deep Neural Networks*

The goal of a DFANN is to approximate some function *<sup>f</sup>*

forward direction, from the input nodes through the hidden nodes (if any) and to the output nodes. As stated by [22, 23], DFANNs are universal approximators, and the universal approximation theorem states that "every bounded continuous function with bounded support can be approximated arbitrarily closely by a multilayer perceptron by selecting enough but a finite number of hidden neurons with appro-

and learn the value of the parameters θ that result in the best function approxima-

The DFANN model consists a set of elementary processing elements called neurons. These units are organized in an architecture with three types of layers: the input or sensory layer, the hidden, and the output layers. The neurons corresponding to one layer are linked to the neurons of the subsequent layer without any type of bridge, lateral, or feedback connections. The connections symbolize the flux of information between neurons. **Figure 3** illustrates the architecture of this artificial

DFANN operates as follows. The input signal is received by the neurons of the input layer; these neurons are just in charge of propagating the signal to the first hidden layer, and they do not make any processing. The first hidden layer processes the signal (applying a nonlinear transformation or transfer function) and transfers it to the subsequent layer; the second hidden layer propagates the signal to the third and so on. The number of hidden layers gives the depth of the model, hence the term "deep." When the signal is received and processed by the output layer, it

The knowledge of the DFANN is registered, by the learning algorithms, in the connections between the neurons of each layer = {1,2,…,r}, called weights. Several learning algorithms have been created to estimate the weights, where the most

∗

by mapping *y*̂ *= f*(*x*;)

*DOI: http://dx.doi.org/10.5772/intechopen.83403*

priate transfer function" [22, 24].

neural network with *r* hidden layers.

tion for *f*

∗ [25].

generates the response.

*Natural Hazards - Risk, Exposure, Response, and Resilience*

λ*g*(*t*|ℋ*t*) = lim

λ*g*(*t*|ℋ*t*) = *μ* + ∑

earthquake also influences how many aftershocks there will be.

characteristic parameters of the seismic activity of a given region.

of *t***−1,***t***−2,***t***−<sup>3</sup>** inputs. Also both models were trained with 100 epochs.

**3.1 Deep feedforward neural networks (DFANNs)**

*ti* < *ti*+1,the GIF can be written as

library available in R [21].

**3. Estimation module**

self-exciting temporal point process which models events whose rate at time t may depend on the history of events at times preceding t, allowing events to trigger new events (see [18, 19] and the references within). These models appeared for the first time in applications to population genetics, and for this they are also known as epidemic-type models. Ogata [5, 20] introduced the epidemic-type aftershock sequence (ETAS) models for modeling seismic events. These models are characterized by a parametric intensity function which represents the occurrence rate of an

ETAS models and its successive extensions have proven to be extremely useful in the description and modeling of earthquake occurrence times and locations. Self-exciting point process models [5, 19] were initially introduced in time and successively extended to the space [19]. The temporal self-exciting point processes can

*<sup>E</sup>*[*N*{(*t*,*<sup>t</sup>* <sup>+</sup> <sup>∆</sup>*t*)}|ℋ*t*] \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ ∆*<sup>t</sup>* (1)

*c*(*mi*)*g*(*t* − *ti*) (2)

earthquake at time *t* conditional on the past history of the occurrence.

be defined in terms of the conditional ground intensity function (GIF):

∆*t*↓0

where *N*(*A*) is the number of events occurring at time *t* ∈ *A* and {ℋ*t*:*t* ≥ 0} is the history of all events up to time t. By denoting *ti* ∈ [0,*T*), a simple point process with

*i*:*ti*<*t*

where the component μ can be considered the base rate that prevents the process to die out, *mi* is the magnitude at the time *ti,* and *g* is the triggering function which determines the form of the self-excitation [5]. This process with intensity function λ*g*(*t*|ℋ*t*) is also known as marked self-exciting point process, where the mark is given by the magnitude associated to each event. For example, the magnitude of an

Different parameterizations have been proposed for the functions *m* and *f*. Ogata [5]

The ground intensity function estimation can be estimated using the PtProcess

Once the GIF databases are obtained for each magnitude (>3, >4, >5 and >6), they are structured for estimation with the DL models. The database is separated in two groups, training and test (67 and 33% of the data, respectively). A lookback of 3 is used, meaning that the output in time t will be estimated considering a window

Deep feedforward artificial neural network (DFANN), also called feedforward neural networks or multilayer perceptron, is the most popular and widely known artificial neural network. In this network, the information is propagated in a

proposed the use of *c*(*m*) <sup>=</sup> *<sup>e</sup>* <sup>β</sup>(*m*−*<sup>M</sup> <sup>t</sup>*) and *f*(*t*) <sup>=</sup> *<sup>K</sup>* \_\_\_\_\_ (*<sup>t</sup>* <sup>+</sup> *<sup>c</sup>*) *<sup>p</sup>* , where the parameter <sup>β</sup> measures the effect of magnitude in the production of aftershocks and f is the modified Omori formula [12], with *t* representing the time of occurrence of the shock, *K* a normalizing constant depending on the lower bound of the aftershocks, and *c* and *p* are

**6**

forward direction, from the input nodes through the hidden nodes (if any) and to the output nodes. As stated by [22, 23], DFANNs are universal approximators, and the universal approximation theorem states that "every bounded continuous function with bounded support can be approximated arbitrarily closely by a multilayer perceptron by selecting enough but a finite number of hidden neurons with appropriate transfer function" [22, 24].

The goal of a DFANN is to approximate some function *<sup>f</sup>* ∗ by mapping *y*̂ *= f*(*x*;) and learn the value of the parameters θ that result in the best function approximation for *f* ∗ [25].

The DFANN model consists a set of elementary processing elements called neurons. These units are organized in an architecture with three types of layers: the input or sensory layer, the hidden, and the output layers. The neurons corresponding to one layer are linked to the neurons of the subsequent layer without any type of bridge, lateral, or feedback connections. The connections symbolize the flux of information between neurons. **Figure 3** illustrates the architecture of this artificial neural network with *r* hidden layers.

DFANN operates as follows. The input signal is received by the neurons of the input layer; these neurons are just in charge of propagating the signal to the first hidden layer, and they do not make any processing. The first hidden layer processes the signal (applying a nonlinear transformation or transfer function) and transfers it to the subsequent layer; the second hidden layer propagates the signal to the third and so on. The number of hidden layers gives the depth of the model, hence the term "deep." When the signal is received and processed by the output layer, it generates the response.

The knowledge of the DFANN is registered, by the learning algorithms, in the connections between the neurons of each layer = {1,2,…,r}, called weights. Several learning algorithms have been created to estimate the weights, where the most

**Figure 3.** *Deep feedforward artificial neural network (DFANN).*

popular and the first being the backpropagation, also known as generalized delta rule, popularized by [26]. The backpropagation learning algorithm is a supervised learning method and is an implementation of the Delta rule. It requires the desired output for any given input to be able to compute the output error. The main idea of the algorithm is to have a backward propagation of the errors from the output nodes to the inner nodes. For the construction of the backpropagation learning algorithm, we need to compute the gradient of the error of the network with respect to the network's modifiable weights. A DFANN network with 4 hidden layers and 12 neurons in each layer was implemented for this work.

## **3.2 Recurrent neural networks with long short-term memory (RNN-LSTM)**

As firstly proposed by Rumelhart [26], recurrent neural networks have a primitive type of memory, in the form of recurrent layers that can operate in time [27]. Each recurrent layer takes both the output of the previous layer and an internal output of the current layer as inputs. Thus, RNNs are ideal for dealing with time series data [27]. RNNs can solve the purpose of sequence handling to a great extent but not entirely; they are great when it comes to short contexts, but to be able to build a story and remember it, the models need to be able to understand and remember the context behind longer sequences, just like a human brain. This is not possible with a simple RNN. Long short-term memory (LSTM) networks [28] are a type of RNN precisely designed to escape the long-term dependency issue of recurrent networks. LSTM recurrent networks (RNN-LSTM) have memory cells that have an internal recurrence (a self-loop), in addition to the outer recurrence of the RNN. The latter adds a nonlinear transformation to the inputs [28]. These memory cells, A, are controlled mainly by the memory door, the forgetting door (*ht*), and the output door. The memory door activates the entry of information to the memory cell, and the forgetting door selectively erases certain information in the memory cell and activates the storage to the next entry [29]. Finally, the output door decides what information the memory cell will emit [30]. The LSTM network structure is illustrated in **Figure 4**. Each cell has three gate activation functions σ and two output activation functions defined by tanh as a nonlinear transfer function.

In addition, they classify and predict based on time series data, since there may be delays of unknown duration between important events in a series of time. It allows clearly remembering events selected from far away in the past, which contrasts with basic NRs, for which the memory of an event decays over time [27].

**9**

**Figure 5.**

*Ground intensity function (GIF) estimation.*

*Assessing Seismic Hazard in Chile Using Deep Neural Networks*

A 1-layer RNN-LSTM with 12 cells was implemented for this work. Both DL models

**Figure 5** shows GIF estimation for the data preprocessing module, estimated for magnitudes >3, >4, >5, and >6, respectively. Note that with higher magnitudes, the GIF time series become thinner, due to the decrease of seismic events that fit in the

The structure implemented for both DFANN and RNN-LSTM models is shown

The DFANN model performs slightly better than the RNN-LSTM models, in particular for lesser magnitudes (>3). **Table 1** shows the training and test performance measures (root mean square error, RMSE) for each magnitude group and DL model. Both models show better performances with magnitude >3, that is, when

Also, a representation of the training and test results for the best model are shown in **Figure 7**. The model captures the trend very well; however, it does not perform accordingly in terms of the magnitude of the intensity function.

were implemented using Keras, with TensorFlow as backend, in Python.

*DOI: http://dx.doi.org/10.5772/intechopen.83403*

**4. Results**

category.

in **Figure 6**.

more information are available.

**Figure 4.** *LSTM cells structure, based on the work by [31].*

A 1-layer RNN-LSTM with 12 cells was implemented for this work. Both DL models were implemented using Keras, with TensorFlow as backend, in Python.
