**2.4 Long short-term memory (LSTM) and extreme learning machine (ELM) modeling**

A special kind of Recurrent Neural Network that can recognize long-term dependencies is known as Long Short-Term Memory Network or LSTM. LSTMs are designed specifically to avoid the long-term dependence problem [40]. Their behavior is set up to make retention of memory over time their default setting. Therefore, they perfectly compare time series forecasting [40]. This study employs LSTM to develop prediction models for latitude and longitude correction. The hyperparameters used in modeling are presented in **Table 1**. A three-layer LSTM network was simulated with different combinations of hidden neurons on each layer. Hidden layer 1 comprises 500, 1000, and 1500 hidden neurons; hidden layer 2 has 700, 500, and 300 hidden neurons, while hidden layer 3 shall consist of 300, 200, and 100 hidden neurons. These different combinations of hidden neurons were modeled with three various training epochs of 100, 200, and 300 to generate several combinations of LSTM prediction networks. Moreover, the training optimizer used is "sgdm" or the Stochastic Gradient Descent with momentum with a set initial learning rate of 0.001, a mini-batch size of 128 to learn the common patterns as important features, and a gradient threshold of 1.

On the other hand, Extreme Learning Machine or ELM is also applied to generate prediction models. ELMs are feedforward neural networks having one or more layers of hidden nodes that are used to analyze data and predict values [41]. These hidden nodes' parameters require no adjustment for selecting features, compression, clustering, classification, or sparse estimation [41]. To ensure lower error rates, weights to these concealed nodes may be assigned using the stochastic projection method, or these nodes can be passed down from their predecessors and not changed. In contrast to conventional gradient-based methods of learning for feedforward neural networks, ELM offers intriguing and important characteristics [42]. In comparison to


#### **Table 1.**

*Hyperparameters used in LSTM prediction modeling of latitude and longitude.*


**Table 2.**

*Hyperparameters used in ELM prediction modeling of latitude and longitude.*

gradient-based learning, ELM learning progresses far more quickly and performs well in generalization [43]. The hyperparameters used in the simulation of ELM models for latitude and longitude prediction are summarized in **Table 2**. A single-layer ELM is used, thus producing various models by simulating the different numbers of hidden neurons, which are given as 100, 200, 300, 400, 500, 600, 700, 800, 900, and 1000 while the selected activation function applied is the "radbas" or the radial basis function for good generalization and fast training.

#### **2.5 Evaluation metrics for prediction model performance**

The performance of the developed prediction models for latitude and longitude correction was evaluated using mean square error (MSE), root mean square error (RMSE), coefficient of determination (R2 ), and mean absolute error (MAE).

MSE calculates the average difference of squares between predicted and true values. MSE is used to assess the model's quality based on predictions made across the entire training dataset versus the true label/output value. Lower MSE values suggest that the model is more accurate, and this is defined mathematically by (1):

$$MSE = \frac{1}{n} \sum\_{i=1}^{n} \left(\dot{\boldsymbol{\nu}}\_{i} - \boldsymbol{\nu}\_{i}\right)^{2} \tag{1}$$

The average difference between expected and actual values in a dataset is measured using the root mean square error (RMSE). The RMSE measures how distributed the residuals are, showing how closely the observed data clusters around the predicted values. Mathematically, RMSE is calculated as the square root of the MSE.

The coefficient of determination (R<sup>2</sup> ) is a measure of how well the values fit in comparison to the original values. It calculates the percentage of the total variation in the variable that is dependent which can be explained by the model's independent variables. The value is obtained as a percentage and ranges from 0 to 1. The greater the value, the better the model. It is calculated using (2).

$$R^2 = \mathbf{1} - \frac{\sum\_{i=1}^{n} \left(\boldsymbol{\mathcal{y}}\_i - \bar{\boldsymbol{\mathcal{y}}}\_i\right)^2}{\sum\_{i=1}^{n} \left(\boldsymbol{\mathcal{y}}\_i - \overline{\boldsymbol{\mathcal{y}}}\_i\right)^2} \tag{2}$$

Finally, MAE indicates the difference between the true and predicted values, which is calculated by averaging the absolute difference over a given data set. It is typically utilized when measuring performance using continuous variable data.

*Adaptive Neuro-Fuzzy Inference System-Based GPS-IMU Data Correction for Capacitive… DOI: http://dx.doi.org/10.5772/intechopen.112921*

It returns a linear number that equalizes the weighted individual differences. The smaller the value, the greater the performance of the model. It is computed using (3).

$$MAE = \frac{\sum\_{i=1}^{n} \left| y\_i - \bar{y}\_i \right|}{n} \tag{3}$$

where *n* is equal to the number of data points, *<sup>i</sup> y* is the observed values, and *<sup>i</sup> y* is the predicted values.
