**4. Liquefaction-induced settlement model development**

#### **4.1 Preparing training and testing datasets**

The manner in which data are divided into training and test data sets in data mining procedures has a substantial effect on the results [21–23]. The statistical parameters for the input variables include the minimum, maximum, mean and standard deviation of the training and test datasets, as shown in **Table 2**. Data set splitting was done to assess the generalization efficiency and predictive ability of the developed models. The related performance of the training and testing datasets suggests that the developed models can be applied to the trained ranges. In the testing the ranges of input and output parameters often occur in the training datasets as shown in **Table 2**. The training and testing datasets'statistical consistency enhances the performance of the developed models and thus helps to properly assess them.

To ensure comparability, the RF and REP Tree models are proposed using the same training and test datasets. Using these models, liquefaction-induced settlements are predicted, and an analysis of the detailed performance of these models will find the optimum model afterwards. If the performance of this model on the training and test datasets is adequate then it can be adopted for development.

### **4.2 Evaluation measures**

In this study, three evaluation measures, mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (*r*) are used to evaluate and compare the performance of the models. The MAE, RMSE and *r* are three useful statistical measures which provide some useful insights into the prediction model, of which the MAE is an average of the sum of the differences between the values predicted by a model and the actual values, the RMSE is a standard deviation of the


**Table 2.**

*Statistical parameters of the training and testing datasets.*

differences, and the correlation coefficient (*r*) is a statistical measure representing the percentage of the variance for a model a dependent variable that's described by an independent variable, and their expressions are as follows [24]:

$$MAE = \frac{1}{n} \sum\_{i=1}^{n} |y\_i - \mathbf{x}\_i| \tag{4}$$

using parameter setting. The optimum value for each machine learning parameter is illustrated in **Table 3**. In the proposed RF and REP Tree models the most significant parameters are the number of seeds and the minimum total weight of instances in a

*Evaluation of Liquefaction-Induced Settlement Using Random Forest and REP Tree Models:…*

The RF and REP Tree predictive results were obtained from the datasets for training and testing datasets. The MAE, RMSE and correlation coefficient (*r*) were subsequently determined on the basis of the Eqs. (4)–(6) shown in **Figure 2** that depicts RF and REP Tree models performance, respectively. For the RF model the

leaf during the modeling process.

*DOI: http://dx.doi.org/10.5772/intechopen.94274*

**Figure 2.**

**265**

*Comparison of MAE, RMSE, and* r *values from the RF and REP tree models.*

$$RMSE = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} \left(\mathbf{y}\_i - \mathbf{x}\_i\right)^2} \tag{5}$$

$$r = \frac{\sum\_{i=1}^{n} (\boldsymbol{\kappa}\_i - \overline{\boldsymbol{\pi}}) \left(\boldsymbol{y}\_i - \overline{\boldsymbol{\eta}}\right)}{\sqrt{\sum\_{i=1}^{n} (\boldsymbol{\kappa}\_i - \overline{\boldsymbol{\pi}})^2 \sum\_{i=1}^{n} \left(\boldsymbol{y}\_i - \overline{\boldsymbol{\eta}}\right)^2}}\tag{6}$$

where *yi* and *xi* are the observed and predicted value of *i* th sample of the data respectively, *x* and *y* are the mean values of the observed and predicted values respectively, and *n* is the total number of samples. MAE can be given as a more natural and unambiguous index compared with RMSE to quantify errors between the estimated and actual observed values [25, 26]. RMSE was used as a standard statistical metric to assess output of a model [27]. The larger correlation coefficient (*r*) and lower mean absolute error (MAE) values, and the root mean squared error (RMSE) present a higher accuracy of predicted results.
