**4. Experiment and discussion**

1

As study material in this paper used data from the earthquake incident on September 28, 2018, in Palu, Sigi, and Donggala. This choice takes into account that the disaster has a reasonably broad impact on damage. In general, the damage can be divided into several phenomena. One of them is the damage caused by fault movements, fractures, and earthquake shocks. The fault movement is an offset where the left side moves north and the right side shifts to the south. The length of the most considerable shear on the right side is about 4 m, while the left side shifts to the north along 3 m. This shift is visible on the map visible on Google map. Of course, buildings that are traversed by faults will suffer significant damage and soil fractures, where

fractures can be the impact of the movement of faults (or reactivated faults) with a smaller offset. Earthquake shocks are in the form of vibrations both horizontally and vertically. In general, in Palu City, the impact of damage due to shocks was not too much, except for buildings of low quality.

Therefore, is the phenomenon of damage due to the tsunami. The impact of a tsunami is the result of inundation (submerged buildings) and tsunami currents (speed or force acting to push or pull buildings). The impact of current velocity is mainly the scouring of the subgrade. If it is loose sand, the erosion rate is very high. Generally, buildings with shallow foundations fail because the scour reaches the base of the foundation. The buildings are relatively light, so they are easily carried away by the flow of water. Another damage is due to the tsunami and at the same time carrying debris to cars and ships, so collisions with these objects often result in heavy damage.

Lastly is the phenomenon of damage due to liquefaction. There are 4–5 locations that are pretty prominent and wide, namely in Balaroa, Petobo, Jono Oge, Lolu village (also in Jono Oge), and Sibalaya. Although some spots also occur liquefaction in the sand boil, it is not prominent and is not recorded. In addition, landslides in the sea can occur due to liquefaction. This kind of avalanche is induced by liquefaction. The landslides in Balaroa and Sibalaya were a phenomenon of liquefaction-induced landslides. It is possible that the submarine landslides that occurred in Palu Bay which caused the tsunami impact had the exact mechanism as in Sibalaya.

#### **4.1 Community satisfaction prediction model**

This section presents the modeling framework and procedures used to develop the ANN and SVM approach models. Similar to the traditional modeling process, where the goal is to estimate set coefficients in the form of a particular function. The main objective of the ANN model in this study is to obtain a set of matrices, which are abstract basic knowledge of the available data after going through the training loop. However, to use ANN in solving real-world problems, it is necessary to design a framework following the characteristics of a problem. The framework design aims to define the required ANN architecture and the relationships between the components in the framework. After completing the design framework, the next stage is to design the architecture of each ANN sub-model. The ANN architectural design process is a decision-making process, which includes determining the number of layers, the number of neurons in each layer, the variables entered into the input layer and the output layer. After completing the ANN architectural design, the design results need to be tested and validated.

In general, a neural network is made up of millions (even more) of the basic structures of interconnected and integrated neurons so that they can carry out activities regularly and continuously as needed. The imitation of a neuron in an artificial neural network structure is a processing element that can function as a neuron. The number of input signals is multiplied by the corresponding weight w. Then do the sum of all the results of the multiplication and the resulting output is passed into the activating function to get the degree of the output signal f (a, w). Although it is still far from perfect, the performance of this neuron clone is identical to that of the cell biology we know today. The collection of neurons is made into a network that functions as a computational tool. The number of neurons and the network structure for each problem solved is different.

Furthermore, this model was developed by activating the entire network in ANN. Activating an artificial neural network means activating every neuron used in that

#### *Data Mining Applied for Community Satisfaction Prediction of Rehabilitation… DOI: http://dx.doi.org/10.5772/intechopen.99349*

network. Many functions can be used as activators, such as goniometric and hyperbolic functions, step unit functions, impulses, sigmoid, etc. Of the several commonly used functions is the sigmoid function because it is considered closer to the human brain's performance. The algorithm activation process during iteration can be monitored, and its movement pattern can be seen.

In contrast to the neural network strategy, which seeks to find a hyperplane that separates classes, SVM tries to find the best hyperplane in the input space. The basic principle of SVM is a linear classifier. It is further developed to work on non-linear problems by incorporating the concept of a kernel trick in a high-dimensional workspace. This development encourages research in modeling to explore the potential capabilities of SVM theoretically and in terms of application. Currently, SVM has been successfully applied to real-world problems, and in general, provides a better solution than conventional methods.

#### **4.2 Community satisfaction data**

The model built is verified using data from questionnaire collection around the rehabilitation and reconstruction project. The questionnaire result dataset includes 625 results from 2 rehabilitation and reconstruction projects and 25 input parameters referred to as influencing parameters in an empirical study of community satisfaction. These parameters are given a sequence code based on the pre-during-post stage as input, as shown in **Table 1** below. All data obtained based on the level of importance and level of performance of each parameter asked the correspondent.

#### **4.3 Stages of learning and modeling test**

Forming a dataset is carried out to form three datasets that can be used immediately to learn, test, and validate. The database is divided into two datasets. The first set includes all the information. The dataset of both questionnaires was collected, which will be used for validation purposes. The entire dataset used for learning and test purposes is further divided into two subsets to obtain learning datasets. One set contains 80% of the data used for learning and 20% of the data used for testing. It is statistically independent data from the dataset used during learning and testing based on separating the dataset for the validation process. Therefore, verification of the DM model by using a separated dataset can be considered a control to check the performance of the DM model. The learning process is carried out with the number of epochs (10,000 times). The iteration process produces an ANN model that has an optimal weight between neurons.

After the learning phase is complete, the model development step is continued to the test stage to check the effectiveness of the learning process. The dataset used in the test stage becomes the DM input. The algorithm used in this stage uses a learning algorithm that has been recorded in the DM application when the learning process is running. The test process can calculate the error rate that occurs. If the error level of the test stage is still within an acceptable level, then the DM model is considered reasonable. A comparison of the model's accuracy is made by comparing the average MSE values during the test phase. Finally, the DM model with the lowest MSE error rate and the highest R2 is selected. Finally, after the learning and test process is complete. Furthermore, the verification and validation of the model are carried out using the data that has been prepared with the prediction model of the community satisfaction learning and test results. Different dataset details were selected for model validation.


**Table 1.**

*Input code.*

### **4.4 Model interpretation**

In engineering science, apart from requiring a high level of accuracy, it also requires interpreting the modeling results. The ability to interpret DM is greatly

## *Data Mining Applied for Community Satisfaction Prediction of Rehabilitation… DOI: http://dx.doi.org/10.5772/intechopen.99349*

influenced by the power of the data-driven model for this purpose. When the DM black box is implemented with ANN, SVM, and MR algorithms that involve complex mathematical expressions, the data-driven application procedure provided must translate the model. In this case, the results of the model interpretation are carried out to obtain a measurement of the input variables of the community satisfaction prediction model.

The first stage of model interpretation is to believe in the ability and accuracy of the model. The prediction model of community satisfaction using community satisfaction as the leading prediction parameter is first checked for modeling accuracy. There are several methods for evaluating predictive models, one of which uses the sum of absolute errors. The sum of the absolute errors often referred to as the absolute deviation of the average or MAD, is measuring forecasting accuracy by averaging the forecast errors using their absolute values. MAD is beneficial for analyzing and measuring the prediction error in the same unit of measure as the original data. In addition, the resulting process modeling criteria are stated in the RMSE, provided that the smaller the resulting RMSE (close to the value 0) will result in a better output prediction model.

This model is structured with a confidence level of 95% according to the t-student distribution. All DM models with ANN, SVM and MR algorithms are trained using 12 input variable attributes. **Figure 1** shows the predictive capacity of all training outcome models, comparing their performance in predicting the value of community

**Figure 1.** *Performance measured.*

satisfaction based on MAD, RMSE, and R2 . This table shows that the value of community satisfaction can be predicted accurately by each of the three DM models, especially by the ANN and SVM models.

**Figure 1** above shows the standard error, and R2 for each model developed. The DM model with the SVM algorithm has the smallest MAD value and RMSE value, and the highest R<sup>2</sup> value. The prediction model with the ANN and SVM algorithms is acceptable and can be used in calculating community satisfaction predictions because it has R<sup>2</sup> close to 1. The following community satisfaction prediction model used in this study is the DM model with the SVM algorithm.

DM technique, also known as association rule mining, can find associative rules between a combination of items. Two parameters can determine the importance of an associative rule. The parameter is the percentage combination of these attributes in the database and confidence, namely the strength of the relationship between attributes in the associative rule. With the generate and test paradigm, the algorithm used in this study is making candidate combinations of attributes based on specific rules and then tested. Combining attributes that meet these requirements is called a frequent itemset, which is then used to create rules that meet the minimum confidence requirements.

By analyzing **Figure 2** (the scatterplot of the community satisfaction value prediction of the SVM algorithm with the questionnaire results), the variables that have been determined have a significant relationship with the change in the value of the questionnaire community satisfaction. **Figure 2a** shows the scatterplots of learning results in the SVM model, and **Figure 2b** shows the results of the validation stages.

In the validation stage, the library feature *rminer* is used to describe and obtain the relative contribution value of each input value. The confirmed model has R<sup>2</sup> , MAD, and RMSE values in the performance validation stage, such as **Figure 1**, with 20 runs performed, while the best hyperparameters to achieve a fit SVM model are used. ∈ = 0.07 ± 0.01 and γ = 0.05 ± 0.00. Whereas the hyperparameters for ANN used H = 3 ± 1.

Furthermore, the interpretation of the regression analysis used in DM is carried out. Package *rminer*, provides a graphical interpretation tool, namely: REC curve,

*Data Mining Applied for Community Satisfaction Prediction of Rehabilitation… DOI: http://dx.doi.org/10.5772/intechopen.99349*

error tolerance depicted on the x-axis, while the percentage value of road performance predictions is depicted on the y-axis. The resulting curve describes the level of error in the form cumulative distribution function (CDF). The error level defined as the difference between the predicted values of community satisfaction f(x) with community satisfaction actual on every coordinate (x, y). This approach is also a squared residual ( ( )) 2 *y fx* − or absolute deviation *y fx* − ( ) based on error metric mapping. **Figure 3** shown REC curve community satisfaction model with MR, ANN, dan SVM algorithm.

In **Figure 3** it can be analyzed that the REC curve describes the error tolerance on the x-axis and the level of accuracy of the regression function on the y-axis. The level of accuracy is defined as the percentage of modeling results that fit the specified tolerance. If the tolerance value is zero, only that value is considered to meet the model requirements. However, if you choose the maximum tolerance, other values can be used as reference for accuracy values. In the REC curve it is clear that the level of accuracy has a trade-off with tolerance. The greater the tolerance value given, the

**Figure 3.** *The regression error characteristic curve.*

higher the accuracy value. Conceptually, the model with the lowest tolerance value with the highest accuracy is the model that has the best REC value.

The illustration of the REC curve depicts three different models. The curve shows that the SVM model has the highest accuracy value with the smallest tolerance value that moves consistently. This REC curve depicts the entire iteration process with 20 runs on the SVM model with hyperparameters as mentioned in the previous section. The shape of the REC curve can change shape when using different hyperparameters and the number of iteration runs is different.

#### **4.5 Variable contribution**

The DM model developed can assess each variable's contribution and attribute that becomes input data in the model. In this study, the variables or attributes consist of A1-C9. All attributes are then grouped into three dimensions pre, during, and post. A parameter vector in this DM model is chosen to explain that it is a variable function and not parameters as in the parametric approach. The only condition for a variance function is to be able to generate a non-negative definite variance matrix. Several methods can be used to estimate hyperparameter values. The value of θ can be predicted in this DM by using the cross-validation method. Hyperparameter used (H and γ) are H (2, 4, …, 10) and γ (2–15, 2–13, …, 23). This value produces the most precise model with optimal run time. For further model development, an approach can be used to try other hyperparameter values. The contribution of each attribute and dimension is of relative importance in composing the model.

The search results for the contribution value in DM can be simplified and displayed in **Figure 4**. This figure can display the relative importance on the x-axis for each attribute and dimension on the y-axis forming the community satisfaction prediction model with the DM model approach using the SVM, ANN, and MR algorithms.

Based on **Figure 4** below, each parameter has an almost even effect on community satisfaction in disaster management. When using a model that is considered the fittest, namely SVM, it can be seen that the most significant importance is the comfort of road and bridge compared to before (C4), and Collaboration between local communities in reconstruction and rehabilitation (A5). Therefore, the access road to residence compared to before the reconstruction and rehabilitation (C8), Participation in the reconstruction and rehabilitation process (A4), and Community Participation in the reconstruction and rehabilitation (B7). While pre-rehabilitation and reconstruction, the stage is the most critical dimension affecting community satisfaction.

The following model analysis is to compile an algorithm to select the main dimensions that affect the community satisfaction model and analyze the supporting variables that affect the community satisfaction prediction model that is not accommodated in this model. The results of VEC analysis illustrate the influence of the main attributes that move dynamically in the prediction model of community satisfaction with this SVM model in the form of information and socialization about reconstruction and rehabilitation (A1), a pre-rehabilitation and reconstruction group. Decreased community satisfaction following the time of reconstruction program began (A2) and the role of the facilitator in the reconstruction and rehabilitation process (B1),

*Data Mining Applied for Community Satisfaction Prediction of Rehabilitation… DOI: http://dx.doi.org/10.5772/intechopen.99349*

**Figure 4.** *Relative importance.*

and conversely, community satisfaction improved when performed the access road to residence compared to before the reconstruction and rehabilitation (C8).
