**3. Research method**

In developing community satisfaction prediction models, complete information is needed about the characteristics of the type of work carried out. In general, community satisfaction at each stage is relatively easy to obtain if data is collected regularly and routinely. Community satisfaction is generally easy to compile and has several measurement methods to evaluate overall community satisfaction objectively. Meanwhile, data satisfaction that is outside the existing standard stages is a little more challenging to obtain and requires a long time. For example, data on community satisfaction pre-handling rehabilitation and reconstruction, compared to other stages, is more difficult to obtain. Existing data is more subjective, so that the quality of the data obtained depends on the ability of stakeholders to see and see analyze the conditions of these stages.

This section will describe the methods used to predict community satisfaction. This analysis is not mathematical, but it is carried out to obtain illustrations to show the argument that the proposed method is a more effective model. The community satisfaction prediction model is considered very important in completing a natural disaster management system. In addition, information related to the characteristics of community satisfaction includes pre, during, and post-rehabilitation and reconstruction, which are variables that are considered to have a significant influence on overall community satisfaction.

The community satisfaction model can be used in each stage, analyze disaster management, and determine the rehabilitation and reconstruction methods needed. Disaster management can analyze the existing conditions of the disaster management stages required to complete each disaster management step. This is linked to decisionmaking in management regarding the best and alternative methods for implementing post-disaster rehabilitation and reconstruction. In developing this model, researchers will use a DM-based community satisfaction prediction approach using data collected from the rehabilitation and reconstruction work locations in Palu, Sigi, and Donggala. Data is divided according to the handling area for calibration, learning, test, and validation purposes.

#### **3.1 Model approach**

This study will develop a community satisfaction prediction model with the DM approach without any restrictive assumptions by considering the input data sourced from the questionnaire results. The preparation of a community satisfaction prediction model with DM follows the following stages and processes. It was first cleaning and researching data that can be used in the deterioration model. The data cleaning process includes deleting inappropriate and irrelevant data from the database. This process can include writing errors, ensuring that the writing format remains consistent, and deleting records with incomplete data.

Second, check the data. The first step is to make a histogram or bar chart to determine the frequency of each variable. After that, the relationship of each data must be found. Knowing the distribution and correlation between existing variables helps researchers choose the proper form of data and be more efficient in evaluating the mode to be formed. In data checking, discrepancies and inaccuracies can be found so that further data cleaning is required. The level of correlation refers to the relationship between two variables. A high level of correlation indicates that the two variables are closely related, where if one of these variables changes, the other variables will

also change proportionally. If the variables are continuous, these variables will form a line if drawn together. A low level of correlation indicates that the two variables change randomly and are not related. Most of the data fall between two extreme values. The correlation level test is shown through the correlation matrix.

Third, choosing the type of model. After considering each type of model previously studied (deterministic, probabilistic, and artificial intelligence). In this research, the development of the selected AI-based model. Developing a community satisfaction model is carried out through iteration stages by changing aspects of the model to form the best model based on the available data. Model development is done by adjusting aspects to the type of model and the available software. Several factors influence the shape of the model, among others, the basic equation, the variables used in the model, and the grouping of these variables into groups.

Fourth, look for parameter values. Determination of values and parameters is required in model development. In general, this step is completed using an optimized algorithm equation. However, for simple models (for example, a linear regression model using the least square method), this value can be manually optimized using a spreadsheet program. The *rminer* provides a complete menu option in determining the parameter value with the command:> contribution.

Finally, after the parameter values are obtained and the model has been formed, the model must be evaluated. The evaluation method will depend on the type of model selected. If, after evaluation, the model is not feasible, then the type of model must be reconsidered. If the type of model is still deemed inadequate, the form of the model must be changed and redeveloped. If the evaluation results conclude that the model type is unsuitable for the available data, then the model type must be reconsidered. There are several ways to evaluate statistical models. One of the initial actions that must be considered in evaluating a model is estimating parameter values. The parameter values must be reasonable and significant.

#### **3.2 Model evaluation**

By considering the classification or regression approach, other alternative evaluation steps can also be taken. The evaluation process is carried out for regression based on the difference between the observed value and the estimated value (error value). In general, the lower the error value, the better the community satisfaction prediction model, where the error value = 0 is the ideal value to be achieved.

In this study, three measurements were taken: the mean absolute deviation (MAD) root mean squared error (RMSE). Models with low MAD and RMSE values and R2 values close to the unit value can be interpreted as models with a high level of prediction. RMSE is more sensitive to extreme values than MAD, and this is because RMSE uses the square value of the difference between the measurement results and the predicted model results. Compared to MAD, RMSE is more likely to produce a more significant error value in a model. Looking at the differences, measuring the error value through the two models will provide a different perspective on the proposed model to be used as a comparison.

Furthermore, different DM regression models can be easily compared by drawing a regression error characteristic (REC) graph, which depicts the tolerance for error values on the x-axis compared to the error tolerance percentage values estimated on the y-axis. The representation of the feasibility level of the model is also used in this study. All outputs are collected for evaluation. The integration of the R application with other reporting applications can be facilitated by compiling additional scripts.

*Data Mining Applied for Community Satisfaction Prediction of Rehabilitation… DOI: http://dx.doi.org/10.5772/intechopen.99349*

#### **3.3 R Tools**

The satisfaction pattern through the community satisfaction prediction model is designed to be dynamic with various algorithm choices. The choice of the Multiple Regression (MR), ANN, and SVM algorithms is expected to provide various approaches to community satisfaction with the rehabilitation and reconstruction stages. The results of developing a community satisfaction model will be evaluated and adjusted throughout the disaster management stages until a model can translate the dynamics of existing data. The prediction model must be dynamic and respond to changing conditions [25].

Getting a fit model has carried out a whole iteration of all possible combinations between all variables. In this study, iterations were carried out with consideration of 25 variables and combination exploration. The model selection stage, especially during the feature selection stage, is only applied to the SVM algorithm. The advantage of this approach lies in the fact that the three SVM hyperparameters (c, γ, ϵ) can be set automatically and are urgently needed during the feature selection process.

During the learning phase (after selecting the input variables), the ANN algorithm in this study will use the overall multilayer perception relationship, with one hidden layer using H processing units, relationship predictions, and logistic activation functions 1 / (1 + e (−x)). The best value of H can be found by range {2, 4, …, 10}, under the internal value (amount of training data used), around 5-fold crossvalidation has been performed [26]. Based on tracing the built network, the value of H, which produces the smallest MAD value, has been selected, and ANN is retested using all training data. For the SVM algorithm, to reduce search space, this study uses the Gaussian kernel approach and the proposed heuristics approach to

determine *complexity penalty parameter* = 3, and sizes for *incentive tube*, <sup>ˆ</sup> , *<sup>N</sup>* ∈ σ = where ( ) 2 1,5 <sup>ˆ</sup> . <sup>ˆ</sup> , *<sup>N</sup> i i <sup>i</sup> y y <sup>N</sup>* σ<sup>=</sup> <sup>=</sup> ∑ <sup>−</sup> *<sup>i</sup> <sup>y</sup>* ais the amount of data used [27]. The most critical

parameters in SVM are *kernel parameter* γ , used in the search scope {2−15, 2−13, …, 23 }, below the minimum 5-*fold cross-validation* [26].

Completing the modeling of the ANN and SVM algorithms, in this study, the MR model was tested as a comparison. The entire DM algorithm consisting of ANN, SVM, and MR is implemented with the R-Tool (R Development Core Team, 2009) and *rminer library* [28]. Furthermore, before fitting the ANN, SVM, and MR models, all data are tested with standard statistics, and then the output is tested for inverse transformation.
