Perceptual Structure and Pattern Modeling of Aftershock-Induced Landslides

*Omar F. Althuwaynee, Badal Pokharel, Ali Aydda and Sinan Jasim Hadi* 

#### **Abstract**

 In this study, two unsupervised clustering techniques were used to generate meaningful training and testing landslide sample locations to produce landslide susceptibility map in north of Kathmandu Valley, Nepal. First, the landslide inventory data was prepared after enormous aftershock of magnitude 7.3 Mw on May 12, 2015. Secondly, unsupervised clustering algorithms (k-means, expectationmaximization (EM) using Gaussian mixture models (GMM)) were used in R programming environment to generate two samples of datasets. Additionally, a dataset was generated using random sampling and that for sampling comparison purpose. Later, eight terrain-based topographic conditioning factors were selected from freely available digital elevation model derivatives (slope, aspect, elevation, roughness, topographic wetness index (TWI), stream power index (SPI), area solar radiation (ASR), and curvature). Eventually, a total of three datasets was used to run logistic regression model to produce the landslide susceptibility map (LSM). Using uncertainty optimization, statistical tests (ANOVA, Wald test, tenfold cross validation, and ROC of AUC value) were performed. The unsupervised samples show more significant behavior over the unstable characteristic of the randomly selected samples. Gaussian mixture model (GMM) algorithm scored the highest in model performance and prediction capability. The findings present an alternative sampling strategy that significantly increases the efficiency and stability of the landslide susceptibility model.

**Keywords:** landslide prediction, unsupervised clustering, logistic regression, cluster, Nepal

#### **1. Introduction**

Earthquake-induced landslide is a significant natural hazard especially in mountain areas and contributes in mountain-scale erosional budgets as well as life losses [1, 2]. An earthquake occurred on April 25, 2015 that struck at Barkpak, Gorkha, at a magnitude of 7.8 Mw and triggered landslides, rockfalls, and avalanches [3] which blocked roads and dammed rivers [4]. The biggest aftershock of magnitude 7.3 Mw on May 12, 2015, triggered many landslides as well. The major shock and the aftershock sequences produced almost 25,000 landslides with an approximate area of 87 km2 , most landslides concentrating in the central Nepal through the

 whole slopes in the central Nepal [5, 6]. The landslide in Rasuwa district only covers 20.12 km2 area with volume of 0.08 km3 [5]. The most affected districts are Rasuwa, Dhadging, Nuwakot, Sindhupalchowk, and Gorkha [7]. The landslides and debris triggered by major shock caused loss of human lives, households, schools, and other infrastructures [4].

Methods of selecting training and model validation in landslide susceptibility mapping (LSM) using data mining approaches are crucial, especially to reduce the result uncertainty and to reach a better representation for landslide spatial distribution. LSM usually starts with collecting and processing previous or current landslide inventory followed by selection of deterministic or stochastic-based (clustering, regression, artificial intelligence, etc.) susceptibility mapping techniques and then map production and validation. In modeling phase, inventory is classified as training data samples used as dependent factors with the independent (conditioning) factors of landslides to be fitted, while the testing data samples is used as qualitative measure to evaluate the predictive power of the classification rule [8, 9].

 Studies have been made to analyze the earthquake-triggered landslide distribution and simple spatial correlation by [10–13] and others. Landslides generally occur in cluster pattern [14], and this pattern indicates the high density of the events occurring at a specific location [15]. In the case of earthquake-triggered landslides, events are clustered near and around the epicenters [11]. A hypothesis states that landslide events usually happened in a cluster manner and not randomly (spatially and temporally) especially in a study area that has a complex geological underneath. Therefore, spatial data clustering algorithms can be used to extract some logical representation of such clusters.

Machine learning algorithm which evolved from pattern recognition study can be used in landslide susceptibility assessment [16]. Regression and classification are two types of supervised learning. When desired output class or event is unknown, unsupervised learning is applied to study the patterns of the input data and group similar into a class [17]. Clustering and dimensionality reduction are widely used unsupervised learning. Logistic regression is considered one of the robust multivariate methods to generate landslide susceptibility models that have been examined and validated based on the performance and stability by many authors in landslide research field [8, 18–23].

In this study, the landslide inventory map prepared after a second major aftershock on May 12, 2015, is processed, and three inventory datasets were produced, two clustered-based datasets using k-means and expectation-maximization (EM) using Gaussian mixture models (GMM) and one random-based dataset in R programming environment. The efficiency in each dataset is checked using statistical sets to prepare the final LSM using logistic regression method.

#### **2. Study area**

The study lies in southern part of Rasuwa district, north of Kathmandu Valley, approximately 55 km east from the epicenter (**Figure 1**). It covers an area about 329 square kilometers with an elevation range of 540–4320 m. The number of aftershocks between April 25, 2015, and May 12, 2015, recorded by the National Seismological Center, in Rasuwa and Nuwakot, reached 19 and 22, respectively. In between this time frame, the epicenters for 15 aftershocks have been recorded in the study area with a magnitude lying in the range of 4.0–5.0 ML. The distribution of landslides seems high after the major shock in April 25, and the small numbers of additional landslides are observed after aftershock of 7.3 Mw in May 12 [6].

*Perceptual Structure and Pattern Modeling of Aftershock-Induced Landslides DOI: http://dx.doi.org/10.5772/intechopen.87836* 

**Figure 1.** 

*Map of study area showing major settlements and epicenters for aftershocks of Gorkha earthquake 2015 in between April 25 and May 12, 2015.* 

The peak ground acceleration (PGA) in study area is found in the range of 0.48–0.76 g [24]. In the scenario of Gorkha earthquake, landslide densities have no direct correlation with modeled PGA [6]. The densities, however, correlate with several factors such as slope, PGA, surface downdrop, specific metamorphic lithologies, and large plutonic intrusions [25]. The land in the study area is mostly occupied by forest followed by cultivated land, grassland, shrubland, debris deposit, barren land, and water body. The study area is connected to Kathmandu, capital city, by Pasang Lhamu Highway which faced obstruction several times due to the landslides induced by the earthquake. Trishuli is the main river draining the area. There were many landslides along the Trishuli River valley on both sides of the river from Dandagaun to the border [4]. The landslides caused significant damage at headworks, desander, penstock pipe, and powerhouse of hydropower project at Mailung Khola [4]. As of now, there have been very limited studies in damages which happened due to the coseismic landslides in this area. In Langtang Valley, north of the study area, the most devastating landslide induced by this earthquake had occurred in which the debris avalanche buried several villages killing 350 people [25].

#### **3. Data**

 A geological map is prepared by digitizing the map published by the Department of Mines and Geology (DMG) in 2011 [26]. The information on settlements,

roads, and streams are taken from topographic maps. The spatial database for the aftershocks within a study area is prepared using data in the website of the National Seismological Center (NSC), and that of PGA is taken from United States Geological Survey (USGS) website. The extension of landslides with head scarp and displaced body was compared using Google Earth, and outliers were removed. We have used high-resolution satellite image taken after the major aftershock on May 12, 2015, for preparing landslide inventory. A total of 683 landslide polygons are prepared and verified in field as well as in Google Earth.

#### **4. Methodology**

 Training and testing sample data for landslide susceptibility analysis were generated using different cluster-based data selection techniques, and then we used one commonly used sampling technique (random selection) that to compare, assess, and justify the nobility of the clustering based techniques. Firstly, two datasets were generated from unsupervised clustering algorithms: k-means and expectationmaximization (EM) using Gaussian mixture models (GMM) in R environment. And then using the results of each cluster technique, we randomly extracted 70 and 30% out of total landslide inventory to be used later for susceptibility analysis model training and testing, respectively. To add more competitive value to the current research hypothesis, we prepared an additional inventory by using the common spatial random selection procedure [8, 23, 27] and that for comparison purposes and hypothesis proofing.

#### **4.1 Unsupervised clustering algorithms**

Unsupervised clustering algorithm, as a significant data mining technique, is a process of self-organization of data without any prior knowledge on the category to which a particular data belongs [28].

The main advantage of using unsupervised clustering algorithms is that it helps to acquire the better understanding and identification of similar patterns of data that can be placed into a specific category or class [17]. An attempt has been made to generate and compare seven different landslide susceptibility models from random-based distribution assumption and other assumption related to focus on the event pattern nature for analyzing the landslide distribution patterns and the consequences in the study area.

#### *4.1.1 K-mean clustering algorithm*

 In k-mean clustering, the dataset is partitioned into "k" number of groups or clusters as specified by the analyst [29] where the data within a cluster are as similar as possible [28]. First k points, known as centroids, for each cluster is defined where the centroids are the center of a cluster chosen randomly by k-mean algorithm. Each data is selected and its distance from the centroid is calculated. The data is assigned to that cluster where its distance to the centroid is shortest. K-mean algorithm attempts to minimize the sum of squares within the cluster as mentioned in Eq. (1):

$$J = \sum\_{j=1}^{k} \sum\_{i=1}^{n} \left\| \mathbf{x}\_{i}^{(j)} - \mathbf{c}\_{j} \right\|^{2} \tag{1}$$

 In the given formula, n is the number of objects to be clustered, k is the number of clusters, and ‖x( i j) − cj‖ 2 is the distance function that lies between x( i j) and centroid cj.

K-mean clustering is the reliable and fast algorithm for partitioning the larger datasets but is sensitive to the outliers [28].

#### *4.1.2 EM clustering using GMM*

 Gaussian mixture model is a parametric statistical model [30] in which each cluster to be modeled by Gaussian distribution is characterized by three parameters: mean vector (*μk*), covariance matrix (∑*k*), and an associated probability in the mixture [28, 31]. Expectation-minimization (EM) algorithm is widely used to determine the parameters of GMM for a given set of data [32]. For estimating the model parameters, each cluster is centered at *μk* means keeping the increased density for the data points near this mean, and the geometric feature (shape, volume, and orientation) for each cluster is determined using the covariance matrix ∑*k* [28]. Mclust implements maximum likelihood to fit different models using the covariance matrix parameterizations [33], and the best model is selected using Bayesian information criteria (BIC) [28, 33, 34]. The model with highest BIC score is selected as the best.

#### **4.2 Logistic regression**

Logistic regression (LR) builds multivariate regression relation between a dependent variable and different independent variables [35]. LR estimates the probability of an event occurring by applying maximum likelihood estimation after transforming the dependent variable into logit variable [35, 36]. The logistic model is given as

$$P = \frac{1}{1 + e^{-\mathfrak{a}}} \quad \cdots \tag{2}$$

where P is the probability of occurrence of landslide and varies from 0 to 1 on s-shaped curve and z varies from -∞ to +∞ and is described by the given equation

$$z = b\_0 + b\_1 \varkappa\_1 + b\_2 \varkappa\_2 + \dots + b\_n \varkappa\_n \tag{3}$$

where b0 is the intercept of the logistic model, n is the number of independent variables, b1 is the slope coefficient of the model, and x1 is the independent variables. Thus, the formed linear model from Eq. (3) is the logistic regression model that represents the presence or absence of landslides on the independent variables [37].

 The process for LR was carried out in R environment, and later the coefficients for each independent variable were imported to GIS to perform the final LSM using Eqs. (2) and (3).

#### **5. Results and discussion**

In k-mean clustering algorithm, the optimum number of clusters was obtained using the elbow method is 7. This method analyzes the total within-cluster sum of squares (WSS) as the function of number of clusters, and the bend indicates that additional clusters beyond it are insignificant and do not improve the total WSS [28]. A higher number of clusters reduce the error of grouping pattern, while in the current application, applying more than 10 clusters will not be practical and ends with a complex clusters pattern. To test the hypothesis of complex cluster pattern, **Figure 2** shows that in k-mean clustering, increasing the clusters to more than the recommended still reveals new groups.

In GMM, the clusters were modeled using Gaussian distribution having three parameters: mean vector, covariance matrix, and associated probability in the mixture. The result model was then fit using EM algorithm. Mclust is a contributed R package for model-based clustering, classification, and density estimation based on GMM [38]. Bayesian informative criterion (BIC) for 14 available model parameterizations was obtained (**Figure 3**). BIC is associated with the variability of the dataset, and the higher its value, the better the model is. On the x-axis, the number of the clusters obtained from the algorithm is present, and BIC value is on the y-axis. Based on model criteria, three top models were recommended (VVV, 7; EVV, 9; EVE, 8, with BIC values, −26628.78, −26647.97, −26651.12, respectively). Thus, VVV (ellipsoidal, varying volume, shape, and orientation) with seven components was selected by model automatically, and to enable the comparison applicability between the cluster techniques, we have to keep the number of clusters consistent (using 7 clusters). Moreover, taking in consideration the highest value of BIC, the number of cluster is seven which means adding further clusters will not increase the value of BIC or give the better model performance.

By separating the non-Gaussian distribution into a combination of finite number of Gaussian distribution on the basis of unsupervised learning without any a priori knowledge, GMM can be used to model an arbitrary distribution [39]. From 683 landslide data and 7 clusters, we got the subspace dimension d = 6 (**Figure 4a**). This result shows the maximal possible separation among the clusters. Further,

**Figure 2.**  *Clusters obtained from k-mean algorithm.* 

**Figure 3.**  *GMM BIC.* 

*Perceptual Structure and Pattern Modeling of Aftershock-Induced Landslides DOI: http://dx.doi.org/10.5772/intechopen.87836* 

**Figure 4.** 

*GMM classification and uncertainty obtained from EM algorithm: (a) GMM classification plot and (b) uncertainty boundary.* 

the measure of uncertainty for associated GMM classification was also obtained from EM algorithm in **Figure 4b**. The points that could not fit into any of the seven clusters are given in this plot. The larger symbols represent the more uncertain observations [28]. When the clusters intersect, the uncertainty regions lie on the overlaps [40]. In this case, the landslides have mostly occurred within in the swath zone between two largest shocks of 7.8 and 7.3 Mw, which seem to have created the overlapping of the points among the clustering causing uncertainties.

Generalized linear models are used to generalize the classic linear models based on the normal distribution [41]. ANOVA is used to analyze the deviance for one or more generalized linear model fits, and it is similar to analysis of variance [42]. Smaller variances in conditioning factors reflect the higher likelihood in occurrence of landslides. Though this test gives the likelihood of a factor, it fails to give the predictive power of the model.

 For all clusters elevation has negative relation, whereas curvature, slope, SPI, aspect, and ASR have positive relation with the landslides. In contrary, TWI shows negative relation to landslide occurrence for k-means and positive for the remaining clusters. Likewise, roughness is negative for all other clusters. The areas with low TWI value suffered from more landslides due to this earthquake [23]. The low value of TWI indicates lower-order drainage that increases the instability [23]. Moreover, the dominant portion of landslides triggered by the earthquake intersects with the channel [6] which means there is significant correlation between drainage system and landslides. In this regard, SPI can be considered as an important influential factor. In general, the higher slope and higher relief are considered to be positively correlated with the landslide density [43].

Slope and elevation clearly stand out as the most influencing factors for all datasets. SPI, other than in EM cluster, is an influencing factor in the six datasets. SPI is a significant conditioning factor for landslides in study area due to its positive logistic regression coefficients and Pr(<Chi) value less than 0.05 in ANOVA table. The study area is drained by Trishuli, a major river in central Nepal which might be the reason for the instability of slopes during intensive shaking. Roughness shows its influence on landslides in k-means. TWI, curvature, and aspect are less significant for all clusters.

In most of the validation test, slope has occurred as the most influencing factor. The study area is dominated by the Lesser Himalayan rocks, and the observations

made by [25] show the high concentration of landslides in the slopes composed of metamorphic rocks of the Lesser Himalayan Sequence. Comparatively, few landslides occurred in moderate slope, and most of them were confined in the confluence of the steep slopes [6].

This might not indicate the direct relation of the slope with the landslide because the area of landslide concentration was in the transition from moderate slope to the steeper slope. Also, for the steeper slopes and high relief farther from fault plane, the number of landslides decreased [6]. The low relief and slope along with other

**Figure 5.** 

*Variable importance for different datasets.* 

#### **Figure 6.**

*Prediction rates: (a) prediction rate for original inventory; (b) prediction rate for EM; and (c) prediction rate for k-means.* 

*Perceptual Structure and Pattern Modeling of Aftershock-Induced Landslides DOI: http://dx.doi.org/10.5772/intechopen.87836* 

factors like the earthquake source, the fluvial connectivity, and the proximity of landslide to the direct connection with streams/rivers are high for these coseismic landslides [6] which might be the reason for significant correlation of SPI and TWI with the landslides. Therefore, slope along with the earthquake source characteristics, SPI and TWI, made the impact in triggering the landslides (**Figure 5**).

 K-fold cross validation was performed to assess accuracy of the model for training as well as testing data. The result in testing data helps to compare the model performance for different datasets (**Figure 6**). In addition, ROC curve was prepared to analyze success and prediction rate. Eventually, EM using GMM cluster has registered the highest accuracy in predicting landslide occurrence with accuracy of 95.76% followed by k-means with 95.75%. The lowest accuracy is for random sampling of original inventory (93.00 %). The kappa index is also high for EM which makes it a better model. EM using GMM cluster has the highest prediction rate (0.95) (**Figure 7**).

#### **Figure 7.**  *Comparison of accuracy and kappa index for different datasets.*

**Figure 8.**  *Landslide susceptibility map using dataset obtained from GMM.* 

GMM modeling is one of the feasible tools for preparing landslide susceptibility model [44]. This is because GMM is feasible in the case of cluster covariance, unlike other algorithms. The LSM is prepared using the best model, that is, GMM (**Figure 8**).

The high susceptibility index can be observed in the transition of low and high elevation as well as near to the water body.

 The area has the metasedimentary rocks of the Lesser Himalaya, and PGA effects were higher for the Lesser Himalaya, and slope increases in the Higher Himalaya [6]. The rock strength for those near water bodies might be weak due to more erosivity than rock in steeper slopes. On this basis, we cannot take only slope or SPI as the cause of landslide occurrence. The physiographic transition, weathering degree based on the rock strength, effects of PGA with reference to the topography, and SPI and TWI all combined are the major influencing factor for the landslides triggered by this earthquake. In this earthquake, southward slope suffered more landslides than others [45]. In this regard, ASR can be considered to create similar impact like slope and aspect, as it is controlled by aspect and angle. Due to these controlling factors, roughness and curvature were not much significant.

#### **6. Conclusion**

 This research aimed to find and relate the best clustering patterns of earthquake-induced landslide locations of April 25 to May 12, 2015, and then use the resultant data pattern with topographic conditioning factors to produce LSM in north of Kathmandu Valley, Nepal. In this study, a novel sampling strategy was tested to increase the efficiency of the LSM rather than on merely random spatial classification. The obtained results allowed us to conclude: first, the applied unsupervised clustering sampling strategy shows more significant behavior over the random sampling, and EM using GMM cluster scored the highest accuracy in prediction of landslide occurrence, and second, all conditioning factors contribute to landslide occurrence in the study area, but SPI and TWI are significant. As for future work, the current findings will be tested and verified using another dataset with limited inventory; moreover, we encourage more research to be carried out to prove the model hypothesis in the case of rainfall-induced landslide susceptible areas.

#### **Acknowledgements**

The authors thank the Department of Mines and Geology (DMG), Nepal, for its kindness in making the geological data available.

*Perceptual Structure and Pattern Modeling of Aftershock-Induced Landslides DOI: http://dx.doi.org/10.5772/intechopen.87836* 

#### **Author details**

Omar F. Althuwaynee1,2, Badal Pokharel3,4, Ali Aydda5 and Sinan Jasim Hadi2 \*

1 Department of Geoinformation Engineering, Sejong University, Seoul, Republic of Korea

2 Department of Real Estate Development and Management, Ankara University, Ankara, Turkey

3 Department of Geology, Tri-Chandra Multiple Campus, Kathmandu, Nepal


\*Address all correspondence to: sinan.jasim@yahoo.com

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### **References**

[1] Dadson S, Hovius N, Chen H, Brian Dade W, Lin JC, Hsu ML, et al. Earthquake-driven increase in sediment delivery from an active mountain belt. Geology. 2004;**32**(8):733-736. DOI: 10.1130/G20639.1

[2] Hovius N, Meunier P, Lin CW, Hongey C, Chen YG, Dadson S, et al. Prolonged seismically induced erosion and the mass balance of a large earthquake. Earth and Planetary Science Letters. 2011;**304**(3-4):347-355. DOI: 10.1016/j.epsl.2011.02.005

[3] Chiaro G, Kiyota T, Pokhrel RM, Goda K, Katagiri T, Sharma K. Reconnaissance report on geotechnical and structural damage caused by the 2015 Gorkha earthquake, Nepal. Soils and Foundations. 2015;**55**(5):1030-1043. DOI: 10.1016/j. sandf.2015.09.006

[4] Shrestha AB, Bajracharya SR, Kargel JS, Khanal NR. The Impact of Nepal's Gorkha Earthquake-induced Geohazards. International Centre for Integrated Mountain Development (ICIMOD); 2015

[5] Martha T, Roy P, Mazumdar R, Govimarndharaj KB, Vinod Kumar K. Spatial characteristics of landslides triggered by the 2015 Mw 7.8 (Gorkha) and Mw 7.3 (Dolakha) earthquakes in Nepal. Landslides. 2017;**14**(2):697-704. DOI: 10.1007/s10346-016-0763-x

[6] Roback K, Clark MK, West AJ, Zekkos D, Li G, Gallen SF, et al. The size, distribution, and mobility of landslides caused by the 2015 Mw7.8 Gorkha earthquake, Nepal. Geomorphology. 2018;**301**:121-138. DOI: 10.1016/j.geomorph.2017.01.030

[7] Regmi A, Dhital M, Zhang J, Su LJ, Chen XQ. Landslide susceptibility assessment of the region affected by

the 25 April 2015 Gorkha earthquake of Nepal. Journal of Mountain Science. 2016;**13**(11):1941-1957. DOI: 10.1007/ s11629-015-3688-2

[8] Althuwaynee O, Pradhan B, Park HJ, Lee JH. A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. Catena. 2014;**114**:21-36. DOI: 10.1016/j. catena.2013.10.011

[9] Brenning A. Spatial prediction models for landslide hazards: Review, comparison and evaluation. Natural Hazards and Earth System Science. 2005;**5**(6):853-862

[10] Rodríguez CE, Bommer J, Chandler R. Earthquake-induced landslides: 1980-1997. Soil Dynamics and Earthquake Engineering. 1999;**18**(5):325-346. DOI: 10.1016/ S0267-7261(99)00012-3

[11] Keefer DK. Landslides caused by earthquake. Geological Society of America Bulletin. 1984;**95**(4):406-421. DOI: 10.1130/0016-7606(1984)95<406:LCBE >2.0.CO;2

[12] Keefer KD. Statistical analysis of an earthquake-induced landslide distribution—The 1989 Loma Prieta, California event. Engineering Geology. 2000;**58**(3-4):231-249. DOI: 10.1016/ S0013-7952(00)00037-5

[13] Keefer KD. Investigating landslides caused by earthquakes—A historical review. Surveys in Geophysics. 2002;**23**(6):473-510. DOI: 10.1023/A:1021274710840

[14] Jarman D. Large rock slope failures in the highlands of Scotland: *Perceptual Structure and Pattern Modeling of Aftershock-Induced Landslides DOI: http://dx.doi.org/10.5772/intechopen.87836* 

Characterisation, causes and spatial distribution. Engineering Geology. 2006;**83**(1-3):161-182. DOI: 10.1016/j. enggeo.2005.06.030

[15] Althuwaynee O, Pradhan B. An alternative technique for landslide inventory modeling based on spatial pattern characterization. In: Rahman A et al., editors. Geoinformation for Informed Decisions, Lecture Notes in Geoinformation and Cartography. Cham: Springer; 2014. pp. 35-48

[16] Marjanovic M, Bajat B, Kovacevic M. Landslide susceptibility assessment with machine learning algorithms. In: Proceedings 2009 International Conference on Intelligent Networking and Collaborative Systems 4-6 Nov. 2009. 2009. pp. 273-278

[17] Swamynathan M. Mastering Machine Learning with Python in Six Steps. Berkeley, CA; 2017

[18] Ayalew L, Yamagishi H. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology. 2005;**65**(1-2):15-31. DOI: 10.1016/j.geomorph.2004.06.010

[19] Chau KT, Chan JE. Regional bias of landslide data in generating susceptibility maps using logistic regression: Case of Hong Kong Island. Landslides. 2005;**2**(4):280-290. DOI: 10.1007/s10346-005-0024-x

[20] Devkota KC, Regmi AD, Pourghasemi HR, Yoshida K, Pradhan B, Chang Ryu I, et al. Landslide susceptibility mapping using certainty factor, index of entropy and logistic regression models in GIS and their comparison at Mugling-Narayanghat road section in Nepal Himalaya. Natural Hazards. 2013;**65**(1):135-165. DOI: 10.1007/s11069-012-0347-6

[21] Lee S, Pradhan B. Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides. 2007;**4**(1):33-41. DOI: 10.1007/s10346-006-0047-y

[22] Lombardo L, Mai PM. Presenting logistic regression-based landslide susceptibility results. Engineering Geology. 2018;**244**:14-24. DOI: 10.1016/j.enggeo.2018.07.019

[23] Shrestha S, Kang TS, Suwal KM. An ensemble model for co-seismic landslide susceptibility using GIS and random forest method. ISPRS International Journal of Geo-Information. 2017;**6**(11):365. DOI: 10.3390/ijgi6110365

[24] USGS, U. S. G. S. M7.8-36 km E of Khudi, Nepal [Internet]. 2015. Available from: http://earthquake.usgs.gov/ earthquakes/eventpage/us20002926

[25] Kargel JS, Leonard GJ, Shugar DH, Haritashya UK, et al. Geomorphic and geologic controls of geohazards induced by Nepal's 2015 Gorkha earthquake. Science. 2016;**351**(6269). DOI: 10.1126/ science.aac8353

[26] DoMa Geology. Geological Map of Central Nepal. Kathmandu, Nepal; 2011

[27] Althuwaynee O, Pradhan B, Lee S. Application of an evidential belief function model in landslide susceptibility mapping. Computers & Geosciences. 2012;**44**:120-135. DOI: 10.1016/j.cageo.2012.03.003

[28] Kassambara A. Practical Guide to Cluster Analysis in R: Unsupervised Machine Learning. USA: STHDA; 2017

[29] MacQueen J. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1(14). University of California Press; 1967. pp. 281-297

[30] Shental N, Bar Hillel A, Hertz T, Weinshall D. Computing Gaussian mixture models with EM using equivalence constraints. In: Sebastian T, Lawrence S, Bernhard S, editors. Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press; 2004

[31] Reynolds DA, Rose RC, Smith MJT. A mixture modeling approach to text-independent speaker ID. The Journal of the Acoustical Society of America. 1990;**87**(S1):S109-S109. DOI: 10.1121/1.2027823

[32] Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological). 1977:1-38. Available from: https://www.jstor.org/ stable/2984875?seq=1#metadata\_info\_ tab\_contents

 [33] Fraley C, Raftery A, Murphy T. Scrucca L:MCLUST Version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation. USA: Department of Statistics University of Washington; 2012. Available from: http://www.stat. washington.edu/mclust/

[34] Schwarz G. Estimating the dimension of a model. Annals of Statistics. 1978;**6**(2):461-464

[35] Atkinson P, Massari R. Generalised linear modelling of susceptibility to landsliding in the Central Apennines, Italy. Computers & Geosciences. 1998;**24**(4):373-385. DOI: 10.1016/ S0098-3004(97)00117-9

[36] Dai FC, Lee CF. Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology. 2002;**42**(3-4):213-228. DOI: 10.1016/ S0169-555X(01)00087-3

[37] Bai S, Wang J, Lu GN, Zhou PG, Hou SS, Xu SN. GIS-based and logistic regression for landslide susceptibility mapping of Zhongxian segment in the Three Gorge area, China. Geomorphology. 2010;**115**(1-2):23-31. DOI: 10.1016/j.geomorph.2009.09.025

[38] Scrucca L, Fop M, Murphy TB, Raftery A. Mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. The R Journal. 2016;**8**(1):289

[39] Qiu L, Yuan S, Mei H, Fang F. An improved Gaussian mixture model for damage propagation monitoring of an aircraft wing spar under changing structural boundary conditions. Sensors. 2016;**16**(3):291. DOI: 10.3390/ s16030291

[40] Fraley C, Raftery AE. Enhanced model-based clustering, density estimation, and discriminant analysis software: MCLUST. Journal of Classification. 2003;**20**(2):263-286. DOI: 10.1007/s00357-003-0015-3

[41] Lindsey KJ. Applying Generalized Linear Models. New York: Springer Science & Business Media; 2000

[42] Hastie TJ, Pregibon D. Generalized linear model. In: Chambers JM, Hastie TJ, editors. Statistical Models in S. London: Chapman and Hall; 1993. pp. 209-247

[43] Lu N, Godt J. Hillslope Hydrology and Stability. Cambridge University Press; 2013. pp. 1-453

[44] Timonin V, Bai SB, Wang J, Kanevski M, Pozdnukhov A. Landslide data analysis with Gaussian mixture model. In: International Congress on Environmental Modelling and Software, Barcelona, Spain. 2008

[45] Gnyawali K, Adhikari B. Spatial relations of earthquake induced

*Perceptual Structure and Pattern Modeling of Aftershock-Induced Landslides DOI: http://dx.doi.org/10.5772/intechopen.87836* 

landslides triggered by 2015 Gorkha Earthquake Mw=7.8. In: Mikoš M, Casagli N, Yin Y, Sassa K, editors. Advancing Culture of Living with Landslides. WLF 2017. Cham: Springer; 2017. pp. 85-93

**637**

**Chapter 51**

**Abstract**

Comparing the Accuracy of

on Hourly and Daily Mean

Outdoor Temperature

*Merve Kuru and Gülben Çalış*

the energy prediction models.

**1. Introduction**

commercial buildings, regression analysis

Energy Prediction Models Based

It is well known that outdoor temperature highly effects energy consumption in buildings. Accordingly, outdoor temperature is an important parameter for constructing energy prediction models, however; the effect of using data with different time-intervals on the accuracy of models needs to be investigated. This chapter aims at investigating the impact of hourly and daily disaggregated data on the performance of energy models. Data were collected between January and December, 2015 from a commercial building located in Saint-Quentin-en Yveline, France. The daily and hourly HVAC electricity consumption were modeled based on daily mean and hourly outdoor temperature, respectively. The results show that the correlation between daily mean outdoor temperature and daily HVAC electricity consumption is stronger compared to the model based on hourly disaggregated data. Moreover, the correlation coefficient between daily HVAC electricity consumption and daily mean outdoor temperature was obtained as 0.82, whereas it was 0.70 between hourly HVAC electricity consumption and hourly outdoor temperature. The results indicate that hourly disaggregated data does not necessarily improve the accuracy of

**Keywords:** energy prediction models, outdoor temperature, disaggregated data,

Electrical energy used for heating and cooling in buildings, businesses, and industry consume around half of the energy used in the European Union [1]. Therefore, understanding and predicting the energy consumption of heating, ventilating, and air-conditioning (HVAC) systems are beneficial for energy engineers, electricity providers, end-users, and policy makers in addressing energy sustainability challenges, the expansion of distribution networks, energy pricing, and policy development. In order to have accurate prediction models, the most influencing factors have to be included in the models [2]. Many researchers state that the outdoor temperature is an important factor that effects the HVAC energy consumption in buildings. Wu et al. [3] indicated that the outdoor temperature has a significant effect (R value = 0.905), whereas solar radiation has little somewhat

#### **Chapter 51**
