Preface

Since the times of the earliest humans to populate the earth, we have gradually tried to understand and control the world around us. In an attempt to understand such phenomena, humans started to make predictions to various degrees. For instance, humans started to make predictions about the planets' motions, eclipses, rainfall cycles, or the periodicity of certain diseases. However, in the last few decades, the complexity of carrying out predictions has exceeded our abilities to predict.

Fortunately, the dawn of electronic computers is profoundly increasing our abilities to predict nature. However, the problems we are facing now are far more complex than the problems we faced a century ago.

The ability of these machines to demonstrate advanced cognitive skills in making decisions, learning, perceiving the environment, predicting certain behaviors, or processing written or spoken languages, among other skills, makes this discipline of paramount importance in today's world.

I hope that this work is of interest to students and researchers alike, as I did my best to comprise quality research contributions with several different applications.

#### **Marco Antonio Aceves Fernandez**

Universidad Autonoma de Queretaro, Queretaro, Mexico Topic Editor: Machine Learning and Data Mining

Artificial Intelligence has been a very important topic over the last period due to its use in companies, business, and real life because more and more data is generated each time. The automatic interpretation of big data is based on the extraction of patterns, and the field of Artificial Intelligence has a great role in extracting information and making decisions.

In particular, this book presents some of the contemporary and relevant topics in Artificial Intelligence, providing machine learning approaches, deep learning approaches, knowledge-based recognition, case studies and emerging technologies, and applications using Artificial Intelligence. Innovative advances from this field have been included in order to show the reader the newly researched and developed approaches.

> **Carlos M. Travieso-Gonzalez** University of Las Palmas de Gran Canaria, Gran Canaria, Spain Topic Editor: Applied Intelligence

Section 1

## Machine Learning and Data Mining

### **Chapter 1**

## Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women in Bangladesh: A Hierarchical Machine Learning Classification Approach

*Iqramul Haq, Md. Ismail Hossain, Md. Moshiur Rahman, Md. Injamul Haq Methun, Ashis Talukder, Md. Jakaria Habib and Md. Sanwar Hossain*

#### **Abstract**

Contraception enables women to exercise their human right to choose the number and spacing of their children. The present study identified the best model selection procedure and predicted contraceptive practice among women aged 15–49 years in the context of Bangladesh. The required information was collected through a wellknown nationally representative secondary dataset, the Bangladesh Demographic and Health Survey (BDHS), 2014. To identify the best model, we applied a hierarchical logistic regression classifier in the machine learning process. Seven well-known ML algorithms, such as logistic regression (LR), random forest (RF), naïve Bayes (NB), least absolute shrinkage and selection operation (LASSO), classification trees (CT), AdaBoost, and neural network (NN) were applied to predict contraceptive practice. The validity computation findings showed that the highest accuracy of 79.34% was achieved by the NN method. According to the values obtained from the ROC, NN (AUC = 86.90%) is considered the best method for this study. Moreover, NN (Cohen's kappa statistic = 0.5626) shows the most extreme discriminative ability. From our research, we suggest using the artificial neural network technique to predict contraceptive use among Bangladeshi women. Our results can help researchers when trying to predict contraceptive practice.

**Keywords:** contraceptive, machine learning algorithms, LASSO, NN, hierarchical

#### **1. Introduction**

Family planning is indispensable in facilitating the prosperity and autonomy of women, their families, and their communities. Contraceptive choices, maternal and newborn health care, sexually transmitted infections, and sexual health are the main concepts of reproductive health [1]. The states agreed in 2001 that among the Millennium Development Goals (MDGs), target 5b was called for by 2015 for universal access to reproductive health. Global contraceptive prevalence is 64% (41% in lowincome countries) and the global unmet need for family planning is 12% (22% in lowincome countries) as reported at the end of the MDGs period. Sustainable Development Goals (SDGs) targets 3.7 and 5.6 call for universal access to sexual and reproductive health care services and sexual and sexual and reproductive health and reproductive rights, respectively [2, 3].

It has been calculated that maternal mortality has been reduced globally by 30% by the increase in contraceptive use [4]. Unintended pregnancies, pregnancy spacing, and reducing high-risk pregnancies are the consequences of contraceptive use [5–7]. Current studies show that every year, contraceptive use could reduce nearly 230 million births by stopping unwanted pregnancies [8]. As a result, the use of contraception improves the health of women and their children [6, 9]. However, the prevalence of contraceptive practice varied between 11.3% and 72.1% in different countries, namely Mozambique, 11.3%, Ghana, 21.5%, Bangladesh (modern method), 54.0%, and Sweden, 72.1% [9–12].

Previous research has shown that various variables are significantly associated with contraceptive use, such as maternal age, maternal and husband's educational level, wealth status, maternal age at first marriage, and so on [11, 13]. Through the promotion of family planning, appropriate diagnostics, and interventions, the prevalence of contraceptive use is increasing. Popular statistical methods (binary logistic regression) have been applied to determine important indicators of contraceptive use among women. But the main goal is to predict contraceptive practice among women aged between 15 and 49 in Bangladesh. Machine learning is a scientific method that can build models for prediction purposes. According to the research, traditional statistical procedures were shown to be ineffective in this form of modeling. Machine learning approaches have long been shown to be more successful and promising in handling a variety of complicated and nonlinear issues [14–16].

However, not many studies have explored machine learning methods to develop predictive models for studying contraceptive methods. Therefore, various well-known machine learning algorithms were applied to predict contraceptive practices among 15– 49-year-old women in Bangladesh in this study. Before prediction, we applied a Hierarchical Logistic Regression classifier in machine learning approaches that were used to select potential risk factors associated with the contraceptive practice of women. To our best knowledge, the originality of the study is that it is almost new in the field of machine learning classifier approach in the contraceptive practice of Bangladesh context, for the first time using such methods, which will assist future data scientists.

#### **2. Methods**

#### **2.1 Data source**

In this study, the necessary information has been extracted from a representative secondary national data set, the Bangladesh Demographic and Health Survey (BDHS), *Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women… DOI: http://dx.doi.org/10.5772/intechopen.103187*

2014. This survey was carried out through a joint effort of the National Institute of Population Research and Training (Bangladesh), Mitra Associates (Bangladesh), and ICF International (USA).

The entire list of enumeration areas (EAs) that encompasses the entire country, provided by the Bangladesh Bureau of Statistics (BBS) for the 2011 population and housing census of the People's Republic of Bangladesh, served as the sampling frame for the 2014 BDHS. An EA was a geographical zone with an average of 120 households. The survey uses a two-stage stratified sampling process that includes information on the EA region, residence (urban or rural), and the number of residential households counted. Viable interviews were conducted in 98% of the selected households (out of 17,989 total). For this study, 17,863 ever-married women aged 15–49 years were included in the final analysis. Note that to learn more about the detailed sampling procedure of the 2014 BDHS, see the final published report of the survey [17].

#### **2.2 Dependent variable**

Since the main purpose of this study was to predict contraception practice among women aged 15–49 years, the response variable was "current contraception use", which was classified as "Yes or No". If the respondent currently utilizes a contraceptive method, she falls into the "Yes" group, otherwise, she falls into the "No" group.

#### **2.3 Independent variables**

Besides the response variable, a set of 21 demographic and socioeconomic risk factors were included in the analysis, which was associated with contraceptive practice and considered predictor variables. Several studies found that demographic and socioeconomic characteristics such as current age, division, religion, residence, respondent's working status, FP media exposure, age at first marriage, currently breastfeeding, wealth status, women's education, husband's education, child ever born, number of living children, ideal number of children, fertility preference, marital status, and decision making for using contraception are potential risk factors that determine contraception practice among women [10, 11, 18–24]. The list of independent variables and their measures are presented in **Table 1**.



#### **Table 1.**

*Description of independent variables.*

#### **2.4 Statistical analysis**

The frequency distribution was used to describe the background characteristics of the respondents. In this study, we developed a Hierarchical Logistic Regression classifier in machine learning approaches that were used to select potential risk factors related to the contraceptive practice of women in Bangladesh by using the largest value of AUC (*p* < 0.05). One of the procedures for enhancing the performance of machine learning is hierarchical learning, which is inspired by human learning [25]. The DeLong test is an extensively used test to compare the difference between two AUCs [26]. That model was significant, with the largest AUC value, and was considered the final model in this analysis. The steps are depicted in **Figure 1**.

To meet the objective of the study, we fitted numerous numbers of model where the full model is denoted by *Mi* (where *i* ¼ 21) using Hierarchical Logistic Regression classifier in the Machine Learning Process. The steps are described below:

**Step 1:** Consider *j*th model defined as *Mj* ð Þ *j* ¼ 1, 2, 3, … , *i* which is consist of *j* predictors. Thus, the initial model was named Model1 and defined as *M***<sup>1</sup>** where ð Þ *j* ¼ 1 , then fit the model *M***<sup>1</sup>** by using machine learning logistic classifier (MLLC).

**Step 2:** Adding a variable in the previous model and defined as *Mj*þ**<sup>1</sup>** and again also fit model *Mj*þ**<sup>1</sup>** by using MLLC approach.

**Step 3:** Identify the best model by using Delong's Test, which is considered the largest area under the curve at a 5% level of significance.

**Step 4:** If *Mj*þ**<sup>1</sup> >** *Mj* based on AUC at 5% level of significance, then Model *Mj*þ**<sup>1</sup>** has a significantly different AUC from Model *Mj* with *p* < 0.05. In this case, the best model was considered as *Mj*þ**<sup>1</sup>**, otherwise the model was *Mj*.

**Step 5:** The process is repeated successively until the desired number of risk factors/features are identified.

*Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women… DOI: http://dx.doi.org/10.5772/intechopen.103187*

**Figure 1.** *Flow diagram for hierarchical logistic regression classifier in the machine learning process.*

After selecting the final model, we applied the 7 most popular machine learning classifiers to predict contraceptive practice among ever-married women aged 15–49 in Bangladesh. In this study, we used seven different popular ML algorithms (Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), Least Absolute Shrinkage and Selection Operation (LASSO), Classification Trees (CT), AdaBoost, and Neural Network (NN)). A detailed description of the algorithms used is available in the literature [27–32].

The Statistical Package for Social Science (SPSS) version 25 and R version 4.0.0 software were used for data management and analysis.

#### **2.5 Proposed approach**

Data from ever-married women aged 15–49 was used in this study. Only evermarried women aged 15–49 was considered for the final analysis based on this criterion. Then, apply data preparation methods; for example, first find out missing data from the overall dataset. It is well known that the main drawbacks of missing information in a dataset are the reduced statistical power (because it reduces the number of samples n, the estimates will have larger standard errors). The main disadvantages of missing data in a dataset are statistical power reductions, which are well-known (because it reduces the number of samples n, the estimates will have larger standard errors). There are numerous imputation methods for imputing missing values nowadays, including direct deletion, mode imputation, hot-deck imputation, and so on [33]. A lower threshold of 5% missingness has been suggested in the literature [34]. We utilized the direct deletion method because this study had a low rate of missing values, which means we removed all missing values from the data set and conducted the analysis using the entire data set. The next step after missing value processing is to normalize/standardize the variables, which is useful when the data distribution is unknown. As a result, normalization is not required for any machine learning approach, especially in categorical data. Finally, all machine learning classifiers included in this study were performed on 70% of the respondents in each group (training data set, *n* = 12,504) and acquired by the remaining 30% (test data set, *n* = 5358). All models were trained to support 10-fold crossvalidation. On the training set, we performed 10-fold cross-validation, and on the testing set, we estimated performance. The results of the development of the seven machine learning classifiers are depicted in **Figure 2**.

#### **2.6 Model evaluation**

We used the following criteria to evaluate the ML algorithms' performance: confusion matrix, receiver operating characteristic (ROC), and the area under that curve (AUC). Generally, a confusion matrix has four possible prediction outcomes, such as TR = true positives, TN = true negatives, FP = false positives, and FN = false negatives. Several performance measures, including accuracy, precision, recall, and the F1 score, are usually calculated using these four potential outcomes to assess the classifier. The ROC curves have been calculated by utilizing the predicted outcomes as well as the true outcomes. To examine the ML algorithms' discriminating powers, the AUC of the ROC has been averaged for the test data sets [35]. Theoretically, the AUC should be between 0 and 1, with 1 being the most extreme value for an ideal classifier. Since the usual lower bound for random classification is 0.5, an AUC greater than 0.5 has at least some capacity to separate between cases and non-cases [36]. In addition to these measures, we also used Cohen's kappa statistic, which is a better measure to examine the agreement between two raters. It is calculated by utilizing the predicted and the actual classifications in a data set. The value of Cohen's kappa statistic is 1*:*

#### **3. Results**

#### **3.1 Sociodemographic characteristics of women**

**Table 2** shows the percentage distribution of women according to the selected socio-demographic characteristics of Bangladesh. The majority of women (19%) are between the ages of 25 and 29. The majority of them (35%) are from the Dhaka division, Muslims (90%), living in male-headed households (89%), and in the rural areas (72%). In terms of working status, slightly more than two-thirds (67%) of women are not currently involved in any kind of income-generating activities, and 80% of them do not have any media exposure. The majority of women (77% of them) *Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women… DOI: http://dx.doi.org/10.5772/intechopen.103187*

**Figure 2.** *Flow chart of the development of the seven machine learning classifiers.*


#### *Artificial Intelligence Annual Volume 2022*



#### *Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women… DOI: http://dx.doi.org/10.5772/intechopen.103187*


**Table 2.**

*Percentage distribution of ever- married women age between 15 and 49 by selected socio-demographic characteristics.*

married before their 18th birthday, and 79% of them were not breastfeeding their children at the time of the survey. The findings also show that around 96 to 97% of women are not amenorrheic (96%) or abstaining (97%). In terms of wealth status, 42% of the women were from rich families. Approximately half of the women (46%) had secondary or higher education. The majority of the husbands (44%) had a secondary or higher level of education. The number of women who knew about sexually transmitted infections (STIs) was found to be 67%. The majority of women (46%) have had 2–3 children, while 53% have 1–2 living children. The ideal number of children was 2–3 (86%) and more than half (57%) of the women were not interested in having another child. The vast majority of women are currently married (94%), and only 9% can make the decision to use a contraception method on their own. Regarding contraception use, according to the 2014 BDHS, 58.9% of women used it.

#### **3.2 Create model**

In the initial step of the analysis, we applied hierarchal logistic regression to select the final model. Here, each variable was considered as one model. We added a potential risk factor (variable) to the previous model that was considered a new model in this analysis (**Table 3**). For example, in the initial model *M*<sup>1</sup> we considered (arbitrary) respondent age, *M*1*+Division* was considered as *M*2. Similarly, we consider another model by adding a variable to the previous model until the desired number of models is reached in this analysis. The details are presented in **Table 3**.

#### **3.3 Best model selection**

All models were statistically significant (*p* < 0.001) except models *M*<sup>7</sup> and *M*12. Based on the Delong test, we excluded two variables (FP media exposure and wealth status) from our final analysis. The remaining significant variables were considered risk factors for predicting contraceptive practice among women aged 15–49 years in Bangladesh. From **Table 4**, Model *M*<sup>21</sup> was the final model for analysis, and selected risk factors were also used for the final analysis. The details of the best model selection procedure are given in **Table 4**.

#### **3.4 Performance parameter of machine learning algorithms**

This study used seven different machine algorithms to classify contraceptive practices among married women both training and an experimental/test dataset.

*Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women… DOI: http://dx.doi.org/10.5772/intechopen.103187*


#### **Table 3.**

*Create a model-based hierarchical approach.*


**Table 4.** *Best model selection based on Delong's test.* Performance parameters (such as accuracy, precision, recall, F1, specificity, and AUC value) were used to compare the predictive performance of these algorithms. In addition, Cohen Kappa's statistical information was used to determine the discriminant accuracy of the algorithm. The prediction results with performance parameters for each algorithm are shown in **Table 5** and **Figure 3**.

**Table 5** shows that the logistic regression classifier has an accuracy of 78.52%. The precision and recall of the fitted model were 81.23% and 82.39%, respectively, while the F1 score was 81.81%. The area under the curve (AUC) was calculated to be 86.57%. The prediction performance result of a random forest was displayed with an accuracy of 77.57%. Here, the precision, recall, and F1 score of the random forest classifier were 73.82%, 85.35%, and 81.99%, respectively. The AUC, in this case, was 84.07%. The final accuracy of the naïve Bayes classifier was 76.56%, with a precision of 75.73% and a recall of 88.32%. The F1 score and the AUC value, in this case, were 81.54% and 84.17%, respectively. Using Least Absolute Shrinkage and Selection Operator (LASSO) analysis, the accuracy in the test data set was seen as 79.08% with precession and recall of 79.39% and 86.85% respectively, and the F1 score was 82.96%. According


#### **Table 5.**

*Performance evaluation for seven ML algorithms (test data set).*

**Figure 3.** *Area under curve of all seven machine learning classifiers.*

*Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women… DOI: http://dx.doi.org/10.5772/intechopen.103187*

**Figure 4.** *Violin plots of the 10-fold cross-validation.*

to the test observation results, the classification tree method showed 78.57% accuracy in predicting contraceptive practice among married women, with a precession of 78.16%, a recall of 88.06%, an F1 score of 82.81%, and an AUC value of 85.59%. For AdaBoost, these values are 78.05% (accuracy), 80.20% (precession), 84.88% (recall), 82.10% (F1 score) and 86.15% (AUC). Finally, we used an artificial neural network and obtained an accuracy of 79.34%. Here, other parameters such as precession, recall, F1 score, and AUC are 78.71%, 88.76%, 83.44%, and 86.90% respectively. Among the seven classifiers, we obtained the best performance from NN in terms of both accuracy and AUC. Cohen's kappa value is 0.5626.

This violin plot shows the relationship of seven classifiers to accuracy. The shaded areas detail the distribution of the data in each classifier. **Figure 4** shows that NN provided the highest mean accuracy, followed by LASSO and AdaBoost. Unlike the boxplot, the entire distribution of the 10-fold accuracy can be visualized in this violin plot (**Figure 4**).

#### **4. Discussion**

This is the very first study that uses a hierarchical logistic classifier in a machine learning approach. Then the predictive performance of the hierarchical logistic classifier was compared with the other six machine learning algorithms' predictive power. In this study, the use of contraception among ever-married women in Bangladesh has been predicted using sociodemographic factors. This study can provide policymakers and academics with a starting point to examine key outlines in a larger framework and raise noteworthy interventions.

The study found that the prevalence of contraception was almost 59% in Bangladesh. The prevalence rate of contraceptives in India is 54%, while the rates were 47%, 34%, and 65%, respectively, for Nepal, Pakistan, and Sri Lanka [37, 38]. As the government of Bangladesh is committed to the London Summit on Family Planning to improve contraceptive access and use among impoverished people in both urban and rural areas [39], the findings of this study will provide grounding direction for the increase in the prevalence of contraception.

In this study, we used hierarchical LR, RF, NB, LASSO, CT, AdaBoost, and NN machine learning techniques to predict contraceptive practice among ever-married women in Bangladesh. The current analysis was to evaluate which performed better based on the accurate prediction rate of contraceptive use for 2014, BDHS data sets. Moreover, there was no evidence of scientific study that used a hierarchical logistic classifier and several supervised learning. In this study, 70% of the respondents were used for model tuning purposes, and the remaining 30% were used to check model performance, for the model tuning was performed using 10-fold crossvalidation on the training dataset. The researcher observed that cross-validation is most commonly used to evaluate model performance [40]. The prediction of contraceptive use was measured by performance parameters (such as accuracy, precision, recall, F1, and AUC value) compared to the performance of seven different machine learning classifiers in this analysis. Cohen's kappa, the proportion of predicted to actual classification in the dataset, is used to assimilate model perfection. Among the used models, the Neural Network outperformed other models with an accuracy of 79.34%. Additionally, in terms of Cohen kappa, the result of this analysis also highlighted that the Neural Network provides the best predictive performance (Cohen's *κ* = 0.5626). This indicates Neural Networks have achieved better performance than other LR, RF, Lasso Regression, NB, CT, and AdaBoost. Hailemariam et al. proposed a J48 decision tree that performed better than Naïve Bayes to predict contraceptive practice in Ethiopian women [41]. However, Hailemariam et al. have not used the neural network in their study [41]. In a data mining study in India, the CART model produces pretty satisfactory results for finding the predictors of contraception use among married women [42]. However, Vaz and his team member also found that the Random Forest model was the most accurate model for predicting women's fertile periods [43]. Machine learning algorithms can be quite helpful in predicting infertility in women, according to a study conducted in Nigeria [44].

#### **5. Conclusions**

In this paper, we investigate the hierarchical logistic regression classifier in machine learning approaches to identify potential risk factors related to contraceptive practices of women in Bangladesh. In summary, we conclude that all of the selected covariates were significant determinants for contraceptive practice except FP Mass media exposure and wealth status according to the hierarchical logistic regression classifier in machine learning approaches based on the Delong test. Here, we compared seven supervised machine learning algorithms to predict contraceptive practice among ever-married women aged between 15 and 49 years in Bangladesh. The NN model has exhibited the best results based on the performance parameters, having demonstrated an accuracy of 79.34%, a precision of 78.71%, a recall of 88.76%, an F1 score of 83.44%, and an AUC value of 86.90. Among the seven algorithms, the NN model performs the best in terms of accuracy, Cohen's kappa statistic, and area under the curve (AUC). This study recommends the use of the NN model and policymakers should pay attention to continuing this study in the future.

*Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women… DOI: http://dx.doi.org/10.5772/intechopen.103187*

#### **Acknowledgements**

A special thank goes to the Demographic Health Surveys for enabling us to use Bangladesh Demographic Health Survey data for our study from https://dhsprogram. com/data/.

### **Funding**

This study did not receive funding.

### **Conflicts of interest**

The authors declare that they are not competing of interest.

### **Data availability**

This study was analyzed using secondary data, which were available at "https://dhsprogram.com/data/".

### **Author details**

```
Iqramul Haq1
             *, Md. Ismail Hossain2
                                   , Md. Moshiur Rahman3
                                                          ,
Md. Injamul Haq Methun4
                          , Ashis Talukder5
                                          , Md. Jakaria Habib2
and Md. Sanwar Hossain2
```
1 Department of Agricultural Statistics, Sher-e-Bangla Agricultural University, Dhaka, Bangladesh

2 Department of Statistics, Jagannath University, Dhaka, Bangladesh,

3 Department of Pharmacology and Toxicology, Sher-e-Bangla Agricultural University, Dhaka, Bangladesh

4 Department of Statistics, Tejgaon College, Dhaka, Bangladesh

5 Statistics Discipline, Khulna University, Khulna, Bangladesh

\*Address all correspondence to: iqramul.haq@sau.edu.bd

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] United Nations Population Fund. Sexual and Reproductive Health for all: Reducing Poverty, Advancing Development and Protecting Human Rights. New York, New York, United States: United Nations Population Fund; 2010

[2] United Nations. Transforming our World: The 2030 Agenda for Sustainable Development United Nations. 2015. Available from: https://sustainabledeve lopment.un.org/content/documents/ 21252030%20Agenda%20for%20Susta inable%20Development%20web.pdf

[3] World Health Organization. Health-Related Millennium Development Goals. 2015 . Available from: https://www.who. int/gho/publications/world\_health\_ statistics/EN\_WHS2015\_Part1.pdf?ua=1

[4] Cleland J, Conde-Agudelo A, Peterson H, Ross J, Tsui A. Contraception and health. The Lancet. 2012;**380**(9837):149-156. DOI: 10.1016/ s0140-6736(12)60609-6

[5] Ahmed S, Li Q, Liu L, Tsui AO. Maternal deaths averted by contraceptive use: An analysis of 172 countries. The Lancet. 2012;**80**(9837): 111-125. DOI: 10.1016/S0140-6736(12) 60478-4

[6] Brunner Huber LR, Smith K, Sha W, Vick T. Interbirth interval and pregnancy complications and outcomes: Findings from the pregnancy risk assessment monitoring system. Journal of Midwifery & Women's Health. 2018; **63**(4):436-445. DOI: 10.1111/jmwh. 12745

[7] Darroch J. Singh S. Estimating Unintended Pregnancies Averted from Couple-Years of Protection (CYP). 2011. Available from: https://www.guttmacher. org/sites/default/files/page\_files/ guttmacher-cyp-memo.pdf

[8] Liu L, Becker S, Tsui A, Ahmed S. Three methods of estimating births averted nationally by contraception. Population Studies. 2008;**62**(2):191-210. DOI: 10.1080/00324720801897796

[9] Yazdkhasti M, Pourreza A, Pirak A, Abdi F. Unintended pregnancy and its adverse social and economic consequences on health system: A narrative review article. Iranian Journal of Public Health. 2015;**44**(1):12-21

[10] Aviisah PA, Dery S, Atsu BK, Yawson A, Alotaibi RM, Rezk HR, et al. Modern contraceptive use among women of reproductive age in Ghana: Analysis of the 2003–2014 Ghana demographic and health surveys. BMC Women's Health. 2018;**18**(1):1-10. DOI: 10.1186/s12905-018-0634-9

[11] Haq I, Sakib S, Talukder A. Sociodemographic factors on contraceptive use among ever-married women of reproductive age: Evidence from three demographic and health surveys in Bangladesh. Medical Science. 2017;**5**(4):31. DOI: 10.3390/medsci 5040031

[12] Kopp Kallner H, Thunell L, Brynhildsen J, Lindeberg M, Gemzell Danielsson K. Use of contraception and attitudes towards contraceptive use in Swedish women—A Nationwide survey. PLoS One. 2015;**10**(5):e0125990. DOI: 10.1371/journal.pone.0125990

[13] Mandiwa C, Namondwe B, Makwinja A, Zamawe C. Factors associated with contraceptive use among young women in Malawi: Analysis of the 2015–16 Malawi demographic and health survey data. Contraception and

*Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women… DOI: http://dx.doi.org/10.5772/intechopen.103187*

Reproductive Medicine. 2018;**3**(1):12-19. DOI: 10.1186/s40834-018-0065-x

[14] Moazenzadeh R, Mohammadi B, Shamshirband S, Chau K. Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran. Engineering Applications of Computational Fluid Mechanics. 2018;**12**(1):584-597. DOI: 10.1080/ 19942060.2018.1482476

[15] Mousa SR, Bakhit PR, Osman OA, Ishak S. A comparative analysis of treebased ensemble methods for detecting imminent lane change maneuvers in connected vehicle environments. Transportation Research Record: Journal of the Transportation Research Board. 2018;**2672**(42):268-279. DOI: 10.1177/ 0361198118780204

[16] Zhang Y, Haghani A. A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies. 2015;**58**: 308-324. DOI: 10.1016/j.trc.2015.02.019

[17] NIPORT, Mitra and Associates, & ICF International. Bangladesh Demographic and Health Survey 2014. Bangladesh: NIPORT, Mitra and Associates, and ICF International; 2016

[18] Johnson EO. Determinants of modern contraceptive uptake among Nigerian women: Evidence from the National Demographic and health survey. African Journal of Reproductive Health. 2017;**21**(3):89-95. DOI: 10.29063/ajrh2017/v21i3.8

[19] Gebre MN, Edossa ZK. Modern contraceptive utilization and associated factors among reproductive-age women in Ethiopia: Evidence from 2016 Ethiopia demographic and health survey. BMC Women's Health. 2020;**20**(1):1-14. DOI: 10.1186/s12905-020-00923-9

[20] Islam AZ, Mondal MNI, Khatun ML, Rahman MM, Islam MR, Mostofa MG, et al. Prevalence and determinants of contraceptive use among employed and unemployed women in Bangladesh. International Journal of MCH and AIDS. 2016;**5**(2):92-102. DOI: 10.21106/ijma.83

[21] Kidayi PL, Msuya S, Todd J, Mtuya CC, Mtuy T, Mahande MJ. Determinants of modern contraceptive use among women of reproductive age in Tanzania: Evidence from Tanzania demographic and health survey data. Advances in Sexual Medicine. 2015; **05**(03):43-52. DOI: 10.4236/asm.2015. 53006

[22] Solanke BL. Factors influencing contraceptive use and non-use among women of advanced reproductive age in Nigeria. Journal of Health, Population and Nutrition. 2017;**36**(1):1-14. DOI: 10.1186/s41043-016-0077-6

[23] Sridhar A, Salcedo J. Optimizing maternal and neonatal outcomes with postpartum contraception: Impact on breastfeeding and birth spacing. Maternal Health, Neonatology and Perinatology. 2017;**3**(1):1-10. DOI: 10.1186/s40748-016-0040-y

[24] Vu LTH, Oh J, Bui QT-T, Le AT-K. Use of modern contraceptives among married women in Vietnam: A multilevel analysis using the multiple indicator cluster survey (2011) and the Vietnam population and housing census (2009). Global Health Action. 2016;**9**(1):29574. DOI: 10.3402/gha.v9.29574

[25] Zhang L, Zhang B. Hierarchical machine learning–a learning methodology inspired by human intelligence. In: International Conference on Rough Sets and Knowledge Technology. Berlin, Heidelberg: Springer; 2006. pp. 28-30. DOI: 10.1007/ 11795131\_3

[26] DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics. 1988;**44**(3):837. DOI: 10.2307/2531595

[27] Anuse A, Vyas V. A novel training algorithm for convolutional neural network. Complex & Intelligent Systems. 2016;**2**(3):221-234. DOI: 10.1007/s40747-016-0024-6

[28] Buntine W. Learning classification trees. Statistics and Computing. 1992; **2**(2):63-73. DOI: 10.1007/bf01889584

[29] Jang W, Lee JK, Lee J, Han SH. Naïve Bayesian classifier for selecting good/bad projects during the early stage of international construction bidding decisions. Mathematical Problems in Engineering. 2015;**2015**:1-12. DOI: 10.1155/2015/830781

[30] Talukder A, Ahammed B. Machine learning algorithms for predicting malnutrition among under-five children in Bangladesh. Nutrition. 2020;**78**: 110861. DOI: 10.1016/j.nut.2020.110861

[31] Vasquez MM, Hu C, Roe DJ, Chen Z, Halonen M, Guerra S. Least absolute shrinkage and selection operator type methods for the identification of serum biomarkers of overweight and obesity: Simulation and application. BMC Medical Research Methodology. 2016;**16**(1): 154-172. DOI: 10.1186/s12874-016-0254-8

[32] Wu P, Zhao H. Some analysis and research of the AdaBoost algorithm. In: International Conference on Intelligent Computing and Information Science. Berlin, Heidelberg: Springer; 2011. pp. 1-5

[33] Xu X, Xia L, Zhang Q, Wu S, Wu M, Liu H. The ability of different imputation methods for missing values

in mental measurement questionnaires. BMC Medical Research Methodology. 2020;**20**(1):1-9. DOI: 10.1186/s12874- 020-00932-0

[34] Madley-Dowd P, Hughes R, Tilling K, Heron J. The proportion of missing data should not be used to guide decisions on multiple imputation. Journal of Clinical Epidemiology. 2019; **110**:63-73. DOI: 10.1016/j.jclinepi.2019. 02.016

[35] Liu B, Fang L, Liu F, Wang X, Chou K-C. iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach. Journal of Biomolecular Structure & Dynamics. 2016;**34**(1):223-235. DOI: 10.1080/07391102.2015.1014422

[36] Liaw A, Wiener M. Classification and regression by random Forest. R News. 2002;**2**(3):18-22. Available from: https://cogns.northwestern.edu/cbmg/ LiawAndWiener2002.pdf

[37] Family Planning. India: Commitment Maker since 2012. 2018. Available from: https://www.familypla nning2020.org/india

[38] The World Bank. Contraceptive Prevalence, Any Methods (% of Women Ages 15–49) Data. 2019. Available from: https://data.worldbank.org/indicator/SP. DYN.CONU.ZS%20(2019)

[39] Huda FA, Robertson Y, Chowdhuri S, Sarker BK, Reichenbach L, Somrongthong R. Contraceptive practices among married women of reproductive age in Bangladesh: A review of the evidence. Reproductive Health. 2017;**14**(1):69-77. DOI: 10.1186/ s12978-017-0333-2

[40] Cawley GC, Talbot NLC. Gene selection in cancer classification using sparse logistic regression with Bayesian *Machine Learning Algorithm-Based Contraceptive Practice among Ever-Married Women… DOI: http://dx.doi.org/10.5772/intechopen.103187*

regularization. Bioinformatics. 2006; **22**(19):2348-2355. DOI: 10.1093/ bioinformatics/btl386

[41] Hailemariam T, Gebregiorgis A, Meshesha M, Mekonnen W. Application of data mining to predict the likelihood of contraceptive method use among women aged 15-49 case of 2005 demographic health survey data collected by central statistics agency, Addis Ababa, Ethiopia. Journal of Health & Medical Informatics. 2017;**8**(3): 274-279. DOI: 10.4172/2157-7420. 1000274

[42] Chaurasia AR. Contraceptive use in India: A data mining approach. International Journal of Population Research. 2014;**2014**:1-11. DOI: 10.1155/ 2014/821436

[43] Vaz F, Silva RR, Bernardino J. Using data mining in a mobile application for the calculation of the female fertile period. In: Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. Setúbal, Portugal: SciTePress; 2018. DOI: 10.5220/ 0007228603590366

[44] Balogun JA, Egejuru N, Idowu P. Comparative analysis of predictive models for the likelihood of infertility in women using supervised machine learning techniques. Computer Reviews Journal. 2018;**2**(1):313-330

#### **Chapter 2**

## Evaluating Similarities and Differences between Machine Learning and Traditional Statistical Modeling in Healthcare Analytics

*Michele Bennett, Ewa J. Kleczyk, Karin Hayes and Rajesh Mehta*

#### **Abstract**

Data scientists and statisticians are often at odds when determining the best approaches and choosing between machine learning and statistical modeling to solve their analytical challenges and problem statements across industries. However, machine learning and statistical modeling are actually more closely related to each other rather than being on different sides of an analysis battleground. The decision on which approach to choose is often based on the problem at hand, expected outcome(s), real world application of the results and insights, as well as the availability and granularity of data for the analysis. Overall machine learning and statistical modeling are complementary techniques that are guided on similar mathematical principles, but leverage different tools to arrive at insights. Determining the best approach should consider the problem to be solved, empirical evidence and resulting hypothesis, data sources and their completeness, number of variables/data elements, assumptions, and expected outcomes such as the need for predictions or causality and reasoning. Experienced analysts and data scientists are often well versed in both types of approaches and their applications, hence use best suited tools for their analytical challenges. Due to the importance and relevance of the subject in the current analytics environment, this chapter will present an overview of each approach as well as outline their similarities and differences to provide the needed understanding when selecting the proper technique for problems at hand. Furthermore, the chapter will also provide examples of applications in the healthcare industry and outline how to decide which approach is best when analyzing healthcare data. Understanding of the best suited methodologies can help the healthcare industry to develop and apply advanced analytical tools to speed up the diagnostic and treatment processes as well as improve the quality of life for their patients.

**Keywords:** machine learning, statistical modeling, data science, healthcare analytics, research design

#### **1. Introduction**

In the recent years, machine learning techniques have been utilized to solve problems at hand across multitudes of industries and topics. In the healthcare industry, these techniques are often applied to a variety of healthcare claims and electronic health records data to garner valuable insights into diagnostic and treatment pathways in order to help optimize patient healthcare access and treatment process [1]. Unfortunately, many of these applications resulted in inaccurate or irrelevant research results, as proper research protocols were not fully followed [2]. On the other hand, statistics has been the basis of analysis in healthcare research for decades, especially, in the areas of clinical trials and health economics and outcomes research (HEOR), where the precision and accuracy of analyses have been the primary objectives [3]. Furthermore, the classical statistics methodologies are often preferred in those research areas to ensure the ability to replicate and defend the results and ultimately, the ability to publish the research content in peer-reviewed medical journals [3]. The increased availability of data, including data from wearables, provided the opportunity to apply a variety of analytical techniques and methodologies to identify patterns, often hidden, that could help with optimization of healthcare access as well as diagnostic and treatment process [4].

With the rapid increase in data from the healthcare and many other industries, it is important to consider how to select well - suited statistical and machine learning methodologies that would be best for the problem at hand, the available data type, and the overall research objectives [5]. Machine learning alone or complemented by statistical modeling is becoming, not just a more common, but a desired convergence to take advantage of the best of both approaches for advancing healthcare outcomes [1]. Please note that this book chapter was originally posted on the Cornell University's research working article website: https://arxiv.org. The book chapter content is mostly the same between the two versions [6].

#### **2. Machine learning foundation is in statistical learning theory**

Machine learning (ML) is considered a branch of artificial intelligence and computer science that focuses on mimicking human behaviors through a set of algorithms and methods that use historical values to predict new values [7], without specifically being coded to do so and thereby learning over time [8, 9]. ML is grounded in statistical learning theory (SLT), which provides the constructs used to create prediction functions from data. One of the first examples of SLT was the creation of the support vector machine (SVM), the supervised learning method that can be used as for both classification and regression and has become a standard in modeling how to recognize visual objects [7]. SLT formalizes the model that makes a prediction based on observations (i.e., data) and ML automates the modeling [7].

SLT sets the mathematical and theoretical framework for ML as well as the properties of learning algorithms [7] with the goals of providing mechanisms for studying inference and creating algorithms that become more precise and improved over time [8]. SLT is based multivariate statistics and functional analysis [8]. Functional analysis is the branch of statistics that measures shapes, curves, and surfaces, extending multivariate vector statistics to continuous functions and finding functions that describe data patterns [8]. Inductive inference is the process of generalizing and modeling past observations to make predictions for the future; SLT formalizes the modeling concepts of inductive inference, while ML automates them [8].

#### *Evaluating Similarities and Differences between Machine Learning and Traditional Statistical… DOI: http://dx.doi.org/10.5772/intechopen.105116*

For example, pattern recognition is considered a problem of inductive inference and SLT, as it is a curving-fitting problem, and one of the most common applications of ML [7–9]. Pattern recognition is not suited for traditional computer programming as the inferences needed are not free of assumptions and the patterns are not easily described or labeled programmatically with deterministic functions. The standard mathematics behind SLT makes no assumptions on distributions, uses stochastic functions that can include humans labeling the "right" classification, i.e., training data, and can assume that the probability of the occurrence of one observation is independent of another thereby including the concept of randomness [7–9]. These tenets are therefore those of ML as well.

SLT also provides the definition of terms often using in ML such as overfitting, underfitting and generalization. Overfitting is when the presence of noise in the data negatively affects training and the ultimate model performance because the noise is being incorporated into the learning process, thereby giving error when the model sees new data [8, 9]. Underfitting is when the noise impacts both performance on training data as well as new and unseen data [9]. In ML, discussion about underfitting and overfitting are often used to describe models that do not generalize the data effectively and might not present the right set of data elements to explain the data patterns and posited hypotheses [9]. Underfitting is often defined when model which is missing features that would be present in the most optimized model, akin to a regression model not fully explaining all of the variance of the dependent variable [9]. In a similar vein, overfitting is when the model contains more features or different features than is optimal, like a regression model with autocorrelation or multicollinearity [9].

The general goal of learning algorithms and therefore ML model optimization is to reduce the dimensions, features, or data variables to the fewest number needed as that reduces noise or the impact of trivial variables that can overfit or unfit [8, 9]. A regularization model can then become generalized to perform not just on the past or the training data, but also on future and yet unseen data [8, 9]. Although true generalization needs both the right modeling criteria as well as strong subject matter knowledge [8].

Often dimension reduction approaches like Principal Component Analysis (PCA) or boot strapping techniques used along with subject matter expertise can help resolve how to refine models, combat fit challenges, as well as improve generalization potential [9, 10]. Furthermore, understanding the studied population and data characteristics can further help define the data to be used, variable selection, and proper model set up [10].

#### **3. Similarities between machine learning and statistical modeling**

Statistical modeling is based on SLT and use of mathematical models and statistical assumptions to generate sample data and make predictions about the real world occurrences. A statistical model is often represented as a collection of probability distributions on a set of all possible outcomes. Furthermore, statistical modeling has evolved in the last few decades and shaped the future of business analytics and data science, including the current use and applications of ML algorithms. On the other hand, machine learning does not require many assumptions and interventions when running algorithms in order to accurately predict studied outcomes [7].

There are similarities between ML and statistical modeling that are prevalent across most analytical efforts. Both techniques use historical data as input to predict new output values, but they vary as noted above on the underlying assumptions and the level of analyst intervention and data preparation.

Overall, machine learning foundations are based from statistical learning theory, and it is recommended for the data scientists to apply SLT's guiding rules during analysis. While it may seem as a statistical background and understanding is not required when analyzing the underlying data, this misconception often leads to data scientist's inability to set up proper research hypothesis and analysis due to a lack of understanding of the problem and the underlying data assumptions as well as caveats. This issue can in turn result in biased and irrelevant results as well as unfounded conclusions and insights. With that in mind, it is important to evaluate the problem at hand, and consider both statistical modeling and ML as possible methods to be applied. Understanding the underlying assumptions of the data and statistical inference can help support proper technique selection and guide the pathway to solution [11]. In the later sections of the chapter, application of both techniques will be provided and the reasoning for selecting the methods presented to guide future research.

As mentioned above, the similarities between ML and statistical modeling start with the underlying assumption that data or observations from the past can be used to predict the future [7]. The variables included in the analysis generally represent two types: dependent variables, that in ML are called targets, and independent variables, that in ML are called features. The definition of the variables is the same across both techniques [8]. Furthermore, both ML and statistical modeling leverage the available data in a way that allow for generalization of results to larger population [7]. The loss and risk associated with the models accuracy and representation of the real world occurrence is described frequently in terms of mean squared error (MSE). In statistical modeling, MSE is the difference between the predicted value and the actual value and is used to measure loss of the performance of predictions. In the ML, the same MSE concept is presented via a confusion matrix that evaluates a classification problem's accuracy [9].

#### **4. Differences between machine learning and statistical modeling**

Differences between machine learning and statistical modeling are distinct and based on purposes and needs for the analysis as well as the outcomes. Assumptions and purposes for the analysis and approach can vastly differ. For example, statistics typically assumes that predictors or features are known and additive, models are parametric, and testing of hypotheses and uncertainty are forefront. On the other hand, ML does not make these assumptions [12]. In ML, many models are based on nonparametric approaches where the structure of model is not specified or unknown, additivity is not expected, and assumptions about normal distributions, linearity or residuals, for example, are not needed for modeling [10].

The purpose of ML is predictive performance using general purpose learning algorithms to find patterns that are less known, unrelated, and in complex data without a priori view of underlying structures [10]. Whereas in statistical modeling, consideration for inferences, correlations, and the effects of a small number of variables are drivers [12].

Due to the differences in the methods' characteristics, it is important to understand the variations in application of the techniques when solving healthcare problems. For example, one typical application of statistics is to analyze whether a population has a particular medical condition. For some diseases such as diabetes,

#### *Evaluating Similarities and Differences between Machine Learning and Traditional Statistical… DOI: http://dx.doi.org/10.5772/intechopen.105116*

the condition is easily screened for and diagnosed using distinct lab values, such as elevated and increasing HbA1C over time, high glucose levels and low insulin levels, often due to insulin depletion occurring from unmanaged diabetes. Also conditions such as hypertension can easily be detected at home or in the healthcare provider's office using simple blood pressure measurement and monitoring, and wearables can identify when patients are experiencing atrial fibrillation, abnormal heart rhythms and even increased patient falls (possible syncope). Therefore, analyses of patients with these easily measurable conditions can be done simply by qualifying patients based on lab values or biomarkers falling within or outside of certain ranges. One of the simplest examples is identifying patients with diabetes [13]. This can be accomplished by using A1C levels to group patients as having no diabetes (A1C < 5.7), pre-diabetes (AIC of 5.7–6.4), or diabetes (A1C > 6.4). These ranges are based on American Diabetes Association Diagnosis Guidelines and a very high, medically accepted correlation between AIC levels and the diagnosis of diabetes [14].

On the other hand, if the objective of the research is to predict which pre-diabetic patients are most likely to progress to diabetes, a myriad of factors influence diabetes progression including extent of chronic kidney disease, high blood pressure, insulin levels over time, body mass index/obesity, age, years with diabetes, success of prior therapy, number and types of prior therapies, family history, coronary artery disease, prior cardiovascular events, infections, etc. A complicated combination of comorbidities, risk factors, and patient behavior can lead to differing diabetes complications and varying outcomes makes prediction more challenging and thus it represents a good candidate for the use of machine learning techniques. Classification models such as gradient boosting tree algorithms have been used to successfully predict diabetes progression, especially earlier in the disease. While there any many diabetes risk factors and co-morbidities, these disease characteristics are well studied over many years, thus enabling stable predictive models which perform well over time [14].

Overall, machine learning is highly effective when the model uses more than a handful of independent variables/features [10]. ML is required when the number of features (p) is larger than the number of records or observations (n) – this is called the curse of dimensionality [15, 16], which increases the risk of overfitting, but can be overcome with dimensionality reductive techniques (i.e., PCA), as part of modeling [15] and clinical/expert input on the importance or lack thereof of certain features, is it relates to the disease or its treatment. Additionally, statistical learning theory teaches that learning algorithms increase their ability to translate complex structures from data at a greater and faster rate than the increase of sample size capture can alone provide [8]. Therefore, statistical learning theory and ML offer methods for addressing high-dimensional data or big data (high velocity, volume and variety) and smaller sample sizes [17], such as recursive feature elimination and support vector machines, boosting, or cross validation which can also minimize prediction error [18].

In the healthcare industry, machine learning models are frequently used in cancer prediction, generally in three areas: (1) predicting a patient with a cancer prognosis/ diagnosis, (2) predicting cancer progression, and (3) predicting cancer mortality. Of these, predicting whether a patient may have a cancer prognosis/diagnosis can be more or less difficult depending on the tumor type. Certain cancers such as lung cancer, breast cancer, prostate cancer, and skin cancer are evaluated based on specific signs and symptoms, and non-invasive imaging or blood tests. These cancers are easier to predict. Conversely, cancers with non-descript symptoms such fatigue, dizziness, GI pain and distress, and lack of appetite are much more difficult to predict even with machine learning models as these symptoms are associated with multiple

tumor types (for example esophageal, stomach, bladder, liver, and pancreatic cancer) and also mimic numerous other conditions [14].

For cancers with vague symptoms, understanding the patient journey is very important to cancer prediction. If a prediction period is too long and does not reflect the time period before diagnosis when symptoms develop, the model may overfit due to spurious variables not related to the condition. If the prediction period is too short, key risk factors from the patient record could be missing. Variable pruning is required in these situations. A multi-disciplinary team including business and clinical experts can help trim unrelated variables and improve model performance [14].

Model validation is an inherent part of the ML process where the data is split into training data and test data, with the larger portion of data used to train the model to learn outputs based on known inputs. This process allows for rapid structure knowledge for primary focus on building the ability to predict future outcomes [15]. Beyond initial validation of the model within the test data set, the model should be further tested in the real world using a large, representative, and more recent sample of data [19]. This can be accomplished by using the model to score the eligible population and using a look forward period to assess incidence or prevalence of the desired outcome. If the model is performing well, probability scores should be directly correlated to incidence/ prevalence (the higher the probability score, the higher the incidence/prevalence). Model accuracy, precision, and recall can also be assessed using this approach [20].

Epidemiology studies and prior published machine learning research in related areas of healthcare can help benchmark the performance of the model relative to the baseline prevalent or incident population for the condition to be predicted. Machine learning models created using a few hundred or thousand patients often do not perform as well in the real world. Careful variable pruning, cohort refinement and adjustment of modeling periods can often resolve model performance problems. Newer software can be used to more quickly build, test, and iterate models, allowing users to easily transform and combine features as well as run many models simultaneously and visualize model performance, diagnosis and solve model issues [21].

#### **5. How to choose between machine learning and statistical modeling**

Machine learning algorithms are a preferred choice of technique vs. a statistical modeling approach under specific circumstances, data configurations, and outcomes needed.

#### **5.1 Importance of prediction over causal relationships**

As noted above, machine learning algorithms are leveraged for prediction of the outcome rather than present the inferential and causal relationship between the outcome and independent variables/data elements [17, 22]. Once a model has been created, statistical analysis can sometime elucidate and validate the importance and relationship between independent and dependent variables.

#### **5.2 Application of wide and big dataset(s)**

Machine Learning algorithms are learner algorithms and learn on large amount of data often presented by a large number of data elements, but not necessarily with many observations [23]. Ability of multiple replications of samples, cross validation

#### *Evaluating Similarities and Differences between Machine Learning and Traditional Statistical… DOI: http://dx.doi.org/10.5772/intechopen.105116*

or application of boot strapping techniques for machine learning allows for wide datasets with many data elements and few observations, which is extremely helpful in predicting rare disease onset [24] as long as the process is accompanied with real world testing to ensure the models are not suffering from overfitting [18, 19]. With the advent of less expensive and more powerful computing power and storage, multialgorithm, ensembled models using larger cohorts can be more efficiently built. Larger modeling samples that are more representative of the overall population can help reduce the likelihood of overfitting or underfitting [25]. A large cohort imposes various issues and of priority is the ability to identify the set of independent variables that are most meaningful and impactful. These significant independent variables provide a predictive and/or inferential model that can be readily acceptable in providing a real-world application. The variables in such instances may also result into more realistic magnitude and direction of the causal relationship between the independent and outcomes variables of interest.

A recent example for a real-world example in healthcare for machine learning algorithm application is to identify the likelihood of hospitalization for high-risk patients diagnosed with Covid 19. The dataset leveraged included over 20,000 independent variables across healthcare claims data for diagnostics and treatment variables. The best optimal ML model consisted of approximately 200 important predictors variables such as age, diagnosis like Type 2 diabetes/CKD/Hypertension, frequency of office visits, Obesity amongst others. None of the variables in this example were 'new', however, the magnitude and direction as a result of the ML exercise may illustrate the 'true' impact of each independent variable, a feature that is a serious limitation in traditional statistical modeling [26].

Furthermore, as explained above, statistical models tend to not operate well on very large datasets and often require manageable datasets with a fewer number of pre-defined attributes/data elements for analysis [23]. The recommended number of attributes is up to 12 in a statistical model, because these techniques are highly prone to overfitting [25]. This limitation creates a challenge when analyzing large healthcare datasets and require application of dimension reduction techniques or expert guidance in allowing to eliminate the number of independent variables in the study [23].

#### **5.3 Limited data and model assumptions are required**

In machine learning algorithms, there are fewer assumptions that need to be made on the dataset and the data elements [5]. However, a good model is usually preceded by profiling of the target and control groups and some knowledge of the domain. Understanding relationships within the data improve outcomes and interpretability [27].

Machine learning algorithms are comparatively more flexible than statistical models, as they do not require making assumptions regarding collinearity, normal distribution of residuals, etc. [5]. Thus, they have a high tolerance for uncertainty in variable performance (e.g., confidence intervals, hypothesis tests [28]. In statistical modeling emphasis is put in uncertainty estimates, furthermore, a variety of assumptions have to be satisfied before the outcome from a statistical model can be trusted and applied [28]. As a result, the statistical models have a low uncertainty tolerance [25].

Machine learning algorithms tend to be preferred over statistical modeling when the outcome to be predicted does not have a strong component of randomness, e.g., in visual pattern recognition an object must be an E or not an E [5], and when the learning algorithm can be trained on an unlimited number of exact replications [29].

ML is also appropriate when the overall prediction is the goal, with less visibility to describe the impact of any one independent variable or the relationships between variables [30], and when estimating uncertainty in forecasts or in effects of selected predictors is not a requirement [28]. However, often data scientists and data analysts leverage regression analytics to understand the estimated impact, including directionality of the relationships between the outcome and data elements, to help with model interpretation, relevance, and validity for the studied [27]. ML is also preferred when the dataset is wide and very large [23] with underlying variables are not fully known and previously described [5].

#### **6. Machine learning extends statistics**

Machine learning requires no prior assumptions about the underlying relationships between the data elements. It is generally applied to high dimensional data sets and does not require many observations to create a working model [5]. However, understanding the underlying data will support building representative modeling cohorts, deriving features relevant for the disease state and population of interest, as well as understanding how to interpret modeling results [19, 27].

In contrast, statistical model requires a deeper understanding how the data was collected, statistical properties of the estimator (p-value, unbiased estimators), the underlying distribution of the population, etc. [17]. Statistical modeling techniques are usually applied to low dimensional data sets [25].

#### **7. Machine learning can extend the utility of statistical modeling**

Robert Tibshirani, a statistician and machine learning expert at Stanford University, calls machine learning "glorified statistics," which presents the dependence of machine learning techniques on statistics in a successful execution that not only allows for a high level of prediction, but interpretation of the results to ensure validity and applicability of the results in the healthcare [17]. Understanding the association and knowing their differences enables data scientists and statisticians to expand their knowledge and apply variety of methods outside their domain of expertise. This is the notion of "data science," which aims to bridge the gap between the areas as well as bring other important to consider aspects of research [5]. Data science is evolving beyond statistics or more simple ML approaches to incorporate self-learning and autonomy with the ability to interpret context, assess and fill in data gaps, and make modeling adjustment over time [31]. While these modeling approaches are not perfect and more difficult to interpret, they provide exciting new options for difficult to solve problems, especially where the underlying data or environment is rapidly changing [27].

Collaboration and communication between not only data scientists and statisticians but also medical and clinical experts, public policy creators, epidemiologists, etc. allows for designing successful research studies that not only provide predictions and insights on relationships between the vast amount of data elements and health outcomes [30], but also allow for valid, interpretable and relevant results that can be applied with confidence to the project objectives and future deployment in the real [30, 32].

*Evaluating Similarities and Differences between Machine Learning and Traditional Statistical… DOI: http://dx.doi.org/10.5772/intechopen.105116*

Finally, it is important to remember that machine learning foundations are based in statistical theory and learning. It may seem machine learning can be done without a sound statistical background, but this leads to not really understanding the different nuances in the data and presented results [17]. Well written machine learning code does not negate the need for an in-depth understanding of the problem, assumptions, and the importance of interpretation and validation [29].

#### **8. Specific examples in healthcare**

As mentioned earlier in the chapter, machine learning algorithms can be leveraged in the healthcare industry to help evaluate a continuum of access, diagnostic and treatment outcomes, including prediction of patient diagnoses, treatment, adverse events, side effects, and improved quality of life as well as lower mortality rates [24].

As shown in **Figure 1**, often these algorithms can be helpful in predicting a variety of disease conditions and shortening the time from awareness to diagnosis and treatment, especially in rare and underdiagnosed conditions, estimate the 'true' market size, predicting disease progression such as identifying fast vs. slow progressing patients as well as determinants of suitable next line change [32]. Finally, the models can be leveraged for patient and physician segmentation and clustering to identify appropriate targets for in-person and non-personal promotion [30].

There are, however, instances in which machine learning might not be the right tool to leverage, including when the condition or the underlying condition have a few known variables, when the market is mature and has known predetermined diagnostic and treatment algorithm, and when understanding correlations and inference is more important than making prediction [5].

One aspect of the machine learning process is to involve a cross functional team of experts in the healthcare area to ensure that the questions and problem statement along with hypothesis are properly set up [33, 34]. Many therapeutic areas require in-depth understanding of the clinical and medical concepts (i.e., diagnostic process, treatment regimens, potential adverse effects, etc.), which can help with the research design and selection of the proper analytical techniques. If the expert knowledge is not considered or properly captured in the research design, it might lead to irrelevant, invalid, and biased results, and ultimately invalidate the entire research study [33, 34].


#### **Figure 1.**

*Examples of Machine Learning Applications in Healthcare Analytics [22].*

#### **9. A practical guide to the predominant approach**

Using a real example of a project with the goal of predicting the risk of hypertension due to underlying comorbid conditions or induced by medication, the decision to lead with machine learning vs. statistical modeling can be based on explicit criteria that can be weighed and ranked based on the desired outcome of the work [17, 32]. Please see **Figure 2** presenting an example of the approach.

As shown in **Figure 2**, pending the research objectives, machine learning or statistical modeling or both techniques could be the right method(s) to apply. For example, shifts in market trends, including shifts in patient volume of diagnosis and treatment present a suitable example when a statistical modeling type of analysis should be utilized. On the other hand, trying to predict patients with a high risk for hypertension requires the utilization of ML approaches. Leveraging both methods is best suited when predictive power and explanatory reasoning is needed to understand the important factors driving the outcome and their relative magnitudes and inferences.


#### **Figure 2.**

*Criteria for Choosing the Predominant Approach for a Project.*

#### **10. Conclusions**

Machine learning requires fewer assumptions about the underlying relationships between the data elements. It is generally applied to high dimensional data sets and require fewer observations to create a working model [5]. In contrast, statistical model requires an understanding of how the data was collected, statistical properties of the estimator (p-value, unbiased estimators), the underlying distribution of the population, etc. [17]. Statistical modeling techniques are usually applied to low dimensional data sets [25]. Statistical modeling and ML are not at odds but rather complementary approaches that offer choice of techniques based on need and desired outcomes. Data scientists and analysts should not necessarily have to choose between either machine learning or statistical modeling as a mutually exclusive decision tree. Instead, selected approaches from both areas should be considered as both types of methodologies are based on the same mathematical principles but expressed somewhat differently [5, 10].

*Evaluating Similarities and Differences between Machine Learning and Traditional Statistical… DOI: http://dx.doi.org/10.5772/intechopen.105116*

**Note:** This book chapter was originally posted on the Cornell University's research working paper website: https://arxiv.org. The content of the book chapter is mostly the same compared to the version posted on https://arxiv.org [6].

#### **Funding**

Authors work for Symphony Health, ICON plc Organization.

#### **Conflict of interest**

The authors declare no conflict of interest.

### **Author details**

Michele Bennett1,2, Ewa J. Kleczyk1,3\*, Karin Hayes4 and Rajesh Mehta1

1 Symphony Health, ICON, plc Organization, Blue Bell, PA, USA

2 Data Science, Computer Science, and Business Analytics, Grand Canyon University, USA

3 The School of Economics, The University of Maine, Orono, ME, USA

4 Symphony Health, ICON, plc Organization, Phoenix, AZ, USA

\*Address all correspondence to: ewa.kleczyk@symphonyhealth.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Beam AL, Kohane IS. Big data and machine learning in health care. JAMA. 2018;**19**(13):1317-1318. DOI: 10.1001/ jama.2017.18391

[2] Shelmerdine et al. Review of study reporting guidelines for clinical studies using artificial intelligence in healthcare. BMJ Health & Care Informatics. 2021;**28**(1):e100385. DOI: 10.1136/ bmjhci-2021-100385

[3] Romano R, Gambale E. Statistics and medicine: The indispensable know-how of the researcher. Translational Medicine @UniSa. 2013;**5**:28-31

[4] Razzak et al. Big data analytics for preventive medicine. Neural Computing and Application. 2020;**32**:4417-4451. DOI: 10.1007/s00521-019-04095-y

[5] Bzdok D, Altman N, Krzywiniski M. Statistics versus machine learning. Nature Methods. 2018;**15**(4):233-234. DOI: 0.1038/nmeth.4642

[6] Bennett M, Hayes K, Kleczyk EJ, Mehta R. Analytics in healthcare: Similarities and differences between machine learning and traditional advanced statistical modeling. Cornell University. 2022:1-16. Available from: https://arxiv.org/abs/2201.02469

[7] Von Luxburg U, Scholkopf B. Inductive logic. In: Handbook and History of Logic. Vol. 10. New York: Elsevier; 2011

[8] Bousquet et al. Introduction to Statistical Learning. 2003. Available from: http://www.econ.upf.edu/~lugosi/ mlss\_slt.pdf

[9] Field A. Discovering Statistics Using R. London: Sage; 2012

[10] Carmichael I, Marron JS. Data science vs. statistics: Two cultures? Japanese Journal of Statistics and Data Science. 2018;**1**(1):117-138

[11] Cahn A, Shoshan A, Sagiv T, Yesharim R, Goshen R, Shalev V, et al. Prediction of progression from prediabetes to diabetes: Development and validation of a machine learning model. Diabetes/Metabolism Research and Reviews. 2020;**36**(2):e3252. DOI: 10.1002/ dmrr.3252 Epub 2020 Jan 14

[12] Breiman L. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Statistical Science. 2001;**16**(3):199-231

[13] Mehta R, Uppunuthula S. Use of machine learning techniques to identify the likelihood of hospitalization for highrisk patients diagnosed with COVID-19. In: ISPOR Conference; Washington DC. 2022

[14] American Diabetes Association. Understanding A1C Diagnosis. 2022. Available from: https:// www.diabetes.org/diabetes/a1c/ diagnosis#:~:text=Diabetes%20is%20 diagnosed%20at%20fasting,equal%20 to%20126%20mg%2Fdl

[15] Bzdok et al. Machine learning: A primer. Nature Methods. 2017;**14**(12):1119-1120. DOI: 10.1038/ nmeth.4526

[16] Bellman RE. Adaptive Control Processes. Princeton, NJ: Princeton University Press; 1961

[17] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2ed). Stanford, CA: Springer; 2016

*Evaluating Similarities and Differences between Machine Learning and Traditional Statistical… DOI: http://dx.doi.org/10.5772/intechopen.105116*

[18] Chapman et al. Statistical learning theory for high dimensional prediction: Application to criterion-keyed scale development. Psychology Methods. 2016;**21**(4):603-620. DOI: 10.1037/ met0000088

[19] Argent et al. The importance of realworld validation of machine learning systems in wearable exercise biofeedback platforms: A case study. Sensors (Basel). 2021;**21**(7):2346. DOI: 10.3390/ s21072346

[20] Parikh et al. Understanding and using sensitivity, specificity and predictive values. Indian Journal of Ophthalmology. 2008;**56**(1):45-50. DOI: 10.4103/0301-4738.37595

[21] Mendis A. Statistical Modeling vs. Machine Learning. 2019. Available from: https://www.kdnuggets.com/2019/08/ statistical-modelling-vs-machinelearning.html

[22] Hayes K, Rajabathar R, Balasubramaniam V. Uncovering the machine learning "Black Box": Discoveringlatent patient insights using text mining & machine learning. In: Conference Paper Presented at Innovation in Analytics via Machine Learning & AI; Las Vegas, NV. 2019 Available from: https://www.pmsa.org/ other-events/past-symposia

[23] Belabbas M, Wolfe PJ. Spectral methods in machine learning and new strategies for very large datasets. Proceedings of the National Academy of Sciences. 2009;**106**(2):369-374. DOI: 10.1073/pnas.0810600105

[24] Kempa-Liehr et al. Healthcare pathway discovery and probabilistic machine learning. International Journal of Medical Informatics. 2020;**137**:104087. DOI: 10.1016/j. ijmedinf.2020.104087

[25] Wasserman L. Rise of the machines. In: Past, Present, and Future of Statistical Science. Chapman and Hall; 2013. pp. 1-12. DOI: 10.1201/b16720-49

[26] Ranjan R. Calibration in machine learning. 2019. Available from: https:// medium.com/analytics-vidhya/ calibration-in-machine-learninge7972ac93555

[27] Child CM, Washburn NR. Embedding domain knowledge for machine learning of complex material systems. MRS Communications. 2019;**9**(3):806-820. DOI: 10.1557/ mrc.2019.90

[28] Hilliermeir E, Waegerman W. Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods. Machine Learning. 2021;**110**:457-506. DOI: 10.1007/s10994-021-05946-3

[29] Goh et al. Evaluating human versus machine learning performance in classifying research abstracts. Scientometrics. 2020;**125**:1197-1212. DOI: 10.1007/s11192-020-03614-2

[30] Chicco D, Jutman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020;**21**(6). DOI: /10.1186/ s12864-019-6413-7

[31] Ansari et al. Rethinking humanmachine learning in Industry 4.0: How does the paradigm shift treat the role of human learning? Procedia Manufacturing. 2018;**23**:117-122. DOI: 10.1016/j.promfg.2018.04.003

[32] Morganstein et al. Predicting population health with machine learning: A scoping review. BMJ Open. 2020;**10**(10). DOI: 10.1136/ bmjopen-2020-037860

[33] Terranova et al. Application of machine learning in translational medicine: Current status and future opportunities. The AAPS Journal. 2021;**23**(74). DOI: 10.1208/ s12248-021-00593-x

[34] Kleczyk E, Hayes K, Bennett M. Building organization AI and ML acumen during the COVID Era. 2022. In: PMSA Annual Conference. Louisville, KY. 2022

#### **Chapter 3**

## Image-Based Crop Leaf Disease Identification Using Convolution Encoder Networks

*Indira Bharathi and Veeramani Sonai*

#### **Abstract**

Nowadays, agriculture plays a major role in the progress of our nation's economy. However, the advent of various crop-related infections has a negative impact on agriculture productivity. Crop leaf disease identification plays a critical role in addressing this issue and educating farmers on how to prevent the spread of diseases in crops. Researchers have already used methodologies such as decision trees, random forests, deep neural networks, and support vector machines. In this chapter, we proposed a hybrid method using a combination of convolutional neural networks and an autoencoder for detecting crop leaf diseases. With the help of convolutional encoder networks, this chapter presents a unique methodology for detecting crop leaf infections. Using PlantVillage dataset, the model is trained to recognize crop infections based on leaf images and achieves an accuracy of 99.82%. When compared with existing work, this chapter achieves better results with a suitable selection of hyper tuning parameters of convolution neural networks.

**Keywords:** crop leaf, convolution neural network, autoencoder, ReLU, deep neural network, hyper tuning

#### **1. Introduction**

In agriculture, crop leaf plays an important role in giving information about the good growth of the plant. Various climatic factors affect the growth of the plant. Besides natural calamities, crop leaf disease is a major hazard to the growth of agriculture yields and economic victims. Once we fail to analyze the infections in the crops, we may lead to low pesticide usage. Therefore, crop leaf identification is considered a major issue in the biological features of diseases present in the crop. When required, expert visual inspections and biological reviews are normally carried out through plant diagnosis. This strategy, on the other hand, is usually timeconsuming and ineffective. To solve these difficulties, sophisticated and intelligent systems for detecting plant diseases are required.

To provide an intelligent system to identify the crop leaf diseases, we proposed a convolution neural network with image processing methodologies such as image segmentation and filtering. In the existing works, most researchers applied conventional machine learning algorithms to predict or identify the crop diseases present in the leaf. However, machine learning algorithms better recognize the plants, weed discrimination, etc. As a result, crop leaf disease identification is critical to maintaining agricultural productivity. In general, plant leaf disease analysis is also done manually by using visual inspection. But it is time-consuming and potentially error-prone. As a result, diagnosing crop disease using automated procedures is beneficial since it reduces a significant amount of effort associated with crop monitoring on large farms, and it detects disease symptoms at an early stage, i.e., when the disease first appears. Leaf plant health monitoring and early detection of symptoms are required to limit disease transmission, which aids farmers in effective management methods.

To develop an accurate image classifier for crop leaf identification, we need image samples of damaged and healthy crops. The PlantVillage dataset has thousands of images of healthy and infected crop leaves. In this dataset, six diseases in three crop species are labeled. Hence, we use 54,306 image samples with a convolution encoder network to identify the crop leaf diseases more accurately. The main contribution of this chapter is summarized as follows:


The rest of the chapter is organized as follows: The literature is briefly explained in Section 2. The techniques used have been elaborated in Section 3. The results were elaborated in Section 4, and the conclusion and future work are provided in Section 5.

#### **2. Related works in the literature**

An existing literature survey categorized the plant diseases by using several Convolutional Neural Networks (CNN) [1, 2]. In the PlantVillage dataset, another CNN-based architecture was presented to classify disease, and it outperformed DL models such as as AlexNet, VGG-16, Inception-v3, and ResNet [3]. CNN model is also proposed in a study to classify data in tea leaves. A CNN-based approach was developed for groundnut disease categorization in a recent publication [4]. Similarly, little literature has looked at sophisticated training strategies; for example, [5] focused at the performance of AlexNet and GoogLeNet. By comparing state-of-the-art and finetuning techniques, comparison research was undertaken to demonstrate the importance of the fine-tuning technique.

A random forest-based classifier to identify the healthy and affected leaf is proposed [6]. The author has described the dataset creation, extraction, and training. An AlexNet classification technique is applied to detect rice leaf diseases, namely bacterial blight, brown spot as well as leaf smut [7]. In order to monitor regularly and automatic disease detection for remote sensing images was proposed [8]. Using Canny's edge detection and machine learning algorithm, a disease identification system was proposed [9]. A convolutional neural networks-based autoencoder was used to detect crop leaf diseases. The convolution filter size of 2 2 and 3 3 gives different accuracy for the different eoches [10].

*Image-Based Crop Leaf Disease Identification Using Convolution Encoder Networks DOI: http://dx.doi.org/10.5772/intechopen.106989*

A state-of-the-art deep convolutional neural network for image classification is proposed in [11]. A DenseNet model is proposed to perform better than other models. The author proposes activation functions that perform better than ReLU on various scales [12]. For the early detection of European wheat diseases, an automatic plant disease diagnosis system is proposed [13, 14]. To increase the robustness of crop detection, a multi-target tracking algorithm is proposed [15].

In order to classify the leaf images, deep learning approaches are studied [16]. For the leaf segmentation, the images are trained using Mask Regionased Convolutional Neural Network (Mask R-CNN). The average accuracy obtained for the VGG16 images is 91.5%. Through deep learning methodologies, leaf images are classified as healthy and affected [17]. A method to dynamically analyze the images of the disease is proposed in [18]. The output is sent to the farmer, and the feedback is reflected in the model. Using the deep learning, strawberry fruits and leaves, diseases are diagnosed [18]. A convolutional Neural Network (CNN) model and Learning Vector Quantization (LVQ) algorithm-based method for tomato leaf disease detection and classification [19, 20].

To categorize the healthy and affected leaf, a deep learning model is applied over the public images [1]. For the sustainable development of arming, it is essential to use Artificial intelligence and machine learning approaches [21]. To solve the current agricultural problems, a computer vision technology is combined with deep learning model [22]. Using the images of plants, a state-of-the-art deep learning model is applied to detect disease [5, 23]. To enhance the accuracy, a depthwise separable convolution is adopted [3]. For the automatic detection of infection in the tomato leaves, an enhanced deep learning architecture is adopted over the plantVillage datase [4]. To classify the crop, a novel three-dimensional (3D) convolutional neural network (CNN) is applied over the remote sensing image [24].

#### **3. Proposed hybridized convolution neural network with variational autoencoders system**

In this chapter, a hybridized convolution neural network with variational autoencoders is proposed to classify the crop leaf diseases, and hence, it is named as V-Convolution encoder network. To extract the informative features of the leaf, we used an autoencoder. It is a type of neural network, which is useful for outperforming two functions, namely encoding and decoding. An encoding part plays a role in extracting the high-dimensional features of the leaf, and the decoding part reconstructs the inputs taken. In general, all CNN consists of three important layers, namely encoder layers, max or min – pooling layers, and fully connected layers, as shown in **Figure 1**.

#### **3.1 Building blocks of CNN**

The convolutional layer is the core part of a convolutional network that contains a structure of learnable channels. In the forward pass, the width and height information of the images is passed over each channel, and the product of kernel and image pixels is calculated. In the backward pass, the gradient of the loss with corresponds to input, weight, and bias is computed. The various levels of filters are used to extract the needed features from the matrix of original images taken. As the filter levels go in deep, we can solve a more specific problem. To hold the important features, zero padding is added across the image matrix.

**Figure 1.** *Proposed architeure*.

The ReLU activation function is used within the convolution layer, which adds nonlinearity to the network. It calculates the weighted informative features faster than the tangent or sigmoidal function. Next is the max pooling layer, which increases a pooling layer in the midstream of several convolutional layers. Its skill is to vigorously decrease the spatial size of the image to minimize the size of parameters and calculation and consequently to control overfitting. When the image size is large, the pooling layer reduces the number of training features. The important significance of adding pooling layers is to lessen the spatial size of the input image. Here, min-max pooling is used in our implementation. After the pooling layer, the fully connected layer is essential to produce an output equivalent to the number of classes that we want as output. In this, the annihilation of neurons is done, and we gain a vector of all neurons. In such a layer, all neurons are fully connected with neurons in the previous layer. At last softmax layer is used to calculate the probabilities should be in the range 0–1, and the summation of all probabilities is 1.

#### **3.2 Variational autoencoder**

Variational autoencoder is proposed to extract the features of given input images. It is a neural organization that is intended for unsupervised learning. It comprises two sections: encoder and decoder. The encoder means to encode input highlights into encoding vectors, while the decoder acquires the yield highlights back from the encoding vector. The encoder is planned so that the result produces a variable, which is a compressed form of the input. On the other side, the decoder decompresses the resultant images back to their original size. The difference between autoencoder and variation autoencoder is that the autoencoder represents the features by applying the function, whereas the variational autoencoder represents the features by calculating the probability distribution. This encoder is designed based on the principle of a

#### *Image-Based Crop Leaf Disease Identification Using Convolution Encoder Networks DOI: http://dx.doi.org/10.5772/intechopen.106989*

neural network that gives q of input as p of output. In a probability distribution model, this network parameterizes the inaccurate features of the input images q and produces the result as a distribution of x (p | q). This variational decoder then reconstructs the input samples p such that it produces parameters to the distribution y (p | q). This model consists of two phases, namely feature learning and classifier. The learning of features is done in an unsupervised network, whereas the leaf diseases are classified by training the samples using a CNN classifier. The overall architecture of the proposed system is shown in **Figure 1**.

To perform the crop leaf disease identification, we have considered the PlantVillage dataset. To improve the performance of the proposed system, the segmentation process is performed on the original data samples before feature learning (**Table 1**).


#### **Table 1.**

*Classes of various crop diseases.*


#### **Table 2.**

*Training, test, and validation values used for each category of data sets.*

We acquired our results based on the training and testing sample listed in **Table 2**. For classification, we considered different types of crop diseases from tomato leaves. **Table 1** describes the various crop and their diseases. It has six different classes, ranging from 1 to 6. The proposed network has been trained to recognize crop infections based on leaf images. Different convolution filter levels are used in the proposed work, and to train the network more efficiently, ReLU activation function is used. It was shown that the proposed architecture achieved better accuracy for various epochs and convolution filter sizes. We applied additional convolution layers with 128 filters and filtered size 2 2 with ReLU. It is then followed by two additional convolution layers with 256 filters and filter size 2 2 with ReLU. After all this, a flattening layer is used to acquire a vector of neurons that uses ReLU function. Then two dense layers are used: one uses ReLU, while the other uses the softmax function and depicts the output class.

#### **4. Results and discussion**

The proposed system is implemented by using Python Scikit-learn packages and is executed using the Intel i5 processor. The proposed approach is evaluated by using the PlantVillage dataset [25]. The testing and training used for the leaf image dataset are illustrated in **Table 2**. The following performance parameters have been considered for our implementation, namely precision, recall, and F1-score. The results are taken with different values of epochs, and it is compared with existing approaches. By varying the epochs, the error in the testing and training sample is plotted in **Figure 2**.

We achieved 98% of accuracy if the network is iterated for 150 epochs. It is also observed that as the filter size increases, we get 100% accuracy. **Table 2** shows the training and testing accuracy for the different convolutional filter sizes such as 2 2 and 3 3. The best training accuracy for the 2 2 filter size is 97.21, and the best testing accuracy is 87.12 for filter size 2 2. When compared with existing work, this paper achieves better results with a suitable selection of hyper tuning parameters of a convolution neural network (**Table 3**).

**Figure 2.** *Comparison of training and testing error with various epoches*.

*Image-Based Crop Leaf Disease Identification Using Convolution Encoder Networks DOI: http://dx.doi.org/10.5772/intechopen.106989*


#### **Table 3.**

*Training and testing accuracy for different filters.*

**Figure 3.** *Comparison of various performance parameters*.

**Figure 4.** *Accuracy comparison of various classifier.*


*Artificial Intelligence Annual Volume 2022*

**Table 4.**

*Precision, Recall, F1-score, and Accuracy value of various datasets.*

*Image-Based Crop Leaf Disease Identification Using Convolution Encoder Networks DOI: http://dx.doi.org/10.5772/intechopen.106989*

The performance of the resulting implementation is illustrated in the **Figure 3**. **Figure 4** shows the comparison of the proposed classifier and existing classifier approaches. The proposed CNN approach shows superior performance in terms of accuracy compared with other existing approaches (**Table 4**).

#### **5. Conclusion and future work**

Crop leaf diseases have been responsible for reducing production resulting in economic causes. Recently, the crop leaf has been facing several diseases from various insects and pests. This chapter proposes a unique methodology for detecting crop leaf infections. With the PlantVillage dataset, the model is trained to recognize crop infections based on leaf images and achieves an accuracy of 99.82%. This chapter presented a feature selection algorithm to identify essential features from crop leaf images. The chosen features are given to the hybrid method using a combination of convolutional neural networks and autoencoders. Among all the existing classifiers, the proposed approach shows an average of 84.54% of execution time improvement in performing the classification. This work can be enhanced further to give the recommendation to the farmer to apply proper insecticides prior to the spread of such diseases.

#### **Author details**

Indira Bharathi\* and Veeramani Sonai School of Computing, Amrita Vishwa Vidhyapeetham, Vengal, Tamil Nadu, India

\*Address all correspondence to: b\_indira@ch.amrita.edu

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### **References**

[1] Kulkarni O. Crop disease detection using deep learning. In: IEEE Fourth International Conference on Computing Communication Control and Automation (ICCUBEA). New York: IEEE; 2018. pp. 1-4

[2] Kaushik M, Prakash P, Ajay R, Veni S. Tomato leaf disease detection using convolutional neural network with data augmentation. In: IEEE 2020 5th International Conference on Communication and Electronics Systems (ICCES). New York: IEEE; 2020. pp. 1125-1132

[3] Kamal K, Yin Z, et al. Depthwise separable convolution architectures for plant disease classification. Computers and Electronics in Agriculture. 2019;**165**: 104948

[4] Karthik R, Hariharan M, Anand S, Mathikshara P, Johnson A. Attention embedded residual CNN for disease detection in tomato leaves. Applied Soft Computing. 2020;**86**:105933

[5] Bedi P, Gole P, Agarwal SK. Using deep learning for image-based plant disease detection. In: Internet of Things and Machine Learning in Agriculture. Lausanne, Switzerland: Frontiers; 2021. pp. 369-402

[6] Maniyath SR, Ram H. Plant disease detection using machine learning. In: IEEE International Conference on Design Innovations for 3Cs Compute Communicate Control (ICDI3C). New York: IEEE; 2018. pp. 41-45

[7] Matin MH, Uddin MS. An efficient disease detection technique of Rice leaf using AlexNet. Journal of Computer and Communications. 2020;**8**(12):4

[8] Badage A. Crop disease detection using machine learning: Indian

Agriculture. International Research Journal of Engineering and Technology. 2018;**5**(9):866-869

[9] Kranth PR, Lalitha H, Basava L, Mathur A. Plant disease prediction using machine learning algorithm. International Journal of Computer Applications. 2018;**182**(25):41-45

[10] Khamparia A, Saini G, Gupta D, Khanna A, Tiwari S. Seasonal crops disease prediction and classification using deep convolutional encoder network. Circuits, Systems and Signal Processing. 2020;**39**:818-836

[11] Too EC, Yujian L, Njuki S, Yingchun L. A comparative study of fine-tuning deep learning models for plant disease identification. Computers and Electronics in Agriculture. 2018;**22**:135-152

[12] Qian S, Liu H, Liu C, Wu S, Wong HS. Adaptive activation functions in convolutional neural networks. Neurocomputing. 2018;**272**:204-212

[13] Picon A, Alvarez-Gila A, Seitz M, Ortiz-Barredo A, Echazarra J, Johannes A. Deep convolutional neural networks for mobile capture device-based crop disease classification in the wild. Computational Electronic Agriculture. 2018;**138**:200-209

[14] Lu J, Hu J, Zhao G, Mei F, Zhang C. An in-field automatic wheat disease diagnosis system. Computers and Electronics in Agriculture. 2017;**142**: 369-379

[15] Hamuda E, Mc Ginley B, Glavin M, Jones E. Improved image processingbased crop detection using Kalman filtering and the Hungarian algorithm. Computers and Electronics in Agriculture. 2018;**148**:37-44

*Image-Based Crop Leaf Disease Identification Using Convolution Encoder Networks DOI: http://dx.doi.org/10.5772/intechopen.106989*

[16] Yang K, Zhong W. Leaf segmentation and classification with a complicated background using deep learning. Agronomy. 2020;**10**:1721

[17] Ferentinos KP. Deep learning models for plant disease detection and diagnosis. Computers and Electronics in Agriculture. 2018;**145**:311-318

[18] Park H, JeeSook E, Kim S-H. Crops disease diagnosing using image-based deep learning mechanism. In: International Conference on Computing and Network Communications (CoCoNet). New York: IEEE; 2018. pp. 23-26

[19] Sardogan M, Tuncer A, Ozen Y. Plant leaf disease detection and classification based on CNN with LVQ algorithm. In: IEEE International Conference on Computer Science and Engineering (UBMK). New York: IEEE; 2018. pp. 382-385

[20] Reddy JN, Vinod K, Ajai AR. Analysis of classification algorithms for plant leaf disease detection. In: IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). New York: IEEE; 2019. pp. 1-6

[21] Jha K, Doshi A, Patel P, Shah M. A comprehensive review on automation in agriculture using artificial intelligence. Artificial Intelligence Agriculture. 2019; **2**:1-12

[22] Tian H, Wang T, Liu Y, Qiao X, Li Y. Computer vision technology in agricultural automation – A review. Agriculture. 2020;**7**(1):1-19

[23] Too EC, Yujian L, Njuki S, Yingchun L. A comparative study of fine-tuning deep learning models for plant disease identification. Computers and Electronics in Agriculture. 2019;**161**:272-279

[24] Ji S, Zhang C, Xu A, Shi Y, Duan Y. 3D convolutional neural networks for crop classification with multi-temporal remote sensing images. Remote Sensing. 2018;**10**(1):75-84

[25] https://knowyourdata-tfds.with google.com/

#### **Chapter 4**

## Perspective Chapter: Deep Reinforcement Learning for Co-Resident Attack Mitigation in The Cloud

*Suxia Cui and Soamar Homsi*

#### **Abstract**

Cloud computing brings convenience and cost efficiency to users, but multiplexing virtual machines (VMs) on a single physical machine (PM) results in various cybersecurity risks. For example, a co-resident attack could occur when malicious VMs use shared resources on the hosting PM to control or gain unauthorized access to other benign VMs. Most task schedulers do not contribute to both resource management and risk control. This article studies how to minimize the co-resident risks while optimizing the VM completion time through designing efficient VM allocation policies. A zero-trust threat model is defined with a set of co-resident risk mitigation parameters to support this argument and assume that all VMs are malicious. In order to reduce the chances of co-residency, deep reinforcement learning (DRL) is adopted to decide the VM allocation strategy. An effective cost function is developed to guide the reinforcement learning (RL) policy training. Compared with other traditional scheduling paradigms, the proposed system achieves plausible mitigation of coresident attacks with a relatively small VM slowdown ratio.

**Keywords:** cloud computing, risk mitigation, resource management, co-resident attack, reinforcement learning

#### **1. Introduction**

Cloud Computing, which has its origins in expanding the Internet, aims to provide remote and scalable computing and storage resources to its customers. Users from small businesses in a locally resource-limited environment can manipulate and store large datasets for real-time processing with cloud services. The cloud platform has gradually reshaped daily lives because it has been recognized as a convenient way to transmit and store data in the big data era. Organizations can choose from the public, private, or hybrid cloud that combines the public and private deployment model features. The term "XaaS" is coined for the service-oriented architecture, emphasizing that anything can be treated as a service under the cloud computing environment. Examples of cloud delivery services include infrastructure as a service (IaaS),

platform as a service (PaaS), and software as a service (SaaS) [1]. Recently, function as a service (FaaS) further expanded the backend as a service (BaaS) offering. Under each delivery model, a cloud service provider (CSP) is responsible for allocating enough resources to maintain quality of service (QoS) to the users and protect their data from security risks.

Virtualization has been adopted by most cloud computing platforms to profit from the "pay as you go (PAYG)" model [2]. Virtualization is an idea generated from IBM's mainframe platform in the early 1960s. After entering the twenty-first century, it was successfully utilized in cloud computing that can bring down the cost of maintaining a large-scale system. It converts a physical server into numerous VMs, rented out to several occupants [3–7]. This VM-PM relationship is illustrated in **Figure 1**.

The apparent relationship between PMs running and power consumption places a high demand on a strategy for energy minimization in this configuration [8]. Security and data privacy are other concerns for cloud computing platforms. Attackers will seek to exploit any vulnerability to achieve various malicious goals on the victim's network, software, and databases. The co-resident attack is one of the prevalent cybersecurity risks resulting from virtualization. Ideally, two neighboring VMs are isolated from each other when running their tasks. However, in reality, each colocated VM will depend on the same PM where hardware, like CPUs or memory elements, is shared by all the VMs. Therefore, a VM's private information may be accessed by its neighboring VM by launching side-channel attacks [9–12], as shown in **Figure 2**. Here, a hypervisor or virtual machine monitor (VMM) creates and runs VMs on a hosting PM. The arrows illustrate the route of side-channel attacks. A sidechannel attack is a significant security challenge that prevents many organizations from adopting cloud computing technology. Although recently deep learning algorithms have proven to be effective in cloud resource management [13–16], few paid attention to side-channel attack avoidance at the same time.

To fill in this gap, we developed a novel deep reinforcement-learning (DRL) based dynamic VM allocation approach to optimize the trade-offs between the VM completion and the co-resident risks mitigation. The main contributions of this paper are as follows:

• Threat model design: A time-sensitive zero trust threat model is developed for co-resident vulnerability analysis. The model enables the tracking of VM co-existent pairs on the same PM.

**Figure 1.**

*VMs and PMs in a data center via virtualization.*

*Perspective Chapter: Deep Reinforcement Learning for Co-Resident Attack Mitigation… DOI: http://dx.doi.org/10.5772/intechopen.105991*

**Figure 2.** *Side-channel attacks in cloud computing.*


#### **2. Related works: Secure cloud resource management**

Resource management alone without security consideration in a cloud computing environment is already very challenging. Multi-tenant environments allow attackers to begin a co-resident infiltration and steal the victim's information by side channels. The risks that attackers pose to VMs on the same hypervisor are growing security concerns and are being addressed differently. This section will discuss established current resource management approaches with and without security awareness. Three categories of methodologies are commonly used: Heuristic, game theory, and machine learning (ML).

The exploration of heuristics and meta-heuristics to solve nondeterministic polynomial time (NP) problems are growing amid the difficulties to solve them using traditional methods [17]. Gawali and Shinde [18] combined a modified analytic hierarchy process (MAHP), bandwidth aware divisible scheduling (BATS), and the longest expected processing time preemption (LEPT) to achieve improved performance. Qin *et al.* [20] took the idea of "ant colony optimization" from [19] and proposed a probabilistic algorithm that can simultaneously maximize the revenue of communications and minimize the power consumption of PMs. Similarly, Tawfeek *et al.* also adopted ant species' nature and presented a random optimization search approach for allocating the incoming jobs to the virtual machines [21]. The proposed method outperformed the popular first come first serve method. Patel introduced a hybrid algorithm that used a modified honeybee behavior-inspired algorithm for prioritybased tasks and an enhanced weighted round-robin algorithm for non-priority-based tasks [22] to balance the workload over the cloud dynamically. When using heuristic

methods with the consideration of co-resident attacks, various policies were compared. Jia *et al.* [23] proposed a VM allocation method to optimize load balancing and reduce energy consumption and security risks by managing CPU utilization of the hosting PMs. Miao *et al.* offered two metrics to outline co-residency and conflict of the cloud [24]. Both placement and migration algorithms mediate differences between tenants to alleviate co-resident attacks in the cloud proactively. Han *et al.* formulated a set of security metrics and a quantitative model to assign new VMs to the server with the most VMs [25]. The research uncovered that the server's configuration, oversubscription, and background traffic had a substantial impact on the ability to stop attackers from co-locating with the targets.

Yang *et al.* [26] explored a simplified algorithm for energy management in cloud computing. The paper centered around establishing a mathematical model to calculate computing nodes'stability, configuring a game-theoretic cooperative model for the task of scheduling cloud computing, and examining the problem as a multi-stage sequential game. Patra *et al.* presented the task as a player and the VM as a strategy in [27]. A non-cooperative game scheduling and a task balance scheduling algorithm are compared to collect the node's average task processing speed. Therefore, it was determined that the game-theoretic algorithm proposed could improve energy management in cloud computing. In [28], the cooperative behavior of multiple cloud servers was studied. An evolutionary mechanism was presented in the hierarchical cooperative game model for VMs deployment strategy to improve the efficiency in the public cloud environment. Jia *et al.* modeled several basic VM allocation policies using game theory to achieve a quantitative analysis, while also presenting the attack effectiveness, coverage, power consumption, workload balance, and cost under the VM allocation policies and solving the mathematical solution in CloudSim [23]. Their results found that to reduce the efficiency rates for the attacker, the cloud provider should apply a probabilistic VM allocation policy. Narwal *et al.* proposed a payoff matrix and a decision tree for any number of users [29, 30]. When a unique user was selected, the choices of investing in security were assessed until equilibrium was reached. Security games are a way of blocking the attacker's ability to locate the VMs they are searching for. Han *et al.* proposed a policy pool with multiple VM allocation policies from which to select the policy that will be used with a certain probability [31].

Difficulties regarding energy efficiency in cloud computing can also be addressed using machine learning-based techniques [32]. Witanto *et al.* employed a neural network-based adaptive selector procedure to arrange the VMs on the physical servers in data centers [33]. Pahlevan *et al.* presented a hyper-heuristic algorithm to exploit both heuristic and ML-based VM allocation methods by selecting the best one during run-time [32]. Zhang *et al.* [34] suggested an auction-based resource allocation scheme to represent a machine learning classification or regression problem. They outlined machine learning classification and posed two resource allocation prediction algorithms rooted in linear and logistic regression. Liu *et al.* presented a reinforcement learning-based approach to allow complex scenarios to efficiently manage resources [35]. In order to do so, they used neural networks to grasp the goal of the research model, RL to enhance the model, and E-greedy methodology to expand the RL process. Their approach lowered job delay for hybrid scenarios. ML-based methods have been proposed to fight against co-resident attacks focusing on different factors, such as minimizing the time of a malicious VM co-location. Joseph *et al.* [36] used traditional ML algorithms, such as support vector machine (SVM), naïve bayes, and random forests to detect malware, following a self-healing methodology to power off the attacked VMs and restore them to healthy conditions. In reality, there is a concern

with the amount of time it takes to implement a solution to mitigate VMs in the event of co-resident attacks. To the best of our knowledge, no current ML-based approach succeeded in mitigating co-resident attacks based on VM mitigation, while minimizing the VM downtime.

#### **3. Threat model**

There are many approaches to fight against co-resident attacks, including hardware modification, intrusion detection, secure VM allocation, and migration. Threat model building is crucial to guide proper defense. This section goes through the study of existing models and presents our proposed threat model with detailed variable selections.

#### **3.1 Modeling co-resident attacks**

Many optimization models were proposed to fight against co-resident attacks. Abazari *et al.* suggested a multi-objective optimization method to calculate alternative responses with the least amount of threat through graphics and proper attack countermeasures [37]. Liu *et al.* considered the three main factors which lead to the likelihood of malicious VMs co-locating with normal users [38]. Berrima *et al.* used a VM placement strategy to reduce the co-location attacks with complete resource optimization. Their approach presents a trade-off between security and VM startup delay [39]. Hasan *et al.* proposed a co-resident attacks mitigation and prevention (CAMP) model to separate malicious and benign VMs by comparing existing models over data security, data survivability, and user storage overhead [5]. Other works focused on a probabilistic co-residence coverage optimization model, while combining a data partition technique that involves arranging servers randomly [40, 41].

#### **3.2 Proposed threat model with detailed design components**

Our proposed approach takes the time-sensitive risk level from co-resident attacks into account and searches for the solution to the dynamic VM allocation problem through DRL. Research shows that the co-resident attack will have a total cycle of *t*3, consisting of three stages: probe, construct, and launch. Probe and construct generate a configuration interval. This is illustrated in **Figure 3**. To avoid the attacks, the defender must take action before the launching starts. In other words, before the configuration interval *t*<sup>2</sup> is reached [42].

Our co-resident risk model is developed in a similar scenario to [43]. The choice of variables is listed in **Table 1**.

The co-resident risk indicator can be obtained through the following equations:

$$\text{tr}r(v\_i, v\_j) = t\_i(v\_i) \times \text{CoRes}\left(v\_i, v\_j\right) \times t\_s(v\_j) \tag{1}$$

$$\begin{aligned} \textit{CoResFactor} &= \\ \begin{cases} a\_0 & \text{for } \textit{CoRes}\left(v\_i, v\_j\right) < t\_1 \\ a\_1 & \text{for } \textit{CoRes}\left(v\_i, v\_j\right) \in [t\_1, t\_2) \\ a\_2 & \text{for } \textit{CoRes}\left(v\_i, v\_j\right) \ge t\_2 \end{cases} \end{aligned} \tag{2}$$

**53**


*Variable definition.*

**Figure 3.**

*The timeline of attacks [42].*

The threat score, *ts*ð Þ *vi* , reflects the potential risk to *VMi*. It is a floating number between 0 and 1, with 0 representing no risk and 1 representing the highest risk. As illustrated in Eq. (1), the risk is also proportional to the co-resident duration time recorded in a matrix, *CoRes vi*, *vj* .

The VM's co-resident rewards factor, *CoResFactor*, is the crucial parameter in guiding RL training. It can be determined by where the co-resident attack cycle status of the VM resides. For example, the time of a co-existing VM pair on the same PM for a period of less than *t*<sup>1</sup> is considered to be safe. If there is a malicious VM, it means that it has not passed the probing stage yet. So, the risk of getting a co-resident attack is low. In this case, *α*<sup>0</sup> ¼ 0 is chosen. If the *CoRes vi*, *vj* is between *t*<sup>1</sup> and *t*2, the system needs to be aware that if a malicious VM exists in the pair; it reaches the constructing stage and moves closer to launching the attack. So, *α*<sup>1</sup> needs to be non-zero. While the VM pair co-exists on the same PM for more than *t*<sup>2</sup> time period, an attack could be launched. This is the situation to be avoided, so that *α*<sup>2</sup> is assigned to a more aggressive number.

#### **3.3 Assumptions**

Two assumptions guide the proposed model:

*Perspective Chapter: Deep Reinforcement Learning for Co-Resident Attack Mitigation… DOI: http://dx.doi.org/10.5772/intechopen.105991*


The system will be simulated under the above assumptions. The overall co-resident risk level, the number of co-resident attacks, and the VM slowdown ratio is the proposed system's evaluation metrics.

#### **4. DRL-based VM scheduling system design and simulation**

#### **4.1 Mathematical background of RL and schematics**

RL problems can be modeled as a Markov decision process (MDP) to find a policy by maximizing the accumulated rewards. An MDP has four tuples (*S*, *A*, *Pa*, *Ra*), where *S* is a set of states called state space, *A* is a set of actions called action space, *Pa* is the probability of state transition from *s* to *s* <sup>0</sup> under action *a*, and *Ra* is the immediate reward right after action *a*. There are two major methods to solve the reinforcement learning iteration problem. One is called *value* � *function*, and the other is *policy* � *gradient*. Q-Learning is an example of *value* � *function*, which has a function: *Q* : *S* � *A* ! *R*. Before learning begins, *Q* is initialized to 0 or a base value. The core of the algorithm is a Bellman equation, which updates the Q value with new information:

$$\begin{aligned} Q^{new}(s\_t, a\_t) &\leftarrow Q(s\_t, a\_t) + \\ a &\Big[r\_t + \gamma \max\_a Q(s\_{t+1}, a) - Q(s\_t, a)\Big] \end{aligned} \tag{3}$$

Here, *α* and *γ* are the learning rate and discount factor, respectively. *rt* is the reward at the time step *t*. We adopt DeepRM [44] framework, which follows *policy* � *gradient* with a deep neural network added into this system to solve large-scale RL tasks. This portion of the deep RL can be illustrated in **Figure 4**.

The nature of the *policy* � *gradient* is to maximize the expected cumulative discount reward *Eπθ* P<sup>∞</sup> *<sup>t</sup>*¼<sup>0</sup>*γ<sup>t</sup> rt* � �, which can be expressed as:

$$\nabla\_{\theta} E\_{\pi\theta} \left[ \sum\_{t=0}^{\infty} \gamma^{t} r\_{t} \right] = E\_{\pi\theta} \left[ \nabla\_{\theta} \log \pi\_{\theta}(s, a) Q^{\pi\theta}(s, a) \right] \tag{4}$$

Here, *γ* ∈ð � 0, 1 is a discount factor for future rewards. *rt* is the reward at the time step *t*. The VMM picks actions based on a *policy π* : *π*ð Þ! *s*, *a* ½ � 0, 1 , which is defined as the probability of action *a* taken in the state *s*. A manageable number of adjustable parameters, *θ*, are called the policy parameter. So, the policy can be represented as *πθ*ð Þ *s*, *a* , and *θ* will be updated via gradient descent:

$$
\theta \gets \theta + \beta \sum\_{t} \nabla\_{\theta} \log \pi\_{\theta}(s\_t, a\_t) v\_t \tag{5}
$$

**55**

*Approved for Public Release on 01 June 2022; Distribution Unlimited; Case number: AFRL-2022–2581.*

**Figure 4.** *Reinforcement learning with policy represented via DNN [44].*

where *β* is the step size. The corresponding expected cumulative discounted reward *<sup>Q</sup>πθ*ð Þ *<sup>s</sup>*, *<sup>a</sup>* can be estimated by the empirically computed cumulative discounted reward *vt*.

#### **4.2 RL components design**

Reinforcement learning is a unique type of machine learning paradigm, which has been successfully applied to task scheduling [45–49]. It contains several detailed components that need clarification. Here, we first define our state space, action space, and rewards before introducing the simulation system.

#### *4.2.1 State space*

RL is a model-free machine learning method; an agent learns from the trial-anderror process to interact with the environment. The state of the environment is defined as a vector of several components as shown in **Table 2**. They build the data structure of a *VM* which can be classified as:

1.Computing resources factors;

2. Security awareness factors (already introduced in **Table 1**).

The current allocation of the cluster resources can be retrieved by the mapping between VMs and resource slots available on the PM, which can be expressed as a matrix *X*.

#### *4.2.2 Action space*

It is assumed that VMs will be assigned to the PM if requested resources are available at each time step. The action space is defined by {0, 1, ..., *n*}, where 0 means *Perspective Chapter: Deep Reinforcement Learning for Co-Resident Attack Mitigation… DOI: http://dx.doi.org/10.5772/intechopen.105991*


**Table 2.** *Data structure of a VM.*

no action taken, and 1 through *n* means to allocate a new VM on the *n*<sup>0</sup> *s* PM slot. After each action, the next state space is obtained by updating the recent mapping of *X*.

#### *4.2.3 Rewards*

**57**

A reward strategy is designed to guide the VM allocation agent toward our goal: Sufficiently utilize current resources to complete jobs on time and simultaneously minimize co-resident attacks. The reward function must be carefully designed to avoid contradiction. In the proposed system with a total of *J* active VMs, the rewards function consists of two terms:


They can be calculated accordingly as:

$$R\_{\text{VMD}} = \sum\_{j \in J} \text{VM}\_{\text{Delayed}}(v\_j) = \sum\_{j \in J} \frac{-1}{T\_j} \tag{6}$$

$$R\_{\mathcal{RC}} = \sum\_{i,j \in \mathcal{J}} (-\mathbf{1}) \times rcr(v\_i, v\_j) / 2 \tag{7}$$

The full rewards are calculated as a weighted sum of the two terms with weights *ω*<sup>1</sup> and *ω*2. The overall rewards can be obtained by:

$$\text{TotalReurards} = a\_1 \times R\_{\text{VMD}} + a\_2 \times R\_{\text{RC}} \tag{8}$$

This rewards equation implies the objective of this novel VM allocation system is focused on:


#### **4.3 Simulation system design**

#### *4.3.1 System model*

Similar to the DeepRM [44] framework, the proposed simulation system is illustrated in **Figure 5**. CPU and memory (MEM) are the two resources for limited constraint consideration. When the VM is assigned to the PM, a time step starts to count the duration of the VM. If there are other VMs simultaneously assigned to the same PM, the co-resident counter is also started to accumulate the time steps and recorded in *CoRes vi*, *vj* . VM requests arrive according to a Bernoulli process. The backlog queue houses all the incoming VMs waiting for allocation.

#### *4.3.2 Co-resident duration matrix*

The time steps will be recorded in the co-resident duration matrix as shown below. This small-scale example limits each CPU and MEM resource to five slots. Here, five VMs f g *VM*1, … , *VM*<sup>5</sup> are illustrated with the life cycles marked with a start and end

**Figure 5.** *Resource, time steps, job slots, and backlog queue in [44].*

*Perspective Chapter: Deep Reinforcement Learning for Co-Resident Attack Mitigation… DOI: http://dx.doi.org/10.5772/intechopen.105991*

time steps as *VM*<sup>1</sup> : ½ � 0, 2 , *VM*<sup>2</sup> : ½ � 0, 4 , *VM*<sup>3</sup> : ½ � 0, 10 , *VM*<sup>4</sup> : ½ � 2, 4 , *VM*<sup>5</sup> : ½ � 4, 11 *:* Their life cycle lengths can be represented as 2, 4, 10, 2, 7 f g, respectively. The overlapped time steps shown in a co-resident duration matrix are:

$$\text{CoRes}(v\_i, v\_j) = \begin{bmatrix} 2 & 2 & 2 & 0 & 0 \\ 2 & 4 & 4 & 2 & 0 \\ 2 & 4 & 10 & 2 & 6 \\ 0 & 2 & 2 & 2 & 0 \\ 0 & 0 & 6 & 0 & 7 \end{bmatrix}$$

The proposed zero-trust strategy means all the VMs could be malicious, so the threat scores for all VMs are set to 1 (*ts*ð Þ¼ *vi* 1). Thus, the co-resident risk indicator matrix is:

$$\text{corr}(v\_i, v\_j) = \mathbf{1} \times \text{CoRes}\left(v\_i, v\_j\right) \times \mathbf{1} = \text{CoRes}\left(v\_i, v\_j\right) \tag{9}$$

#### *4.3.3 Configuration interval*

In the simulated system, five-time steps are chosen to represent *t*<sup>1</sup> and ten-time steps to represent the configuration interval *t*2. In **Figure 5**, time is represented in a vertical direction. In order to mitigate the co-resident attacks, the system is designed to train the agent to avoid two VMs sharing the same PM for more than *t*<sup>2</sup> time interval. Based on the timeline of the attacks illustrated in **Figure 3**, different values will be assigned to the *CoResFactor* as shown in Eq. (2).

#### *4.3.4 Risk mitigation strategies*

When two VMs have overlapped time steps less than *t*1, there is a minor risk of coresident attacks, so *α*<sup>0</sup> ¼ 0; when two VMs have resided on the same PM for more than *t*<sup>1</sup> time steps, but less than *t*2, co-resident attack risks start to accumulate. Thus, the first risk mitigation function is set to be: *α*<sup>1</sup> ¼ *k* � ð Þ *t* � *t*<sup>1</sup> , while *k* has been chosen from {0, 0.25, 0.5, 1, 2} to explore the efficiency of different choices. When two VMs have co-residence on the same PM for more than *t*<sup>2</sup> time steps, there is enough construction time for attacks to take place, so a more aggressive factor in the form of *k*2 is added to the reward function. The proposed system applies the second risk mitigation function: *α*<sup>2</sup> ¼ *k* � ð Þþ *t* � *t*<sup>1</sup> *k*2, where *k*2 has been tested in the pool of f g 0, 1, 2, 3, … . A portion of the risk mitigation function design can be found in **Figure 6**, where all the *k* values are presented; only *k*2 ¼ 0, *k*2 ¼ 1, and *k*2 ¼ 2 on top *k* ¼ 2 are shown on the graph.

#### *4.3.5 Software*

The system is programmed in Python with the flowchart illustrated in **Figure 7**. First, the arriving VMs are placed in a backlog queue. If the queue is not empty, the scheduling system operates to find the optimized solution to assign VMs to PMs. At each time step, the system will update the co-resident duration matrix which reflects the current risk level and will guide the choice of risk mitigation strategies.

**Figure 6.** *Risk mitigation function design.*

**Figure 7.** *The flowchart of the proposed system.*

*Perspective Chapter: Deep Reinforcement Learning for Co-Resident Attack Mitigation… DOI: http://dx.doi.org/10.5772/intechopen.105991*

#### **5. Simulation results analysis**

The co-resident simulation program is built upon the Python-based DeepRM [44] open-source platform. The neural network is constructed by a fully connected hidden layer with 20 neurons, and a total of 89,451 parameters. Poisson distribution with a new arriving rate of 0.7 is chosen to simulate the VMs' dynamic arrival. All the results are obtained in 2500 iterations.

Our proposed system introduces co-resident risk mitigation to task scheduling by adding Eq. (7) to the total rewards calculation of Eq. (8). During the investigation, it was observed that the system performed differently, while manipulating the risk mitigation function parameters illustrated in **Figure 6**. The effectiveness of the proposed mitigation scheme can be analyzed by the RL rewards, VM slowdown ratio, and attack reduction.

#### **5.1 Total rewards affected by risk mitigation factors**

As illustrated in Eq. (8), total discounted rewards can be captured by taking both VM delay and co-resident risks into consideration. Since there is no preference between the two, *ω*1and *ω*<sup>2</sup> in Eq. (8) are both set to 1. In the first experiment, *k*2 is set to 0, and *k* is chosen from 0, 1, and 2. The recorded total accumulated rewards in **Figure 8** explain that a smaller *k* value leads to a larger reward (Note: the reward is negative). When *k* ¼ 0there is no risk mitigation. As *k* increases, more mitigation influence will be placed in the system, and the total discounted reward decreases.

DeepRM provides DRL and other heuristics VM allocation methods, such as tetris, random allocation, and small jobs first (SJF), for comparison. Although those methods do not have co-resident risk mitigation features, the total discounted rewards can illustrate how severe the cybersecurity risks they are experiencing. Two user cases are

**Figure 8.** *The total rewards accumulated from different k values.*


**Table 3.** *Total discounted rewards.*

shown in **Table 3**. A negative number with a larger absolute value means a worse situation.

#### **5.2 Slowdown ratio affected by risk mitigation factors**

The metric to measure the efficiency of VM scheduling is to calculate the *VMDelayed* as defined in **Table 2**. In programming, the slowdown ratio is utilized. Each VM has its own expected life cycle shown as the "Ideal length of finishing" in **Table 2**. It also has a length marked by time step when generated. When the VM is assigned to a PM, the "Start time" is marked. At the time of finishing, a "Finish time" is recorded. The parameter *Slowdown* is calculated by Eq. (10). Ideally, if there is no delay in the execution, *Slowdown* ¼ 1, but in the actual application, many factors can cause the delay. Thus, *Slowdown* ≥1.

$$\text{Slouredown} = (\textit{FinishTime} - \textit{StartTime}) / \text{VMLLength} \tag{10}$$

With an increment of *k* value, more rewards are generated to mitigate the potential co-resident risks through the risk mitigation function. As a matter of fact, it sacrifices the VM completion time, so the slowdown ratio increases. Experiments show that "Random" allocation of VMs has the largest average slowdown ratio. If using "Random" slowdown ratio as a baseline, the percentage of slowdown ratio reduction from the baseline data is shown in **Table 4**.

#### **5.3 Co-resident attacks reduction by risk mitigation factors**

Considering the goal of mitigating co-resident attacks, a group of experiments is conducted to represent the effectiveness of different risk mitigation function parameters under RL scenario. **Figure 9** illustrates the total counts of co-resident attacks if *k* and *k*2 are set as in **Figure 6**. If *k* ¼ 2 and *k*2 ¼ 1, the count reduces dramatically compared with *k* ¼ 0 and *k*2 ¼ 0, where there is no mitigation applied.

#### **6. Conclusions and future work**

This chapter addresses the importance of cybersecurity awareness in cloud computing resource management. The proposed RL-based scheduling method takes both


**Table 4.**

*VM slowdown ratio over random method.*

*Perspective Chapter: Deep Reinforcement Learning for Co-Resident Attack Mitigation… DOI: http://dx.doi.org/10.5772/intechopen.105991*

**Figure 9.** *The potential co-resident attacks by different k and k2 selection strategies.*

*VM* slowdown time and co-resident attack risks mitigation into consideration. The co-resident risk model under no-trust conditions is formed. As a result, the problem is narrowed down to minimizing the co-tenancy on the same *PM* among all the active VMs. Finally, a DRL-based task scheduling system is simulated with proposed risk mitigation factors.

This chapter proves that much can be explored in resource management and risk mitigation in cloud computing. It is evident that ML obtained much attention recently, and more applications are being developed in this direction. Although there is a concern about training costs under a deep learning algorithm, it outperforms other methods in adaptation to a more dynamic environment, which makes it outstanding. If designed properly, the computational burden can be shifted to offline. The above experiment results are obtained by using MacBook Air with a 2.2GHz dual-core Intel i7 processor and 8GB memory. It takes 3 minutes per 2500 iterations to train the policies. While applying the pre-trained model to take actions during runtime testing, it will not take longer than 2 seconds for the longest allocation decision. The results show applying reinforcement learning to co-resident risk mitigation is plausible. Different mitigation strategies lead to different VM completion ratios and risk levels. The proposed strategies proved to be helpful in searching for VM allocation improvement with consideration of both VM completion constraints and co-resident risk awareness. In the future, a more in-depth investigation of the reward equation design will be conducted. A thorough search accompanied by mathematical models to discuss the convergence will be explored. An advanced cost function will be developed with resources and security constraints. Multi-agent reinforcement learning will be applied to extend the model of this research and the efficiency will be tested and compared.

#### **Acknowledgements**

This work was supported in part by the Air Force Research Laboratory and Department of Education MSEIP grant award no. P120A180114, Texas A & M Engineering Experiment Station Annual Research Conference Project (TEES TARC) Award 28-235980-00020, and the National Science Foundation grant award no. OAC 1827243. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the Air Force Research Laboratory or the U.S. Government.

#### **Conflict of interest**

The authors declare no conflict of interest.

### **Author details**

Suxia Cui<sup>1</sup> \*† and Soamar Homsi2†


© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Perspective Chapter: Deep Reinforcement Learning for Co-Resident Attack Mitigation… DOI: http://dx.doi.org/10.5772/intechopen.105991*

#### **References**

[1] Hossain S. Chapter 1 Cloud Computing Terms, Definitions, and Taxonomy. In: Cloud Computing Service and Deployment Models: Layers and Management. IGI Global. 2013. pp. 1-25

[2] BarbosaAndrea FP, Charão AS. Impact of pay-as-you-go cloud platforms on software pricing and development: A review and case study. Computational Science and Its Applications. 2012;**7336**: 404-417

[3] Smith JE, Nair R. Virtual Machines: Versatile Platforms for Systems and Processes. Amsterdam, Netherlands: Elsevier; 2005

[4] Tank D, Aggarwal A, Chaubey N. Virtualization vulnerabilities, security issues, and solutions: a critical study and comparison. International Journal of Information Technology. 2022;**14**:847–862

[5] Hasan MM, Rahman MA. Protection by Detection: A Signaling Game Approach to Mitigate Co-Resident Attacks in Cloud. In: 2017 IEEE 10th International Conference on Cloud Computing (CLOUD). Honolulu, HI, USA; 2017. pp. 552-559

[6] Ding J, Sha L, Chen X. Modeling and evaluating IaaS cloud using performance evaluation process algebra. In: 2016 22nd Asia-Pacific Conference on Communications (APCC). Yogyakarta, Indonesia. 2016. pp. 243-247

[7] Addya SK, Turuk AK, Satpathy A, Sahoo B, Sarkar M. A strategy for live migration of virtual machines in a cloud federation. IEEE Systems Journal. 2019; **13**(3):2877-2887

[8] Velayudhan Kumar MR, Raghunathan S. Heterogeneity and thermal aware adaptive heuristics for energy efficient consolidation of virtual machines in infrastructure clouds. Journal of Computer and System Sciences. 2016;**82**(2):191-212

[9] Mthunzi SN, Benkhelifa E, Alsmirat MA, Jararweh Y. Analysis of VM communication for VM-based cloud security systems. In: 2018 Fifth International Conference on Software Defined Systems (SDS). 2018. pp. 182-188

[10] Sane BO, Niang I, Fall D. A review of virtualization, hypervisor and VM allocation security: Threats, vulnerabilities, and countermeasures. In: 2018 International Conference on Computational Science and Computational Intelligence (CSCI). Las Vegas, NV, USA. 2018. pp. 1317-1322

[11] Navamani BA, Yue C, Zhou X. Discover and Secure (DaS): An Automated Virtual Machine Security Management Framework. In: 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC). Orlando, FL, USA. 2018. pp. 1-6

[12] Kong T, Wang L, Ma D, Xu Z, Yang Q, Chen K. A Secure Container Deployment Strategy by Genetic Algorithm to Defend against Co-Resident Attacks in Cloud Computing. In: 2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/ SmartCity/DSS). Zhangjiajie, China: IEEE; 2019. pp. 1825-1832

[13] Qiao A, Choe SK, Subramanya SJ, Neiswanger W, Ho Q, Zhang H, et al. Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning. In: 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21). Virtual Conference; 2021. pp. 1-18

[14] Xiao W, Ren S, Li Y, Zhang Y, Hou P, Li Z, et al. AntMan: Dynamic scaling on GPU clusters for deep learning. In: 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). virtual conference; 2020. pp. 533-548

[15] Gu J, Chowdhury M, Shin KG, Zhu Y, Jeon M, Qian J, et al. Tiresias: A GPU cluster manager for distributed deep learning. In: 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19). Boston, MA: USENIX Association; 2019. pp. 485-500

[16] Zhao J, Rodríguez MA, Buyya R. A deep reinforcement learning approach to resource management in hybrid clouds harnessing renewable energy and task scheduling. In: 2021 IEEE 14th International Conference on Cloud Computing (CLOUD). Chicago, IL, USA; 2021. pp. 240-249

[17] Gupta K, Katiyar V. Survey of resource provisioning heuristics in cloud and their parameters. International Journal of Computational Intelligence Research. 2017;**13**(5):1283-1300

[18] Gawali MB, Shinde SK. Task scheduling and resource allocation in cloud computing using a heuristic approach. Journal of Cloud Computing. 2018;**7**(1):4

[19] Dorigo M, Birattari M, Stutzle T. Ant colony optimization. IEEE Computational Intelligence Magazine. 2006;**1**(4):28-39

[20] Qin Y, Wang H, Zhu F, Zhai L. A multi-objective ant colony system

algorithm for virtual machine placement in traffic intense data centers. IEEE Access. 2018;**6**:58912-58923

[21] Tawfeek MA, El-Sisi A, Keshk AE, Torkey FA. Cloud task scheduling based on ant colony optimization. In: 2013 8th International Conference on Computer Engineering Systems (ICCES). Cairo, Egypt; 2013. pp. 64-69

[22] Patel KD, Bhalodia TM. An efficient dynamic load balancing algorithm for virtual machine in cloud computing. In: 2019 International Conference on Intelligent Computing and Control Systems (ICCS). Madurai, India; 2019. pp. 145-150

[23] Jia H, Liu X, Di X, Qi H, Cong L, Li J, et al. Security strategy for virtual machine allocation in cloud computing. Procedia Computer Science. 2019;**147**: 140-144

[24] Miao F, Wang L, Wu Z. A VM placement based approach to proactively mitigate co-resident attacks in cloud. In: 2018 IEEE Symposium on Computers and Communications. Natal, Brazil: IEEE; 2018. pp. 00285-00291

[25] Han Y, Chan J, Alpcan T, Leckie C. Virtual machine allocation policies against co-resident attacks in cloud computing. In: 2014 IEEE International Conference on Communications (ICC). Sydney, NSW, Australia: IEEE. 2014. pp. 786-792

[26] Yang J, Jiang B, Lv Z, Choo KKR. A task scheduling algorithm considering game theory designed for energy management in cloud computing. Future Generation Computer Systems. 2020; **105**:985-992

[27] Patra MK, Sahoo S, Sahoo B, Turuk AK. Game theoretic approach for real-time task scheduling in cloud

*Perspective Chapter: Deep Reinforcement Learning for Co-Resident Attack Mitigation… DOI: http://dx.doi.org/10.5772/intechopen.105991*

computing environment. In: 2019 International Conference on Information Technology (ICIT). Bhubaneswar, India: IEEE; 2019. pp. 454-459

[28] Han K, Cai X, Rong H. An evolutionary game theoretic approach for efficient virtual machine deployment in green cloud. In: 2015 International Conference on Computer Science and Mechanical Automation (CSMA). Hangzhou, China; 2015. pp. 1-4

[29] Narwal P, Singh SN, Kumar D. Predicting strategic behavior using game theory for secure virtual machine allocation in cloud. In: Networking Communication and Data Knowledge Engineering. Singapore: Springer; 2018. pp. 83-92

[30] Narwal P, Kumar D, Singh SN. A hidden markov model combined with Markov Games for intrusion detection in cloud. Journal of Cases on Information Technology (JCIT). 2019;**21**(4):14-26

[31] Han Y, Alpcan T, Chan J, Leckie C. Security games for virtual machine allocation in cloud computing. In: International Conference on Decision and Game Theory for Security. Fort Worth, TX, USA: Springer; 2013. pp. 99-118

[32] Pahlevan A, Qu X, Zapater M, Atienza D. Integrating heuristic and machine-learning methods for efficient virtual machine allocation in data centers. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 2017;**37**(8): 1667-1680

[33] Witanto JN, Lim H, Atiquzzaman M. Adaptive selection of dynamic VM consolidation algorithm using neural network for cloud resource management. Future Generation Computer Systems. 2018;**87**:35-42

[34] Zhang J, Xie N, Zhang X, Yue K, Li W, Kumar D. Machine learning based resource allocation of cloud computing in auction. Comput Mater Continua. 2018;**56**(1):123-135

[35] Liu Z, Zhang H, Rao B, Wang L. A reinforcement learning based resource management approach for time-critical workloads in distributed computing environment. In: 2018 IEEE International Conference on Big Data (Big Data). Seattle, WA, USA: IEEE; 2018. pp. 252-261

[36] Joseph L, Mukesh R. To detect malware attacks for an autonomic selfheal approach of virtual machines in cloud computing. In: Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM). Chennai, India: IEEE; 2019. pp. 220-231

[37] Abazari F, Analoui M, Takabi H. Multi-objective response to co-resident attacks in cloud environment. International Journal of Information and Communication Technology Research. 2017;**9**(3):25-36

[38] Liu Y, Ruan X, Cai S, Li R, He H. An optimized VM allocation strategy to make a secure and energy-efficient cloud against co-residence attack. In: International Conference on Computing, Networking and Communications (ICNC). Maui, HI, USA: IEEE; 2018. pp. 349-353

[39] Berrima M, Nasr AK, Ben RN. Co-location resistant strategy with full resources optimization. In: Proceedings of the 2016 ACM on Cloud Computing Security Workshop. Hofburg Palace, Vienna, Austria; 2016. pp. 3-10

[40] Levitin G, Xing L, Dai Y. Coresidence based data vulnerability vs. security in cloud computing system with random server assignment. European Journal of Operational Research. 2018; **267**(2):676-686

[41] Xing L, Levitin G. Balancing theft and corruption threats by data partition in cloud system with independent server protection. Reliability Engineering & System Safety. 2017;**167**:248-254

[42] Zhang Y, Li M, Bai K, Yu M, Zang W. Incentive compatible moving target defense against vm-colocation attacks in clouds. In: IFIP International Information Security Conference. Heraklion, Crete, Greece: Springer; 2012. pp. 388-399

[43] Wang X, Wang L, Miao F, Yang J. SVMDF: A secure virtual machine deployment framework to mitigate coresident threat in cloud. In: 2019 IEEE Symposium on Computers and Communications (ISCC). Barcelona, Spain; 2019. pp. 1-7

[44] Miao H, Alizadeh M, Menache I, Kandula S. Resource management with deep reinforcement learning. In: HotNet '16: Proceedings of the 15th ACM Workshop on Hot Topics in Networks. Atlanta, Georgia, USA. 2016. pp. 50-56

[45] Mao H, Schwarzkopf M, Venkatakrishnan SB, Meng Z, Alizadeh M. Learning Scheduling Algorithms for Data Processing Clusters. In: Proceedings of the 2019 ACM Special Interest Group on Data Communication (SIGCOMM). Beijing, China; 2019. p. 270–288

[46] Tuli S, Ilager S, Ramamohanarao K, Buyya R. Dynamic scheduling for stochastic edge-cloud computing environments using A3C learning and residual recurrent neural networks. In: IEEE Transactions on Mobile Computing. 2022 March;**21**(3):940-954

[47] Tuli S, Poojara SR, Srirama SN, Casale G, Jennings NR. COSCO: Container Orchestration Using Co-Simulation and Gradient Based Optimization for Fog Computing Environments. IEEE Transactions on Parallel and Distributed Systems. 2022;**33**(1):101-116

[48] Paeng B, Park IB, Park J. Deep reinforcement learning for minimizing tardiness in parallel machine scheduling with sequence dependent family setups. IEEE Access. 2021;**9**(10):1390-1401

[49] Asheralieva A, Niyato D, Xiong Z. Auction-and-learning based lagrange coded computing model for privacypreserving, secure, and resilient mobile edge computing. In: IEEE Transactions on Mobile Computing. 2021;early access. pp. 1-2
