of defects, etc. Eclipse JDT:

It has 22 patterns 9 Projects, 139 ins. Format: XML Manually detected and

nsciences.github.io/tree/master/re po/defect/mccabehalsted/\_posts

http://promise.site.uottawa.ca/SERe pository/datasets-page.html

http://www.seiplab.riteh.uniri.hr/?

http://www.ptidej.net/tools/desig

page\_id=834&lang=en

npatterns/

pattern mining.

*Data Mining - Methods, Applications and Systems*

**Figure 5.**

**Figure 6.**

PROMISE Defect Pred.

Software Defect Pred. Data

**Table 6.**

**142**

PMART Design

Effort Est.

pattern mining

*Description of popular repositories used in studies.*


*Data Mining - Methods, Applications and Systems*

**References**

[1] Halkidi M, Spinellis D, Tsatsaronis G,

*DOI: http://dx.doi.org/10.5772/intechopen.91448*

*Data Mining and Machine Learning for Software Engineering*

[9] Prasad MC, Florence L, Arya A. A study on software metrics based software defect prediction using data mining and machine learning techniques. International Journal of Database Theory and Application. 2015;**8**(3):179-190. DOI:

10.14257/ijdta.2015.8.3.15

pp. 17-26

pp. 309-320

[10] Zhang Y, Lo D, Xia X, Sun J. Combined classifier for cross-project defect prediction: An extended

empirical study. Frontiers of Computer Science. 2018;**12**(2):280-296. DOI: 10.1007/s11704-017-6015-y

[11] Yang X, Lo D, Xia X, Zhang Y, Sun J. Deep learning for just-in-time defect prediction. In: International Conference on Software Quality, Reliability and Security (QRS); 3–5 August 2015; Vancouver, Canada: IEEE; 2015.

[12] Zhang F, Zheng Q, Zou Y, Hassan AE. Cross-project defect prediction

unsupervised classifier. In: Proceedings of the 38th International Conference on Software Engineering ACM; 14–22 May 2016; Austin, TX, USA: IEEE; 2016.

[13] Di Nucci D, Palomba F, Oliveto R, De Lucia A. Dynamic selection of classifiers in bug prediction: An

adaptive method. IEEE Transactions on Emerging Topics in Computational Intelligence. 2017;**1**(3):202-212. DOI: 10.1109/TETCI.2017.2699224

[14] Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Cross-project defect prediction: A large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE '09); August 2009; Amsterdam, Netherlands:

ACM; 2009. pp. 91-100

using a connectivity-based

[2] Dhamija A, Sikka S. A review paper

[3] Minku LL, Mendes E, Turhan B. Data mining for software engineering and humans in the loop. Progress in Artificial Intelligence. 2016;**5**(4):

[4] Malhotra R. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing. 2015;**27**:504-518. DOI:

10.1016/j.asoc.2014.11.023

[5] Mayvan BB, Rasoolzadegan A, Ghavidel Yazdi Z. The state of the art on design patterns: A systematic mapping of the literature. Journal of Systems and

Software. 2017;**125**:93-118. DOI: 10.1016/j.jss.2016.11.030

Research patterns and trends in

Strategies. 2010;**2**(3):243-257

AA. Applying swarm ensemble

IEEE. 2014. pp. 356-361

**145**

[6] Sehra SK, Brar YS, Kaur N, Sehra SS.

software effort estimation. Information and Software Technology. 2017;**91**:1-21. DOI: 10.1016/j.infsof.2017.06.002

[7] Taylor Q, Giraud-Carrier C, Knutson CD. Applications of data mining in software engineering. International Journal of Data Analysis Techniques and

[8] Coelho RA, Guimarães FRN, Esmin

clustering technique for fault prediction using software metrics. In: Machine Learning and Applications (ICMLA), 2014 13th International Conference on

Vazirgiannis M. Data mining in software engineering. Intelligent Data Analysis. 2011;**15**(3):413-441. DOI:

on software engineering areas implementing data mining tools & techniques. International Journal of Computational Intelligence Research.

10.3233/IDA-2010-0475

2017;**13**(4):559-574

307-314

### **Author details**

Elife Ozturk Kiyak Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Turkey

\*Address all correspondence to: elife.ozturk@ceng.deu.edu.tr

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Data Mining and Machine Learning for Software Engineering DOI: http://dx.doi.org/10.5772/intechopen.91448*

#### **References**

[1] Halkidi M, Spinellis D, Tsatsaronis G, Vazirgiannis M. Data mining in software engineering. Intelligent Data Analysis. 2011;**15**(3):413-441. DOI: 10.3233/IDA-2010-0475

[2] Dhamija A, Sikka S. A review paper on software engineering areas implementing data mining tools & techniques. International Journal of Computational Intelligence Research. 2017;**13**(4):559-574

[3] Minku LL, Mendes E, Turhan B. Data mining for software engineering and humans in the loop. Progress in Artificial Intelligence. 2016;**5**(4): 307-314

[4] Malhotra R. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing. 2015;**27**:504-518. DOI: 10.1016/j.asoc.2014.11.023

[5] Mayvan BB, Rasoolzadegan A, Ghavidel Yazdi Z. The state of the art on design patterns: A systematic mapping of the literature. Journal of Systems and Software. 2017;**125**:93-118. DOI: 10.1016/j.jss.2016.11.030

[6] Sehra SK, Brar YS, Kaur N, Sehra SS. Research patterns and trends in software effort estimation. Information and Software Technology. 2017;**91**:1-21. DOI: 10.1016/j.infsof.2017.06.002

[7] Taylor Q, Giraud-Carrier C, Knutson CD. Applications of data mining in software engineering. International Journal of Data Analysis Techniques and Strategies. 2010;**2**(3):243-257

[8] Coelho RA, Guimarães FRN, Esmin AA. Applying swarm ensemble clustering technique for fault prediction using software metrics. In: Machine Learning and Applications (ICMLA), 2014 13th International Conference on IEEE. 2014. pp. 356-361

[9] Prasad MC, Florence L, Arya A. A study on software metrics based software defect prediction using data mining and machine learning techniques. International Journal of Database Theory and Application. 2015;**8**(3):179-190. DOI: 10.14257/ijdta.2015.8.3.15

[10] Zhang Y, Lo D, Xia X, Sun J. Combined classifier for cross-project defect prediction: An extended empirical study. Frontiers of Computer Science. 2018;**12**(2):280-296. DOI: 10.1007/s11704-017-6015-y

[11] Yang X, Lo D, Xia X, Zhang Y, Sun J. Deep learning for just-in-time defect prediction. In: International Conference on Software Quality, Reliability and Security (QRS); 3–5 August 2015; Vancouver, Canada: IEEE; 2015. pp. 17-26

[12] Zhang F, Zheng Q, Zou Y, Hassan AE. Cross-project defect prediction using a connectivity-based unsupervised classifier. In: Proceedings of the 38th International Conference on Software Engineering ACM; 14–22 May 2016; Austin, TX, USA: IEEE; 2016. pp. 309-320

[13] Di Nucci D, Palomba F, Oliveto R, De Lucia A. Dynamic selection of classifiers in bug prediction: An adaptive method. IEEE Transactions on Emerging Topics in Computational Intelligence. 2017;**1**(3):202-212. DOI: 10.1109/TETCI.2017.2699224

[14] Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Cross-project defect prediction: A large scale experiment on data vs. domain vs. process. In: Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the Symposium on the Foundations of Software Engineering (ESEC/FSE '09); August 2009; Amsterdam, Netherlands: ACM; 2009. pp. 91-100

**Author details**

Elife Ozturk Kiyak

**144**

Graduate School of Natural and Applied Sciences, Dokuz Eylul University, Turkey

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

\*Address all correspondence to: elife.ozturk@ceng.deu.edu.tr

provided the original work is properly cited.

*Data Mining - Methods, Applications and Systems*

[15] Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of crosscompany and within-company data for defect prediction. Empirical Software Engineering. 2009;**14**(5):540-578. DOI: 10.1007/s10664-008-9103-7

[16] Herbold S, Trautsch A, Grabowski J. A comparative study to benchmark cross-project defect prediction approaches. IEEE Transactions on Software Engineering. 2017;**44**(9): 811-833. DOI: 10.1109/TSE.2017.2724538

[17] Ghotra B, McIntosh S, Hassan AE. Revisiting the impact of classification techniques on the performance of defect prediction models. In: IEEE/ACM 37th IEEE International Conference on Software Engineering; 16–24 May 2015; Florence, Italy: IEEE; 2015. pp. 789-800

[18] Wang T, Li W, Shi H, Liu Z. Software defect prediction based on classifiers ensemble. Journal of Information & Computational Science. 2011;**8**:4241-4254

[19] Wang S, Yao X. Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability. 2013;**62**:434-443. DOI: 10.1109/TR.2013.2259203

[20] Rodriguez D, Herraiz I, Harrison R, Dolado J, Riquelme JC. Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering; May 2014; London, United Kingdom: ACM; 2014. p. 43

[21] Laradji IH, Alshayeb M, Ghouti L. Software defect prediction using ensemble learning on selected features. Information and Software Technology. 2015;**58**:388-402. DOI: 10.1016/j. infsof.2014.07.005

[22] Malhotra R, Raje R. An empirical comparison of machine learning

techniques for software defect prediction. In: Proceedings of the 8th International Conference on Bioinspired Information and Communications Technologies. Boston, Massachusetts; December 2014. pp. 320-327

techniques for defect detection and classification. In: Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA); 14–15 November 2014; Odisha, India; 2014. pp. 387-395

*DOI: http://dx.doi.org/10.5772/intechopen.91448*

*Data Mining and Machine Learning for Software Engineering*

random undersampling and stacking technique. Jurnal Teknologi. 2017;**79**:

[37] Singh PD, Chug A. Software defect prediction analysis using machine learning algorithms. In: 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence; 2–13 January 2017; Noida, India: IEEE; 2017.

[38] Hammouri A, Hammad M, Alnabhan M, Alsarayrah F. Software bug prediction on using machine learning approach. International Journal of Advanced Computer Science and

Applications. 2018;**9**:78-83

Strategies. 2017;**9**:1-16

9353-3

pp. 1-6

[39] Akour M, Alsmadi I, Alazzam I. Software fault proneness prediction: A comparative study between bagging, boosting, and stacking ensemble and base learner methods. International Journal of Data Analysis Techniques and

[40] Bowes D, Hall T, Petric J. Software

[41] Watanabe T, Monden A, Kamei Y, Morisaki S. Identifying recurring association rules in software defect prediction. In: IEEE/ACIS 15th

International Conference on Computer and Information Science (ICIS); 26–29 June 2016; Okayama, Japan: IEEE; 2016.

[42] Zhang S, Caragea D, Ou X. An empirical study on using the national vulnerability database to predict

Springer; 2011. pp. 217-223

software vulnerabilities. In: International Conference on Database and Expert Systems Applications. Berlin, Heidelberg:

[43] Wen J, Li S, Lin Z, Hu Y, Huang C. Systematic literature review of machine

defect prediction: Do different classifiers find the same defects? Software Quality Journal. 2018;**26**: 525-552. DOI: 10.1007/s11219-016-

45-50

pp. 775-781

[30] Yousef AH. Extracting software static defect models using data mining. Ain Shams Engineering Journal. 2015;**6**:

[31] Gupta DL, Saxena K. AUC based software defect prediction for objectoriented systems. International Journal of Current Engineering and Technology.

[32] Kumar L, Rath SK. Application of genetic algorithm as feature selection technique in development of effective fault prediction model. In: IEEE Uttar

Conference on Electrical, Computer and Electronics Engineering (UPCON); 9-11 December 2016; Varanasi, India: IEEE;

[33] Tomar D, Agarwal S. Prediction of defective software modules using class

Computational Intelligence and Soft Computing. 2016;**2016**:1-12. DOI:

[34] Ryu D, Baik J. Effective multiobjective naïve Bayes learning for crossproject defect prediction. Applied Soft Computing. 2016;**49**:1062-1077. DOI:

Pradesh Section International

imbalance learning. Applied

10.1016/j.asoc.2016.04.009

0892-6

**147**

[35] Ali MM, Huda S, Abawajy J,

Alyahya S, Al-Dossari H, Yearwood J. A parallel framework for Software Defect detection and metric selection on cloud computing. Cluster Computing. 2017; **20**:2267-2281. DOI: 10.1007/s10586-017-

[36] Wijaya A, Wahono RS. Tackling imbalanced class in software defect prediction using two-step cluster based

10.1155/2016/7658207

133-144. DOI: 10.1016/j.

asej.2014.09.007

2016;**6**:1728-1733

2016. pp. 432-437

[23] Malhotra R. An empirical framework for defect prediction using machine learning techniques with Android software. Applied Soft Computing. 2016;**49**:1034-1050. DOI: 10.1016/j.asoc.2016.04.032

[24] Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K. Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th International Conference on Software Engineering (ICSE '16). Austin, Texas; May 2016. pp. 321-332

[25] Kumar L, Misra S, Rath SK. An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Computer Standards & Interfaces. 2017; **53**:1-32. DOI: 10.1016/j.csi.2017.02.003

[26] Yang X, Lo D, Xia X, Sun J. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology. 2017;**87**:206-220. DOI: 10.1016/j. infsof.2017.03.007

[27] Chen X, Zhao Y, Wang Q, Yuan Z. MULTI: Multi-objective effort-aware just-in-time software defect prediction. Information and Software Technology. 2018;**93**:1-13. DOI: 10.1016/j. infsof.2017.08.004

[28] Zimmermann T, Premraj R, Zeller A. Predicting defects for eclipse. In: Third International Workshop on Predictor Models in Software Engineering (PROMISE'07); 20-26 May 2007; Minneapolis, USA: IEEE; 2007. p. 9

[29] Prakash VA, Ashoka DV, Aradya VM. Application of data mining

*Data Mining and Machine Learning for Software Engineering DOI: http://dx.doi.org/10.5772/intechopen.91448*

techniques for defect detection and classification. In: Proceedings of the 3rd International Conference on Frontiers of Intelligent Computing: Theory and Applications (FICTA); 14–15 November 2014; Odisha, India; 2014. pp. 387-395

[15] Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of crosscompany and within-company data for defect prediction. Empirical Software Engineering. 2009;**14**(5):540-578. DOI:

*Data Mining - Methods, Applications and Systems*

techniques for software defect prediction. In: Proceedings of the 8th International Conference on Bioinspired Information and Communications Technologies. Boston, Massachusetts;

December 2014. pp. 320-327

[23] Malhotra R. An empirical

10.1016/j.asoc.2016.04.032

pp. 321-332

infsof.2017.03.007

infsof.2017.08.004

framework for defect prediction using machine learning techniques with Android software. Applied Soft Computing. 2016;**49**:1034-1050. DOI:

[24] Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K. Automated parameter optimization of classification techniques for defect prediction models. In: Proceedings of the 38th International Conference on Software Engineering (ICSE '16). Austin, Texas; May 2016.

[25] Kumar L, Misra S, Rath SK. An empirical analysis of the effectiveness of software metrics and fault prediction model for identifying faulty classes. Computer Standards & Interfaces. 2017; **53**:1-32. DOI: 10.1016/j.csi.2017.02.003

[26] Yang X, Lo D, Xia X, Sun J. TLEL: A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology. 2017;**87**:206-220. DOI: 10.1016/j.

[27] Chen X, Zhao Y, Wang Q, Yuan Z. MULTI: Multi-objective effort-aware just-in-time software defect prediction. Information and Software Technology.

[28] Zimmermann T, Premraj R, Zeller A. Predicting defects for eclipse. In: Third International Workshop on Predictor Models in Software Engineering (PROMISE'07); 20-26 May 2007; Minneapolis, USA: IEEE; 2007. p. 9

[29] Prakash VA, Ashoka DV, Aradya VM. Application of data mining

2018;**93**:1-13. DOI: 10.1016/j.

[16] Herbold S, Trautsch A, Grabowski J. A comparative study to benchmark cross-project defect prediction approaches. IEEE Transactions on Software Engineering. 2017;**44**(9): 811-833. DOI: 10.1109/TSE.2017.2724538

[17] Ghotra B, McIntosh S, Hassan AE. Revisiting the impact of classification techniques on the performance of defect prediction models. In: IEEE/ACM 37th IEEE International Conference on Software Engineering; 16–24 May 2015; Florence, Italy: IEEE; 2015. pp. 789-800

[18] Wang T, Li W, Shi H, Liu Z. Software defect prediction based on classifiers ensemble. Journal of

[19] Wang S, Yao X. Using class imbalance learning for software defect prediction. IEEE Transactions on Reliability. 2013;**62**:434-443. DOI:

10.1109/TR.2013.2259203

and Assessment in Software

Kingdom: ACM; 2014. p. 43

infsof.2014.07.005

**146**

2011;**8**:4241-4254

Information & Computational Science.

[20] Rodriguez D, Herraiz I, Harrison R, Dolado J, Riquelme JC. Preliminary comparison of techniques for dealing with imbalance in software defect prediction. In: Proceedings of the 18th International Conference on Evaluation

Engineering; May 2014; London, United

[21] Laradji IH, Alshayeb M, Ghouti L. Software defect prediction using ensemble learning on selected features. Information and Software Technology. 2015;**58**:388-402. DOI: 10.1016/j.

[22] Malhotra R, Raje R. An empirical comparison of machine learning

10.1007/s10664-008-9103-7

[30] Yousef AH. Extracting software static defect models using data mining. Ain Shams Engineering Journal. 2015;**6**: 133-144. DOI: 10.1016/j. asej.2014.09.007

[31] Gupta DL, Saxena K. AUC based software defect prediction for objectoriented systems. International Journal of Current Engineering and Technology. 2016;**6**:1728-1733

[32] Kumar L, Rath SK. Application of genetic algorithm as feature selection technique in development of effective fault prediction model. In: IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON); 9-11 December 2016; Varanasi, India: IEEE; 2016. pp. 432-437

[33] Tomar D, Agarwal S. Prediction of defective software modules using class imbalance learning. Applied Computational Intelligence and Soft Computing. 2016;**2016**:1-12. DOI: 10.1155/2016/7658207

[34] Ryu D, Baik J. Effective multiobjective naïve Bayes learning for crossproject defect prediction. Applied Soft Computing. 2016;**49**:1062-1077. DOI: 10.1016/j.asoc.2016.04.009

[35] Ali MM, Huda S, Abawajy J, Alyahya S, Al-Dossari H, Yearwood J. A parallel framework for Software Defect detection and metric selection on cloud computing. Cluster Computing. 2017; **20**:2267-2281. DOI: 10.1007/s10586-017- 0892-6

[36] Wijaya A, Wahono RS. Tackling imbalanced class in software defect prediction using two-step cluster based random undersampling and stacking technique. Jurnal Teknologi. 2017;**79**: 45-50

[37] Singh PD, Chug A. Software defect prediction analysis using machine learning algorithms. In: 7th International Conference on Cloud Computing, Data Science & Engineering-Confluence; 2–13 January 2017; Noida, India: IEEE; 2017. pp. 775-781

[38] Hammouri A, Hammad M, Alnabhan M, Alsarayrah F. Software bug prediction on using machine learning approach. International Journal of Advanced Computer Science and Applications. 2018;**9**:78-83

[39] Akour M, Alsmadi I, Alazzam I. Software fault proneness prediction: A comparative study between bagging, boosting, and stacking ensemble and base learner methods. International Journal of Data Analysis Techniques and Strategies. 2017;**9**:1-16

[40] Bowes D, Hall T, Petric J. Software defect prediction: Do different classifiers find the same defects? Software Quality Journal. 2018;**26**: 525-552. DOI: 10.1007/s11219-016- 9353-3

[41] Watanabe T, Monden A, Kamei Y, Morisaki S. Identifying recurring association rules in software defect prediction. In: IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS); 26–29 June 2016; Okayama, Japan: IEEE; 2016. pp. 1-6

[42] Zhang S, Caragea D, Ou X. An empirical study on using the national vulnerability database to predict software vulnerabilities. In: International Conference on Database and Expert Systems Applications. Berlin, Heidelberg: Springer; 2011. pp. 217-223

[43] Wen J, Li S, Lin Z, Hu Y, Huang C. Systematic literature review of machine learning based software development effort estimation models. Information and Software Technology. 2012;**54**: 41-59. DOI: 10.1016/j. infsof.2011.09.002

[44] Dave VS, Dutta K. Neural network based models for software effort estimation: A review. Artificial Intelligence Review. 2014;**42**:295-307. DOI: 10.1007/s10462-012-9339-x

[45] Kultur Y, Turhan B, Bener AB. ENNA: Software effort estimation using ensemble of neural networks with associative memory. In: Proceedings of the 16th ACM SIGSOFT; November 2008; Atlanta, Georgia: ACM; 2008. pp. 330-338

[46] Kultur Y, Turhan B, Bener A. Ensemble of neural networks with associative memory (ENNA) for estimating software development costs. Knowledge-Based Systems. 2009;**22**: 395-402. DOI: 10.1016/j. knosys.2009.05.001

[47] Corazza A, Di Martino S, Ferrucci F, Gravino C, Mendes E. Investigating the use of support vector regression for web effort estimation. Empirical Software Engineering. 2011;**16**:211-243. DOI: 10.1007/s10664-010-9138-4

[48] Minku LL, Yao X. A principled evaluation of ensembles of learning machines for software effort estimation. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering; September 2011; Banff, Alberta, Canada: ACM; 2011. pp. 1-10

[49] Minku LL, Yao X. Software effort estimation as a multiobjective learning problem. ACM Transactions on Software Engineering and Methodology (TOSEM). 2013;**22**:35. DOI: 10.1145/ 2522920.2522928

[50] Minku LL, Yao X. Can crosscompany data improve performance in software effort estimation? In: Proceedings of the 8th International Conference on Predictive Models in Software Engineering (PROMISE '12); September 2012; New York, United States: ACM; 2012. pp. 69-78

[58] Zare F, Zare HK, Fallahnezhad MS. Software effort estimation based on the optimal Bayesian belief network. Applied Soft Computing. 2016;**49**:968-980. DOI:

*DOI: http://dx.doi.org/10.5772/intechopen.91448*

*Data Mining and Machine Learning for Software Engineering*

prediction using text analysis

2012. pp. 7-10

2017. pp. 296-301

993-1006

2011;**8**:3295-3302

techniques. In: Proceedings of the 4th International Workshop on Security Measurements and Metrics (ESEM '12); September 2012; Lund Sweden: IEEE;

[66] Chernis B, Verma R. Machine learning methods for software

vulnerability detection. In: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics (CODASPY '18); March 2018; Tempe, AZ, USA: 2018. pp. 31-39

[67] Li X, Chen J, Lin Z, Zhang L, Wang Z, Zhou M, et al. Mining approach to obtain the software vulnerability characteristics. In: 2017 Fifth

International Conference on Advanced Cloud and Big Data (CBD); 13–16 August 2017; Shanghai, China: IEEE;

[68] Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A. Automatic feature learning for vulnerability prediction. arXiv preprint arXiv:170802368 2017

[69] Scandariato R, Walden J, Hovsepyan A, Joosen W. Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering. 2014;**40**:

[70] Tang Y, Zhao F, Yang Y, Lu H, Zhou Y, Xu B. Predicting vulnerable components via text mining or software metrics? An effort-aware perspective. In: IEEE International Conference on Software Quality, Reliability and Security; 3–5 August 2015; Vancouver, BC, Canada: IEEE; 2015. p. 27–36

[71] Wang Y, Wang Y, Ren J. Software vulnerabilities detection using rapid density-based clustering. Journal of Information and Computing Science.

[72] Medeiros I, Neves NF, Correia M. Automatic detection and correction of

10.1016/j.asoc.2016.08.004

10.1016/j.asoc.2016.05.008

ensemble machine learning.

Engineering. 2017;**4**:143-147

[59] Azzeh M, Nassif AB. A hybrid model for estimating software project effort from use case points. Applied Soft Computing. 2016;**49**:981-989. DOI:

[60] Hidmi O, Sakar BE. Software development effort estimation using

International Journal of Computing, Communication and Instrumentation

[61] Ghaffarian SM, Shahriari HR. Software vulnerability analysis and discovery using machine-learning and

Computing Surveys (CSUR). 2017;**50**:

[62] Jimenez M, Papadakis M, Le Traon Y. Vulnerability prediction models: A case study on the linux kernel. In: IEEE 16th International Working Conference

[63] Walden J, Stuckman J, Scandariato R. Predicting vulnerable components: Software metrics vs text mining. In: IEEE 25th International Symposium on Software Reliability Engineering; 3–6 November 2014; Naples, Italy: IEEE;

[64] Wijayasekara D, Manic M, Wright JL, McQueen M. Mining bug databases

Interactions; 6–8 June 2012; Perth, WA,

[65] Hovsepyan A, Scandariato R, Joosen W, Walden J. Software vulnerability

vulnerabilities. In: 5th International Conference on Human System

Australia: IEEE; 2013. pp. 89-96

data-mining techniques. ACM

1-36. DOI: 10.1145/3092566

on Source Code Analysis and Manipulation (SCAM); 2–3 October 2016; Raleigh, NC, USA: IEEE; 2016.

pp. 1-10

2014. pp. 23-33

**149**

for unidentified software

[51] Kocaguneli E, Menzies T, Keung JW. On the value of ensemble effort estimation. IEEE Transactions on Software Engineering. 2012;**38**: 1403-1416. DOI: 10.1109/TSE.2011.111

[52] Dejaeger K, Verbeke W, Martens D, Baesens B. Data mining techniques for software effort estimation. IEEE Transactions on Software Engineering. 2011;**38**:375-397. DOI: 10.1109/ TSE.2011.55

[53] Khatibi V, Jawawi DN, Khatibi E. Increasing the accuracy of analogy based software development effort estimation using neural networks. International Journal of Computer and Communication Engineering. 2013;**2**:78

[54] Subitsha P, Rajan JK. Artificial neural network models for software effort estimation. International Journal of Technology Enhancements and Emerging Engineering Research. 2014;**2**:76-80

[55] Maleki I, Ghaffari A, Masdari M. A new approach for software cost estimation with hybrid genetic algorithm and ant colony optimization. International Journal of Innovation and Applied Studies. 2014;**5**:72

[56] Huang J, Li YF, Xie M. An empirical analysis of data preprocessing for machine learning-based software cost estimation. Information and Software Technology. 2015;**67**:108-127. DOI: 10.1016/j.infsof.2015.07.004

[57] Nassif AB, Azzeh M, Capretz LF, Ho D. Neural network models for software development effort estimation. Neural Computing and Applications. 2016;**27**:2369-2381. DOI: 10.1007/ s00521-015-2127-1

*Data Mining and Machine Learning for Software Engineering DOI: http://dx.doi.org/10.5772/intechopen.91448*

[58] Zare F, Zare HK, Fallahnezhad MS. Software effort estimation based on the optimal Bayesian belief network. Applied Soft Computing. 2016;**49**:968-980. DOI: 10.1016/j.asoc.2016.08.004

learning based software development effort estimation models. Information and Software Technology. 2012;**54**:

*Data Mining - Methods, Applications and Systems*

software effort estimation? In: Proceedings of the 8th International Conference on Predictive Models in Software Engineering (PROMISE '12); September 2012; New York, United States: ACM; 2012. pp. 69-78

[51] Kocaguneli E, Menzies T, Keung JW.

1403-1416. DOI: 10.1109/TSE.2011.111

[52] Dejaeger K, Verbeke W, Martens D, Baesens B. Data mining techniques for software effort estimation. IEEE Transactions on Software Engineering.

On the value of ensemble effort estimation. IEEE Transactions on Software Engineering. 2012;**38**:

2011;**38**:375-397. DOI: 10.1109/

[53] Khatibi V, Jawawi DN, Khatibi E. Increasing the accuracy of analogy based software development effort estimation using neural networks. International Journal of Computer and Communication Engineering. 2013;**2**:78

[54] Subitsha P, Rajan JK. Artificial neural network models for software effort estimation. International Journal of Technology Enhancements and Emerging Engineering Research. 2014;**2**:76-80

[55] Maleki I, Ghaffari A, Masdari M. A new approach for software cost estimation with hybrid genetic

algorithm and ant colony optimization. International Journal of Innovation and

[56] Huang J, Li YF, Xie M. An empirical analysis of data preprocessing for machine learning-based software cost estimation. Information and Software Technology. 2015;**67**:108-127. DOI: 10.1016/j.infsof.2015.07.004

[57] Nassif AB, Azzeh M, Capretz LF, Ho D. Neural network models for software development effort estimation. Neural Computing and Applications. 2016;**27**:2369-2381. DOI: 10.1007/

Applied Studies. 2014;**5**:72

s00521-015-2127-1

TSE.2011.55

[44] Dave VS, Dutta K. Neural network based models for software effort estimation: A review. Artificial Intelligence Review. 2014;**42**:295-307. DOI: 10.1007/s10462-012-9339-x

[45] Kultur Y, Turhan B, Bener AB. ENNA: Software effort estimation using ensemble of neural networks with associative memory. In: Proceedings of the 16th ACM SIGSOFT; November 2008; Atlanta, Georgia: ACM; 2008.

[46] Kultur Y, Turhan B, Bener A. Ensemble of neural networks with associative memory (ENNA) for estimating software development costs. Knowledge-Based Systems. 2009;**22**:

[47] Corazza A, Di Martino S, Ferrucci F, Gravino C, Mendes E. Investigating the use of support vector regression for web effort estimation. Empirical Software Engineering. 2011;**16**:211-243. DOI: 10.1007/s10664-010-9138-4

[48] Minku LL, Yao X. A principled evaluation of ensembles of learning machines for software effort estimation. In: Proceedings of the 7th International Conference on Predictive Models in Software Engineering; September 2011; Banff, Alberta, Canada: ACM; 2011.

[49] Minku LL, Yao X. Software effort estimation as a multiobjective learning problem. ACM Transactions on

Software Engineering and Methodology (TOSEM). 2013;**22**:35. DOI: 10.1145/

[50] Minku LL, Yao X. Can crosscompany data improve performance in

395-402. DOI: 10.1016/j. knosys.2009.05.001

41-59. DOI: 10.1016/j. infsof.2011.09.002

pp. 330-338

pp. 1-10

**148**

2522920.2522928

[59] Azzeh M, Nassif AB. A hybrid model for estimating software project effort from use case points. Applied Soft Computing. 2016;**49**:981-989. DOI: 10.1016/j.asoc.2016.05.008

[60] Hidmi O, Sakar BE. Software development effort estimation using ensemble machine learning. International Journal of Computing, Communication and Instrumentation Engineering. 2017;**4**:143-147

[61] Ghaffarian SM, Shahriari HR. Software vulnerability analysis and discovery using machine-learning and data-mining techniques. ACM Computing Surveys (CSUR). 2017;**50**: 1-36. DOI: 10.1145/3092566

[62] Jimenez M, Papadakis M, Le Traon Y. Vulnerability prediction models: A case study on the linux kernel. In: IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM); 2–3 October 2016; Raleigh, NC, USA: IEEE; 2016. pp. 1-10

[63] Walden J, Stuckman J, Scandariato R. Predicting vulnerable components: Software metrics vs text mining. In: IEEE 25th International Symposium on Software Reliability Engineering; 3–6 November 2014; Naples, Italy: IEEE; 2014. pp. 23-33

[64] Wijayasekara D, Manic M, Wright JL, McQueen M. Mining bug databases for unidentified software vulnerabilities. In: 5th International Conference on Human System Interactions; 6–8 June 2012; Perth, WA, Australia: IEEE; 2013. pp. 89-96

[65] Hovsepyan A, Scandariato R, Joosen W, Walden J. Software vulnerability

prediction using text analysis techniques. In: Proceedings of the 4th International Workshop on Security Measurements and Metrics (ESEM '12); September 2012; Lund Sweden: IEEE; 2012. pp. 7-10

[66] Chernis B, Verma R. Machine learning methods for software vulnerability detection. In: Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics (CODASPY '18); March 2018; Tempe, AZ, USA: 2018. pp. 31-39

[67] Li X, Chen J, Lin Z, Zhang L, Wang Z, Zhou M, et al. Mining approach to obtain the software vulnerability characteristics. In: 2017 Fifth International Conference on Advanced Cloud and Big Data (CBD); 13–16 August 2017; Shanghai, China: IEEE; 2017. pp. 296-301

[68] Dam HK, Tran T, Pham T, Ng SW, Grundy J, Ghose A. Automatic feature learning for vulnerability prediction. arXiv preprint arXiv:170802368 2017

[69] Scandariato R, Walden J, Hovsepyan A, Joosen W. Predicting vulnerable software components via text mining. IEEE Transactions on Software Engineering. 2014;**40**: 993-1006

[70] Tang Y, Zhao F, Yang Y, Lu H, Zhou Y, Xu B. Predicting vulnerable components via text mining or software metrics? An effort-aware perspective. In: IEEE International Conference on Software Quality, Reliability and Security; 3–5 August 2015; Vancouver, BC, Canada: IEEE; 2015. p. 27–36

[71] Wang Y, Wang Y, Ren J. Software vulnerabilities detection using rapid density-based clustering. Journal of Information and Computing Science. 2011;**8**:3295-3302

[72] Medeiros I, Neves NF, Correia M. Automatic detection and correction of web application vulnerabilities using data mining to predict false positives. In: Proceedings of the 23rd International Conference on World Wide Web (WWW '14); April 2014; Seoul, Korea; 2014. pp. 63-74

[73] Yamaguchi F, Golde N, Arp D, Rieck K. Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on Security and Privacy; 18-21 May 2014; San Jose, CA, USA: IEEE; 2014. pp. 590-604

[74] Perl H, Dechand S, Smith M, Arp D, Yamaguchi F, Rieck K, et al. Vccfinder: Finding Potential Vulnerabilities in Open-source Projects to Assist Code Audits. In: 22nd ACM Conference on Computer and Communications Security (CCS'15). Denver, Colorado, USA; 2015. pp. 426-437

[75] Yamaguchi F, Maier A, Gascon H, Rieck K. Automatic inference of search patterns for taint-style vulnerabilities. In: 2015 IEEE Symposium on Security and Privacy; San Jose, CA, USA: IEEE; 2015. pp. 797-812

[76] Grieco G, Grinblat GL, Uzal L, Rawat S, Feist J, Mounier L. Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy; March 2016; New Orleans, Louisiana, USA; 2016. pp. 85-96

[77] Pang Y, Xue X, Wang H. Predicting vulnerable software components through deep neural network. In: Proceedings of the 2017 International Conference on Deep Learning Technologies; June 2017; Chengdu, China; 2017. pp. 6-10

[78] Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, et al. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. arXiv preprint arXiv:180101681. 2018

[79] Imtiaz SM, Bhowmik T. Towards data-driven vulnerability prediction for requirements. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering; November, 2018; Lake Buena Vista, FL, USA. 2018. pp. 744-748

Software. 2012;**85**:408-424. DOI:

*DOI: http://dx.doi.org/10.5772/intechopen.91448*

*Data Mining and Machine Learning for Software Engineering*

[87] Alhusain S, Coupland S, John R, Kavanagh M. Towards machine learning based design pattern recognition. In:

[94] Mayvan BB, Rasoolzadegan A. Design pattern detection based on the graph theory. Knowledge-Based Systems. 2017;**120**:211-225. DOI: 10.1016/j.knosys.2017.01.007

[95] Hussain S, Keung J, Khan AA. Software design patterns classification and selection using text categorization approach. Applied Soft Computing. 2017;**58**:225-244. DOI: 10.1016/j.

[96] Kaur A, Singh S. Detecting software

bad smells from software design patterns using machine learning algorithms. International Journal of Applied Engineering Research. 2018;**13**:

[97] Hussain S, Keung J, Khan AA, Ahmad A, Cuomo S, Piccialli F. Implications of deep learning for the automation of design patterns organization. Journal of Parallel and Distributed Computing. 2018;**117**:

[98] Fowler M. Refactoring: Improving the Design of Existing Code. 2nd ed. Boston: Addison-Wesley Professional;

[99] Kumar L, Sureka A. Application of LSSVM and SMOTE on seven open source projects for predicting refactoring at class level. In: 24th Asia-Pacific Software Engineering Conference (APSEC); 4–8 December 2017; Nanjing, China: IEEE; 2018.

[100] Ratzinger J, Sigmund T, Vorburger P, Gall H. Mining software evolution to

International Symposium on Empirical Software Engineering and Measurement (ESEM 2007); 20–21 September 2007; Madrid, Spain: IEEE; 2007. pp. 354-363

[101] Ratzinger J, Sigmund T, Gall HC. On the relation of refactoring and

predict refactoring. In: First

256-266. DOI: 10.1016/j. jpdc.2017.06.022

asoc.2017.04.043

10005-10010

2018

pp. 90-99

Computational Intelligence (UKCI); 9–11 September 2013; Guildford, UK:

[88] Tekin U. Buzluca F, A graph mining approach for detecting identical design structures in object-oriented design models. Science of Computer

Programming. 2014;**95**:406-425. DOI:

[89] Zanoni M, Fontana FA, Stella F. On applying machine learning techniques for design pattern detection. Journal of Systems and Software. 2015;**103**: 102-117. DOI: 10.1016/j.jss.2015.01.037

[90] Chihada A, Jalili S, Hasheminejad SMH, Zangooei MH. Source code and design conformance, design pattern detection from source code by classification approach. Applied Soft Computing. 2015;**26**:357-367. DOI:

[91] Dwivedi AK, Tirkey A, Ray RB, Rath SK. Software design pattern recognition using machine learning techniques. In: 2016 IEEE Region 10 Conference (TENCON); 22–25

November 2016; Singapore, Singapore:

[92] Dwivedi AK, Tirkey A, Rath SK. Applying software metrics for the mining of design pattern. In: IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON); 9–11 December 2016; Varanasi, India: IEEE;

[93] Dwivedi AK, Tirkey A, Rath SK. Software design pattern mining using classification-based techniques. Frontiers of Computer Science. 2018;**12**:908-922. DOI: 10.1007/s11704-017-6424-y

10.1016/j.jss.2011.08.031

2013 13th UK Workshop on

IEEE; 2013. pp. 244-251

10.1016/j.scico.2013.09.015

10.1016/j.asoc.2014.10.027

IEEE; 2017. pp. 222-227

2017. pp. 426-431

**151**

[80] Jie G, Xiao-Hui K, Qiang L. Survey on software vulnerability analysis method based on machine learning. In: IEEE First International Conference on Data Science in Cyberspace (DSC); 13–16 June 2016; Changsha, China: IEEE; 2017. pp. 642-647

[81] Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, et al. Automated vulnerability detection in source code using deep representation learning. In: 17th IEEE International Conference on Machine Learning and Applications (ICMLA). Orlando, FL, USA: IEEE; 2018, 2019. pp. 757-762

[82] Mayvan BB, Rasoolzadegan A, Yazdi ZG. The state of the art on design patterns: A systematic mapping of the literature. Journal of Systems and Software. 2017;**125**:93-118. DOI: 10.1016/j.jss.2016.11.030

[83] Dong J, Zhao Y, Peng T. A review of design pattern mining techniques. International Journal of Software Engineering and Knowledge Engineering. 2009;**19**:823-855. DOI: 10.1142/S021819400900443X

[84] Fowler M. Analysis Patterns: Reusable Object Models. Boston: Addison-Wesley Professional; 1997

[85] Vlissides J, Johnson R, Gamma E, Helm R. Design Patterns-Elements of Reusable Object-Oriented Software. 1st ed. Addison-Wesley Professional; 1994

[86] Hasheminejad SMH, Jalili S. Design patterns selection: An automatic twophase method. Journal of Systems and

*Data Mining and Machine Learning for Software Engineering DOI: http://dx.doi.org/10.5772/intechopen.91448*

Software. 2012;**85**:408-424. DOI: 10.1016/j.jss.2011.08.031

web application vulnerabilities using data mining to predict false positives. In: Proceedings of the 23rd International Conference on World Wide Web (WWW '14); April 2014; Seoul, Korea;

*Data Mining - Methods, Applications and Systems*

[79] Imtiaz SM, Bhowmik T. Towards data-driven vulnerability prediction for requirements. In: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering; November, 2018; Lake Buena Vista, FL,

[80] Jie G, Xiao-Hui K, Qiang L. Survey on software vulnerability analysis method based on machine learning. In: IEEE First International Conference on Data Science in Cyberspace (DSC); 13–16 June 2016; Changsha, China:

USA. 2018. pp. 744-748

IEEE; 2017. pp. 642-647

[81] Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, et al. Automated vulnerability detection in source code using deep representation learning. In: 17th IEEE International Conference on Machine Learning and Applications (ICMLA). Orlando, FL, USA: IEEE; 2018, 2019. pp. 757-762

[82] Mayvan BB, Rasoolzadegan A, Yazdi ZG. The state of the art on design patterns: A systematic mapping of the literature. Journal of Systems and Software. 2017;**125**:93-118. DOI: 10.1016/j.jss.2016.11.030

[83] Dong J, Zhao Y, Peng T. A review of design pattern mining techniques. International Journal of Software Engineering and Knowledge Engineering. 2009;**19**:823-855. DOI: 10.1142/S021819400900443X

[84] Fowler M. Analysis Patterns: Reusable Object Models. Boston: Addison-Wesley Professional; 1997

[85] Vlissides J, Johnson R, Gamma E, Helm R. Design Patterns-Elements of Reusable Object-Oriented Software. 1st ed. Addison-Wesley Professional; 1994

[86] Hasheminejad SMH, Jalili S. Design patterns selection: An automatic twophase method. Journal of Systems and

[73] Yamaguchi F, Golde N, Arp D, Rieck K. Modeling and discovering vulnerabilities with code property graphs. In: 2014 IEEE Symposium on Security and Privacy; 18-21 May 2014; San Jose, CA, USA: IEEE; 2014.

[74] Perl H, Dechand S, Smith M, Arp D, Yamaguchi F, Rieck K, et al. Vccfinder: Finding Potential Vulnerabilities in Open-source Projects to Assist Code Audits. In: 22nd ACM Conference on Computer and Communications Security (CCS'15). Denver, Colorado,

[75] Yamaguchi F, Maier A, Gascon H, Rieck K. Automatic inference of search patterns for taint-style vulnerabilities. In: 2015 IEEE Symposium on Security and Privacy; San Jose, CA, USA: IEEE;

[76] Grieco G, Grinblat GL, Uzal L, Rawat S, Feist J, Mounier L. Toward large-scale vulnerability discovery using machine learning. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy; March 2016; New Orleans, Louisiana,

[77] Pang Y, Xue X, Wang H. Predicting vulnerable software components through deep neural network. In: Proceedings of the 2017 International Conference on Deep Learning Technologies; June 2017; Chengdu,

[78] Li Z, Zou D, Xu S, Ou X, Jin H, Wang S, et al. VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. arXiv preprint

2014. pp. 63-74

pp. 590-604

USA; 2015. pp. 426-437

2015. pp. 797-812

USA; 2016. pp. 85-96

China; 2017. pp. 6-10

arXiv:180101681. 2018

**150**

[87] Alhusain S, Coupland S, John R, Kavanagh M. Towards machine learning based design pattern recognition. In: 2013 13th UK Workshop on Computational Intelligence (UKCI); 9–11 September 2013; Guildford, UK: IEEE; 2013. pp. 244-251

[88] Tekin U. Buzluca F, A graph mining approach for detecting identical design structures in object-oriented design models. Science of Computer Programming. 2014;**95**:406-425. DOI: 10.1016/j.scico.2013.09.015

[89] Zanoni M, Fontana FA, Stella F. On applying machine learning techniques for design pattern detection. Journal of Systems and Software. 2015;**103**: 102-117. DOI: 10.1016/j.jss.2015.01.037

[90] Chihada A, Jalili S, Hasheminejad SMH, Zangooei MH. Source code and design conformance, design pattern detection from source code by classification approach. Applied Soft Computing. 2015;**26**:357-367. DOI: 10.1016/j.asoc.2014.10.027

[91] Dwivedi AK, Tirkey A, Ray RB, Rath SK. Software design pattern recognition using machine learning techniques. In: 2016 IEEE Region 10 Conference (TENCON); 22–25 November 2016; Singapore, Singapore: IEEE; 2017. pp. 222-227

[92] Dwivedi AK, Tirkey A, Rath SK. Applying software metrics for the mining of design pattern. In: IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics Engineering (UPCON); 9–11 December 2016; Varanasi, India: IEEE; 2017. pp. 426-431

[93] Dwivedi AK, Tirkey A, Rath SK. Software design pattern mining using classification-based techniques. Frontiers of Computer Science. 2018;**12**:908-922. DOI: 10.1007/s11704-017-6424-y

[94] Mayvan BB, Rasoolzadegan A. Design pattern detection based on the graph theory. Knowledge-Based Systems. 2017;**120**:211-225. DOI: 10.1016/j.knosys.2017.01.007

[95] Hussain S, Keung J, Khan AA. Software design patterns classification and selection using text categorization approach. Applied Soft Computing. 2017;**58**:225-244. DOI: 10.1016/j. asoc.2017.04.043

[96] Kaur A, Singh S. Detecting software bad smells from software design patterns using machine learning algorithms. International Journal of Applied Engineering Research. 2018;**13**: 10005-10010

[97] Hussain S, Keung J, Khan AA, Ahmad A, Cuomo S, Piccialli F. Implications of deep learning for the automation of design patterns organization. Journal of Parallel and Distributed Computing. 2018;**117**: 256-266. DOI: 10.1016/j. jpdc.2017.06.022

[98] Fowler M. Refactoring: Improving the Design of Existing Code. 2nd ed. Boston: Addison-Wesley Professional; 2018

[99] Kumar L, Sureka A. Application of LSSVM and SMOTE on seven open source projects for predicting refactoring at class level. In: 24th Asia-Pacific Software Engineering Conference (APSEC); 4–8 December 2017; Nanjing, China: IEEE; 2018. pp. 90-99

[100] Ratzinger J, Sigmund T, Vorburger P, Gall H. Mining software evolution to predict refactoring. In: First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007); 20–21 September 2007; Madrid, Spain: IEEE; 2007. pp. 354-363

[101] Ratzinger J, Sigmund T, Gall HC. On the relation of refactoring and

software defects. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories; May 2008; Leipzig, Germany: ACM; 2008. pp. 35-38

[102] Amal B, Kessentini M, Bechikh S, Dea J, Said LB. On the Use of Machine Learning and Search-Based software engineering for ill-defined fitness function: A case study on software refactoring. In: International Symposium on Search Based Software Engineering; 26-29 August 2014; Fortaleza, Brazil; 2014. pp. 31-45

[103] Wang H, Kessentini M, Grosky W, Meddeb H. On the use of time series and search based software engineering for refactoring recommendation. In: Proceedings of the 7th International Conference on Management of Computational and Collective intElligence in Digital EcoSystems. Caraguatatuba, Brazil; October 2015. pp. 35-42

[104] Rodríguez G, Soria Á, Teyseyre A, Berdun L, Campo M. Unsupervised learning for detecting refactoring opportunities in service-oriented applications. In: International Conference on Database and Expert Systems Applications; 5–8 September; Porto, Portugal: Springer; 2016. pp. 335-342

[105] Marian Z, Czibula IG, Czibula G. A hierarchical clustering-based approach for software restructuring at the package level. In: 9th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC); 21–24 September 2017; Timisoara, Romania: IEEE; 2018. pp. 239-246

[106] Mourad B, Badri L, Hachemane O, Ouellet A. Exploring the impact of clone refactoring on test code size in objectoriented software. In: 16th IEEE International Conference on Machine Learning and Applications (ICMLA); 18-21 December 2017; Cancun, Mexico. 2018. pp. 586-592

[107] Imazato A, Higo Y, Hotta K, Kusumoto S. Finding extract method refactoring opportunities by analyzing development history. In: IEEE 41st Annual Computer Software and Applications Conference (COMPSAC); 4–8 July 2017; Turin, Italy: IEEE; 2018. pp. 190-195

**Chapter 9**

Education

*Ferda Ünal*

**Abstract**

of students.

**1. Introduction**

of a student.

**153**

and binary grade categorization.

Data Mining for Student

Performance Prediction in

The ability to predict the performance tendency of students is very important to improve their teaching skills. It has become a valuable knowledge that can be used for different purposes; for example, a strategic plan can be applied for the development of a quality education. This paper proposes the application of data mining techniques to predict the final grades of students based on their historical data. In the experimental studies, three well-known data mining techniques (decision tree, random forest, and naive Bayes) were employed on two educational datasets related to mathematics lesson and Portuguese language lesson. The results showed the effectiveness of data mining learning techniques when predicting the performances

**Keywords:** data mining, student performance prediction, classification

Recently, online systems in education have increased, and student digital data has come to big data size. This makes possible to draw rules and predictions about the students by processing educational data with data mining techniques. All kinds of information about the student's socioeconomic environment, learning environment, or course notes can be used for prediction, which affect the success or failure

In this study, the successes of the students at the end of the semester are estimated by using the student data obtained from secondary education of two Portuguese schools. The aim of this study is to predict the students' final grades to support the educators to take precautions for the children at risk. A number of data preprocessing processes were applied to increase the accuracy rate of the prediction model. A wrapper method for feature subset selection was applied to find the optimal subset of features. After that, three popular data mining algorithms (decision tree, random forest, and naive Bayes) were used and compared in terms of classification accuracy rate. In addition, this study also investigates the effects of two different grade categorizations on data mining: five-level grade categorization

The remainder of this paper is organized as follows. In Section 2, the previous studies in this field are mentioned. In Section 3, the methods used in this study are

[108] Yue R, Gao Z, Meng N, Xiong Y, Wang X. Automatic clone recommendation for refactoring based on the present and the past. In: IEEE International Conference on Software Maintenance and Evolution (ICSME); 23–29 September 2018; Madrid, Spain: IEEE; 2018. pp. 115-126

[109] Alizadeh V, Kessentini M. Reducing interactive refactoring effort via clustering-based multi-objective search. In: 33rd ACM/IEEE International Conference on Automated Software Engineering; September 2018; Montpellier, France: ACM/IEEE; 2018. pp. 464-474

[110] Ni C, Liu WS, Chen X, Gu Q, Chen DX, Huang QG. A cluster based feature selection method for cross-project software defect prediction. Journal of Computer Science and Technology. 2017;**32**:1090-1107. DOI: 10.1007/ s11390-017-1785-0

[111] Rahman A, Williams L. Characterizing defective configuration scripts used for continuous deployment. In: 11th International Conference on Software Testing, Verification and Validation (ICST); 9–13 April 2018; Vasteras, Sweden: IEEE; 2018. pp. 34-45

[112] Kukkar A, Mohana R. A supervised bug report classification with incorporate and textual field knowledge. Procedia Computer Science. 2018;**132**: 352-361. DOI: 10.1016/j.procs.2018. 05.194

#### **Chapter 9**

software defects. In: Proceedings of the 2008 International Working Conference on Mining Software Repositories; May 2008; Leipzig, Germany: ACM; 2008.

*Data Mining - Methods, Applications and Systems*

[107] Imazato A, Higo Y, Hotta K, Kusumoto S. Finding extract method refactoring opportunities by analyzing development history. In: IEEE 41st Annual Computer Software and Applications Conference (COMPSAC); 4–8 July 2017; Turin, Italy: IEEE; 2018.

[108] Yue R, Gao Z, Meng N, Xiong Y,

recommendation for refactoring based on the present and the past. In: IEEE International Conference on Software Maintenance and Evolution (ICSME); 23–29 September 2018; Madrid, Spain:

Wang X. Automatic clone

IEEE; 2018. pp. 115-126

[109] Alizadeh V, Kessentini M.

search. In: 33rd ACM/IEEE

pp. 464-474

s11390-017-1785-0

[111] Rahman A, Williams L.

bug report classification with

05.194

Characterizing defective configuration scripts used for continuous deployment. In: 11th International Conference on Software Testing, Verification and Validation (ICST); 9–13 April 2018; Vasteras, Sweden: IEEE; 2018. pp. 34-45

[112] Kukkar A, Mohana R. A supervised

incorporate and textual field knowledge. Procedia Computer Science. 2018;**132**: 352-361. DOI: 10.1016/j.procs.2018.

Reducing interactive refactoring effort via clustering-based multi-objective

International Conference on Automated Software Engineering; September 2018; Montpellier, France: ACM/IEEE; 2018.

[110] Ni C, Liu WS, Chen X, Gu Q, Chen DX, Huang QG. A cluster based feature selection method for cross-project software defect prediction. Journal of Computer Science and Technology. 2017;**32**:1090-1107. DOI: 10.1007/

pp. 190-195

[102] Amal B, Kessentini M, Bechikh S, Dea J, Said LB. On the Use of Machine Learning and Search-Based software engineering for ill-defined fitness function: A case study on software refactoring. In: International

Symposium on Search Based Software Engineering; 26-29 August 2014; Fortaleza, Brazil; 2014. pp. 31-45

[103] Wang H, Kessentini M, Grosky W, Meddeb H. On the use of time series and search based software engineering for refactoring recommendation. In: Proceedings of the 7th International Conference on Management of Computational and Collective intElligence in Digital EcoSystems. Caraguatatuba, Brazil; October 2015.

[104] Rodríguez G, Soria Á, Teyseyre A, Berdun L, Campo M. Unsupervised learning for detecting refactoring opportunities in service-oriented

applications. In: International Conference

[105] Marian Z, Czibula IG, Czibula G. A hierarchical clustering-based approach for software restructuring at the package level. In: 9th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC); 21–24 September 2017; Timisoara, Romania: IEEE; 2018. pp. 239-246

[106] Mourad B, Badri L, Hachemane O, Ouellet A. Exploring the impact of clone refactoring on test code size in objectoriented software. In: 16th IEEE International Conference on Machine Learning and Applications (ICMLA); 18-21 December 2017; Cancun, Mexico.

2018. pp. 586-592

**152**

on Database and Expert Systems Applications; 5–8 September; Porto, Portugal: Springer; 2016. pp. 335-342

pp. 35-38

pp. 35-42
