**2. Materials and method**

There are studies commenting on the association between statistical results and variables including sociological, psychological, neurological, and genetic fields. Statistical analyses are mathematical operations that process and summarize data based on probability and guide researchers who review the association between variables in this direction. Machine learning techniques are also based on probabilistic models, yet the outcome of these models provides prediction on mutually exclusive events in a widen event set. Regarding these features, machine learning studies have the ability to analyze and obtain a prediction in different perspectives about data. Nevertheless, despite technological developments and improvements in enhancements of accuracy rates of prediction results of these systems, there are limited studies that have been done about substance abuse.

The aim of this study designed in this context was to analyze the criminal record history, continuum of substance use, former polysubstance abuse, attempted suicide, and inpatient treatment by using decision trees and to discuss the association of the findings obtained with the literature.

#### **2.1 Machine learning methods**

Machine learning is the structure that includes learning in artificial intelligence applications. On the one hand, it can be defined as the whole of algorithms that imitate human intelligence, and on the other hand, do not need rules that people can interpret and enter manually.

Machine learning applications learn the desired task by assimilating the presented datasets, just as people learn the concepts they see and hear on their own. They can make predictions about the outcome of the new data entry that is out of the data they have learned over time. The training set used in machine learning is used in the machine learning process, and the test set is used in the prediction process. For example, in the design of a system that will ensure that an orchid is selected from a vase with different flowers and taken into a separate vase, as much data about the orchid genus as possible are included in the training set for learning, and other flowers such as height, color, leaf shape, color, curl, flower shape, color distribution, folds are selected. It is ensured that the distinguishable features can be created by the machine. After making these distinctive classes in the training set, the predictive ability of the machine is tested. In this dataset, called the test set, there are flowers in a vase with a new arrangement that the machine has not seen before. It is expected from the machine to find out if there are orchids in this newly encountered cluster and to take it (differentiate) if it is. Machine learning processes are very similar in principle with

learning processes of human. In the developmental processes of people, learning is divided into behavioral and cognitive approaches and many sub-branches under it. Machine learning methods are also divided into branches within themselves such as supervised and unsupervised learning [6].

In the most general form, supervised learning is in which the relationship between input and output is learned by matching under the supervision of a supervisor. Unsupervised learning is learning by finding the regularities between the inputs entering the system without a supervisor and producing output. Problems solved using supervised learning are generally divided into classification and regression problems. The important thing in supervised learning methods is to include a target attribute in the dataset. Depending on the type of problem to be addressed, the type of target attribute can be of different type. For target attribute classification problems, there may be class labels, while for regression problems it may be a numerical value.

### **2.2 The classification method and the decision tree algorithm**

The classification method, which is among the machine learning methods, is one of the commonly preferred methods, especially in the field of medicine. Learning in classification algorithms is based on learning and classifying the distribution form from the given training set. Support Vector Machine (SVM), Nonlinear Supporter Vector Machine, Naive Bayes Classifier, Decision Tree Classifier, and Nearest Kneighbor classification algorithms are examples. For instance, Zaim Gokbay et al. [7] presented a decision support system design in their study in order to support the diagnosis of endocrine disease. The classification rules used in the model were created depending on the investigation of visible changes, which has started along with complaints on the physical appearance as well as the laboratory results. The patient complaints were stated by the patient by filling in a questionnaire. Physical changes were investigated by the endocrinologist during the exam, and findings on the mandible evagination, skin cracks were clinically evaluated and entered into the system. Every three data entered into the system were used to classify the prediction model of three different endocrinological diseases. Each class represents a disease. The individual falls into a class in the sum of his answers to the questions, and it is concluded that he has the potential to have the disease indicated by that class. Your symptoms that cause you to come to the doctor serve as an indicator. The formation of such rules allows to make predictions in the next steps. There is the logic of separating data belonging to common features into certain classes in a dataset in classification methods. Numerous algorithms have been developed for this purpose. Examples including the entropy-based classifications, regression and decision trees, memorybased algorithms, Bayesian classifiers may be given.

In the decision tree classification method, the data are classified by separating from the root to the leaf. The if-then rule is implemented in this separation. If the condition is 1, then a chain-like condition 2 and then 3 are formed in order to establish a branching structure from root to leaf. The decision tree method was chosen for the classification performed in this study, because it is based on rules that may be understood by people, both visually and because of the convenience that would provide to multidisciplinary work in comparing the results with the literature.

The model was established with 10-fold cross-validation in the study. The success rates of the decision tree classes were interpreted through the accuracy, class recall, and class precision values, and the association of the sequences obtained with the literature was discussed.

*The Analysis on the Effects of COMT, DRD2, PER3, eNOS, NR3C1 Functional Gene Variants… DOI: http://dx.doi.org/10.5772/intechopen.106313*

#### **2.3 Classification model performance criteria**

The simplest and most common method used to measure the performance of classification models is the accuracy rate, precision, and recall rates.

$$(Accuracy - Rate) = \frac{TP + TN}{TP + FP + TN + FN} \times 100\tag{1}$$

$$\text{Class} - \text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}} \text{x100} \tag{2}$$

$$\text{Class} - \text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}} \mathbf{x100} \tag{3}$$

Parameters in formulas (1)-(3) are defined as true positive (TP), false negative (FN), true negative (TN), and false positive (FP) as follows;


In other words, the accuracy refers to the percentage of samples classified as correct. The measure of how many of the positively predicted outputs are positively predicted is expressed as Recall, and the measure of how many of the positively predicted outputs are positive is expressed as Precision.

#### **2.4 The dataset characteristics**

This study was conducted with retrospective data of 211 male participants known to be addicted to at least one substance, obtained from studies completed and published with the approval of the ethics committee (2019/87) of the Ethics Committee for Clinical research within Istanbul Faculty of Medicine [8–10]. The average age of the individuals is 28.67, and the age varies between 18 and 51 years of age. The educational level of the individuals include 2 college graduates, 61 secondary school graduates, 49 primary school graduates, and 99 literate or illiterate participants. The marital status of the participants was as follows: 147 were single, 43 were married, 12 were divorced, and 9 were married but living separately. There were three students among the participants; 56 individuals are employed, whereas 152 individuals are unemployed. Individuals who use at least one of cigarettes and alcohol and use at least one of the addictive substances including cannabinoids, synthetic cannabinoids, cannabis, cocaine, ecstasy, and heroin were included in the study. The first age of start of the individuals for one of these substances varies between 10 and 30 years of age.

There is no missing information in the dataset used within the scope of the study. Therefore, the data were not exposed to a preliminary procedure. Descriptions of attributes, values, and variable names are presented in **Table 1**.


*The Analysis on the Effects of COMT, DRD2, PER3, eNOS, NR3C1 Functional Gene Variants… DOI: http://dx.doi.org/10.5772/intechopen.106313*


#### **Table 1.**

*Attribute descriptions, variable names, types, and values.*
