*3.6.1 Techniques/method(s)*

Different tools that can be adopted for this type of research include the following:

*Ensemble Machine Learning Algorithms for Prediction and Classification of Medical Images DOI: http://dx.doi.org/10.5772/intechopen.100602*


### *3.6.2 Data extraction*

Data will be extracted from the medical images represented by the distribution of pixels. To improve the data, standardization ofthe data is necessary before the application of the principal component analysis and these will enable each of the feature to have its mean = 0 while the variance = 1 Mathematically, this can be done by subtracting the mean and dividing by the standard deviation for each value of each variable. PCA makes maximum variability in the dataset more visible by rotating the axes.

$$Z = \frac{\mathfrak{x} - \mu}{\sigma} \tag{1}$$

### *3.6.3 The use of covariance matrix*

The next step is to create a covariance matrix by constructing a square matrix to express the correlation between two or more features in a multidimensional dataset.

### *3.6.3.1 Principal component analysis*

The PCA is used for technique dimensionality reduction with the operation of using measurement vector for distance classification, it explores the pattern recognition techniques [39]. It is best used for feature extraction, removal of redundancy, and feature extractions. A simple analysis of the PCA algorithm for image classification is given below. The PCA employs the Eigenvectors and Eigenvalues by the sorting of the eigenvectors in highest to lowest order, then select the number of principal components using the following equations:

$$\text{PC1} = \mathbf{w}\_{1,1} \text{ (Feature A)} + \mathbf{w}\_{2,1} \text{ (Feature B)} \dots + \mathbf{w}\_{n,1} \text{ (Feature N)}\tag{2}$$

$$\text{PC1} = \mathbf{w}\_{1,2} \text{ (Feature A)} + \mathbf{w}\_{2,2} \text{ (Feature B)} \dots + \mathbf{w}\_{n,2} \text{ (Feature N)}\tag{3}$$

$$\text{PC1} = \mathbf{w}\_{1,3} \text{ (Feature A)} + \mathbf{w}\_{2,3} \text{ (Feature B)} \dots + \mathbf{w}\_{n,3} \text{ (Feature N)}\tag{4}$$

The first of the principal components (PC1) is a synthetic variable built as a linear combination to determine the magnitude and the direction of the maximum variance in the dataset. The second principal component (PC2) is also a synthetic linear combination that captures the remaining variance in the data set and is not correlated with PC1. The rest of the principal components likewise capture the remaining variation without being correlated with the previous component. PCA allows resizing of medical images, patterns recognition, dimensionality reduction, and visualization of multidimensional data. The covariance matrix is symmetrical and has the form:

$$\begin{aligned} w \mathbf{11} &\quad w \mathbf{12} &\quad w \mathbf{13} \\ \mathbf{C} &= \, w \mathbf{21} &\quad w \mathbf{22} &\quad w \mathbf{23} \\ &\quad w \mathbf{n1} &\quad w \mathbf{n2} &\quad w \mathbf{n3} \end{aligned} \tag{5}$$

eigenvalues of matrix C are the variances of the principal components.

$$\text{PCi} = \mathbf{w}\_{\text{i1}} \mathbf{X1} + \mathbf{w}\_{\text{i2}} \mathbf{X2} + \dots \text{wip} \mathbf{Xp} \tag{6}$$

Where *var*(*Pci*i) = λ<sup>i</sup> and the constants wi1, *w*i2, … , *w*ip are the elements of the corresponding eigenvector,

$$\text{Therefore, } \mathbf{w}^2 \mathbf{i}\_{\text{i}1} + \mathbf{w}^2 \mathbf{i}\_{\text{i}2} + \dots + \mathbf{w}^2 \mathbf{i}\_{\text{ip}} = \mathbf{1} \tag{7}$$

The sum of variances of the PCA is equal to the sum of variances of the original variances. The principal components contain all the variation of the original data.

### *3.6.3.2 Logistic regressions*

Logistic regression is a linear algorithm (with a non-linear transform on output). It can have a linear relationship between the input variables and the output. Data transformation can come from the input variables result in a more accurate model [40]. In the following Eqs. (8)-(10) z is the output variable, x is the input variable where w and b will be initialized as zeros, to begin with, and they will be modified by numbers of iterations while training the model. The output z is passed through a nonlinear function. The commonly used nonlinear function is the sigmoid function that returns a value between 0 and 1. The logistic regression uses the basic linear regression formula as:

$$\text{for a } 0 \le \mathbf{h} \; \theta(\mathbf{x}) \le \mathbf{1} \; \text{(binary)}\tag{8}$$

$$\mathbf{h}\_{\theta^{(\mathbf{x})}} = (a\mathbf{x} + b), \text{where } \theta \text{ is the vector of parameters } (\mathbf{w}, \mathbf{b}) \tag{9}$$

$$\mathbf{g}(\mathbf{z}) = \frac{1}{1 + e^{-x}} \text{ Where } \mathbf{g}(\mathbf{z}) \text{ is the sigmoid function} \tag{10}$$

$$\mathbf{h}\_{\theta^{(\mathbf{x})}} = \frac{\mathbf{1}}{\mathbf{1} + \mathbf{e}^{-(a\mathbf{x}+b)}} \tag{11}$$

To train a logistic classifier h<sup>θ</sup>ð Þ <sup>x</sup> for each class of y prediction gives:

$$\mathbf{y}\_{predict} = \mathbf{g}(\mathbf{z}) = \frac{1}{\mathbf{1} + e^{-x}} \text{ (Sigmoid Function) which is optimized.} \tag{12}$$

### *3.6.3.3 Decision tree*

The building of the decision tree begins with the root and then moves to the subnodes. The nodes represent characteristics features that represent the decision points, hence the information is classified. The nodes are linked at different levels to each other by branches that represent different decisions made by testing the status of the features in the node. It is a type of supervised machine learning algorithm that uses the if and else statement with a tree data structure for its operation [41].

The decision tree algorithm is based on the use of Shannon Entropy which determine the amounts of information of an event. If the probability distribution is given P as P = (p1, p2, P3, … , Pn), with a sample data S, the information carried by this distribution is called the Entropy of P and is given by the following equation, where (Pi) is the probability that the number (i) will appear during the process.

*Ensemble Machine Learning Algorithms for Prediction and Classification of Medical Images DOI: http://dx.doi.org/10.5772/intechopen.100602*

$$\mathbf{P} = (\mathbf{p1}, \mathbf{p2}, \mathbf{P3}, \dots, \mathbf{Pn}) \tag{13}$$

$$\text{Entropy} \left( P \right) = \sum\_{i=1}^{n} \text{Pi } \mathbf{X} \text{ log } \left( \text{Pi} \right) \tag{14}$$

$$\text{Gain}(\text{P}, T) = \text{Entropy}\left(\text{P}\right) - \sum\_{j=1}^{n} (\text{Pj} \to \text{Entropy}(\text{Pj})) \tag{15}$$

Where Pj is a set data of all values of (T).

### *3.6.3.4 Artificial neural network*

Neural networks use set of algorithms to recognize patterns in medical images through machine perception, labelling of the raw input. The layers are made of *nodes* [42]. loosely patterned. A node combines input from the data with a set of coefficients, or weights, that either amplify or dampen that input, thereby assigning significance to inputs concerning the task the algorithm is trying to learn. These inputweight products are summed and then the sum is passed through a node's so-called activation function, to determine whether and to what extent that signal should progress further through the network to affect the outcome (**Figure 5**).

$$\text{The input layer is denoted by } \mathbf{X}\_1, \mathbf{X}\_2, \mathbf{X}\_3, \dots \mathbf{X}\_n. \tag{16}$$

the first input is X1 and the other several inputs to Xn.

The connection weight is also denoted by W1,W2,W3, …Wn*:* (17)

The weight signifies the strength of the node.

$$\mathbf{a} = \sum\_{j=1}^{n} w\_{\theta} \mathbf{x}\_{j} + \mathbf{b} \tag{18}$$

Where a is the output generated from multiple input. The output is a weighted combination of all the inputs. This output a is fed into a transfer function f to produce y.

**Figure 5.** *Neural network modeling and output function.*
