**2.1 Machine learning**

*Epilepsy - Advances in Diagnosis and Therapy*

diagnosis (CAD) has been used in hospitals; it cannot replace the doctor, but it can assist the professionals to diagnose the disease accurately. The main aim of the CAD systems is to identify the disease in early stages of its development. The CAD supportive tool is developed by using highly complex recognition techniques and machine learning algorithms. The CAD systems are approved by US Food and Drug Administration. They can reduce the false negative rate of recognition of diseases. Recent research studies have identified that the performance of CAD is better in the clinical environment. Establishing CAD systems in medical practice contains some risk and complexity. Sometimes, the interpretation of given data may not yield 100% accurate result. It provides only secondary opinion to the physicians. Especially in epileptic seizure detection, machine learning is very difficult because of understanding the brainwaves. The patterns of brainwaves are completely unique to individuals. Since 1998, CAD tools have been useful for diagnosing disease. It does not mean that they are meant for diagnostic purposes, but the approved CAD system can provide accurate results. Early diagnosis of disease is very important for saving life. Different information can be extracted by using medical image and signal technologies such as X-ray, computed tomography (CT), positron emission tomography (PET), single positron emission computed tomography (SPECT), magnetic resonance imaging (MRI), ultrasound, EEG, electrocardiography (ECG), electromyography (EMG), etc. for diagnosing diseases like cancer and coronary artery, cardiovascular, and neurological disorders. CAD supports accurate diagnosis in early stages of a chronic disease. Soft computing techniques are used in computer-aided diagnosis and computer-aided detection. In the earlier stage of computational approaches, the problem-solving methods were carried out using

conventional mathematics and specific analytical models [2].

**Hard computing Soft computing**

There are different types of conventional methods such as Boolean logic, crisp analysis, numerical analysis, deterministic search, analytical model, and binary logic. These conventional methods are also

These techniques commonly use arithmetic, science,

Since conventional methods are used from the beginning of computational science, these traditional ways of approach require a lot of computation time

The hard computing techniques are inaccurate,

The programs that are written by using these

It involves precise input data and sequential

called hard computing techniques

inadequate, and unreliable

techniques are deterministic

*Comparison of hard and soft computing.*

and computing

procedures

The traditional way of computing would be less efficient for problem-solving.

To make something better to handle uncertainty, imprecision, robustness, low solution cost, partial truth, and approximation, the soft computing

It mimics biological procedures and plays a greater role in the development of computational science

The soft computing methods can be approached in

(nondeterministic polynomial)-complete problems

Unlike hard computing, the inputs are adjusted to

best result by maximizing the desired benefit and minimizing the undesired one at low solution cost

The soft computing techniques are developed mainly to get better results for any NP

techniques are introduced

and acts efficiently

optimize the result

It can produce exact but not an approximate answer For any given information, the process can give the

It imitates the model from nature

different ways for finding solution

In the growth of computational science, researchers focus on soft computing in order to overcome the drawbacks of hard computing. Just like artificial

**194**

**Table 1.**

Problem-solving is a challenging task for intelligent entities. It has been proved that "a machine can learn new things." It can adapt to new situations and has an ability to learn from the storage information. Machine learning techniques include artificial neural networks (ANNs), perceptron, and support vector machine (SVM) whereas evolutionary computations include evolutionary algorithms, meta-heuristic and swam intelligence. Just like human brain, a machine is capable of acquiring knowledge from data. It is developed from the field of AI. In order to build intelligent machines, we need machine learning techniques. These techniques deal with huge data in minimum time. There are different types of machine learning methods. They are as follows:


Supervised learning technique is used in majority of analyses. In this technique, the system learns from training examples, whereas in unsupervised learning, the system is challenged to discover some patterns directly from the given data. Classification and regression are two different supervised learning problems. The next section gives detailed description about classification using EEG signals for epileptic seizure detection. Regression gives the statistical relationship between two or more variables. An association rule learning problem and clustering problem are major examples explaining unsupervised learning problems. Association rule learning is based on rule-based machine learning method and used to discover the interesting relationship between variables in a huge database whereas clustering method discovers the patterns from the groupings of given data. Reinforcement learning is the third type of machine learning which learns how to behave in an environment merely by interaction. It is a dynamic way of learning. It learns directly and controls the data (no supervisor). Machine learning algorithms have the ability of learning from data and make predictions and classifications for a model based on the sample

inputs. ANN is a technique composed of artificial neurons (processing units or elements) and mimics the function of the human brain, whereas SVM is based on associate learning method and performs data classification. It separates the data into corresponding groups using hyperplanes. Perceptron and support vector are very similar linear classifiers. A network with no hidden layers is called a single layer perceptron. Back propagation algorithm and perceptron are second-generation neural networks. Back propagation is a technique used to train the neural network in order to minimize the objective function. It can learn from mistakes. It looks for the minimum value of the error function in weight space. The weight that minimizes the error function is then considered to be a solution for the learning problem. "It is a supervised learning method, and is a generalization of the delta rule or gradient descent" [2].Neural networks can be classified as follows:


The back propagation algorithm works as follows: each neuron has an activation function in the neural network with respect to weights wij defined as:

$$A\_j(\overline{\mathcal{X}}, \overline{\mathcal{w}}) = \sum\_{i=0}^{n} \varkappa\_i w\_{ji} \tag{1}$$

The sigmoid function with respect to output function is defined as:

$$O\_f(\overline{\mathcal{X}}, \overline{\omega}) = \frac{1}{1 + e^{A(\overline{\mathcal{X}}, \overline{\mathcal{U}})}} \tag{2}$$

Therefore, the error functions of each neuron in the output are defined as:

$$E\_j(\overline{\mathcal{X}}, \overline{w}, d) = \left( O\_j(\overline{\mathcal{X}}, \overline{w}) - d\_j \right)^2 \tag{3}$$

where dj denotes the jth element of the desired response vector and the sum of the errors in the output layer from all the neurons is defined as:

$$E\_j(\overline{\mathcal{X}}, \imath \overline{\mathcal{w}}, d) = \Sigma\_j \left( O\_j(\overline{\mathcal{X}}, \imath \overline{\mathcal{w}}) - d\_j \right)^2 \tag{4}$$

Since ∆*w* α − \_\_\_ <sup>∂</sup>*<sup>E</sup>* <sup>∂</sup>*w*, the overall error is reduced by using the *gradient descendent* method. The partial derivative of errors with respect to weight using the delta rule is defined as:

$$
\omega\_{\vec{\mu}} = -\eta \frac{\partial E}{\partial w\_{\vec{\mu}}} \tag{5}
$$

where *η* denotes the learning rate parameter. Eqs. (1) and (2) provide the dependency with respect to output as:

$$\frac{\partial E}{\partial O\_j} = \mathcal{Z} \left( O\_j - d\_j \right) \tag{6}$$

**197**

**2.2 Fuzzy logic**

*Components of Soft Computing for Epileptic Seizure Prediction and Detection*

Therefore, the weight adjustment of each neuron (from (5) and (8)) is:

∆*wji* = −2*η*(*Oj* − *dj*)*Oj*(1 − *Oj*) *xi* (9)

Feed forwarding the inputs, calculating the error, and propagating it back to the previous layers are the main steps of an ANN classifier. The error is identified as the difference between the desired response and actual response of the network. Each classifier is based on some learning method. There are different types of learning methods such as error correction learning, memory-based learning, associative learning, neural net learning, genetic learning, etc. SVM is based on the associative learning method. There are many advantages in SVM. The performance of SVM is very competitive with other methods. A drawback is the problem complexity for large sample sizes. Special optimizers are used for optimization. Basically, SVM is a linear classifier that classifies the two different classes (normal and seizure) efficiently. The features of the two classes are categorized by the labels "−1" and "+1."

*n*

where yi denotes the label related to the pattern xi and n refers to the number of

(*x*) = ∑ *i*

where wi denotes the weight vector and b refers to the bias. For the case b = 0, the

features into two classes. The kernel is an algorithm that can produce non-linear decision boundaries. Replacing the normal SVM (linear kernel) dot product with a kernel function defines a Gaussian radial basis function classifier which is expressed as

sigma value is one that has been associated with all the attributes in the dataset. The features are separated into two different classes with respect to their feature label. ANN and SVM are supervised learning methods. Both have different working patterns. SVM with kernels is highly suitable for non-linear mapping functions. The classification process is important because a machine has to learn how to classify the data into groups [3].

Machine learning, fuzzy logic, and evolutionary computations can be applicable for any decision-making problems. Unlike Boolean logic, fuzzy logic is an approach

(x) = 0 produce a hyperplane through the origin, which divides the

represent the two sample data from the dataset. The default

2 /2σ<sup>2</sup>

samples. Dot product or the scalar product of linear classifier is defined as:

= 2(*Oj* − *dj*)*Oj*(1 − *Oj*) *xi* (8)

} (10)

*wixi* (11)

(*x*) + *b* (12)

(13)

*DOI: http://dx.doi.org/10.5772/intechopen.83413*

∂*wji*

= \_\_\_\_ <sup>∂</sup>*<sup>E</sup>* ∂*Oj* ∂*Oj* \_\_\_\_ ∂*wji*

The features that are extracted from the signal are defined as:

*<sup>S</sup>* <sup>=</sup> {(*xi*, *yi*)*i*=1

*W<sup>T</sup>*

This Eq. (11) in the function form is:

set of vectors in *W<sup>T</sup>*

The variables xi and xj

*f*(*x*) = *W<sup>T</sup>*

*k*(*xixj*) = *e*<sup>−</sup>׀׀*xi*−*xj*׀׀

From (6) and (7)

\_\_\_\_ <sup>∂</sup>*<sup>E</sup>*

$$\text{Also}, \frac{\partial O\_j}{\partial w\_{ji}} = \frac{\partial O\_j}{\partial A\_l} \frac{\partial A\_j}{\partial w\_{ji}} = O\_j \{ \mathbf{1} - O\_j \} \,\text{\AA}\_i \tag{7}$$

*Components of Soft Computing for Epileptic Seizure Prediction and Detection DOI: http://dx.doi.org/10.5772/intechopen.83413*

From (6) and (7)

*Epilepsy - Advances in Diagnosis and Therapy*

• single-layer neural network;

• competitive neural network.

• multi-layer neural network; and

*Aj*(*x*¯,*w*¯) = ∑

*Oj*(*x*¯,*w*¯) <sup>=</sup> \_\_\_\_\_\_\_ <sup>1</sup>

*Ej*(*x*¯,*w*¯,*d*) = (*Oj*(*x*¯,*w*¯) − *dj*)

the errors in the output layer from all the neurons is defined as:

*Ej*(*x*¯,*w*¯,*d*) = ∑*<sup>j</sup>* (*Oj*(*x*¯,*w*¯) − *dj*)

*wji* = −*η* \_\_\_\_ <sup>∂</sup>*<sup>E</sup>*

inputs. ANN is a technique composed of artificial neurons (processing units or elements) and mimics the function of the human brain, whereas SVM is based on associate learning method and performs data classification. It separates the data into corresponding groups using hyperplanes. Perceptron and support vector are very similar linear classifiers. A network with no hidden layers is called a single layer perceptron. Back propagation algorithm and perceptron are second-generation neural networks. Back propagation is a technique used to train the neural network in order to minimize the objective function. It can learn from mistakes. It looks for the minimum value of the error function in weight space. The weight that minimizes the error function is then considered to be a solution for the learning problem. "It is a supervised learning method, and is a generalization of the delta rule or gradient descent" [2].Neural networks can be classified as follows:

The back propagation algorithm works as follows: each neuron has an activation

*i*=0 *n*

*xiwji* (1)

<sup>1</sup> <sup>+</sup> *<sup>e</sup><sup>A</sup>*(*x*¯,*w*¯) (2)

= 2(*Oj* − *dj*) (6)

= *Oj*(1 − *Oj*) *xi* (7)

<sup>2</sup> (3)

<sup>2</sup> (4)

(5)

function in the neural network with respect to weights wij defined as:

The sigmoid function with respect to output function is defined as:

Therefore, the error functions of each neuron in the output are defined as:

where dj denotes the jth element of the desired response vector and the sum of

method. The partial derivative of errors with respect to weight using the delta rule

<sup>∂</sup>*w*, the overall error is reduced by using the *gradient descendent*

∂*wji*

where *η* denotes the learning rate parameter. Eqs. (1) and (2) provide the depen-

∂*Oj*

<sup>=</sup> <sup>∂</sup>*Oj* \_\_\_\_ ∂*Aj* ∂*Aj* \_\_\_\_ ∂*wji*

∂*Oj* \_\_\_\_ ∂*wji*

**196**

Since ∆*w* α − \_\_\_ <sup>∂</sup>*<sup>E</sup>*

dency with respect to output as:

Also,

\_\_\_\_ <sup>∂</sup>*<sup>E</sup>*

is defined as:

$$\frac{\partial E}{\partial w\_{j\bar{l}}} = \frac{\partial E}{\partial O\_{\bar{l}}} \frac{\partial O\_{\bar{l}}}{\partial w\_{j\bar{l}}} = \mathcal{Z} \left( O\_{\bar{j}} - d\_{\bar{j}} \right) O\_{\bar{j}} \{ \mathbf{1} - O\_{\bar{j}} \} \infty\_{\bar{i}} \tag{8}$$

Therefore, the weight adjustment of each neuron (from (5) and (8)) is:

$$
\Delta w\_{ji} = -2\eta \left( O\_j - d\_j \right) O\_j \{ 1 - O\_j \} \ x\_i \tag{9}
$$

Feed forwarding the inputs, calculating the error, and propagating it back to the previous layers are the main steps of an ANN classifier. The error is identified as the difference between the desired response and actual response of the network. Each classifier is based on some learning method. There are different types of learning methods such as error correction learning, memory-based learning, associative learning, neural net learning, genetic learning, etc. SVM is based on the associative learning method. There are many advantages in SVM. The performance of SVM is very competitive with other methods. A drawback is the problem complexity for large sample sizes. Special optimizers are used for optimization. Basically, SVM is a linear classifier that classifies the two different classes (normal and seizure) efficiently. The features of the two classes are categorized by the labels "−1" and "+1." The features that are extracted from the signal are defined as:

$$\mathcal{S} = \left\{ \left. \left( \mathcal{X}\_{i\iota} \mathcal{Y}\_{i} \right)\_{i=1}^{n} \right\}\_{i=1}^{n} \right\} \tag{10}$$

where yi denotes the label related to the pattern xi and n refers to the number of samples. Dot product or the scalar product of linear classifier is defined as:

$$\boldsymbol{W}^T(\boldsymbol{\mathcal{X}}) = \sum\_{i} \boldsymbol{w}\_i \boldsymbol{\mathcal{X}}\_i \tag{11}$$

This Eq. (11) in the function form is:

$$f(\mathbf{x}) = \mathbf{W}^T(\mathbf{x}) + \mathbf{b} \tag{12}$$

where wi denotes the weight vector and b refers to the bias. For the case b = 0, the set of vectors in *W<sup>T</sup>* (x) = 0 produce a hyperplane through the origin, which divides the features into two classes. The kernel is an algorithm that can produce non-linear decision boundaries. Replacing the normal SVM (linear kernel) dot product with a kernel function defines a Gaussian radial basis function classifier which is expressed as

$$k\left(\mathbf{x}\_{i}\mathbf{x}\_{j}\right) = e^{-(\mathbf{l}\mathbf{x}\_{i} - \mathbf{x}\_{j})^{2}/2\sigma^{2}}\tag{13}$$

The variables xi and xj represent the two sample data from the dataset. The default sigma value is one that has been associated with all the attributes in the dataset. The features are separated into two different classes with respect to their feature label. ANN and SVM are supervised learning methods. Both have different working patterns. SVM with kernels is highly suitable for non-linear mapping functions. The classification process is important because a machine has to learn how to classify the data into groups [3].

### **2.2 Fuzzy logic**

Machine learning, fuzzy logic, and evolutionary computations can be applicable for any decision-making problems. Unlike Boolean logic, fuzzy logic is an approach

that deals with a problem by the level of truth values which lie between 0 and 1. Fuzzy refers to vagueness. The Boolean logic results in true or false for the question (**Figure 1**) "Is it raining?" but fuzzy logic gives a number in the range from 0 to 1. Here 1.0 represents absolute truth and 0.0 represents absolute false.

This is a logic used for fuzziness. It was introduced in 1965 by Lofti A.Zadeh. Fuzzy classifier is a classifier (algorithm) that uses fuzzy logic for classification and prediction problems. It is based on fuzzy sets (membership functions). The data-driven and trial and error (heuristic) approaches are two different approaches of fuzzy logic. An automated system can be designed using these approaches. Among these approaches, data-driven is most essential for the model to learn and update continuously. Fuzzy logic uses trial and error approach in tuning process for obtaining a satisfactory result. It is a technique that can handle imprecise data and especially analyze crisp/standard data. The data-driven approach is similar to event-driven approach and it is well structured. In classification processes, appropriate features are required to train and test the system. The performance of the system depends on selecting the apt features from the data for modeling the detection system. The heuristic method is not an optimal approach for problem-solving. It gives satisfactory solution. Heuristics, hyper-heuristics, and meta-heuristics are commonly used with machine learning and optimization techniques. Mostly, machine learning techniques are heuristic. Genetic algorithm or any optimization technique can be used to get optimal solution for the given problem. Fuzzy if then rule is the simple form of fuzzy rule based classifier. Fuzzy if-then rule statements are the form of fuzzy logic. Any classifier that uses fuzzy logic is fuzzy rule based classifier. These classifiers are well suited for linear model of classification whereas ANN can predict better on test data. Recently, deep learning has been the popular tool for prediction and detection processes. Fuzzy logic gives multi-value answers, whereas in machine learning, the system learns from data especially with the control or supervisor [2].

### **2.3 Evolutionary computation**

Evolutionary computation (EC) is a subdiscipline of AI and soft computing. In computational intelligence, evolutionary algorithms are inspired by biological systems and give optimal solution for problems. Meta-heuristic and swarm intelligence may also yield enough good solutions for any optimization problem. EC is a computational intelligence method involved in a lot of optimization techniques for problem-solving methods. It is a subfield of AI. The algorithms of EC are inspired by biological evolution. These algorithms can give highly optimum solutions for any kind of problems. Ant colony optimization, genetic algorithm (GA), genetic programming, self-organization maps, competitive learning, and swarm intelligence are some examples of EC techniques. Genetic algorithm is a technique used for optimization in problem-solving of various fields. It is derived from the natural genetic systems. It gives accurate results, exhibits robustness, and produces optimal solution for the problem.

**199**

*Components of Soft Computing for Epileptic Seizure Prediction and Detection*

In computational intelligence, the application program differs among various problems in various fields. GA starts with the production of the initial chromosome in the population. Chromosomes are binary digits representing the control parameters in the coding of the given problem. Like natural reproduction systems, crossover and mutation processes take place for generating a new population. Fitness calculation is evaluated in successive iterations called generations. After several generations, GA selects the best chromosome using probabilistic transition rules and obtains the optimal or closest optimal solution to the problem. In the automated epileptic seizure detection problem, genetic algorithm is used for feature selection. Selecting relevant features is important for the performance of the system [2].

Both probabilistic ideas and logic are used in probabilistic reasoning in order to handle uncertainty situations. Most of the problems use probability and statistics. "Clean data is greater than more data." Machine learns from data. Quality of data is important rather than quantity of data. Bayesian analysis is one of the most important approaches for probabilistic reasoning. Unknown information or imperfectness is the situation of uncertainty. Bayesian inference is a statistical inference based on Bayes theorem that can be used for accurate prediction. It is very useful when the available data are insufficient for solving the problem. Data analysis is a procedure of evaluating data that are gathered from various sources. The soft computing techniques play a challenging part in data analysis. For example, data mining techniques are especially used for discovering new information from a huge database, whereas soft computing techniques mimic the process of human brain in order to find effec-

There is a link between data analysis and soft computing. Data may be qualitative or quantitative. Quantitative data can give exact solution for the problem. The data are pre-processed once they have been collected. The raw data are transformed effectively for the purpose of analysis in the pre-processing stage. Any type of data has to be initially pre-processed for analysis. The main principle of data pre-processing is to eliminate the irrelevant and redundant data (noise data) in order to get better detection accuracy of the system. In signal processing, the error is referred to as an artifact or noise. Unwanted information can be removed from the raw data using noise reduction. Different types of algorithms are available for data pre-processing. For example, in the case of EEG signal processing for epileptic seizure detection, artifacts can occur from physiological or mechanical sources. Respiratory, cardiac/pulse, eye movement, and electromyography signals are biological artifacts [4]. These artifacts should be recognized and eliminated for proper diagnosis. More than one variety of artifacts can appear in the recorded EEG. Preprocessing is the first step in classification and diagnostics where the artifacts have to be removed. After pre-processing, the signals are filtered and free from noise. These filtered signals are used for feature extraction process in the next step.

The process that converts the huge samples to a set of features is called feature extraction and feature selection is the process that filters the redundant or irrelevant features. These methods are used to reduce the actual dimension of the given data.

*DOI: http://dx.doi.org/10.5772/intechopen.83413*

tive solutions for any NP-complete problem.

**3. Epileptic seizure prediction and detection**

**3.1 Feature extraction/selection and classification**

**2.4 Probabilistic ideas**

**Figure 1.** *Example for fuzzy logic.*

*Components of Soft Computing for Epileptic Seizure Prediction and Detection DOI: http://dx.doi.org/10.5772/intechopen.83413*

In computational intelligence, the application program differs among various problems in various fields. GA starts with the production of the initial chromosome in the population. Chromosomes are binary digits representing the control parameters in the coding of the given problem. Like natural reproduction systems, crossover and mutation processes take place for generating a new population. Fitness calculation is evaluated in successive iterations called generations. After several generations, GA selects the best chromosome using probabilistic transition rules and obtains the optimal or closest optimal solution to the problem. In the automated epileptic seizure detection problem, genetic algorithm is used for feature selection. Selecting relevant features is important for the performance of the system [2].
