Dataset Normal Abnormal Irregular Total Training samples data 944–25,693 8800 3520 763 12,320 Validation samples data 944–25,693 280 280 56 560 Small-scale testing set data 944–25,693 8387 3402 822 11,789 Large-scale testing set data 25694–179,130 85,141 66,133 9172 151,274

seen on our website (http://58.210.56.164:88/ccdd/).

Figure 3. Disease types in the CCDD.

than 9.625 s.

Table 1. Data distribution.

The rest of the chapter is organized as follows: in Section 2, the CCDD used in this study is first introduced; in Section 3, the proposed method is described in detail; then in Section4, the experimental results, as well as comparison with the published results are presented; discussions and conclusions are provided in Section 5 and 6, respectively.

## 2. Dataset

There are 193,690 standard 12-lead ECG records with about 10–20s in duration and sampling frequency of 500 Hz in the CCDD. These data were successively obtained from hospitals (i.e., real clinical environment) located at different districts in Shanghai, Suzhou and Changsha. For data1–251, all the records have detailed heartbeat annotations including P-onset, P-peaks, P-offset, QRS-onset, R-peaks, QRS-offset, T-onset, T-peaks, T-offset and disease type. For data252–943, all the records only have heartbeat annotations for R-peak position and disease type. Just like the MIT-BIH-AR database, all the records in data1–943 were chosen purposefully and two or more cardiologists provided the annotations. As heartbeat annotations are available, we can carry out research on heartbeat classification. For data944–193,690, all the records only have diagnostic conclusions provided by one cardiologist. Note that a diagnostic conclusion may contain more than one disease type.

As shown in Figure 3, we use hexadecimal codes, that is, form of "0dddddd", to encode disease types divided into three grades. There are 12 first-grade types (i.e., Invalid ECG,


Figure 3. Disease types in the CCDD.

achieved an accuracy of 83.66% with a specificity of 83.84% and a sensitivity of 83.43% when

To improve the classification performance further, this study extends our prior work in the following aspects. (1) Two different LCNN models are obtained through different filtering methods (a low-pass filter and a band-pass filter) and different training methods (the explicit method and the implicit method), and then the multipoint-prediction technology and the Bayesian fusion method are successively applied to them. (2) Four effective disease rules based on R peaks are developed for further analysis. (3) The final classification result is determined

The rest of the chapter is organized as follows: in Section 2, the CCDD used in this study is first introduced; in Section 3, the proposed method is described in detail; then in Section4, the experimental results, as well as comparison with the published results are presented; discus-

There are 193,690 standard 12-lead ECG records with about 10–20s in duration and sampling frequency of 500 Hz in the CCDD. These data were successively obtained from hospitals (i.e., real clinical environment) located at different districts in Shanghai, Suzhou and Changsha. For data1–251, all the records have detailed heartbeat annotations including P-onset, P-peaks, P-offset, QRS-onset, R-peaks, QRS-offset, T-onset, T-peaks, T-offset and disease type. For data252–943, all the records only have heartbeat annotations for R-peak position and disease type. Just like the MIT-BIH-AR database, all the records in data1–943 were chosen purposefully and two or more cardiologists provided the annotations. As heartbeat annotations are available, we can carry out research on heartbeat classification. For data944–193,690, all the records only have diagnostic conclusions provided by one cardiologist. Note that a diagnostic

As shown in Figure 3, we use hexadecimal codes, that is, form of "0dddddd", to encode disease types divided into three grades. There are 12 first-grade types (i.e., Invalid ECG,

testing 151,274 records. Figure 2 depicts the whole process flow.

sions and conclusions are provided in Section 5 and 6, respectively.

conclusion may contain more than one disease type.

by utilizing the bias-average method.

Figure 2. LCNN-based method for record classification.

298 Artificial Intelligence - Emerging Trends and Applications

2. Dataset

Normal ECG, Sinus Rhythm, Atrial Arrhythmia, Junctional Rhythm, Ventricular Arrhythmia, Conduction Block, Atrial Hypertrophy, Ventricular Hypertrophy, Myocardial Infarction, ST-T Change and Other Abnormalities), 72 second-grade types and 335 third-grade types, including all the possible diagnostic conclusions provided by cardiologists in clinic. More details can be seen on our website (http://58.210.56.164:88/ccdd/).

In telemedicine centers, ECG records that are not explicitly diagnosed as "normal" all need to be further interpreted by domain experts, so that the potential cardiovascular disease of a patient can be detected as early as possible. Therefore, in this study we only regard ECG records whose diagnostic conclusions are "001" or "0020101" or "0020102" as "normal" (denoted as 0-class) and all the others as "abnormal" (denoted as 1-class). Moreover, we throw away some exception data, that is, the diagnostic conclusion is "000" or the duration is less than 9.625 s.

The training and testing sets are organized as follows [26]: "data944–25,693" is partitioned into three parts where the numbers of training samples, validation samples and testing samples (i.e., the small-scale testing set) are 12,320, 560 and 11,789, respectively, and the large-scale testing set is composed of 151,274 ECG records from data25694–179,130. Note that we will combine training samples and validation samples together for implicit training. Table 1


Table 1. Data distribution.

summarizes the detailed information, where "irregular" denotes the abnormal ECG records whose heart rhythms are irregular (their diagnostic conclusions are "00202").

Essentially, Figure 4 describes an ensemble method including two homogeneous classifiers (i.e., LCNN[A] and LCNN[B]) and five heterogeneous classifiers (i.e., RI[A], RI[B], RI[C], RI [D], LCNN[AB] consisting of LCNN[A] and LCNN[B]). For the homogeneous ensemble, different base classifiers must be obtained if we want to be able to enhance classification performance. In fact, our generation strategy has certain advantages over some well-known ensemble methods such as Bagging and AdaBoost [28]. For the heterogeneous ensemble, different classifiers should be complementary to each other. Since LCNNs are not good at detecting abnormal heart rate and irregular heart rhythm, we use simple disease rules. Next,

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

http://dx.doi.org/10.5772/intechopen.75546

301

Generally speaking, ECG records collected in clinics are often contaminated by several types of interfering noise, such as low-frequency waves (less than 0.5 Hz) caused by breathing movements, high-frequency waves (50 or 60 Hz) caused by electricity, and biological waves (about 33 Hz) caused by physical activities. Although we can use many methods to de-noise, some useful information may be lost after doing that. Since LCNNs which belong to deep learning [29, 30] have the ability to capture useful information but ignore interfering noise after learning

An effective strategy for homogeneous ensemble learning is to make input data different, so we apply a low-pass filter and a band-pass filter of 0.5–40 Hz [31] to the ECG record, respectively. Of course, there is no problem if we exchange the low-pass filter for the band-pass filter; here we just make one path consistent with our prior work. After that, we conduct the downsampling operation (from original 500 to 200 Hz) and extract a data segment of 9.5 s from the incoming ECG record after ignoring the first 0.125 s. Only eight basic leads, namely II, III, V1, V2, V3, V4, V5 and V6 are reserved since the remaining four leads can be linearly derived from

Convolutional neural networks (CNNs) [32] have been successful in the field of ECG signal classification. Zhu used the CNN-based method for both heartbeat classification and record classification, and obtained an accuracy of 99.20% on the MIT-BIH-AR database with 47,190 heartbeats in the "intra-patient" evaluation, and an accuracy of 83.49% and 0.8819 AUC on the CCDD with 11,760 ECG records, respectively [24]. Kiranyaz applied CNNs on 1-lead ECGs for patient-specific heartbeat classification and achieved good results [33, 34]. However, multilead ECGs are different from two-dimensional images: data in the horizontal direction (intralead) are relevant while data in the vertical direction (inter-lead) are independent. For this, our prior work proposed lead convolutional neural networks (LCNNs) for multi-lead ECGs, which have better classification performance. Figure 5 shows an example of three-stage LCNN.

In Figure 5, CU denotes a convolution unit consisting of a convolutional layer and a subsampling (max-pooling) layer; 1DCov and SubSamp denote the one-dimensional convolution

from a certain number of training samples, we do not perform special de-noising.

them. As a result, each ECG record consists of 8 1900 sampling points.

we will introduce each of the steps in detail.

3.1.2. Lead convolutional neural network

3.1. Statistical learning

3.1.1. Preprocessing

## 3. Methodologies

Figure 4 shows the overall framework of the proposed method. As we can see, it consists of three parts, namely, statistical learning, rule inference as well as summarizing. In the part of statistical learning, the ECG record is first preprocessed by two different methods respectively, and then two probability values are outputted by utilizing LCNNs and the multipointprediction technology (LCNN[A] and LCNN[B] have the same architecture and are obtained by "explicit training" and "implicit training", respectively). After that, we use the Bayesian fusion method to incorporate the two outputs. In the part of rule inference, all R-peak positions in the ECG record are detected first [27], and then four disease rules based on them are used for further analysis. Finally, in the part of summarizing, we utilize the bias-average method to determine the classification result, that is, "normal" or "abnormal."

Figure 4. Overall framework of the proposed method.

Essentially, Figure 4 describes an ensemble method including two homogeneous classifiers (i.e., LCNN[A] and LCNN[B]) and five heterogeneous classifiers (i.e., RI[A], RI[B], RI[C], RI [D], LCNN[AB] consisting of LCNN[A] and LCNN[B]). For the homogeneous ensemble, different base classifiers must be obtained if we want to be able to enhance classification performance. In fact, our generation strategy has certain advantages over some well-known ensemble methods such as Bagging and AdaBoost [28]. For the heterogeneous ensemble, different classifiers should be complementary to each other. Since LCNNs are not good at detecting abnormal heart rate and irregular heart rhythm, we use simple disease rules. Next, we will introduce each of the steps in detail.

#### 3.1. Statistical learning

## 3.1.1. Preprocessing

summarizes the detailed information, where "irregular" denotes the abnormal ECG records

Figure 4 shows the overall framework of the proposed method. As we can see, it consists of three parts, namely, statistical learning, rule inference as well as summarizing. In the part of statistical learning, the ECG record is first preprocessed by two different methods respectively, and then two probability values are outputted by utilizing LCNNs and the multipointprediction technology (LCNN[A] and LCNN[B] have the same architecture and are obtained by "explicit training" and "implicit training", respectively). After that, we use the Bayesian fusion method to incorporate the two outputs. In the part of rule inference, all R-peak positions in the ECG record are detected first [27], and then four disease rules based on them are used for further analysis. Finally, in the part of summarizing, we utilize the bias-average method to

whose heart rhythms are irregular (their diagnostic conclusions are "00202").

determine the classification result, that is, "normal" or "abnormal."

Figure 4. Overall framework of the proposed method.

3. Methodologies

300 Artificial Intelligence - Emerging Trends and Applications

Generally speaking, ECG records collected in clinics are often contaminated by several types of interfering noise, such as low-frequency waves (less than 0.5 Hz) caused by breathing movements, high-frequency waves (50 or 60 Hz) caused by electricity, and biological waves (about 33 Hz) caused by physical activities. Although we can use many methods to de-noise, some useful information may be lost after doing that. Since LCNNs which belong to deep learning [29, 30] have the ability to capture useful information but ignore interfering noise after learning from a certain number of training samples, we do not perform special de-noising.

An effective strategy for homogeneous ensemble learning is to make input data different, so we apply a low-pass filter and a band-pass filter of 0.5–40 Hz [31] to the ECG record, respectively. Of course, there is no problem if we exchange the low-pass filter for the band-pass filter; here we just make one path consistent with our prior work. After that, we conduct the downsampling operation (from original 500 to 200 Hz) and extract a data segment of 9.5 s from the incoming ECG record after ignoring the first 0.125 s. Only eight basic leads, namely II, III, V1, V2, V3, V4, V5 and V6 are reserved since the remaining four leads can be linearly derived from them. As a result, each ECG record consists of 8 1900 sampling points.

## 3.1.2. Lead convolutional neural network

Convolutional neural networks (CNNs) [32] have been successful in the field of ECG signal classification. Zhu used the CNN-based method for both heartbeat classification and record classification, and obtained an accuracy of 99.20% on the MIT-BIH-AR database with 47,190 heartbeats in the "intra-patient" evaluation, and an accuracy of 83.49% and 0.8819 AUC on the CCDD with 11,760 ECG records, respectively [24]. Kiranyaz applied CNNs on 1-lead ECGs for patient-specific heartbeat classification and achieved good results [33, 34]. However, multilead ECGs are different from two-dimensional images: data in the horizontal direction (intralead) are relevant while data in the vertical direction (inter-lead) are independent. For this, our prior work proposed lead convolutional neural networks (LCNNs) for multi-lead ECGs, which have better classification performance. Figure 5 shows an example of three-stage LCNN.

In Figure 5, CU denotes a convolution unit consisting of a convolutional layer and a subsampling (max-pooling) layer; 1DCov and SubSamp denote the one-dimensional convolution

<sup>f</sup> covð Þ¼ <sup>v</sup> [

8

>>>>>>>>>>>>>>>>><

>>>>>>>>>>>>>>>>>:

in the j-th unit of the i-th layer.

3.1.3. Training method

<sup>f</sup> subð Þ¼ <sup>v</sup> [

i-th layer, respectively, U denotes the union operation. vij

j, k vk ij <sup>¼</sup> [ j, k

j, k vk ij <sup>¼</sup> [ j, k

6, 7 and 5, respectively; the number of the neurons in the FC layer is 50.

that classifiers with good generalization performance can be obtained.

bij <sup>þ</sup><sup>X</sup> m

φ max [

BBBBBBBBB@

0

q ¼ ð Þ k � 1 Qi þ 1, ð Þ k � 1 Qi þ 2, …, kQi

0 @

0

BBBBBBBBBB@

Here, Pi and Qi denote the size of the convolutional kernel and the sub-sampling step in the

neuron (start from 1) in the <sup>j</sup>-th feature map (unit) of the <sup>i</sup>-th layer. wij,(i � 1)mp is the weight that connects the j-th unit of the i-th layer with the m-th unit of the (i � 1)-th layer, and bij is the bias

The setting of each parameter (such as Pi, Qi and the number of CUs for each lead) will greatly influence the classification performance, and too large or too small parameters all result in unfavorable outcome. In practical applications, they can be determined by a trial-and-error method. To keep things simple and ensure a fair comparison, we use two 3-stages LCNNs with the same architecture, namely LCNN[A] and LCNN[B] in Figure 4, and the corresponding parameters are the same as those in our prior work. The number of the neurons in the input layer is 8 � 1700; the sizes of three convolutional kernels are 21, 13 and 9, respectively; the sizes of three sub-sampling steps are 7, 6, and 6, respectively; the numbers of three feature maps are

To obtain different LCNN models, we develop two training methods for LCNN based on minibatch gradient descent [35] in supervised learning, that is, the explicit method and the implicit method. The main difference between them exists in validation mechanisms. Since gradient descent is a typical local search algorithm, we incorporate "translating starting point" and "adding noise" into it to increase the number of training samples presented to the network, so

The explicit method is commonly used for training neural networks, which utilizes independent validation samples to evaluate the obtained classifier during the training phase. As shown in Figure 6(a), the training process can be described as follows: a random 8 � 1700 local segment is extracted from the 8 � 1900 training sample first (translating starting point), and then it is added with random signals whose maximal amplitude is less than 0.15 millivolts (adding noise) with high probability, finally back propagation is invoked. When Batchsize training samples have been presented to the network, the weights will be adjusted and the current LCNN model will be tested by the particular 8 � 1700 local segments (start from the first position) extracted from 8 � 1900 validation samples. If the accuracy is the best up to the

P Xi�1 p¼0 wp ij,ð Þ <sup>i</sup>�<sup>1</sup> <sup>m</sup><sup>v</sup>

ð Þ kþp ð Þ i�1 m

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

v q ð Þ i�1 j 1 A

http://dx.doi.org/10.5772/intechopen.75546

1

1

CCCCCCCCCA

CCCCCCCCCCA

<sup>k</sup> denotes the output value of the k-th

(2)

303

Figure 5. Architecture of a three-stage LCNN.

operation and the sub-sampling operation, respectively. The computational process consists of data for each lead going through three different CUs, and then inputting information from all leads into a fully connected (FC) layer. Finally, the logistic-regression (LR) layer output the predictive value. In fact, we can regard each CU as a feature extractor, and the subsequent multilayer perceptron consisting of the FC layer and the LR layer as a classifier.

To describe the LCNN clearly, we present its computational formula as follows: let [x1, x2, …, x8] be the incoming ECG record where xi (a vector, 1 < =i < =8) is the data of the i-th lead, then

$$\begin{cases} f(\mathbf{x}) = \mathcal{g}\_E \left( \mathcal{g}\_D \left( \bigcup\_{i=1}^8 \mathcal{g}\_{C\_i} \left( \mathcal{g}\_{B\_i} \left( \mathcal{g}\_{A\_i} (\mathbf{x}\_i) \right) \right) \right) \right) \\\\ \mathcal{g}\_D(\mathbf{x}) = \mathcal{q} \left( \mathcal{W}^T \mathbf{x} + b \right) \\\\ \mathcal{g}\_E(\mathbf{x}) = \frac{1}{1 + e^{-\left( \mathcal{W}^T \mathbf{x} + b \right)}} \end{cases} \tag{1}$$

Here, gE and gD are the computational formulas for the LR layer and the FC layer, respectively, Wand b are the weights and the biases of the corresponding layer, ϕ(x) is the sigmoid function. gAi, gBi and gCi (1 < = i < = 8) are the computational formulas for the CUs whose expression are all fsub(fcov(x)), the only differences between them exist in weights and biases. For the i-th lead, ECG data is inputted into gAi first, and then the output of gAi continues to be inputted into gBi and gCi. After that, the outputs of gC1, gC2, …, gC8 (namely 8 vectors) are concatenated together (namely 1 vector) using the union operation U. Finally, the resulting vector is inputted into gD and gE successively and a value ranging from 0 to 1 is outputted. Note that both the input and the output of each function are vectors and the expressions of fsub(x) and fcov(x) are given by

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning http://dx.doi.org/10.5772/intechopen.75546 303

$$\begin{cases} f\_{\text{cov}}(\boldsymbol{v}) = \bigcup\_{j,k} \boldsymbol{v}\_{\boldsymbol{\mathcal{v}}}^{k} = \bigcup\_{j,k} \left( b\_{\boldsymbol{\mathcal{v}}} + \sum\_{m} \sum\_{p=0}^{p\_{i}-1} \boldsymbol{w}\_{\boldsymbol{\mathcal{v}},(i-1)m}^{p} \boldsymbol{v}\_{(i-1)m}^{(k+p)} \right) \\\\ f\_{\text{sub}}(\boldsymbol{v}) = \bigcup\_{j,k} \boldsymbol{v}\_{\boldsymbol{\mathcal{v}}}^{k} = \bigcup\_{j,k} \boldsymbol{\mathcal{q}} \left( \text{max} \left( \bigcup\_{\substack{q-(k-1)Q\_{i}+1,\\(k-1)Q\_{i}+2,\\(k-1)Q\_{i}+2,\\\cdots\\kQ\_{i} \end{cases}} \boldsymbol{v}\_{(i-1)j}^{q} \right) \end{cases} \tag{2}$$

Here, Pi and Qi denote the size of the convolutional kernel and the sub-sampling step in the i-th layer, respectively, U denotes the union operation. vij <sup>k</sup> denotes the output value of the k-th neuron (start from 1) in the <sup>j</sup>-th feature map (unit) of the <sup>i</sup>-th layer. wij,(i � 1)mp is the weight that connects the j-th unit of the i-th layer with the m-th unit of the (i � 1)-th layer, and bij is the bias in the j-th unit of the i-th layer.

The setting of each parameter (such as Pi, Qi and the number of CUs for each lead) will greatly influence the classification performance, and too large or too small parameters all result in unfavorable outcome. In practical applications, they can be determined by a trial-and-error method. To keep things simple and ensure a fair comparison, we use two 3-stages LCNNs with the same architecture, namely LCNN[A] and LCNN[B] in Figure 4, and the corresponding parameters are the same as those in our prior work. The number of the neurons in the input layer is 8 � 1700; the sizes of three convolutional kernels are 21, 13 and 9, respectively; the sizes of three sub-sampling steps are 7, 6, and 6, respectively; the numbers of three feature maps are 6, 7 and 5, respectively; the number of the neurons in the FC layer is 50.

#### 3.1.3. Training method

operation and the sub-sampling operation, respectively. The computational process consists of data for each lead going through three different CUs, and then inputting information from all leads into a fully connected (FC) layer. Finally, the logistic-regression (LR) layer output the predictive value. In fact, we can regard each CU as a feature extractor, and the subsequent

To describe the LCNN clearly, we present its computational formula as follows: let [x1, x2, …, x8] be the incoming ECG record where xi (a vector, 1 < =i < =8) is the data of the i-th lead, then

> [ 8

gCi gBi gAi

� � � � ! !

ð Þ xi

(1)

i¼1

1

� <sup>W</sup><sup>T</sup> ð Þ <sup>x</sup>þ<sup>b</sup>

Here, gE and gD are the computational formulas for the LR layer and the FC layer, respectively, Wand b are the weights and the biases of the corresponding layer, ϕ(x) is the sigmoid function. gAi, gBi and gCi (1 < = i < = 8) are the computational formulas for the CUs whose expression are all fsub(fcov(x)), the only differences between them exist in weights and biases. For the i-th lead, ECG data is inputted into gAi first, and then the output of gAi continues to be inputted into gBi and gCi. After that, the outputs of gC1, gC2, …, gC8 (namely 8 vectors) are concatenated together (namely 1 vector) using the union operation U. Finally, the resulting vector is inputted into gD and gE successively and a value ranging from 0 to 1 is outputted. Note that both the input and the output of each function are vectors and the expressions of fsub(x) and fcov(x) are

multilayer perceptron consisting of the FC layer and the LR layer as a classifier.

f xð Þ¼ gE gD

8

Figure 5. Architecture of a three-stage LCNN.

302 Artificial Intelligence - Emerging Trends and Applications

>>>>>>>>>><

>>>>>>>>>>:

given by

gEð Þ¼ x

gDð Þ¼ <sup>x</sup> <sup>φ</sup> <sup>W</sup>Tx <sup>þ</sup> <sup>b</sup> � �

1 þ e

To obtain different LCNN models, we develop two training methods for LCNN based on minibatch gradient descent [35] in supervised learning, that is, the explicit method and the implicit method. The main difference between them exists in validation mechanisms. Since gradient descent is a typical local search algorithm, we incorporate "translating starting point" and "adding noise" into it to increase the number of training samples presented to the network, so that classifiers with good generalization performance can be obtained.

The explicit method is commonly used for training neural networks, which utilizes independent validation samples to evaluate the obtained classifier during the training phase. As shown in Figure 6(a), the training process can be described as follows: a random 8 � 1700 local segment is extracted from the 8 � 1900 training sample first (translating starting point), and then it is added with random signals whose maximal amplitude is less than 0.15 millivolts (adding noise) with high probability, finally back propagation is invoked. When Batchsize training samples have been presented to the network, the weights will be adjusted and the current LCNN model will be tested by the particular 8 � 1700 local segments (start from the first position) extracted from 8 � 1900 validation samples. If the accuracy is the best up to the

used for evaluating the obtained model. As shown in Figure 6(b), most of the steps are the same as those shown in Figure 6(a). When Batchsize training samples have been presented to the network, the weights will be adjusted and the current LCNN model will be tested by the particular 8 � 1700 local segments (start from the first position) extracted from the 8 � 1900 training samples, which are used between two adjacent weight-updating processes. At the end of each epoch, the current LCNN model will be saved if the total accuracy is the best up to the

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

http://dx.doi.org/10.5772/intechopen.75546

305

While some may think this method would result in overfitting, this is a false assumption. We extract random local segments and add random signals to them with high probability for training. Differently, we extract particular local segments and do not add random signals to

To be consistent with our prior work, we employ the back propagation algorithm of inertia moment and variable step [36] and do not use any unsupervised pre-training. The relevant parameters are set as follows: the initial step length is 0.02, the step decay is 0.01 except for the second and the third epoch (set as 0.0505), Batchsize is 560, and the maximum number of

We can immediately get the classification result after inputting a 8 � 1700 local segment extracted from the incoming ECG record into the obtained LCNN model. This is the method our prior work adopted, namely the single-point-prediction technology. However, the output of the LCNN is a probability value ranging from 0 to 1 and the classification confidence is low if the value is around 0.5. For this, we develop a new testing method, namely the multipoint-

As shown in Figure 7, nine 8 � 1700 local segments which start from the 1st, 26th, 51st, 76th, 101st, 126th, 151st, 176th and 201st positions are extracted from the ECG record, respectively, then their predictive values outputted by the LCNN are aggregated by the average rule (for

In this study, we employ a Bayesian fusion method [37] to incorporate the outputs of the two LCNNs: given M classifiers and K classes, the predicted class k can be determined based on the

Here, P(y = i|cm) is the probability value predicted on the i-th class by the classifier cm. What we focus on is a binary classification problem (i.e., "normal" vs. "abnormal"), thus K = 2, and

M X M

f g P yð Þ ¼ ijc1; c2;…; cm

m¼1

P yð Þ ¼ ijcm

(3)

P y<sup>ð</sup> <sup>¼</sup> <sup>i</sup>jc1; <sup>c</sup>2;…; cmÞ ¼ <sup>1</sup>

them for validation. Therefore, the probability of overlap is very small.

simplicity's sake), so that the classification confidence can be enhanced.

final probability estimates P(y = i|c1, c2, …, cm) and is given by

8 >><

>>:

k ¼ arg max 1 ≤ i ≤ K

present.

training epochs is 500.

prediction technology.

3.1.5. Bayesian fusion method

3.1.4. Multipoint prediction technology

Figure 6. (a). Pseudo-code of explicit training method. (b). Pseudo-code of implicit training method.

present, the current LCNN model will be saved. Of course, the training will stop when reaching the maximum number of epochs.

Besides the explicit method, we develop an implicit method for training LCNN, whose main characteristic is that a small number of local segments extracted from training samples are used for evaluating the obtained model. As shown in Figure 6(b), most of the steps are the same as those shown in Figure 6(a). When Batchsize training samples have been presented to the network, the weights will be adjusted and the current LCNN model will be tested by the particular 8 � 1700 local segments (start from the first position) extracted from the 8 � 1900 training samples, which are used between two adjacent weight-updating processes. At the end of each epoch, the current LCNN model will be saved if the total accuracy is the best up to the present.

While some may think this method would result in overfitting, this is a false assumption. We extract random local segments and add random signals to them with high probability for training. Differently, we extract particular local segments and do not add random signals to them for validation. Therefore, the probability of overlap is very small.

To be consistent with our prior work, we employ the back propagation algorithm of inertia moment and variable step [36] and do not use any unsupervised pre-training. The relevant parameters are set as follows: the initial step length is 0.02, the step decay is 0.01 except for the second and the third epoch (set as 0.0505), Batchsize is 560, and the maximum number of training epochs is 500.

## 3.1.4. Multipoint prediction technology

We can immediately get the classification result after inputting a 8 � 1700 local segment extracted from the incoming ECG record into the obtained LCNN model. This is the method our prior work adopted, namely the single-point-prediction technology. However, the output of the LCNN is a probability value ranging from 0 to 1 and the classification confidence is low if the value is around 0.5. For this, we develop a new testing method, namely the multipointprediction technology.

As shown in Figure 7, nine 8 � 1700 local segments which start from the 1st, 26th, 51st, 76th, 101st, 126th, 151st, 176th and 201st positions are extracted from the ECG record, respectively, then their predictive values outputted by the LCNN are aggregated by the average rule (for simplicity's sake), so that the classification confidence can be enhanced.

## 3.1.5. Bayesian fusion method

present, the current LCNN model will be saved. Of course, the training will stop when

Figure 6. (a). Pseudo-code of explicit training method. (b). Pseudo-code of implicit training method.

Besides the explicit method, we develop an implicit method for training LCNN, whose main characteristic is that a small number of local segments extracted from training samples are

reaching the maximum number of epochs.

304 Artificial Intelligence - Emerging Trends and Applications

In this study, we employ a Bayesian fusion method [37] to incorporate the outputs of the two LCNNs: given M classifiers and K classes, the predicted class k can be determined based on the final probability estimates P(y = i|c1, c2, …, cm) and is given by

$$\begin{cases} P(y = i | \mathbf{c}\_1, \mathbf{c}\_2, \dots, \mathbf{c}\_m) = \frac{1}{M} \sum\_{m=1}^M P(y = i | \mathbf{c}\_m) \\\ k = \underset{1 \le i \le K}{\arg\max} \left\{ P(y = i | \mathbf{c}\_1, \mathbf{c}\_2, \dots, \mathbf{c}\_m) \right\} \end{cases} \tag{3}$$

Here, P(y = i|cm) is the probability value predicted on the i-th class by the classifier cm. What we focus on is a binary classification problem (i.e., "normal" vs. "abnormal"), thus K = 2, and

The four disease rules are defined as follows:

The formula for calculating heart rate is given by

2. Irregular heart rhythm based on local characteristics.

denoted as the RI[B] classifier.

� � � �

3. Irregular heart rhythm based on local and global characteristics.

� � � �

8 >>><

>>>:

4. Irregular heart rhythm based on global characteristics.

normal. This rule is denoted as the RI[D] classifier.

of RR intervals is greater than (0.05 � AvgRR), that is, satisfies

This rule is denoted as the RI[C] classifier.

neighboring RR intervals is greater than 0.05, that is, ∃k∈ ½ � 1; n � 1 satisfies

std Riþ<sup>2</sup> � Riþ<sup>1</sup> Riþ<sup>1</sup> � Ri

ð Þ� Rkþ<sup>1</sup> � Rk AvgRR AvgRR

HR <sup>¼</sup> <sup>60</sup> � fs � ð Þ <sup>n</sup> � <sup>1</sup> Rn � R<sup>1</sup>

Normal heart rate is defined as 60–100 CPM (i.e., counts per minute) clinically; here we regard 59–101 CPM as normal, otherwise as abnormal. This rule is denoted as the RI[A] classifier.

The first rule for detecting irregular heart rhythm is defined as follows: three successive RR

We regard an ECG record as abnormal if Eq. (6) is satisfied, otherwise as normal. This rule is

The second rule for detecting irregular heart rhythm is defined as follows: one RR interval exceeds 15% of the average RR interval and the standard deviation of the rates between two

> � � � � � �� <sup>&</sup>gt; <sup>0</sup>:<sup>05</sup> �

Likewise, we regard an ECG record as abnormal if Eq. (7) is satisfied, otherwise as normal.

The last rule for detecting irregular heart rhythm is defined as follows: the standard deviation

In the same way, we regard an ECG record as abnormal if Eq. (8) is satisfied, otherwise as

Of course, false R-peak detections will influence the classification performance of the four disease rules and result in an unfavorable outcome. In this study, we utilize the Zhu method

� � � � > 0:15

1 ≤ i ≤ n � 2

std Riþ<sup>1</sup> � Ri ð Þ f g j1 ≤ i ≤ n � 1 > 0:05 � AvgRR (8)

� � � �

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

http://dx.doi.org/10.5772/intechopen.75546

> 0:15 (6)

intervals exceed 15% of the average RR interval, that is, ∃k∈ ½ � 1; n � 3 , ∀j ∈½ � 1; 3 satisfies

� � � AvgRR AvgRR

Rkþ<sup>j</sup> � Rkþj�<sup>1</sup>

(5)

307

(7)

1. Heart rate

Figure 7. Pseudo-code of testing process.

obviously, M = 2. Of course, other fusion methods can be employed for this purpose. For example, we can train one logistic regression using the concatenation of the outputs of both LCNNs, and it may be able to give better results. The part of statistical learning is denoted as the LCNN[AB] classifier.

#### 3.2. Rule inference

R-peaks detection algorithms have reached a very high level at present [38]. On the other hand, there is no doubt that an ECG record is abnormal if it is identified as a specific disease. Hence, we develop four disease rules based on R peaks to detect abnormal heart rate and irregular heart rhythm.

Let fs be the sampling frequency, Ri (1 < = i < = n) be the i-th R-peak position in an ECG record, and std.(.) be the function of calculating the standard deviation. The formula for calculating the average RR interval is given by

$$AvgRR = \frac{1}{n-1} \sum\_{i=2}^{n} \left( R\_i - R\_{i-1} \right) \tag{4}$$

The four disease rules are defined as follows:

#### 1. Heart rate

obviously, M = 2. Of course, other fusion methods can be employed for this purpose. For example, we can train one logistic regression using the concatenation of the outputs of both LCNNs, and it may be able to give better results. The part of statistical learning is denoted as

R-peaks detection algorithms have reached a very high level at present [38]. On the other hand, there is no doubt that an ECG record is abnormal if it is identified as a specific disease. Hence, we develop four disease rules based on R peaks to detect abnormal heart rate and irregular

Let fs be the sampling frequency, Ri (1 < = i < = n) be the i-th R-peak position in an ECG record, and std.(.) be the function of calculating the standard deviation. The formula for calculating

> Xn i¼2

ð Þ Ri � Ri�<sup>1</sup> (4)

n � 1

AvgRR <sup>¼</sup> <sup>1</sup>

the LCNN[AB] classifier.

Figure 7. Pseudo-code of testing process.

306 Artificial Intelligence - Emerging Trends and Applications

the average RR interval is given by

3.2. Rule inference

heart rhythm.

The formula for calculating heart rate is given by

$$HR = \frac{60 \times f\mathbf{\hat{s}} \times (n - 1)}{R\_n - R\_1} \tag{5}$$

Normal heart rate is defined as 60–100 CPM (i.e., counts per minute) clinically; here we regard 59–101 CPM as normal, otherwise as abnormal. This rule is denoted as the RI[A] classifier.

2. Irregular heart rhythm based on local characteristics.

The first rule for detecting irregular heart rhythm is defined as follows: three successive RR intervals exceed 15% of the average RR interval, that is, ∃k∈ ½ � 1; n � 3 , ∀j ∈½ � 1; 3 satisfies

$$\left| \frac{\left(R\_{k+j} - R\_{k+j-1}\right) - A \text{vg}RR}{A \text{vg}RR} \right| > 0.15\tag{6}$$

We regard an ECG record as abnormal if Eq. (6) is satisfied, otherwise as normal. This rule is denoted as the RI[B] classifier.

3. Irregular heart rhythm based on local and global characteristics.

The second rule for detecting irregular heart rhythm is defined as follows: one RR interval exceeds 15% of the average RR interval and the standard deviation of the rates between two neighboring RR intervals is greater than 0.05, that is, ∃k∈ ½ � 1; n � 1 satisfies

$$\begin{cases} \left| \frac{| (R\_{k+1} - R\_k) - A \text{vg} \text{RR} |}{A \text{vg} \text{RR}} \right| > 0.15\\\\ \text{std} \left( \left\{ \frac{R\_{i+2} - R\_{i+1}}{R\_{i+1} - R\_i} \middle| 1 \le i \le n - 2 \right\} \right) > 0.05 \end{cases} \tag{7}$$

Likewise, we regard an ECG record as abnormal if Eq. (7) is satisfied, otherwise as normal. This rule is denoted as the RI[C] classifier.

4. Irregular heart rhythm based on global characteristics.

The last rule for detecting irregular heart rhythm is defined as follows: the standard deviation of RR intervals is greater than (0.05 � AvgRR), that is, satisfies

$$\text{std}(\{R\_{i+1} - R\_i | 1 \le i \le n - 1\}) > 0.05 \times A \text{vg} \text{RR} \tag{8}$$

In the same way, we regard an ECG record as abnormal if Eq. (8) is satisfied, otherwise as normal. This rule is denoted as the RI[D] classifier.

Of course, false R-peak detections will influence the classification performance of the four disease rules and result in an unfavorable outcome. In this study, we utilize the Zhu method [27] to detect R peaks since its accuracy is 99.83% in the MIT-BIH-AR database with 48 ECG records and 99.78% in the CCDD with 251 ECG records, respectively. Note that 0 is outputted if the classification result is "normal", otherwise 1 is outputted.

#### 3.3. Summarizing

As we can see, there are five classifiers, namely RI[A], RI[B], RI[C], RI[D] and LCNN[AB], involved in Figure 4. We can regard an ECG record as "abnormal" if any one of the classifiers gives such a result. However, the outputs of the four rule-based classifiers are 0 or 1, while the output of the LCNN[AB] classifier is a probability value ranging from 0 to 1. We hope that the predictive value can be outputted in the form of a probability, so a new fusion method, namely the bias-average method, is developed, given by

$$\begin{cases} ro = \max(ro\_1, ro\_2, ro\_3, ro\_4) \\ output = \begin{cases} \frac{ro + mno}{2}, & no == 1 \\ mno, & otherwise \end{cases} \end{cases} \tag{9}$$

abnormal ECG records that are incorrectly classified as normal. We also use receiver operating characteristic (ROC) curve-related metrics, including area under ROC curve (AUC), true positive value (TPR) (e.g., "Sp" in this study) and false positive value (FPR)(e.g., "1-Se" in this study). A rough suggestion for measuring classification performance using "AUC" is as fol-

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

http://dx.doi.org/10.5772/intechopen.75546

309

Note that "NPV" in this study is also called the detection precision of normal ECG records.

Our prior work [26] has achieved the best results for record classification on the CCDD up to now, so it is meaningful to compare the proposed method with it. As described in Section 2, the same training, validation and testing samples are used. To show the contribution of each strategy intuitively, we present the corresponding results in Tables 2 and 3 in turn. The classification results on the small-scale testing set is only used for reference; what we mostly focus on is the results on the large-scale testing set, which can be deemed as a more realistic

In Tables 2 and 3, "0", "1", "2", "3" and "4" denote the LCNN[AB], RI[A], RI[B], RI[C] and RI[D] classifiers, respectively. From the results, we can see that although NPV and Se are slightly lower, "0" outperformed our prior work in all other metrics regardless of the smallscale testing set and the large-scale testing set. On this basis, most metrics continue to increase when "1", "2" and "3" are added, while many metrics start to decrease when "4" is added. The role of "4" is to increase NPV and Se, so that TPR under the condition of NPV being equal to

Model Sp NPV Se Acc AUC FPR = 1% NPV = 95% NPV = 90%

LCNN [26] 88.84 90.48 76.95 85.41 0.9034 17.5 97.8 8.22 63.3 24.7 90.1 0 91.81 90.04 74.96 86.95 0.9172 23.0 98.3 9.61 74.1 25.2 91.9 0 + 1 91.77 90.11 75.16 86.98 0.9176 23.3 98.3 9.63 74.2 25.2 91.9 0 + 1 + 2 91.62 90.22 75.51 86.97 0.9180 23.3 98.3 9.63 74.2 25.2 91.9 0 + 1 + 2 + 3 90.28 91.11 78.28 86.82 0.9220 24.6 98.4 9.90 76.3 25.3 92.3 0 + 1 + 2 + 3 + 4 80.60 93.09 85.24 81.94 0.9196 25.8 98.5 9.71 74.8 25.1 91.7

TPR NPV FPR TPR FPR TPR

lows [39]:

1. 0.90–1.00 = excellent classification;

2. 0.80–0.90 = good classification; 3. 0.70–0.80 = fair classification; 4. 0.60–0.70 = poor classification;

5. 0.50–0.60 = failure.

4.2. Numerical experiment

estimate of potential performance in real applications.

Table 2. The results of different classification models on the small-scale testing set.

Where, ro1, ro2, ro3, ro<sup>4</sup> and nno are the outputs of the RI[A], RI[B], RI[C], RI[D] and LCNN [AB] classifiers, max(.) is to choose the maximum value. If we use the first three disease rules, just replace "max(ro1,ro2,ro3,ro4)" with "max(ro1,ro2,ro3)." The final output value ranges from 0 to 1 and the classification result is "normal" if it is less than 0.5, otherwise it is "abnormal." Figure 7 shows the whole testing process including the multipoint-prediction technology, the Bayesian fusion method and the bias-average method.

#### 4. Result

#### 4.1. Performance metrics

In this study, we use several metrics to investigate the performance of different algorithms: Sp (specificity), NPV (negative predictive value), Se (sensitivity) and Acc (accuracy), given by

$$\begin{cases} Sp = \frac{TN}{TN + FP} \\ NPV = \frac{TN}{TN + FN} \\\\ Se = \frac{TP}{TP + FN} \\ Acc = \frac{TP + TN}{TN + FP + FN + TP} \end{cases} \tag{10}$$

Here, TP is the number of abnormal ECG records that are correctly classified as abnormal, TN is the number of normal ECG records that are correctly classified as normal, FP is the number of normal ECG records that are incorrectly classified as abnormal, and FN is the number of abnormal ECG records that are incorrectly classified as normal. We also use receiver operating characteristic (ROC) curve-related metrics, including area under ROC curve (AUC), true positive value (TPR) (e.g., "Sp" in this study) and false positive value (FPR)(e.g., "1-Se" in this study). A rough suggestion for measuring classification performance using "AUC" is as follows [39]:


[27] to detect R peaks since its accuracy is 99.83% in the MIT-BIH-AR database with 48 ECG records and 99.78% in the CCDD with 251 ECG records, respectively. Note that 0 is outputted

As we can see, there are five classifiers, namely RI[A], RI[B], RI[C], RI[D] and LCNN[AB], involved in Figure 4. We can regard an ECG record as "abnormal" if any one of the classifiers gives such a result. However, the outputs of the four rule-based classifiers are 0 or 1, while the output of the LCNN[AB] classifier is a probability value ranging from 0 to 1. We hope that the predictive value can be outputted in the form of a probability, so a new fusion method, namely

ro ¼ maxð Þ ro1;ro2;ro3;ro<sup>4</sup>

8 < : ro þ nno

Where, ro1, ro2, ro3, ro<sup>4</sup> and nno are the outputs of the RI[A], RI[B], RI[C], RI[D] and LCNN [AB] classifiers, max(.) is to choose the maximum value. If we use the first three disease rules, just replace "max(ro1,ro2,ro3,ro4)" with "max(ro1,ro2,ro3)." The final output value ranges from 0 to 1 and the classification result is "normal" if it is less than 0.5, otherwise it is "abnormal." Figure 7 shows the whole testing process including the multipoint-prediction technology, the

In this study, we use several metrics to investigate the performance of different algorithms: Sp (specificity), NPV (negative predictive value), Se (sensitivity) and Acc (accuracy), given by

TN þ FN

Here, TP is the number of abnormal ECG records that are correctly classified as abnormal, TN is the number of normal ECG records that are correctly classified as normal, FP is the number of normal ECG records that are incorrectly classified as abnormal, and FN is the number of

TN þ FP þ FN þ TP

Sp <sup>¼</sup> TN TN þ FP NPV <sup>¼</sup> TN

8

>>>>>>>>>><

>>>>>>>>>>:

Se <sup>¼</sup> TP TP þ FN Acc <sup>¼</sup> TP <sup>þ</sup> TN

<sup>2</sup> , ro ¼¼ <sup>1</sup> nno, otherwise (9)

(10)

output ¼

if the classification result is "normal", otherwise 1 is outputted.

8 >><

>>:

Bayesian fusion method and the bias-average method.

the bias-average method, is developed, given by

308 Artificial Intelligence - Emerging Trends and Applications

3.3. Summarizing

4. Result

4.1. Performance metrics

Note that "NPV" in this study is also called the detection precision of normal ECG records.

#### 4.2. Numerical experiment

Our prior work [26] has achieved the best results for record classification on the CCDD up to now, so it is meaningful to compare the proposed method with it. As described in Section 2, the same training, validation and testing samples are used. To show the contribution of each strategy intuitively, we present the corresponding results in Tables 2 and 3 in turn. The classification results on the small-scale testing set is only used for reference; what we mostly focus on is the results on the large-scale testing set, which can be deemed as a more realistic estimate of potential performance in real applications.

In Tables 2 and 3, "0", "1", "2", "3" and "4" denote the LCNN[AB], RI[A], RI[B], RI[C] and RI[D] classifiers, respectively. From the results, we can see that although NPV and Se are slightly lower, "0" outperformed our prior work in all other metrics regardless of the smallscale testing set and the large-scale testing set. On this basis, most metrics continue to increase when "1", "2" and "3" are added, while many metrics start to decrease when "4" is added. The role of "4" is to increase NPV and Se, so that TPR under the condition of NPV being equal to


Table 2. The results of different classification models on the small-scale testing set.


Draeger Medical Systems are 80 and 75%, respectively, and the average accuracy of nonexperts is 85% [41]. Our proposed method achieved an accuracy of 86.22% and 0.9322 AUC

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

http://dx.doi.org/10.5772/intechopen.75546

311

Due to inter-individual variation in ECG characteristics and the complexity of clinical data, record classification is a highly difficult problem. As a new research direction in recent years, deep learning has already achieved great success in hard Artificial Intelligence tasks, such as speech recognition [42] and image classification [43]. As an improved deep learning architecture, the LCNN show good performance for record classification. However, one LCNN with limited layers and neurons is not strong enough, so we use an ensemble method based on LCNN in this study and the experimental results show its effectiveness. In fact, using the same training and testing samples (heartbeat segments) from the MIT-BIH-AR database, the LCNNbased ensemble method has achieved an accuracy of 99.46% with a specificity of 99.69% and a sensitivity of 98.73% for detecting ectopic heartbeat, comparable to the state-of-the art results

One may doubt the viability of LCNNs since no feature-extraction operations are involved. In fact, if we let f(x) and g(x) be the corresponding function of feature extraction and classification, respectively, the decision-making function of traditional machine-learning methods can be written as g(f(x)). Therefore, we can directly construct the function g(f(x)) by deep-learning techniques [26, 44]. Nevertheless, there is not a general-purpose method for all problems, and what one needs to do is to choose or develop the most appropriate method for a special problem. For instance, abnormal ECG records whose heart rhythms are irregular should be collected first if we use the LCNN to detect irregular heart rhythm. It works well; however, the LCNN will capture not only characteristics of heart rhythm, but also other characteristics such as morphology and intervals during the training phase. That will result in more types of ECG records to be collected if we want to obtain a good classifier. In fact, on the small-scale and large-scale testing sets, the sensitivities of the LCNN[AB] classifier in the detection of irregular heart rhythm are 58.15 and 58.54%, respectively, while the corresponding results for detecting abnormal ECG records whose heart rhythm are regular are 80.31 and 85.92%. From Table 1, we know that the two testing sets only have a small proportion of irregular heart rhythm. If there are more abnormal ECG records whose heart rhythm is irregular, the LCNN[AB] classifier will not achieve such good results. In a word, accurate detection of irregular heart rhythm plays an important role in enhancing classification performance. Fortunately, we can achieve this aim by running the calculation based on R-peak positions. It is also effective for detecting

on 151,274 ECG records, indicating that it has competitive classification performance.

for heartbeat classification in the "intra-patient" evaluation [5].

abnormal heart rate, thus four disease rules are developed in this study.

achieve good classification results.

From the perspective of simulating cardiologists' diagnostic thinking, LCNNs and disease rules express experiential knowledge and intuitive knowledge, respectively, and the combination of them is a complete simulation [1]. This is another reason why the proposed method can

5. Discussion

Table 3. The results of different classification models on the large-scale testing set.

95% (TPR95) can be increased. Compared with our prior work, the "0+1+2+3" model increased all the metrics and significantly improved the classification performance.

As mentioned previously, the aim of this study is to develop a computer-assisted ECG analysis algorithm for telemedicine centers. Specially, the normal ECG records are filtered out first, and then the remaining abnormal ones are delivered to domain experts for further interpretation, so that their workload can be lessened and diagnostic efficiency can be improved [1]. The key technical indicator is to make TPR95 as high as possible [2]. From Tables 1 and 3 we can see that, 38.21% (=56.28% 67.9%) of domain experts' workload can be lessened. The more normal ECG records there are, the less work domain experts do. Although the proposed method increases the computational complexity, we can use the classification system in practical applications. The total computing time for an ECG record is about 125 ms on an Intel Core2 CPU@2.93GHz, 2GB RAM, 32bit Window 7 OS.

Some may think that the self-compared results are not supportive enough to show the effectiveness. In fact, Zhu [24], Wang [25] and Zhang [13] have proposed methods for this subject, but their results are significantly inferior to those of our prior work. Since the heartbeat classification method [5] has achieved the highest accuracy of 99.3% in the "intra-patient" evaluation by now and a good accuracy of 86.4% in the "inter-patient" evaluation, we reimplement it using the CCDD and classify each ECG record with any abnormal heartbeats as "abnormal", but the performance is poor. The open-source software ECG-KIT [11] has recently been made available on Physionet (http://physionet.org/physiotools/ecg-kit/). However, different from the classification standard in this study, ECG-KIT uses the AAMI recommendation to output the result, so it would not necessarily be meaningful to compare our proposed method with ECG-KIT.

Finally, let us consider the industrial applications. The statistical analysis on the classification results given by General Electric Medical System shows that the total accuracy on 2112 ECG records is 88.0%, where the accuracies of interpreting sinus rhythms and non-sinus rhythms are 95 and 53.5%, respectively [40]. Hence, the total accuracy will only be 74.25% if the ratio of the number of sinus rhythms to the number of non-sinus rhythms is 1:1. The similar statistical analysis also shows that on 576 ECG records, the accuracies of Philips Medical Systems and Draeger Medical Systems are 80 and 75%, respectively, and the average accuracy of nonexperts is 85% [41]. Our proposed method achieved an accuracy of 86.22% and 0.9322 AUC on 151,274 ECG records, indicating that it has competitive classification performance.

## 5. Discussion

95% (TPR95) can be increased. Compared with our prior work, the "0+1+2+3" model

Model Sp NPV Se Acc AUC FPR = 1% NPV = 95% NPV = 90%

LCNN [26] 83.84 86.69 83.43 83.66 0.9086 16.0 95.4 1.81 26.7 10.6 73.9 0 87.49 86.30 82.12 85.15 0.9199 22.1 96.6 3.42 50.4 11.3 79.3 0 + 1 87.43 86.75 82.81 85.41 0.9225 24.4 96.9 3.60 53.1 11.5 80.4 0 + 1 + 2 87.20 87.30 83.67 85.66 0.9251 26.2 97.1 3.83 56.5 11.7 81.6 0 + 1 + 2 + 3 86.03 89.11 86.46 86.22 0.9322 34.3 97.8 4.43 65.4 12.1 84.4 0 + 1 + 2 + 3 + 4 80.37 91.04 89.82 84.50 0.9320 36.7 97.9 4.60 67.9 11.8 82.4

TPR NPV FPR TPR FPR TPR

As mentioned previously, the aim of this study is to develop a computer-assisted ECG analysis algorithm for telemedicine centers. Specially, the normal ECG records are filtered out first, and then the remaining abnormal ones are delivered to domain experts for further interpretation, so that their workload can be lessened and diagnostic efficiency can be improved [1]. The key technical indicator is to make TPR95 as high as possible [2]. From Tables 1 and 3 we can see that, 38.21% (=56.28% 67.9%) of domain experts' workload can be lessened. The more normal ECG records there are, the less work domain experts do. Although the proposed method increases the computational complexity, we can use the classification system in practical applications. The total computing time for an ECG record is about 125 ms on an Intel

Some may think that the self-compared results are not supportive enough to show the effectiveness. In fact, Zhu [24], Wang [25] and Zhang [13] have proposed methods for this subject, but their results are significantly inferior to those of our prior work. Since the heartbeat classification method [5] has achieved the highest accuracy of 99.3% in the "intra-patient" evaluation by now and a good accuracy of 86.4% in the "inter-patient" evaluation, we reimplement it using the CCDD and classify each ECG record with any abnormal heartbeats as "abnormal", but the performance is poor. The open-source software ECG-KIT [11] has recently been made available on Physionet (http://physionet.org/physiotools/ecg-kit/). However, different from the classification standard in this study, ECG-KIT uses the AAMI recommendation to output the result, so it would not necessarily be meaningful to compare our proposed method

Finally, let us consider the industrial applications. The statistical analysis on the classification results given by General Electric Medical System shows that the total accuracy on 2112 ECG records is 88.0%, where the accuracies of interpreting sinus rhythms and non-sinus rhythms are 95 and 53.5%, respectively [40]. Hence, the total accuracy will only be 74.25% if the ratio of the number of sinus rhythms to the number of non-sinus rhythms is 1:1. The similar statistical analysis also shows that on 576 ECG records, the accuracies of Philips Medical Systems and

increased all the metrics and significantly improved the classification performance.

Table 3. The results of different classification models on the large-scale testing set.

310 Artificial Intelligence - Emerging Trends and Applications

Core2 CPU@2.93GHz, 2GB RAM, 32bit Window 7 OS.

with ECG-KIT.

Due to inter-individual variation in ECG characteristics and the complexity of clinical data, record classification is a highly difficult problem. As a new research direction in recent years, deep learning has already achieved great success in hard Artificial Intelligence tasks, such as speech recognition [42] and image classification [43]. As an improved deep learning architecture, the LCNN show good performance for record classification. However, one LCNN with limited layers and neurons is not strong enough, so we use an ensemble method based on LCNN in this study and the experimental results show its effectiveness. In fact, using the same training and testing samples (heartbeat segments) from the MIT-BIH-AR database, the LCNNbased ensemble method has achieved an accuracy of 99.46% with a specificity of 99.69% and a sensitivity of 98.73% for detecting ectopic heartbeat, comparable to the state-of-the art results for heartbeat classification in the "intra-patient" evaluation [5].

One may doubt the viability of LCNNs since no feature-extraction operations are involved. In fact, if we let f(x) and g(x) be the corresponding function of feature extraction and classification, respectively, the decision-making function of traditional machine-learning methods can be written as g(f(x)). Therefore, we can directly construct the function g(f(x)) by deep-learning techniques [26, 44]. Nevertheless, there is not a general-purpose method for all problems, and what one needs to do is to choose or develop the most appropriate method for a special problem. For instance, abnormal ECG records whose heart rhythms are irregular should be collected first if we use the LCNN to detect irregular heart rhythm. It works well; however, the LCNN will capture not only characteristics of heart rhythm, but also other characteristics such as morphology and intervals during the training phase. That will result in more types of ECG records to be collected if we want to obtain a good classifier. In fact, on the small-scale and large-scale testing sets, the sensitivities of the LCNN[AB] classifier in the detection of irregular heart rhythm are 58.15 and 58.54%, respectively, while the corresponding results for detecting abnormal ECG records whose heart rhythm are regular are 80.31 and 85.92%. From Table 1, we know that the two testing sets only have a small proportion of irregular heart rhythm. If there are more abnormal ECG records whose heart rhythm is irregular, the LCNN[AB] classifier will not achieve such good results. In a word, accurate detection of irregular heart rhythm plays an important role in enhancing classification performance. Fortunately, we can achieve this aim by running the calculation based on R-peak positions. It is also effective for detecting abnormal heart rate, thus four disease rules are developed in this study.

From the perspective of simulating cardiologists' diagnostic thinking, LCNNs and disease rules express experiential knowledge and intuitive knowledge, respectively, and the combination of them is a complete simulation [1]. This is another reason why the proposed method can achieve good classification results.

In fact, our proposed method can be divided into two parts: the classification of "normal" versus "abnormal" serves as a global classifier and the classification of each specific disease serve as a local classifier. The advantage of the global classifier is that the error accumulation due to introducing each local classifier can be avoided, since the detection performance for many cardiovascular diseases is not very good regardless of whether LCNNs or disease rules are used. There are not enough samples used for training LCNNs in many cases (especially the disease with low detection rate in the general population); while for disease rules, the inaccurate detection of fiducial points can result in subsequent misclassification. Nevertheless, to enhance classification performance further, we can integrate classifiers of easily detectable diseases into Figure 4. For instance, using LCNNs for atrial fibrillation detection, the sensitivity, specificity and accuracy are 98.93, 98.76 and 98.76%, respectively, on 142,167 ECG records in the CCDD, and the detection performance is still excellent on ECG records collected by Shanghai MicroPort Co. Ltd. (almost 100% accuracy). Likewise, using disease rules, we can effectively identify QRS complexes with abnormal amplitude.

References

Technology Press; 2011

10.1007/s10916-014-0098-x

DOI: 10.1109/TBME.2012.2213253

DOI: 10.1109/TBME.2010.2068048

DOI: 10.1109/TBME.2004.824138

DOI: 10.1109/TBME.2011.2113395

10.1109/19.930458

2013.11.019

2535-2543. DOI: 10.1109/TBME.2006.883802

[1] Dong J, Zhang JW, Zhu HH, et al. Wearable ECG monitors and its remote diagnosis service platform. IEEE Intelligent Systems. 2012;6(27):36-43. DOI: 10.1109/MIS.2012.4 [2] Liu X. Atlas of Classical Electrocardiograms. 1st ed. Shanghai: Shanghai Science and

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

http://dx.doi.org/10.5772/intechopen.75546

313

[3] de Chazal P, Reilly RB. A patient-adapting heartbeat classifier using ECG morphology and heartbeat interval features. IEEE Transactions on Biomedical Engineering. 2006;53(12):

[4] Sumathi S, Beaulah HL, Vanithamani R. A wavelet transform based feature extraction and classification of cardiac disorder. Journal of Medical Systems. 2014;38(9):1-11. DOI:

[5] Ye C, Kumar BV, Coimbra MT. Heartbeat classification using morphological and dynamic features of ECG signals. IEEE Transactions on Biomedical Engineering. 2012;59(10):2930-2941.

[6] Llamedo M, Martinez JP. Heartbeat classification using feature selection driven by database generalization criteria. IEEE Transactions on Biomedical Engineering. 2011;58(3):616-625.

[7] Osowski S, Hoa LT, Markiewic T. Support vector machine-based expert system for reliable heartbeat recognition. IEEE Transactions on Biomedical Engineering. 2004;51(4):582-589.

[8] Mar T, Zaunseder S, Martinez JP, et al. Optimization of ECG classification by means of feature selection. IEEE Transactions on Biomedical Engineering. 2011;58(8):2168-2177.

[9] de Chazal P, O'Dwyer M, Reilly RB. Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE Transactions on Biomedical Engineer-

[10] Biel L, Pettersson O, Philipson L, et al. ECG analysis: A new approach in human identification. IEEE Transactions on Instrumentaion and Measurement. 2001;50(3):808-812. DOI:

[11] Goldberger AL, Amaral L, Glass L, et al. PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation.

[12] Testing and Reporting Performance Results of Cardiac Rhythm and ST Segment Measure-

[13] Zhang ZC, Dong J, Luo XQ, et al. Heartbeat classification using disease-specific feature selection. Computers in Biology and Medicine. 2014;46:79-89. DOI: 10.1016/j. compbiomed.

ing. 2004;51(7):1196-1206. DOI: 10.1109/TBME.2004.827359

2000;101(23):215-220. DOI: 10.1161/01.CIR.101.23.e215

ment Algorithms. ANSI/AAMI EC57:1998, Rev; 2012

## 6. Conclusion

In order to lessen the workload of domain experts in telemedicine centers, we present a systematic approach for record classification in this chapter. Based on LCNNs and disease rules, an effective ensemble method including two homogeneous classifiers and five heterogeneous classifiers is developed. On the CCDD, our method yields an accuracy of 86.22% and 0.9322 AUC (excellent), which is a significant improvement on previously reported results [13, 24–26]. Specifically, TPR under the condition of NPV being equal to 95% can reach 67.9%, which means that the workload of domain experts can be decreased by (N% 67.9%) in a clinically acceptable scope (especially in the BCMIS of China) if the percentage of normal ECG records is N%. In general, at least 70 67.9 = 47.53% of domain experts' workload can be reduced since N% can be more than 70%. We have deployed the classification system on the real-time cloud platform of Shanghai Aerial Hospital Network.

Regarding future work, our aim is to develop effective detection algorithms for other common cardiovascular diseases such as premature atrial contraction and premature ventricular contraction, and to integrate them into Figure 4. Since both LCNNs and disease rules have advantages and disadvantages, the combination of the two is a preferable research direction.

## Author details

Linpeng Jin1,2 and Jun Dong<sup>1</sup> \*

\*Address all correspondence to: jdong2010@sinano.ac.cn

1 Suzhou Institute of Nano-tech and Nano-bionics, Chinese Academy of Sciences, Suzhou, China

2 University of Chinese Academy of Sciences, Beijing, China

## References

In fact, our proposed method can be divided into two parts: the classification of "normal" versus "abnormal" serves as a global classifier and the classification of each specific disease serve as a local classifier. The advantage of the global classifier is that the error accumulation due to introducing each local classifier can be avoided, since the detection performance for many cardiovascular diseases is not very good regardless of whether LCNNs or disease rules are used. There are not enough samples used for training LCNNs in many cases (especially the disease with low detection rate in the general population); while for disease rules, the inaccurate detection of fiducial points can result in subsequent misclassification. Nevertheless, to enhance classification performance further, we can integrate classifiers of easily detectable diseases into Figure 4. For instance, using LCNNs for atrial fibrillation detection, the sensitivity, specificity and accuracy are 98.93, 98.76 and 98.76%, respectively, on 142,167 ECG records in the CCDD, and the detection performance is still excellent on ECG records collected by Shanghai MicroPort Co. Ltd. (almost 100% accuracy). Likewise, using disease rules, we can

In order to lessen the workload of domain experts in telemedicine centers, we present a systematic approach for record classification in this chapter. Based on LCNNs and disease rules, an effective ensemble method including two homogeneous classifiers and five heterogeneous classifiers is developed. On the CCDD, our method yields an accuracy of 86.22% and 0.9322 AUC (excellent), which is a significant improvement on previously reported results [13, 24–26]. Specifically, TPR under the condition of NPV being equal to 95% can reach 67.9%, which means that the workload of domain experts can be decreased by (N% 67.9%) in a clinically acceptable scope (especially in the BCMIS of China) if the percentage of normal ECG records is N%. In general, at least 70 67.9 = 47.53% of domain experts' workload can be reduced since N% can be more than 70%. We have deployed the classification system on the

Regarding future work, our aim is to develop effective detection algorithms for other common cardiovascular diseases such as premature atrial contraction and premature ventricular contraction, and to integrate them into Figure 4. Since both LCNNs and disease rules have advantages and disadvantages, the combination of the two is a preferable research direction.

1 Suzhou Institute of Nano-tech and Nano-bionics, Chinese Academy of Sciences, Suzhou,

effectively identify QRS complexes with abnormal amplitude.

312 Artificial Intelligence - Emerging Trends and Applications

real-time cloud platform of Shanghai Aerial Hospital Network.

\*

2 University of Chinese Academy of Sciences, Beijing, China

\*Address all correspondence to: jdong2010@sinano.ac.cn

6. Conclusion

Author details

China

Linpeng Jin1,2 and Jun Dong<sup>1</sup>


[14] Escalona-Moran MA, Soriano MC, Fischer I, et al. Electrocardiogram classification using reservoir computing with logistic regression. IEEE Journal of Biomedical and Health Informatics. 2015;19(3):892-898. DOI: 10.1109/JBHI.2014.2332001

[28] Jin LP, Dong J. Ensemble deep learning for biomedical time series classification. Computational Intelligence and Neuroscience, 2016. Article ID: 6212684. DOI: 10.1155/2016/

Normal Versus Abnormal ECG Classification by the Aid of Deep Learning

http://dx.doi.org/10.5772/intechopen.75546

315

[29] Hinton GE, Salakhutdinov RR. Reducing the dimensionality of data with neural net-

[30] Bengio Y. Learning deep architectures for AI. Foundations and Trends in Machine Learn-

[31] Thakor NV, Webster JG, Tompkins WJ. Estimation of QRS complex power spectra for design of a QRS filter. IEEE Transactions on Biomedical Engineering. 1984;31(11):702-706.

[32] LeCun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition. Proceedings of the IEEE. 1998;86(11):2278-2324. DOI: 10.1109/5.726791

[33] Kiranyaz S, Ince T, Hamila R, et al. Convolutional neural networks for patient-specific ECG classification. In: The 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society; 25-29 August 2015, Milan, Italy. New York: IEEE; 2015.

[34] Kiranyaz S, Ince T, Hamila R, et al. Real-time patient-specific ECG classification by 1D convolutional neural networks. IEEE Transactions on Biomedical Engineering. 2016;63(3):

[35] Li M, Zhang T, Chen YQ, et al. Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 24–27 August, 2014; 2014. New York: ACM; 2014. pp. 661-670

[36] Vogl TP, Mangis JK, Rigler AK, et al. Accelerating the convergence of the back-propagation

[37] Kittler J, Hatef M, Duin RPW, et al. On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998;20(3):226-239. DOI: 10.1109/34.667881

[38] Kohler BU, Hennig C, Orglmeister R. The principles of software QRS detection. IEEE Engineering in Medicine and Biology Magazine. 2002;21(1):42-57. DOI: 10.1109/51. 993193

[39] Gorunescu F. Data Mining Concepts Techniques and Models. 1st ed. Berlin: Springer;

[40] Shah AP, Rubin SA. Errors in the computerized electrocardiogram interpretation of Cardiacrhythm. Journal of Electrocardiology. 2007;40(5):385-390. DOI: 10.1016/j. jelectrocard.

[41] Hakacova N, Tragardh-Johansson E, Wagner GS, et al. Computer-based rhythm diagnosis and its possible influence on nonexpert electrocardiogram readers. Journal of Electro-

cardiology. 2012;45(1):18-22. DOI: 10.1016/j.jelectrocard.2011.05.007

method. Biological Cybernetics. 1988;59(4):257-263. DOI: 10.1007/ BF00332914

works. Science. 2006;313(5786):504-507. DOI: 10.1126/science.1127647

ing. 2009;2(1):1-127. DOI: 10.1561/2200000006

664-675. DOI: 10.1109/TBME.2015.2468589

2011. pp. 185-317. DOI: 10.1007/978-3-642-19721-5

DOI: 10.1109/TBME.1984.325393

6212684

pp. 2608-2611

2007.03.008


[28] Jin LP, Dong J. Ensemble deep learning for biomedical time series classification. Computational Intelligence and Neuroscience, 2016. Article ID: 6212684. DOI: 10.1155/2016/ 6212684

[14] Escalona-Moran MA, Soriano MC, Fischer I, et al. Electrocardiogram classification using reservoir computing with logistic regression. IEEE Journal of Biomedical and Health

[15] Bortolan G, Degani R, Pedrycz W. A fuzzy pattern matching technique for diagnostic ECG classification. In: Proceedings of Computers in Cardiology; 25–28 September 1988; Wash-

[16] Willems JL, Abreu-Lima C, Arnaud P, et al. The diagnostic performance of computer programs for the interpretation of electrocardiograms. The New England Journal of Med-

[17] Rahman QA, Tereshchenko LG, Kongkatong M, et al. Utilizing ECG-based heartbeat classification for hypertrophic cardiomyopathy identification. IEEE Transactions on

[18] Willems JL, Arnaud P, Bemmel JHV, et al. A reference database for multilead electrocardiographic computer measurement programs. Journal of the American College of Cardi-

[19] Ameriean Heart Association Database [Internet]. 1985. Available from: http://www.

[20] Taddei A, Distante G, Emdin E, et al. The European ST-T database: Standard for evaluating Systems for the Analysis of ST-T changes in ambulatory electrocardiography. European Heart Journal. 1992;13(9):1164-1172. DOI: 10.1093/ oxfordjournals.eurheartj.a060332

[21] Zhang JW, Liu X, Dong J. CCDD: An enhanced standard ECG database with its Management & Annotation Tools. International Journal on Artificial Intelligence Tools. 2012;21(5):

[22] Minhas FU, Arif M. Robust electrocardiogram beat classification using discrete wavelet transform. Physiological Measurement. 2008;29(5):555-570. DOI: 10.1109/ ICBBE.2008.796

[23] Martis RJ, Chakraborty C, Ray AK. A two-stage mechanism for registration and classification of ECG using Gaussian mixture model. Pattern Recognition. 2009;42(11):2979-2988.

[24] Zhu HH. Research on ECG recognition critical methods and development on remote multi body characteristic signal monitoring system [Thesis]. Beijing: University of Chinese

[25] Wang LP. Study on Approach of ECG Classification with Domain Knowledge [Thesis].

[26] Jin LP, Dong J. Deep learning research on clinical electrocardiogram analysis. Science China Information Sciences. 2015;45(3):398-415. DOI: 10.1360/N112014-00060

[27] Zhu HH, Dong J. An R-peak detection method based on peaks of Shannon energy envelope. Biomedical Signal Processing and Control. 2013;8(5):466-474. DOI: 10.1016/j.bspc.2013.01.001

Informatics. 2015;19(3):892-898. DOI: 10.1109/JBHI.2014.2332001

icine. 1991;325(25):1767-1773. DOI: 10.1056/NEJM199112193252503

Nanobioscience. 2015;14(5):505-512. DOI: 10.1109/TNB.2015.2426213

ology. 1987;10(6):1313-1321. DOI: 10.1016/S0735-1097(87)80136-5

physionet.org/physiobank/other.shtml [Accessed: January 06, 2018]

ington, DC. New York: IEEE; 1988. pp. 551-554

314 Artificial Intelligence - Emerging Trends and Applications

1-26. DOI: 10.1142/S0218213012400209

DOI: 10.1016/j.patcog.2009.02.008

Shanghai: East China Normal University; 2013

Academy of Sciences; 2013


[42] Dahl GE, Dong Y, Li D, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech and Language Processing. 2012;20(1):30-42. DOI: 10.1109/TASL.2011.2134090

**Chapter 16**

**Provisional chapter**

**A Quantitative Approach for Web Usability Using Eye**

**A Quantitative Approach for Web Usability Using Eye** 

This chapter presents a relatively new approach to show how a web usability classical paradigm can benefit from quantitative data of a nonclassical approach. In the pilot stage, we used experimental eye tracking data acquired from 11 participants faced to a web page to perform three simple tasks. Results show advantages by using eye tracking data to identify and verify some usability problems of such a web page. In this chapter, some hints are presented for people interested in measuring web usability by using such an approach. However, a deeper study should be carried out in order to generalize our results toward the construction of a methodology to be followed by a web developer or

**Keywords:** eye tracking, web usability, human-computer interaction, quantitative

Nowadays, usability is one of the key factors in the success or failure of a web project. Usability is considered a quality attribute that evaluates the accessibility, readability, navigability and ease of learning of a website or application by the user [1]. Moreover, people do not wish to learn how to use a website, and there is no manual to use a web page when a person does not find the information or products they are looking for within a website. Users should have the ability to understand how a website works immediately after viewing the website. If someone gets lost while browsing a website or has difficulty in reading the information, he or she

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.74562

**Tracking Data**

**Abstract**

**1. Introduction**

**Tracking Data**

López-Orozco and Florencia-Juárez

López-Orozco and Florencia-Juárez

http://dx.doi.org/10.5772/intechopen.74562

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

interested people in such a field of research.

approach, nonclassical approach

simply tries not to use that site anymore [2].


#### **A Quantitative Approach for Web Usability Using Eye Tracking Data A Quantitative Approach for Web Usability Using Eye Tracking Data**

DOI: 10.5772/intechopen.74562

López-Orozco and Florencia-Juárez López-Orozco and Florencia-Juárez

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.74562

#### **Abstract**

[42] Dahl GE, Dong Y, Li D, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Transactions on Audio, Speech and Language

[43] Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems; 3–8 December.

[44] Sutskever I, Hinton GE. Deep narrow sigmoid belief networks are universal Approximators.

Neural Computation. 2008;20:2629-2636. DOI: 10.1162/neco.2008. 12-07-661

Processing. 2012;20(1):30-42. DOI: 10.1109/TASL.2011.2134090

Vol. 2012. Vancouver: NIPS; 2012. pp. 153-160

316 Artificial Intelligence - Emerging Trends and Applications

This chapter presents a relatively new approach to show how a web usability classical paradigm can benefit from quantitative data of a nonclassical approach. In the pilot stage, we used experimental eye tracking data acquired from 11 participants faced to a web page to perform three simple tasks. Results show advantages by using eye tracking data to identify and verify some usability problems of such a web page. In this chapter, some hints are presented for people interested in measuring web usability by using such an approach. However, a deeper study should be carried out in order to generalize our results toward the construction of a methodology to be followed by a web developer or interested people in such a field of research.

**Keywords:** eye tracking, web usability, human-computer interaction, quantitative approach, nonclassical approach

## **1. Introduction**

Nowadays, usability is one of the key factors in the success or failure of a web project. Usability is considered a quality attribute that evaluates the accessibility, readability, navigability and ease of learning of a website or application by the user [1]. Moreover, people do not wish to learn how to use a website, and there is no manual to use a web page when a person does not find the information or products they are looking for within a website. Users should have the ability to understand how a website works immediately after viewing the website. If someone gets lost while browsing a website or has difficulty in reading the information, he or she simply tries not to use that site anymore [2].

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Typically, usability has been measured in a qualitative way, using questionnaires, focus groups, interviews, and so on, and, to a lesser extent, in a quantitative manner by measuring time to complete a task, number of errors, number of clicks, and so on [3]. The qualitative methods offer the advantage of having a direct opinion of the users, such as what they like and dislike about the page and how it could be improved, but the opinions of the users could be different from each other. In contrast, the quantitative techniques help us to assign numerical values to statements or observations, in order to study the possible relationships between variables by using statistical methods.

It is important to emphasize that we do not consider any of the techniques mentioned earlier as one "best technique." None of them solve the problem of measuring usability completely. Both approaches have their advantages and disadvantages to consider a limited number of influencing factors [4]. However, techniques that combine the strengths of both are still better and more promising as shown here.

In this chapter, the eye tracking technique was used to obtain quantitative metrics based on what users see on a web page, with the aim of adding these metrics to the traditional qualitative observations, used in the measurement of usability of web pages.

## **2. Eye tracking technology**

The eye tracker is a tool that detects the users' gaze and provides an accurate representation and understanding of an individual's eye movement behavior. An eye tracker detects the two basic eye movements: saccades and fixations. A saccade is the eye movement itself which is needed to place the sight in a different place and a fixation is the period of time when the eye is practically maintained on a single location.

Using an eye tracker allows:


In this chapter, a binocular infrared S2 Eye Tracker Mirametrix Inc. was used. The eye tracker works with an accuracy of 0.5°, data rate of 60 Hz, a head movement box of 25 cm × 11 cm × 30 cm (width × height × depth); with a system setup time less than 5 min, calibration time less than 30 s; and a suggested screen size of 15″–22″ monitor, with capacity of blink tracking recovery, fast time to track recovery, weight of 1 lb, eye tracking technique of bright pupil; and low-power computers.

**3. Related work**

usability problems in web pages are presented here.

**Figure 1.** Saccade route generated by the eye tracker.

Some of the most representative research works that have used eye tracking data to solve

A Quantitative Approach for Web Usability Using Eye Tracking Data

http://dx.doi.org/10.5772/intechopen.74562

A Quantitative Approach for Web Usability Using Eye Tracking Data http://dx.doi.org/10.5772/intechopen.74562 319

**Figure 1.** Saccade route generated by the eye tracker.

## **3. Related work**

Typically, usability has been measured in a qualitative way, using questionnaires, focus groups, interviews, and so on, and, to a lesser extent, in a quantitative manner by measuring time to complete a task, number of errors, number of clicks, and so on [3]. The qualitative methods offer the advantage of having a direct opinion of the users, such as what they like and dislike about the page and how it could be improved, but the opinions of the users could be different from each other. In contrast, the quantitative techniques help us to assign numerical values to statements or observations, in order to study the possible relationships between

It is important to emphasize that we do not consider any of the techniques mentioned earlier as one "best technique." None of them solve the problem of measuring usability completely. Both approaches have their advantages and disadvantages to consider a limited number of influencing factors [4]. However, techniques that combine the strengths of both are still better

In this chapter, the eye tracking technique was used to obtain quantitative metrics based on what users see on a web page, with the aim of adding these metrics to the traditional qualita-

The eye tracker is a tool that detects the users' gaze and provides an accurate representation and understanding of an individual's eye movement behavior. An eye tracker detects the two basic eye movements: saccades and fixations. A saccade is the eye movement itself which is needed to place the sight in a different place and a fixation is the period of time when the eye

• To see what the user observes on the screen by detecting the user's fixations, both the left and the right eyes on the *X* and *Y* coordinates according to the size of the screen as well as

• To know the route or scanpath that the user looks at. The eye tracker also provides the saccade route, which shows the fixations made and the movement between them as shown in

• To know the diameter of the pupil when making the fixations or at the moment, the user

In this chapter, a binocular infrared S2 Eye Tracker Mirametrix Inc. was used. The eye tracker works with an accuracy of 0.5°, data rate of 60 Hz, a head movement box of 25 cm × 11 cm × 30 cm (width × height × depth); with a system setup time less than 5 min, calibration time less than 30 s; and a suggested screen size of 15″–22″ monitor, with capacity of blink tracking recovery, fast time to track recovery, weight of 1 lb, eye tracking technique of bright pupil; and low-power computers.

tive observations, used in the measurement of usability of web pages.

variables by using statistical methods.

318 Artificial Intelligence - Emerging Trends and Applications

and more promising as shown here.

**2. Eye tracking technology**

Using an eye tracker allows:

duration of each fixation.

**Figure 1**.

is practically maintained on a single location.

clicks the mouse on a certain position of the web page.

Some of the most representative research works that have used eye tracking data to solve usability problems in web pages are presented here.

The Centre for Human Computer Interaction Design of the City University of London [5] acquired experimental data by using a Tobii ×50 eye tracker. An ocular follow-up was performed while participants were doing tasks within two web pages. They found sequences of patterns of ocular tracking correlated with the sequences of patterns of usability problems, which are already known.

differences in movement and eye functioning during website navigation in some of the sections. In particular, in one study, older participants were less precise, and in two studies, they took longer time to complete tasks compared to younger participants. In two studies, they observed the central part of the screen more frequently than younger participants and also observed the peripheral left side of the screen less frequently. Also, they took more time to cast first look at the peripheral top of the screen than the younger participants in the study. These results highlight the potential of age-related differences in performance while browsing

A Quantitative Approach for Web Usability Using Eye Tracking Data

http://dx.doi.org/10.5772/intechopen.74562

321

Tag-clouds are a compact list of keywords used in the web page interface, which allow users to browse and navigate through various documents or sections of those pages. In the article written by Montero et al. [11], a study was conducted to investigate the usability of tag-clouds. The experiment was performed in 17 participants, 6 women and 11 men aged between 29 and 49 years. Tasks asked to participants were divided into stages. First was a semantic search, where subjects were asked to search and click on a concept related to a certain topic, such as *Christianity*, where the expected target was the tag *religion*. The second stage consisted of a syntactic search, in which participants were asked to search and click on a tag that answered a certain question that was asked, such as searching for the name of the best-known operating system in the market, where the expected answer was *windows*. The results showed that both the size and the shape of the tag have a clear influence on the visual exploration of the users. They also showed results on the alphabetical order of the tags, which do not imply an

Most of the works reported here try to solve completely the usability problems either by using only eye tracking data or by a qualitative approach. In general, they only solve some aspects. We are convinced that by combining the best of both approaches (qualitative and quantitative) in usability measurements, results should be better. For this reason, this chapter

In order to study and get some data, we designed an experiment to face people to a simple task on a web page. The web page **R***egistro* **Ú***nico de* **P***roductos de* **I***nvestigacion* (RUPI) of the Universidad Autónoma de Ciudad Juárez (UACJ) was used as it is shown in **Figure 2**. During the experiment, participants were asked to fill forms with informational data provided to them beforehand. An eye tracker was used to get data from the user's gaze while conducting

A total of nine full-time and two half-time professors of the Universidad Autonóma de Ciudad Juarez participated in the experiment. They were four women and seven men, aged between 28 and 43 years, and all of them are from the Electrical Engineering and Computer Science

websites and provide motivation for further exploration.

improvement in terms of efficiency at the time of the location of the tags.

proposes a combined approach of both.

**4. Experimental approach**

the experiment.

**4.1. Participants**

Similarly, a website for the American Society of Clinical Oncology (ASCO) was proposed, designed and evaluated with an eye tracking methodology [6]. Participants were asked to complete certain tasks, both in an old design of the page and in the new design. The paper describes how eye tracking helped to diagnose errors and identify the best of both designs.

Djamasbi et al. [7] conducted a study where they examined possible gender differences in web preference with the help of an eye tracker. Several studies show that both men and women have different esthetic preferences, for example, men prefer using darker colors, such as black and blue, while women prefer lighter colors. These preferences not only apply in colors but also in the website distribution of images and information. According to the previous studies, two hypotheses were presented: H1—participating women will notice more sections that have images of people more than men and H2—female participants will notice sections with a light-colored background more than men. The results did not show any significant difference between genders with respect to the number of times they were fixed in the sections of the web page. The results of the fixation analysis did not show significant differences between men and women with respect to sections with different background colors. Although the results did not support the hypotheses, the analyses yielded other data on the web page banners, suggesting that men and women can also ignore banner-type objects, regardless of whether these objects are designed considering their preferences.

Szymanski et al. [8] presented a method for evaluating web usability by combining kinetic gestures and eye tracking applications. All this was done by performing three tasks in an application that aims to deliver a list to users of all stores, restaurants and other places within a mall. The tasks performed were to find certain stores in a specific point in the commercial mall.

In the paper presented by Russell et al., at the annual British Human Computer Interaction Conference [9], it shows a study to discover the answer to two design issues both in online documents and on web pages. Research questions were: what is the best font size to read online? and what kind of source font is best for reading online? *Serif* or *Sans Serif*? The experiment was carried out with employees of a computer company, using an eye tracking device. Participants were given instructions to read and retain several texts with different sizes and types of source fonts, followed by answering three multiple-choice questions per text. Results show that larger fonts have a slight but not significant advantage in reading speed over small fonts. Results on the type of source between *Serif* and *Sans Serif* do not show significant differences in ocular follow-up and retention of texts.

Understanding the age-related differences in website navigation is instructive for website design, especially given by the growing number of older people using the Internet. Romano Bergstrom et al. [10] present the usability and data tracking of five independent usability studies of a website that included young and old participants. The results revealed age-dependent differences in movement and eye functioning during website navigation in some of the sections. In particular, in one study, older participants were less precise, and in two studies, they took longer time to complete tasks compared to younger participants. In two studies, they observed the central part of the screen more frequently than younger participants and also observed the peripheral left side of the screen less frequently. Also, they took more time to cast first look at the peripheral top of the screen than the younger participants in the study. These results highlight the potential of age-related differences in performance while browsing websites and provide motivation for further exploration.

Tag-clouds are a compact list of keywords used in the web page interface, which allow users to browse and navigate through various documents or sections of those pages. In the article written by Montero et al. [11], a study was conducted to investigate the usability of tag-clouds. The experiment was performed in 17 participants, 6 women and 11 men aged between 29 and 49 years. Tasks asked to participants were divided into stages. First was a semantic search, where subjects were asked to search and click on a concept related to a certain topic, such as *Christianity*, where the expected target was the tag *religion*. The second stage consisted of a syntactic search, in which participants were asked to search and click on a tag that answered a certain question that was asked, such as searching for the name of the best-known operating system in the market, where the expected answer was *windows*. The results showed that both the size and the shape of the tag have a clear influence on the visual exploration of the users. They also showed results on the alphabetical order of the tags, which do not imply an improvement in terms of efficiency at the time of the location of the tags.

Most of the works reported here try to solve completely the usability problems either by using only eye tracking data or by a qualitative approach. In general, they only solve some aspects. We are convinced that by combining the best of both approaches (qualitative and quantitative) in usability measurements, results should be better. For this reason, this chapter proposes a combined approach of both.

## **4. Experimental approach**

In order to study and get some data, we designed an experiment to face people to a simple task on a web page. The web page **R***egistro* **Ú***nico de* **P***roductos de* **I***nvestigacion* (RUPI) of the Universidad Autónoma de Ciudad Juárez (UACJ) was used as it is shown in **Figure 2**. During the experiment, participants were asked to fill forms with informational data provided to them beforehand. An eye tracker was used to get data from the user's gaze while conducting the experiment.

## **4.1. Participants**

The Centre for Human Computer Interaction Design of the City University of London [5] acquired experimental data by using a Tobii ×50 eye tracker. An ocular follow-up was performed while participants were doing tasks within two web pages. They found sequences of patterns of ocular tracking correlated with the sequences of patterns of usability problems,

Similarly, a website for the American Society of Clinical Oncology (ASCO) was proposed, designed and evaluated with an eye tracking methodology [6]. Participants were asked to complete certain tasks, both in an old design of the page and in the new design. The paper describes how eye tracking helped to diagnose errors and identify the best of both designs.

Djamasbi et al. [7] conducted a study where they examined possible gender differences in web preference with the help of an eye tracker. Several studies show that both men and women have different esthetic preferences, for example, men prefer using darker colors, such as black and blue, while women prefer lighter colors. These preferences not only apply in colors but also in the website distribution of images and information. According to the previous studies, two hypotheses were presented: H1—participating women will notice more sections that have images of people more than men and H2—female participants will notice sections with a light-colored background more than men. The results did not show any significant difference between genders with respect to the number of times they were fixed in the sections of the web page. The results of the fixation analysis did not show significant differences between men and women with respect to sections with different background colors. Although the results did not support the hypotheses, the analyses yielded other data on the web page banners, suggesting that men and women can also ignore banner-type objects, regardless of

Szymanski et al. [8] presented a method for evaluating web usability by combining kinetic gestures and eye tracking applications. All this was done by performing three tasks in an application that aims to deliver a list to users of all stores, restaurants and other places within a mall. The tasks performed were to find certain stores in a specific point in the commercial mall.

In the paper presented by Russell et al., at the annual British Human Computer Interaction Conference [9], it shows a study to discover the answer to two design issues both in online documents and on web pages. Research questions were: what is the best font size to read online? and what kind of source font is best for reading online? *Serif* or *Sans Serif*? The experiment was carried out with employees of a computer company, using an eye tracking device. Participants were given instructions to read and retain several texts with different sizes and types of source fonts, followed by answering three multiple-choice questions per text. Results show that larger fonts have a slight but not significant advantage in reading speed over small fonts. Results on the type of source between *Serif* and *Sans Serif* do not show significant differ-

Understanding the age-related differences in website navigation is instructive for website design, especially given by the growing number of older people using the Internet. Romano Bergstrom et al. [10] present the usability and data tracking of five independent usability studies of a website that included young and old participants. The results revealed age-dependent

whether these objects are designed considering their preferences.

ences in ocular follow-up and retention of texts.

which are already known.

320 Artificial Intelligence - Emerging Trends and Applications

A total of nine full-time and two half-time professors of the Universidad Autonóma de Ciudad Juarez participated in the experiment. They were four women and seven men, aged between 28 and 43 years, and all of them are from the Electrical Engineering and Computer Science

**4.2. Apparatus**

**4.3. Training before the testing phase**

they might feel lost within the website.

**4.4. Calibration**

**4.5. Experiment**

**4.6. End of session**

Eye movements were recorded by using a remote binocular infrared S2 Eye Tracker, Mirametrix Inc., as mentioned in Section 2. The eye tracking system was used in the pupil corneal reflection tracking mode sampling at 60 Hz and 0.5–1 of accuracy. We ran our experiment by using the Viewer Software provided by the eye tracker manufacturer. Due to the fact that the eye tracker generates a special record of eye position and synchronization events, we had to perform a preprocessing stage in order to adequate this output to a more accessible format for our further analysis. A recording of both eyes' movements was done during the experiment; however, we are only concerned with the analysis of movement data of a single eye.

A Quantitative Approach for Web Usability Using Eye Tracking Data

http://dx.doi.org/10.5772/intechopen.74562

323

Before starting the experiment, participants were trained on the tasks they would perform with an example of information to be managed during the testing phase. This step was done in an intuitive way without using the stimulus page of the real experiment. Additionally, some extra indications were given to the participants. Firstly, they were informed about the possibility to enter fictitious data but as similar as possible to real data only if they did not remember the precise information and secondly, the experiment could be interrupted at any time if they could not find the section where the information should be entered or whether

The participants were asked to sit in front of the PC monitor at approximately 65 cm. Then, a screen with calibration points was presented as shown in **Figure 3**. The participants should look at each of these calibration points. Subsequently, the percentage of calibration error is displayed on the screen. It is noteworthy that the percentage average error was considered acceptable if it was less than 20%. **Figure 3** shows an example of calibration result considered

After the training and calibration were completed, the participants were instructed to start the experimental session, in which they should complete the requested task. The initial screen was the home page of RUPI, where the participants had to find the section to enter the information that was given beforehand. The session ended when the participant pressed the button to save their data.

Once the session was finished, the participants were given a short survey about their experience with the website. This survey had questions concerning the graphical user interface and navigation on the page. They were also asked for any kind of help for future users of the website. Participants were also informed about the purpose of the experiment, and they had

not acceptable (average error: 35.1 > 20%) for our study purposes.

the opportunity to ask any question about our research project.

**Figure 2.** RUPI home page.

Department. All participants had normal or corrected-to-normal visual acuity and had no known neurological disorders. They were naive with respect to the purpose of the study, and they gave written consent before the experiment.

## **4.2. Apparatus**

Eye movements were recorded by using a remote binocular infrared S2 Eye Tracker, Mirametrix Inc., as mentioned in Section 2. The eye tracking system was used in the pupil corneal reflection tracking mode sampling at 60 Hz and 0.5–1 of accuracy. We ran our experiment by using the Viewer Software provided by the eye tracker manufacturer. Due to the fact that the eye tracker generates a special record of eye position and synchronization events, we had to perform a preprocessing stage in order to adequate this output to a more accessible format for our further analysis. A recording of both eyes' movements was done during the experiment; however, we are only concerned with the analysis of movement data of a single eye.

#### **4.3. Training before the testing phase**

Before starting the experiment, participants were trained on the tasks they would perform with an example of information to be managed during the testing phase. This step was done in an intuitive way without using the stimulus page of the real experiment. Additionally, some extra indications were given to the participants. Firstly, they were informed about the possibility to enter fictitious data but as similar as possible to real data only if they did not remember the precise information and secondly, the experiment could be interrupted at any time if they could not find the section where the information should be entered or whether they might feel lost within the website.

#### **4.4. Calibration**

The participants were asked to sit in front of the PC monitor at approximately 65 cm. Then, a screen with calibration points was presented as shown in **Figure 3**. The participants should look at each of these calibration points. Subsequently, the percentage of calibration error is displayed on the screen. It is noteworthy that the percentage average error was considered acceptable if it was less than 20%. **Figure 3** shows an example of calibration result considered not acceptable (average error: 35.1 > 20%) for our study purposes.

#### **4.5. Experiment**

After the training and calibration were completed, the participants were instructed to start the experimental session, in which they should complete the requested task. The initial screen was the home page of RUPI, where the participants had to find the section to enter the information that was given beforehand. The session ended when the participant pressed the button to save their data.

#### **4.6. End of session**

Department. All participants had normal or corrected-to-normal visual acuity and had no known neurological disorders. They were naive with respect to the purpose of the study, and

they gave written consent before the experiment.

322 Artificial Intelligence - Emerging Trends and Applications

**Figure 2.** RUPI home page.

Once the session was finished, the participants were given a short survey about their experience with the website. This survey had questions concerning the graphical user interface and navigation on the page. They were also asked for any kind of help for future users of the website. Participants were also informed about the purpose of the experiment, and they had the opportunity to ask any question about our research project.

**Figure 3.** Calibration screen.

**Figure 4.** Overall phases performed during the experimental session. Phases bounded by dotted line were repeated three times by participants.

**Figure 4** shows all the phases that were performed during the experimental session. Phases that are inside the dotted line are repeated three times for each task performed by the participant.

**5.1. Statistics**

respectively.

where *TimeTj*

*OverallTime* = ∑

**Table 1.** Computation of statics with experimental data.

*TotalFix* = ∑

Eqs. (1)–(3) were used to compute the elapsed time and total number of fixations in each task performed by participants and a ratio between such number of fixations and the elapsed time,

is the elapsed time (s) in each task performed by each participant,

*j*=1 3 *TimeTj*

A Quantitative Approach for Web Usability Using Eye Tracking Data

http://dx.doi.org/10.5772/intechopen.74562

325

*j*=1 3 *FixTj* (1)

(2)

## **5. Data analysis**

The main idea was to find usability problems by looking for regions or areas on the RUPI web page where participants made a lot of fixations. Our hypothesis was that if participants made a lot of fixation somewhere, this may be an indication of such a region which should be analyzed because people found it as being not clear or similar.

In this section, some statistics were computed from the experimental data. These computations are shown in **Table 1**.


**Table 1.** Computation of statics with experimental data.

#### **5.1. Statistics**

**Figure 4** shows all the phases that were performed during the experimental session. Phases that are inside the dotted line are repeated three times for each task performed by the participant.

**Figure 4.** Overall phases performed during the experimental session. Phases bounded by dotted line were repeated three

The main idea was to find usability problems by looking for regions or areas on the RUPI web page where participants made a lot of fixations. Our hypothesis was that if participants made a lot of fixation somewhere, this may be an indication of such a region which should be

In this section, some statistics were computed from the experimental data. These computa-

analyzed because people found it as being not clear or similar.

**5. Data analysis**

times by participants.

**Figure 3.** Calibration screen.

324 Artificial Intelligence - Emerging Trends and Applications

tions are shown in **Table 1**.

Eqs. (1)–(3) were used to compute the elapsed time and total number of fixations in each task performed by participants and a ratio between such number of fixations and the elapsed time, respectively.

$$\text{Overall Time} = \sum\_{i=1}^{3} \text{Time}\_{\Gamma\_i} \tag{1}$$

where *TimeTj* is the elapsed time (s) in each task performed by each participant,

$$\text{TotalFix} = \sum\_{j=1}^{2} \text{Fix}\_{\eta\_j} \tag{2}$$

where *FixTj* is the number of fixations in each task performed by each participant and ¯¯*Fix*/*Time* <sup>=</sup> \_\_\_\_\_\_\_\_\_\_ *TotalFix*

$$\overline{\text{Fix}/\text{T}} \overline{\text{Time}} = \frac{\text{TotalFix}}{\text{OverallTime}} \tag{3}$$

*right menu* to enter the data on the right section (ROI<sup>2</sup>

ous quantitative data are mentioned.

**5.2. Qualitative data**

described as follows:

designed (ROI<sup>2</sup>

highest fixation density as we can notice in such a figure.

**Figure 6.** Time in completing the three tasks by participant ordered from lowest to highest.

Now, the qualitative data obtained from participants and how they could relate to the previ-

After the experimental session, participants were asked to answer a 10-question short survey. This survey was intended in general to have information about the level of familiarization between the participant and RUPI web page, to know their experience during the experiment,

From the data obtained from the participant's survey (qualitative tests), the main results are

ter the data is not bad, and the other 45.4% mentioned that it is bad. This situation confirms

• A total of 45.45% of the participants had regular experience in finding the option to enter the information of the tasks, while 36.36% had bad experience. Only one participant could not enter the data and gave indications to end the session. It is important to mention that none of the participants entered the data in the corresponding section, since the RUPI web page is not well

). For example, any information of a section can be entered in a different section.

to give their opinion about the graphical design of RUPI web page, and so on.

• Most of the participants (8 of 11) have already once used the RUPI web page

• A total of 54.54% of the participants mentioned that the website's *help* section (ROI1

our initial hypothesis that the website's *help* section is not clear, or it is not flashy

), since those areas were the two with the

A Quantitative Approach for Web Usability Using Eye Tracking Data

http://dx.doi.org/10.5772/intechopen.74562

327

) to en-

giving the amount of fixations made for each participant per time unit. In other words, this variable is an indicator of "reading" speed.

#### *5.1.1. Average time of the participants in completing the tasks*

The average time to complete the first task for all participants was of 157.38 s (σ = 55.77), and for the second task, it was 237.00 s (σ = 144.68) and the average of the third task was 132.17 s (σ =72.50).

It is important to mention that only two participants performed one task. For statistics computation, their data were aggregated to the other participants who did three tasks, so they were assigned as participant ID 10 and 11, respectively.

Additionally, **Figure 5** shows the number of fixations made in each task by participant, and the time of each participant in performing the three tasks was also computed and it is shown in **Figure 6**.

#### *5.1.2. Scanpaths and selection of regions of interest*

By using the geometric coordinates *(x, y)* and duration of the fixations, it is possible to draw the saccadic route of a participant in the web page. This saccadic route is known as scanpath. Once all the scanpaths of all participants were plotted, several regions of interest (ROIs) were selected by focusing for those with a high density of fixations.

**Figure 7** shows the RUPI home page with the fixations of all scanpaths, and we only decided to analyze two ROIs in this interface: the *help menu* on the middle of the page (ROI1 ) and the

**Figure 5.** Number of fixations by participant. Tasks are presented in order performed.

**Figure 6.** Time in completing the three tasks by participant ordered from lowest to highest.

*right menu* to enter the data on the right section (ROI<sup>2</sup> ), since those areas were the two with the highest fixation density as we can notice in such a figure.

Now, the qualitative data obtained from participants and how they could relate to the previous quantitative data are mentioned.

#### **5.2. Qualitative data**

where *FixTj*

(σ =72.50).

in **Figure 6**.

¯¯

326 Artificial Intelligence - Emerging Trends and Applications

variable is an indicator of "reading" speed.

*5.1.1. Average time of the participants in completing the tasks*

were assigned as participant ID 10 and 11, respectively.

selected by focusing for those with a high density of fixations.

**Figure 5.** Number of fixations by participant. Tasks are presented in order performed.

*5.1.2. Scanpaths and selection of regions of interest*

is the number of fixations in each task performed by each participant and

*Fix*/*Time* = \_\_\_\_\_\_\_\_\_\_ *TotalFix*

giving the amount of fixations made for each participant per time unit. In other words, this

The average time to complete the first task for all participants was of 157.38 s (σ = 55.77), and for the second task, it was 237.00 s (σ = 144.68) and the average of the third task was 132.17 s

It is important to mention that only two participants performed one task. For statistics computation, their data were aggregated to the other participants who did three tasks, so they

Additionally, **Figure 5** shows the number of fixations made in each task by participant, and the time of each participant in performing the three tasks was also computed and it is shown

By using the geometric coordinates *(x, y)* and duration of the fixations, it is possible to draw the saccadic route of a participant in the web page. This saccadic route is known as scanpath. Once all the scanpaths of all participants were plotted, several regions of interest (ROIs) were

**Figure 7** shows the RUPI home page with the fixations of all scanpaths, and we only decided

to analyze two ROIs in this interface: the *help menu* on the middle of the page (ROI1

*OverallTime* (3)

) and the

After the experimental session, participants were asked to answer a 10-question short survey. This survey was intended in general to have information about the level of familiarization between the participant and RUPI web page, to know their experience during the experiment, to give their opinion about the graphical design of RUPI web page, and so on.

From the data obtained from the participant's survey (qualitative tests), the main results are described as follows:


the options they were asked. Clearly, participants move a lot throughout the designated area as it is shown in **Figure 7**. This can give an indication that users get lost in the *menu* when try-

A Quantitative Approach for Web Usability Using Eye Tracking Data

http://dx.doi.org/10.5772/intechopen.74562

website, giving way to understand several situations: (1) either this area was ignored often by users because it is not very flashy or (2) users who have already used the web page know that this section does not have clear information about the subsections and the data that should be entered in each one. This last assumption can be reinforced with the qualitative tests applied to the participants, since 54.54% of the participants mentioned that the website's *help* section to enter the data was not bad, and the other 45.45% mentioned that

Based on previous results and to test and confirm our findings quickly, simple modifications to the RUPI web page were proposed. Modifications to the visual and functional design were made. The scanpaths of participants testing our new design are shown in **Figure 8**. Here, fixations were better distributed among the different areas of the web page as it was

**Figure 8.** Scanpaths composed of fixations made by all participants during the experiment with the new web page design.

) of the

329

Another point to mention is that there are very few fixations in the *help* area (ROI1

ing to complete the first task.

**5.4. Redesigning and testing**

it was bad.

expected.

**Figure 7.** Scanpaths composed of fixations made by participants during the experiment.

#### **5.3. Analysis of participants' behavior**

In the ROI<sup>2</sup> located to the right of the web page, most of the participant fixations were observed when performing the very first task. Participants were concentrated in the *menu* area to enter the options they were asked. Clearly, participants move a lot throughout the designated area as it is shown in **Figure 7**. This can give an indication that users get lost in the *menu* when trying to complete the first task.

Another point to mention is that there are very few fixations in the *help* area (ROI1 ) of the website, giving way to understand several situations: (1) either this area was ignored often by users because it is not very flashy or (2) users who have already used the web page know that this section does not have clear information about the subsections and the data that should be entered in each one. This last assumption can be reinforced with the qualitative tests applied to the participants, since 54.54% of the participants mentioned that the website's *help* section to enter the data was not bad, and the other 45.45% mentioned that it was bad.

## **5.4. Redesigning and testing**

**5.3. Analysis of participants' behavior**

328 Artificial Intelligence - Emerging Trends and Applications

located to the right of the web page, most of the participant fixations were observed

when performing the very first task. Participants were concentrated in the *menu* area to enter

**Figure 7.** Scanpaths composed of fixations made by participants during the experiment.

In the ROI<sup>2</sup>

Based on previous results and to test and confirm our findings quickly, simple modifications to the RUPI web page were proposed. Modifications to the visual and functional design were made. The scanpaths of participants testing our new design are shown in **Figure 8**. Here, fixations were better distributed among the different areas of the web page as it was expected.

**Figure 8.** Scanpaths composed of fixations made by all participants during the experiment with the new web page design.

Moreover, 60% of participants rated the help section as good and 40% as regular. A total of 60% of participants entered the data successfully and 80% mentioned that they had not experienced any problem with the web page content. Finally, 60% of participants made plausible the fact that the new design has elements to allow users to go forward and go back while browsing.

**References**

[1] Sanchez W. La usabilidad en Ingeniería de Software: definicion y caracteríısticas, Ing-

A Quantitative Approach for Web Usability Using Eye Tracking Data

http://dx.doi.org/10.5772/intechopen.74562

331

[2] Nielsen J. Designing web usability: The practice of simplicity. Thousand Oaks, CA, USA:

[3] Nielsen J, Pernice K. Eyetracking web usability. 1st ed. Thousand Oaks, CA, USA: New

[4] Conte T, Vaz V, Massolar J, Mendes E, Travassos H. Improving a web usability inspection technique using qualitative and quantitative data from an observational study. In: Software Engineering, 2009. SBES'09. XXIII Brazilian Symposium. Fortaleza, Ceara,

[5] Ehmke C, Wilson S. Identifying web usability problems from eye-tracking data. In: Proceedings of the 21st British HCI Group Annual Conference on People and Computers: HCI…But Not as We Know It. Vol. 1. British Computer Society. United Kingdom:

[6] Bojko A. Using eyetracking to compare web page designs: A case study. Journal of Usa-

[7] Djamasbi S, Tullis T, Hsu J, Mazuera E, Osberg K, Bosch J. Gender preferences in web design: Usability testing through eye tracking. In: Proceedings of the Thirteenth Americas Conference on Information Systems (AMCIS), Keystone, Colorado; August 2007. pp. 1-8

[8] Szymanski JM, Sobecki J, Chynał P, Anisiewicz J. Eye tracking in gesture based user interfaces usability testing. In: Asian Conference on Intelligent Information and Database Systems. Springer, Cham: Springer International Publishing; Mar 2015. pp. 359-366 [9] Beymer D, Russell D, Orton P. An eye tracking study of how font size and type influence online reading. In: Proceedings of the 22nd British HCI Group Annual Conference on People and Computers: Culture, Creativity, Interaction. Vol. 2. British Computer

[10] Romano Bergstrom JC, Olmsted-Hawala EL, Jans ME. Age-related differences in eye tracking and usability performance: Website usability for older adults. International

[11] Montero YH, Herrero-Solana V, Guerrero-Bote V. Usabilidad de los tag-clouds: estudio mediante eye-tracking. In: Scire: Representacion y organizaci ´ on del conocimiento.

novacion. San Salvador: Universidad Don Bosco; Aug 2011

New Riders Publishing; 1999

Riders Publishing; 2010

Brazil: IEEE; Oct 2009

University of Lancaster; Sep 2007. pp. 119-128

Society; Liverpool, United Kingdom. Sep 2008. pp. 15-18

Journal of Human-Computer Interaction. 2013;**29**(8)

Idioma: español; 2010;**16**(1):31-41. ISSN: 1135-3716

bility Studies. 2006:**1**(3):112-120

## **6. Conclusions**

In this chapter, a relatively new way of measuring usability by trying to combine the qualitative and quantitative techniques is proposed. Analysis of results showed how both techniques can be complemented. For example, an ROI where participants' rate was poor or marked as not helpful was *areas of help*. On the other hand, most of the works that try to solve usability problems by using the eye tracking technique generally do not focus on the qualitative part, as it could be seen in the state of the art, that is, qualitative measurement is not considered. More particularly, we have explored how the visual behavior of users can be combined with qualitative measurements to solve usability problems of the RUPI website. However, due to the complexity of human behavior and their reaction to a problem (in a web page) as well as the complexity of the required research, it is necessary to study this subject in more depth.

## **Acknowledgements**

Thanks to PROMEP for the financial support to the project "Modelización Cognitiva Computacional de Baja Complejidad para la Busqueda de Información en Español basado en el Comportamiento Ocular de los Usuarios" with ID DSA/103.5/15/7004 under which this work was started. Thanks also to the Autonomous University of Ciudad Juárez under the project 8074-16: "Usabilidad Web Basada en Datos de los Movimientos Oculares (Eyetracking)".

## **Thanks**

Thanks to the student community involved during the experimental phase, specially to Mario Luna-Maldonado and Ramón G. Ortiz-Bustillos who carried out the experiment as part of their undergraduate project. Also, we thank the teaching staff of UACJ for accepting to participate in our experiments.

## **Author details**

López-Orozco\* and Florencia-Juárez

\*Address all correspondence to: francisco.orozco@uacj.mx

Autonomous University of Ciudad Juárez, Ciudad Juárez, Mexico

## **References**

Moreover, 60% of participants rated the help section as good and 40% as regular. A total of 60% of participants entered the data successfully and 80% mentioned that they had not experienced any problem with the web page content. Finally, 60% of participants made plausible the fact that the new design has elements to allow users to go forward and go back while browsing.

In this chapter, a relatively new way of measuring usability by trying to combine the qualitative and quantitative techniques is proposed. Analysis of results showed how both techniques can be complemented. For example, an ROI where participants' rate was poor or marked as not helpful was *areas of help*. On the other hand, most of the works that try to solve usability problems by using the eye tracking technique generally do not focus on the qualitative part, as it could be seen in the state of the art, that is, qualitative measurement is not considered. More particularly, we have explored how the visual behavior of users can be combined with qualitative measurements to solve usability problems of the RUPI website. However, due to the complexity of human behavior and their reaction to a problem (in a web page) as well as the complexity of the required research, it is necessary to study this subject in more depth.

Thanks to PROMEP for the financial support to the project "Modelización Cognitiva Computacional de Baja Complejidad para la Busqueda de Información en Español basado en el Comportamiento Ocular de los Usuarios" with ID DSA/103.5/15/7004 under which this work was started. Thanks also to the Autonomous University of Ciudad Juárez under the project 8074-16: "Usabilidad Web Basada en Datos de los Movimientos Oculares (Eyetracking)".

Thanks to the student community involved during the experimental phase, specially to Mario Luna-Maldonado and Ramón G. Ortiz-Bustillos who carried out the experiment as part of their undergraduate project. Also, we thank the teaching staff of UACJ for accepting to par-

**6. Conclusions**

330 Artificial Intelligence - Emerging Trends and Applications

**Acknowledgements**

ticipate in our experiments.

López-Orozco\* and Florencia-Juárez

\*Address all correspondence to: francisco.orozco@uacj.mx

Autonomous University of Ciudad Juárez, Ciudad Juárez, Mexico

**Author details**

**Thanks**


**Chapter 17**

**Provisional chapter**

**Deep Learning Models for Predicting Phenotypic Traits**

**Deep Learning Models for Predicting Phenotypic Traits** 

Computational analysis of high-throughput omics data, such as gene expressions, copy number alterations and DNA methylation (DNAm), has become popular in disease studies in recent decades because such analyses can be very helpful to predict whether a patient has certain disease or its subtypes. However, due to the highdimensional nature of the data sets with hundreds of thousands of variables and very small number of samples, traditional machine learning approaches, such as support vector machines (SVMs) and random forests, have limitations to analyze these data efficiently. In this chapter, we reviewed the progress in applying deep learning algorithms to solve some biological questions. The focus is on potential software tools and public data sources for the tasks. Particularly, we show some case studies using deep neural network (DNN) models for classifying molecular subtypes of breast cancer and DNN-based regression models to account for interindividual variation in triglyceride concentrations measured at different visits of peripheral blood samples using DNAm profiles. We show that integration of multi-omics profiles into DNN-based learning methods could improve the prediction of the molecular subtypes of breast cancer. We also demonstrate the superiority of our proposed DNN models over the SVM model

**Keywords:** deep learning, omics data, phenotypic traits, data integration, support

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.75311

**and Diseases from Omics Data**

**and Diseases from Omics Data**

Md. Mohaiminul Islam, Yang Wang and

Md. Mohaiminul Islam, Yang Wang and

http://dx.doi.org/10.5772/intechopen.75311

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

for predicting triglyceride concentrations.

vector machine, random forest

Pingzhao Hu

Pingzhao Hu

**Abstract**

#### **Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data**

DOI: 10.5772/intechopen.75311

Md. Mohaiminul Islam, Yang Wang and Pingzhao Hu Md. Mohaiminul Islam, Yang Wang and Pingzhao Hu

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75311

#### **Abstract**

Computational analysis of high-throughput omics data, such as gene expressions, copy number alterations and DNA methylation (DNAm), has become popular in disease studies in recent decades because such analyses can be very helpful to predict whether a patient has certain disease or its subtypes. However, due to the highdimensional nature of the data sets with hundreds of thousands of variables and very small number of samples, traditional machine learning approaches, such as support vector machines (SVMs) and random forests, have limitations to analyze these data efficiently. In this chapter, we reviewed the progress in applying deep learning algorithms to solve some biological questions. The focus is on potential software tools and public data sources for the tasks. Particularly, we show some case studies using deep neural network (DNN) models for classifying molecular subtypes of breast cancer and DNN-based regression models to account for interindividual variation in triglyceride concentrations measured at different visits of peripheral blood samples using DNAm profiles. We show that integration of multi-omics profiles into DNN-based learning methods could improve the prediction of the molecular subtypes of breast cancer. We also demonstrate the superiority of our proposed DNN models over the SVM model for predicting triglyceride concentrations.

**Keywords:** deep learning, omics data, phenotypic traits, data integration, support vector machine, random forest

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **1. Introduction**

#### **1.1. Omics data**

Omics refers to use high-throughput experimental technologies to examine genomics, transcriptomics, metabolomics and proteomics for understanding biological and disease mechanisms. The omics data generated from these technologies are high-dimensional and correlated. Different computational and statistical analyses of these data can be used to identify risk factors for different diseases or to build autonomous disease prediction models. The technological development allows researchers to have a huge amount of high-dimensional biological data. The omics technologies generate such high-throughput data by detecting numerous alterations in molecular components [1]. These technologies also generate additional biological data to comprehend different types of correlations and dependencies among the molecular components. Bioinformatics is a discipline which emerges to perform computational analysis with the high-throughput biological data. Bioinformatics offers tools and methodologies for analyzing different omics data to understand the underlying information about different diseases. Such analyses will help physicians to provide early and patient-specific treatment. Schneider and Orchard [2] listed the state-of-the-art available technologies to generate omics data. They also provided the list of different available bioinformatics resources to analyze omics data and discussed the bioinformatics challenges to handle the high-throughput data.

*1.2.1. Breast cancer and its molecular subtypes*

disease with differing prognostic and clinical outcomes.

*1.2.2. Prediction of triglyceride concentration in blood*

phenotypic courses has not yet been explored in detail.

**1.3. Machine learning for disease phenotype prediction**

Cancer is a collection of diseases that are characterized by uncontrolled cell growth in organs, that is, the sites the cells originate from. For example, breast cancer begins in the breast tissue and may start in the duct or lobe of the breast. When the breast cells are not working properly, they divide continually and a tumor is formed. Breast cancer is a complex and heterogeneous

Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

http://dx.doi.org/10.5772/intechopen.75311

335

In clinical practice, estrogen-receptor expression can be used to classify breast cancer patients into estrogen-receptor positive (ER+) or negative (ER−). A patient is classified as ER+ if the stimulation of growth of her cancer cells depends on the receptor for hormone estrogen. Stratification of breast cancer patients into ER+ or ER− is very important as physicians will use the information to determine whether the patients need chemotherapy or hormonal treatments. Statistics show that approximately 67% of breast cancer tests are positive for hormone receptors [8].

Perou et al. discovered the intrinsic subtypes of breast cancer using gene expression profiles of frozen tissue samples through unsupervised analysis [9]. They classified breast cancers into five groups such as normal-like, luminal A and B (Lumb A and B), basal-like and Her2 enriched (Her2). They established a polymerase chain reaction (PCR)-based test for 50 genes, and these genes are well-known as PAM50 signature. Gene expression levels of these 50 genes can be used to classify patients into one of these four subtypes: luminal A and B, basal-like and Her2-enriched. These subtypes are also known as PAM50 subtypes. This classification has been shown to be prognostically independent of clinicopathologic factors and can identify patients who are more likely to benefit from adjuvant chemotherapy [10]. It can also help identifying the fundamental differences among the PAM50 subtypes at the molecular level [11].

Triglyceride is a type of fat in the human blood. Having a high concentration of triglycerides in human blood can increase our risk of heart diseases, stroke and other disorders. Many genetic loci have been identified by genome-wide association studies, but only a small proportion of interindividual variability of triglycerides has been explained by the genetic determinants. It is known that the level of triglycerides is heritable. Consequently, the development of new high-throughput genomic technologies makes it natural to extend these phenotypic prediction models to complex traits such as triglyceride. Using DNAm profiles to predict disease

Artificial intelligence (AI) is an area of computer science which demonstrates its necessity in our everyday life by machine learning (ML) methods. ML methods can automate the data analysis and can find the hidden intrinsic patterns from big data, which is impossible for a human being. ML methods use these patterns to build predictive models without any explicit programming. These predictive ML models are improving our daily life in various ways such as recommendations of different products during online shopping based on our searches of products, stock price prediction, classification of different objects from images, real-time language translation, and so on.

For example, one of the key questions in medical science is to identify the specific mutations for a particular disease in individual patients. To do this, we need to first collect blood samples from the patients with the disease and healthy individuals without diseases. DNA will be then extracted from these blood samples. DNA microarray (also named as DNA chip) or DNA sequencing technologies will be then used to generate genome-wide genomic data for identifying mutated genes for the disease [2]. Similar technologies and procedures can also be used to generate other types of omics data such as RNA gene expression data and DNA methylation (DNAm) data.

#### **1.2. Phenotypic traits prediction**

A living biological organism can show a number of observable characteristics such as morphology, growth and behavior of the organism. Phenotypes are the product of different genetic expressions of an organism. These expressions are known as the genotype of that organism. However, phenotypic traits are the alternatives of a phenotype of a particular organism. For example, hair is a phenotype but different hair colors are the phenotypic traits. Study of the phenotypic traits prediction is very important as it gives us the knowledge about how genotype impacts upon an individual's diseases or traits. Lippert et al. [3] used the whole-genome sequencing data to identify individuals by predicting their biometric traits. Genome sequencing data were also used by Chen et al. [4] to build a probabilistic Bayesian model to predict the dichotomous traits (e.g., glaucoma, Corn's disease, prostate cancer). This model incorporates annotated information about different variant genotypes and genes, which are associated with diseases. There are other phenotypic trait prediction models such as eye color [5, 6], skin color [6] or facial structures [7].

## *1.2.1. Breast cancer and its molecular subtypes*

**1. Introduction**

334 Artificial Intelligence - Emerging Trends and Applications

methylation (DNAm) data.

**1.2. Phenotypic traits prediction**

color [6] or facial structures [7].

Omics refers to use high-throughput experimental technologies to examine genomics, transcriptomics, metabolomics and proteomics for understanding biological and disease mechanisms. The omics data generated from these technologies are high-dimensional and correlated. Different computational and statistical analyses of these data can be used to identify risk factors for different diseases or to build autonomous disease prediction models. The technological development allows researchers to have a huge amount of high-dimensional biological data. The omics technologies generate such high-throughput data by detecting numerous alterations in molecular components [1]. These technologies also generate additional biological data to comprehend different types of correlations and dependencies among the molecular components. Bioinformatics is a discipline which emerges to perform computational analysis with the high-throughput biological data. Bioinformatics offers tools and methodologies for analyzing different omics data to understand the underlying information about different diseases. Such analyses will help physicians to provide early and patient-specific treatment. Schneider and Orchard [2] listed the state-of-the-art available technologies to generate omics data. They also provided the list of different available bioinformatics resources to analyze omics data and discussed the bioinformatics challenges to handle the high-throughput data. For example, one of the key questions in medical science is to identify the specific mutations for a particular disease in individual patients. To do this, we need to first collect blood samples from the patients with the disease and healthy individuals without diseases. DNA will be then extracted from these blood samples. DNA microarray (also named as DNA chip) or DNA sequencing technologies will be then used to generate genome-wide genomic data for identifying mutated genes for the disease [2]. Similar technologies and procedures can also be used to generate other types of omics data such as RNA gene expression data and DNA

A living biological organism can show a number of observable characteristics such as morphology, growth and behavior of the organism. Phenotypes are the product of different genetic expressions of an organism. These expressions are known as the genotype of that organism. However, phenotypic traits are the alternatives of a phenotype of a particular organism. For example, hair is a phenotype but different hair colors are the phenotypic traits. Study of the phenotypic traits prediction is very important as it gives us the knowledge about how genotype impacts upon an individual's diseases or traits. Lippert et al. [3] used the whole-genome sequencing data to identify individuals by predicting their biometric traits. Genome sequencing data were also used by Chen et al. [4] to build a probabilistic Bayesian model to predict the dichotomous traits (e.g., glaucoma, Corn's disease, prostate cancer). This model incorporates annotated information about different variant genotypes and genes, which are associated with diseases. There are other phenotypic trait prediction models such as eye color [5, 6], skin

**1.1. Omics data**

Cancer is a collection of diseases that are characterized by uncontrolled cell growth in organs, that is, the sites the cells originate from. For example, breast cancer begins in the breast tissue and may start in the duct or lobe of the breast. When the breast cells are not working properly, they divide continually and a tumor is formed. Breast cancer is a complex and heterogeneous disease with differing prognostic and clinical outcomes.

In clinical practice, estrogen-receptor expression can be used to classify breast cancer patients into estrogen-receptor positive (ER+) or negative (ER−). A patient is classified as ER+ if the stimulation of growth of her cancer cells depends on the receptor for hormone estrogen. Stratification of breast cancer patients into ER+ or ER− is very important as physicians will use the information to determine whether the patients need chemotherapy or hormonal treatments. Statistics show that approximately 67% of breast cancer tests are positive for hormone receptors [8].

Perou et al. discovered the intrinsic subtypes of breast cancer using gene expression profiles of frozen tissue samples through unsupervised analysis [9]. They classified breast cancers into five groups such as normal-like, luminal A and B (Lumb A and B), basal-like and Her2 enriched (Her2). They established a polymerase chain reaction (PCR)-based test for 50 genes, and these genes are well-known as PAM50 signature. Gene expression levels of these 50 genes can be used to classify patients into one of these four subtypes: luminal A and B, basal-like and Her2-enriched. These subtypes are also known as PAM50 subtypes. This classification has been shown to be prognostically independent of clinicopathologic factors and can identify patients who are more likely to benefit from adjuvant chemotherapy [10]. It can also help identifying the fundamental differences among the PAM50 subtypes at the molecular level [11].

## *1.2.2. Prediction of triglyceride concentration in blood*

Triglyceride is a type of fat in the human blood. Having a high concentration of triglycerides in human blood can increase our risk of heart diseases, stroke and other disorders. Many genetic loci have been identified by genome-wide association studies, but only a small proportion of interindividual variability of triglycerides has been explained by the genetic determinants. It is known that the level of triglycerides is heritable. Consequently, the development of new high-throughput genomic technologies makes it natural to extend these phenotypic prediction models to complex traits such as triglyceride. Using DNAm profiles to predict disease phenotypic courses has not yet been explored in detail.

## **1.3. Machine learning for disease phenotype prediction**

Artificial intelligence (AI) is an area of computer science which demonstrates its necessity in our everyday life by machine learning (ML) methods. ML methods can automate the data analysis and can find the hidden intrinsic patterns from big data, which is impossible for a human being. ML methods use these patterns to build predictive models without any explicit programming. These predictive ML models are improving our daily life in various ways such as recommendations of different products during online shopping based on our searches of products, stock price prediction, classification of different objects from images, real-time language translation, and so on.

Conventional ML methods, such as support vector machine (SVM), random forest (RF), Bayesian network (BN), and so on, are dependent on the well-defined, engineered and robust hand-tuned features (or feature vectors) as inputs from the raw input data to make reasonable predictions. A domain human expertise is required to develop these engineered features. However, real-time biomedical data are often high-dimensional and noisy. These conventional ML methods are not capable enough to provide suitable techniques to handle such natural raw data (i.e., normalized gene expression data).

Different ML-based methods were developed to classify breast cancer patients into one of the PAM50 subtypes using gene expression profiles [12]. However, a new class of ML methods called deep learning (DL) can handle such high-dimensional, noisy and natural raw data by following representation learning or hierarchical data-driven approaches.

## **2. Deep learning and its application in bioinformatics**

DL is a family of artificial neural network (ANN)-based ML methods which have been inspired by the working principles of a human brain. In a DL network architecture, a series of hidden layers are connected in a cascade fashion between input and output of the network. Each of these layers takes input from its previous layer and transforms the data into a more abstract form. Nonlinear layers allow DL methods to model complex relations between input and output of the network like shallow ANNs. DL is a representation learning method which means it can be fed with raw data and then, it will automatically extract necessary representation for predictions. A DL network provides representations at different levels. The output of each of hidden layers is considered as the representation at that level. The higher layers the data belong, the more abstract representations we get for these data. In different studies, these higher-level representations of raw data prove to be very effective for classification or detection problems. The most important thing here is that these representations, alternatively called feature vectors, are not learned by human engineering rather from the raw input data directly.

In computer vision, there has always been a growing need to train visual recognition systems more generically so that a system trained on one visual recognition task (e.g., classification) could be easily adapted to another task (e.g., detection). To handle such a challenge of adapting the source domain to different target domains, many domain adaptation methods have been proposed [14–16]. For example, Donahue et al. [17] extracted "deep features" from a deep neural network trained on one computer vision task and shown the state-of-the-art performance on a variety of other tasks. The DCNN-based features are very powerful, generic and, thus, can be well adapted to different visual domains, which was also evidenced by Sharif Razavian et al. [18]. They proposed a DCNN-based pretrained model called OverFeat, which was applied to different tasks for which OverFeat was not trained. They achieved stateof-the-art prediction performances for object detection [19]. After having feature descriptors extracted from the first fully connected layer of OverFeat, they applied a linear SVM classifier to these features for image classification, scene recognition, fine-grained recognition and attribute detection on different data sets. A pretrained CNN is usually followed by domainspecific fine-tuning on data from the target domains, especially when training data are scarce. Following this approach, Girshick et al. [20] fine-tuned a CNN pretrained on ILSVRC2012 classification data set and achieved substantially better object detection performance on PASCAL

**Deep learning**

Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

fashion

possible

Perform automatic feature extraction

http://dx.doi.org/10.5772/intechopen.75311

337

Training requires long time for optimizing its thousands of parameters.

Techniques like dropout enable highthroughput data processing relatively

Can solve the problems in an end-to-end

VOC as compared to the standard models based on simple hand-engineered features.

recognition benchmark data sets over the dropout [22].

**Characteristics Conventional machine learning (including shallow neural network)**

Feature engineering Handcrafted features created by experts are required

final output.

Execution time Relatively much less time is required (ranging

High-throughput data It is challenging to handle high-throughput data directly

**Table 1.** Comparison of conventional machine learning and deep learning.

Break the problems into small parts and solve them separately. Combine the results for the

Interpretability Relatively easy to interpret the reasoning Hard to interpret the reasoning

from a few seconds to a few hours)

Problem-solving approach

DL-based models trained using a small data set usually show poor prediction performances over the test data. To tackle such kind of overfitting problem, a technique called dropout was proposed by Hinton et al. [21]. Dropout is a regularization technique, which randomly drops out a predefined ratio of neuron connections within a DL-based network's layer. This helps the model to learn a general weight for each of the neuron connections of that layer. This procedure allows the network to perform well over the test data. In addition, another technique called DropConnect was proposed as a generalization of dropout [22]. It randomly drops out of neuron connections from the network instead of randomly dropping neural connection from a layer. DropConnect provided improved prediction performances for different image

Unlike other ML methods, DL methods have been shown to efficiently handle high-dimensional and noise data in many domains such as computer vision, language processing (**Table 1**). These qualities of DL attract biomedical researchers to use DL instead of conventional ML methods because biomedical data (e.g., omics data) often suffer from high-dimensionality and noisiness.

## **2.1. How deep learning evolved?**

Technological advancement and availability of large data sets allow researchers to rekindle their interest in deep neural networks. Recently, DNN-based models have achieved the stateof-the-art prediction performances at the cost of immense computational power. For example, Krizhevsky et al. [13] trained a deep convolutional neural network (DCNN)-based model with 1.2 million images to classify 1000 different classes, which took approximately 6 days to complete the training. They built this model by optimizing about 60 million learnable parameters. They won the ILSVRC-2012 competition with their state-of-the-art prediction performances in image classification.


**Table 1.** Comparison of conventional machine learning and deep learning.

Conventional ML methods, such as support vector machine (SVM), random forest (RF), Bayesian network (BN), and so on, are dependent on the well-defined, engineered and robust hand-tuned features (or feature vectors) as inputs from the raw input data to make reasonable predictions. A domain human expertise is required to develop these engineered features. However, real-time biomedical data are often high-dimensional and noisy. These conventional ML methods are not capable enough to provide suitable techniques to handle such

Different ML-based methods were developed to classify breast cancer patients into one of the PAM50 subtypes using gene expression profiles [12]. However, a new class of ML methods called deep learning (DL) can handle such high-dimensional, noisy and natural raw data by

DL is a family of artificial neural network (ANN)-based ML methods which have been inspired by the working principles of a human brain. In a DL network architecture, a series of hidden layers are connected in a cascade fashion between input and output of the network. Each of these layers takes input from its previous layer and transforms the data into a more abstract form. Nonlinear layers allow DL methods to model complex relations between input and output of the network like shallow ANNs. DL is a representation learning method which means it can be fed with raw data and then, it will automatically extract necessary representation for predictions. A DL network provides representations at different levels. The output of each of hidden layers is considered as the representation at that level. The higher layers the data belong, the more abstract representations we get for these data. In different studies, these higher-level representations of raw data prove to be very effective for classification or detection problems. The most important thing here is that these representations, alternatively called feature vectors,

Unlike other ML methods, DL methods have been shown to efficiently handle high-dimensional and noise data in many domains such as computer vision, language processing (**Table 1**). These qualities of DL attract biomedical researchers to use DL instead of conventional ML methods because biomedical data (e.g., omics data) often suffer from high-dimensionality

Technological advancement and availability of large data sets allow researchers to rekindle their interest in deep neural networks. Recently, DNN-based models have achieved the stateof-the-art prediction performances at the cost of immense computational power. For example, Krizhevsky et al. [13] trained a deep convolutional neural network (DCNN)-based model with 1.2 million images to classify 1000 different classes, which took approximately 6 days to complete the training. They built this model by optimizing about 60 million learnable parameters. They won the ILSVRC-2012 competition with their state-of-the-art prediction perfor-

natural raw data (i.e., normalized gene expression data).

336 Artificial Intelligence - Emerging Trends and Applications

following representation learning or hierarchical data-driven approaches.

are not learned by human engineering rather from the raw input data directly.

and noisiness.

**2.1. How deep learning evolved?**

mances in image classification.

**2. Deep learning and its application in bioinformatics**

In computer vision, there has always been a growing need to train visual recognition systems more generically so that a system trained on one visual recognition task (e.g., classification) could be easily adapted to another task (e.g., detection). To handle such a challenge of adapting the source domain to different target domains, many domain adaptation methods have been proposed [14–16]. For example, Donahue et al. [17] extracted "deep features" from a deep neural network trained on one computer vision task and shown the state-of-the-art performance on a variety of other tasks. The DCNN-based features are very powerful, generic and, thus, can be well adapted to different visual domains, which was also evidenced by Sharif Razavian et al. [18]. They proposed a DCNN-based pretrained model called OverFeat, which was applied to different tasks for which OverFeat was not trained. They achieved stateof-the-art prediction performances for object detection [19]. After having feature descriptors extracted from the first fully connected layer of OverFeat, they applied a linear SVM classifier to these features for image classification, scene recognition, fine-grained recognition and attribute detection on different data sets. A pretrained CNN is usually followed by domainspecific fine-tuning on data from the target domains, especially when training data are scarce. Following this approach, Girshick et al. [20] fine-tuned a CNN pretrained on ILSVRC2012 classification data set and achieved substantially better object detection performance on PASCAL VOC as compared to the standard models based on simple hand-engineered features.

DL-based models trained using a small data set usually show poor prediction performances over the test data. To tackle such kind of overfitting problem, a technique called dropout was proposed by Hinton et al. [21]. Dropout is a regularization technique, which randomly drops out a predefined ratio of neuron connections within a DL-based network's layer. This helps the model to learn a general weight for each of the neuron connections of that layer. This procedure allows the network to perform well over the test data. In addition, another technique called DropConnect was proposed as a generalization of dropout [22]. It randomly drops out of neuron connections from the network instead of randomly dropping neural connection from a layer. DropConnect provided improved prediction performances for different image recognition benchmark data sets over the dropout [22].

There is another class of DNN known as deep belief network (DBN). A DBN is composed of multiple layers where different layers have neural connections between them but not within the hidden units of a layer. Each of these layers is a restricted Boltzmann machines (RBMs). DBN follows an unsupervised layer-wise pretraining approach for these RBMs using a method called contrastive divergence [23]. DBN is useful to fine-tune a DNN when the number of training samples is small. In this case, a DBN is first trained and the optimized weights from this DBN are then used to fine-tune a DNN. Therefore, this DNN will start the training with these learned weights instead of training from the scratch. This leads the network to be convergent at early stage and improve prediction performances because these weights are neighboring to the best value of a converged model [24].

CNN, convolutional neural network; RNN, recurrent neural network; RBN, radial basis net-

Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

http://dx.doi.org/10.5772/intechopen.75311

339

In recent years, deep learning has been successfully applied to answer many biological questions using diverse biological data sources (**Figure 1**). We briefly review these advancements

Maienschein-Cline et al. used a conventional unsupervised machine learning approach to identify targeted genes which were regulated by transcription factors (TFs) by integrating high-throughput binding data and gene expression data [30]. Their experimental results showed that the conventional method faced challenges to handle such high-throughput data even though they may be very helpful for clinical diagnosis. Therefore, Dananee et al. used a DL-based method called stacked denoising autoencoder (SADE) [25] for such kind of analysis [31]. They first took a high-level representation of the input gene expression profiles. They then fed this higher-level representation of the data to a shallow neural network and a SVM. They demonstrated the improved prediction performances than the baseline principal component analysis (PCA) or kernel principal component analysis (KPCA) approaches. They also identified a set of genes from the connectivity matrices of their DL-based model. These

Somatic point mutation-based cancer classification (SMCC) becomes an attractive research problem as DNA sequencing technology allows us to generate a huge volume of sequencing data. Efficient analysis of the somatic point mutation data can lead to a better patient-specific personal therapy to cancer patients. This kind of somatic point mutation data usually suffer from large sparsity. Therefore, previous SMCC approaches were not able to provide clinically acceptable cancer classification results. However, recently a DL-based classifier known as DeepGene tried to overcome this limitation [32]. This model first filtered the gene data by mutation rate to remove irrelevant genes from them. Then, it indexed the gene data by their nonzero elements which let DeepGene overcome the data sparsity problem. Finally, the outputs of these two steps were fed into a DNN which performed automatic extraction of features for SMCC. DeepGene achieved ~67% prediction accuracy which is much better than the prediction performances of most baseline classifiers (i.e., SVM ~67%, k-nearest neighbors (KNN) ~42% and naïve Bayes (NB) ~9%).

**2.2. Application of deep learning to solve different bioinformatics applications**

in different bioinformatics applications in the following paragraphs.

genes can be investigated further to improve clinical diagnosis.

**Figure 1.** Scopes of deep learning in bioinformatics.

work; and DBN, deep brief network.

Stacked autoencoder is another type of DL-based method, which is used to create a vigorous high-level representation of its input using unsupervised ML approaches [24]. A stacked autoencoder can be built using a DNN by stacking layers on top of others. With this approach, we can increase or decrease the dimension of the input data. Vincent et al. showed that we can feed a high-level representation of input data learned from a stacked autoencoder into a SVM with improved prediction performance [25].

DL-based methods have already achieved state-of-the-art prediction performances in diverse fields such as image classification [26], object detection [13], speech recognition [27], and so on. However, DL methods also allow us to build state-of-the-art prediction models for sequential data [28, 29]. A series of functionally powerful deep learning tools have been implemented (**Table 2**), which can be publicly downloaded to perform these applications.


**Table 2.** A list of commonly used deep learning tools.

CNN, convolutional neural network; RNN, recurrent neural network; RBN, radial basis network; and DBN, deep brief network.

#### **2.2. Application of deep learning to solve different bioinformatics applications**

There is another class of DNN known as deep belief network (DBN). A DBN is composed of multiple layers where different layers have neural connections between them but not within the hidden units of a layer. Each of these layers is a restricted Boltzmann machines (RBMs). DBN follows an unsupervised layer-wise pretraining approach for these RBMs using a method called contrastive divergence [23]. DBN is useful to fine-tune a DNN when the number of training samples is small. In this case, a DBN is first trained and the optimized weights from this DBN are then used to fine-tune a DNN. Therefore, this DNN will start the training with these learned weights instead of training from the scratch. This leads the network to be convergent at early stage and improve prediction performances because these weights are

Stacked autoencoder is another type of DL-based method, which is used to create a vigorous high-level representation of its input using unsupervised ML approaches [24]. A stacked autoencoder can be built using a DNN by stacking layers on top of others. With this approach, we can increase or decrease the dimension of the input data. Vincent et al. showed that we can feed a high-level representation of input data learned from a stacked autoencoder into a SVM

DL-based methods have already achieved state-of-the-art prediction performances in diverse fields such as image classification [26], object detection [13], speech recognition [27], and so on. However, DL methods also allow us to build state-of-the-art prediction models for sequential data [28, 29]. A series of functionally powerful deep learning tools have been implemented

**Source**

**Interface Deep learning** 

MATLAB

C, OpenCL

MATLAB

Python, Julia, Scala, MATLAB

**algorithms**

CNN, RNN

RBN, DBN

RBN, DBN

CNN, RNN, RBN, DBN

RBN, DBN

CNN, RNN, RBN, DBN

CNN, RNN, RBN, DBN

(**Table 2**), which can be publicly downloaded to perform these applications.

Caffe http://caffe.berkeleyvision.org/ Yes Python,

Torch http://torch.ch/ Yes Lua, LuaJIT,

MatConvNet http://www.vlfeat.org/matconvnet/ Yes C++,

MXNet https://mxnet.incubator.apache.org/ Yes R, C++,

**Table 2.** A list of commonly used deep learning tools.

Deeplearning4j https://deeplearning4j.org/ Yes Java, Python CNN, RNN,

Tensorflow https://www.tensorflow.org/ Yes Python, Java CNN, RNN,

Keras https://keras.io/ Yes Python, R CNN, RBN, DBN Theano http://deeplearning.net/software/theano Yes Python CNN, RNN,

**Software Website Open** 

neighboring to the best value of a converged model [24].

with improved prediction performance [25].

338 Artificial Intelligence - Emerging Trends and Applications

In recent years, deep learning has been successfully applied to answer many biological questions using diverse biological data sources (**Figure 1**). We briefly review these advancements in different bioinformatics applications in the following paragraphs.

Maienschein-Cline et al. used a conventional unsupervised machine learning approach to identify targeted genes which were regulated by transcription factors (TFs) by integrating high-throughput binding data and gene expression data [30]. Their experimental results showed that the conventional method faced challenges to handle such high-throughput data even though they may be very helpful for clinical diagnosis. Therefore, Dananee et al. used a DL-based method called stacked denoising autoencoder (SADE) [25] for such kind of analysis [31]. They first took a high-level representation of the input gene expression profiles. They then fed this higher-level representation of the data to a shallow neural network and a SVM. They demonstrated the improved prediction performances than the baseline principal component analysis (PCA) or kernel principal component analysis (KPCA) approaches. They also identified a set of genes from the connectivity matrices of their DL-based model. These genes can be investigated further to improve clinical diagnosis.

Somatic point mutation-based cancer classification (SMCC) becomes an attractive research problem as DNA sequencing technology allows us to generate a huge volume of sequencing data. Efficient analysis of the somatic point mutation data can lead to a better patient-specific personal therapy to cancer patients. This kind of somatic point mutation data usually suffer from large sparsity. Therefore, previous SMCC approaches were not able to provide clinically acceptable cancer classification results. However, recently a DL-based classifier known as DeepGene tried to overcome this limitation [32]. This model first filtered the gene data by mutation rate to remove irrelevant genes from them. Then, it indexed the gene data by their nonzero elements which let DeepGene overcome the data sparsity problem. Finally, the outputs of these two steps were fed into a DNN which performed automatic extraction of features for SMCC. DeepGene achieved ~67% prediction accuracy which is much better than the prediction performances of most baseline classifiers (i.e., SVM ~67%, k-nearest neighbors (KNN) ~42% and naïve Bayes (NB) ~9%).

**Figure 1.** Scopes of deep learning in bioinformatics.

DBN was used to cluster cancer patients by integrating their gene expression and clinical data [33]. This model can capture intra- and cross-modality correlations (i.e., correlation among genomic data from different platforms) and learn a unified representation of the input. Therefore, this model outperformed existing methods in clustering cancer patients. This model can also be used to predict missing values in the data and identify key target genes of miRNAs responsible for different cancer subtypes. Moreover, preliminary clinical screening of a patient with skin disease usually begins with a visual diagnosis by a dermatologist. Since this is a very common malignancy in a human being [34, 35], an automatic system to classify skin diseases will be very helpful for the clinical purpose. Kelley et al. introduced a DCNN model to learn the functional activity of DNA sequences for 164 cell-specific DNA accessibility multitask prediction, and this model achieved the best result among earlier methods [36]. Recently, Esteva et al. collected 129,450 clinical images of skin diseases and built a DCNN model to classify the disease [37]. This model achieved better prediction performance than dermatologists. Nonetheless, a large number of nuclei and the variability in their sizes in histopathological images of breast cancer pose a great difficulty to build an automated system for nucleus detection. Xu et al. overcame this challenge by using a deep learning approach called stacked sparse autoencoder (SSAE). This model outperformed nine previous state-ofthe-art nuclear detection methods [38].

(FIDDLE), which is an open source DCNN-based data integration framework [45]. FIDDLE can predict yeast transcription start site sequencing (TSS-seq) [46] by the integration of heterogeneous genomic data such as RNA-seq and DNA sequence data. The excellent performance of FIDDLE demonstrated that a model built on integrating multiple data sets can provide better prediction

Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

http://dx.doi.org/10.5772/intechopen.75311

341

Chen et al. [47] proposed a deep learning system (D-GEX) which takes a gene's expression profile as input and infers the expression profile of a target gene. D-GEX has the ability to show cross-platform generalization. This model archives 15.33% improvement in gene expression prediction than a linear regression approach. D-GEX proves its cross-platform generalization when the learned D-GEX is used in RNA-seq-based database for gene expression prediction

Existing methods for the classification of cellular phenotypes from cellular images consist of multiple steps. Each of these steps is required with manual modifications and the tuning of different parameter settings. Godinez et al. [48] introduced a new multiscale CNN (M-CNN) network which uses microscopic images to classify them into phenotypes. The prediction performances of the M-CNN in terms of accuracy over eight benchmark data sets are significantly higher than the previous state-of-the-art methods including CNN-based approaches. Gene expression can be regulated using transcription factors (TFs). Therefore, the cell-specific TF binding predictions using gold standard Chip-seq data are very important. Qin and Feng [49] introduced a DNN model termed TFImpute to achieve the abovementioned goal. TFImpute can determine whether a specific TF would bind to a given DNA sequence in a specific cell line. The prediction performance of TFImpute proves its superiority from the comparison with another latest DNN-based approach called DeepBind [50]. Therefore, biologists can use TFImpute to understand how TF binding can be included by a specific cell line. Zhou et al. [51] were the first one to propose a DCNN-based approach to predict the effects of noncoding-variants from large-scale chromatin-profiling data and achieved state-of-the-art predictive performance. They call their method as deep learning-based sequence analyzer (DeepSea). Experimental results show that DeepSea can also precisely predict the conse-

Obtaining precise knowledge about a patient's health condition is crucial to provide early and better treatment. Discovery of good imaging biomarkers can lead clinical research into achieving this goal. Oakden-Rayner et al. [52] provided a proof-of-concept research which proves that computer-based cross-sectional chest CT image analysis is able to predict 5 year mortality in adult (age > 60 years) person. Their framework includes deep learning model, and the predictive performances of this model are better than those who use human-generated features. Besides, visualization of different components of this deep learning-based model can provide

Gene expression can be controlled by enhancer elements and cis-acting DNA regulatory elements [54]. However, existing enhancer predictors face a challenge, that is, the lack of availability of huge and experimentally confirmed enhancers for humans or other species. Yang et al. [55] developed a DNN-based hybrid architecture termed as BiRen which takes only DNA sequence as input to predict enhancers. Experimental results proved that BiRen can

performances than the models which were built using only one data set.

for each target gene and still outperforms LR by 6.57%.

quence of specific SNPs on TF binding.

an explanation about the better prediction performances [53].

Conventional machine learning approaches have been applied to analyze high-content microscopy data to protein subcellular localization from yeast cell images [39]. However, these approaches were not able to perform such analysis without human expert's intervention and yet did not provide accurate classification. Kraus et al. [40] came up with a model called DeepLoc which is a DCNN-based approach to overcome these limitations. DeepLoc outperforms the model ensLOC [39] by 71.4% according to mean average precision using fewer number of images. However, ensLOC uses binary SVM ensemble approach to assign single cells to subcellular compartment classes. Kraus et al. [40] also investigated the reason behind their success over ensLOC by performing 2D visualization of their network's components. They found out that DeepLoc generates a unique signal for different inputs. The structure of a protein and its functions can be studied further by protein contact map prediction from sequences. Wang et al. [41] treated this problem as a pixel-level labeling by considering a protein contact as an image. They proposed a novel deep learning-based protein contact map prediction model with extremely unbalanced positive and negative labels. Their model integrates two evolutionary couplings (EC) and sequence conservation information into their network. Their model gives the state-of-the-art performance result in protein contact map prediction. Furthermore, the predicted protein contacts by this model can generate an improved 3D structure model than the previous best models: CCMpred [42] and MetaPSICOV [43]. Besides, many biological processes such as signal transduction and cellular organization can be affected by different protein-protein interactions (PPI). Hence, it is very important to build a PPI prediction model in order to provide a better design for the therapy of a disease. Sun et al. [44] were the first one to build a deep learning-based model that is a stacked autoencoder for the sequence-based PPI prediction. They achieved an accuracy of 97.19% with 10-fold cross-validation which is better than any existing PPI predictors.

Genomics becomes rich with many different types of functional genomic data because of the latest sequencing technology. Eser et al. introduced flexible integration of data with deep learning (FIDDLE), which is an open source DCNN-based data integration framework [45]. FIDDLE can predict yeast transcription start site sequencing (TSS-seq) [46] by the integration of heterogeneous genomic data such as RNA-seq and DNA sequence data. The excellent performance of FIDDLE demonstrated that a model built on integrating multiple data sets can provide better prediction performances than the models which were built using only one data set.

DBN was used to cluster cancer patients by integrating their gene expression and clinical data [33]. This model can capture intra- and cross-modality correlations (i.e., correlation among genomic data from different platforms) and learn a unified representation of the input. Therefore, this model outperformed existing methods in clustering cancer patients. This model can also be used to predict missing values in the data and identify key target genes of miRNAs responsible for different cancer subtypes. Moreover, preliminary clinical screening of a patient with skin disease usually begins with a visual diagnosis by a dermatologist. Since this is a very common malignancy in a human being [34, 35], an automatic system to classify skin diseases will be very helpful for the clinical purpose. Kelley et al. introduced a DCNN model to learn the functional activity of DNA sequences for 164 cell-specific DNA accessibility multitask prediction, and this model achieved the best result among earlier methods [36]. Recently, Esteva et al. collected 129,450 clinical images of skin diseases and built a DCNN model to classify the disease [37]. This model achieved better prediction performance than dermatologists. Nonetheless, a large number of nuclei and the variability in their sizes in histopathological images of breast cancer pose a great difficulty to build an automated system for nucleus detection. Xu et al. overcame this challenge by using a deep learning approach called stacked sparse autoencoder (SSAE). This model outperformed nine previous state-of-

Conventional machine learning approaches have been applied to analyze high-content microscopy data to protein subcellular localization from yeast cell images [39]. However, these approaches were not able to perform such analysis without human expert's intervention and yet did not provide accurate classification. Kraus et al. [40] came up with a model called DeepLoc which is a DCNN-based approach to overcome these limitations. DeepLoc outperforms the model ensLOC [39] by 71.4% according to mean average precision using fewer number of images. However, ensLOC uses binary SVM ensemble approach to assign single cells to subcellular compartment classes. Kraus et al. [40] also investigated the reason behind their success over ensLOC by performing 2D visualization of their network's components. They found out that DeepLoc generates a unique signal for different inputs. The structure of a protein and its functions can be studied further by protein contact map prediction from sequences. Wang et al. [41] treated this problem as a pixel-level labeling by considering a protein contact as an image. They proposed a novel deep learning-based protein contact map prediction model with extremely unbalanced positive and negative labels. Their model integrates two evolutionary couplings (EC) and sequence conservation information into their network. Their model gives the state-of-the-art performance result in protein contact map prediction. Furthermore, the predicted protein contacts by this model can generate an improved 3D structure model than the previous best models: CCMpred [42] and MetaPSICOV [43]. Besides, many biological processes such as signal transduction and cellular organization can be affected by different protein-protein interactions (PPI). Hence, it is very important to build a PPI prediction model in order to provide a better design for the therapy of a disease. Sun et al. [44] were the first one to build a deep learning-based model that is a stacked autoencoder for the sequence-based PPI prediction. They achieved an accuracy of 97.19% with 10-fold

the-art nuclear detection methods [38].

340 Artificial Intelligence - Emerging Trends and Applications

cross-validation which is better than any existing PPI predictors.

Genomics becomes rich with many different types of functional genomic data because of the latest sequencing technology. Eser et al. introduced flexible integration of data with deep learning Chen et al. [47] proposed a deep learning system (D-GEX) which takes a gene's expression profile as input and infers the expression profile of a target gene. D-GEX has the ability to show cross-platform generalization. This model archives 15.33% improvement in gene expression prediction than a linear regression approach. D-GEX proves its cross-platform generalization when the learned D-GEX is used in RNA-seq-based database for gene expression prediction for each target gene and still outperforms LR by 6.57%.

Existing methods for the classification of cellular phenotypes from cellular images consist of multiple steps. Each of these steps is required with manual modifications and the tuning of different parameter settings. Godinez et al. [48] introduced a new multiscale CNN (M-CNN) network which uses microscopic images to classify them into phenotypes. The prediction performances of the M-CNN in terms of accuracy over eight benchmark data sets are significantly higher than the previous state-of-the-art methods including CNN-based approaches.

Gene expression can be regulated using transcription factors (TFs). Therefore, the cell-specific TF binding predictions using gold standard Chip-seq data are very important. Qin and Feng [49] introduced a DNN model termed TFImpute to achieve the abovementioned goal. TFImpute can determine whether a specific TF would bind to a given DNA sequence in a specific cell line. The prediction performance of TFImpute proves its superiority from the comparison with another latest DNN-based approach called DeepBind [50]. Therefore, biologists can use TFImpute to understand how TF binding can be included by a specific cell line.

Zhou et al. [51] were the first one to propose a DCNN-based approach to predict the effects of noncoding-variants from large-scale chromatin-profiling data and achieved state-of-the-art predictive performance. They call their method as deep learning-based sequence analyzer (DeepSea). Experimental results show that DeepSea can also precisely predict the consequence of specific SNPs on TF binding.

Obtaining precise knowledge about a patient's health condition is crucial to provide early and better treatment. Discovery of good imaging biomarkers can lead clinical research into achieving this goal. Oakden-Rayner et al. [52] provided a proof-of-concept research which proves that computer-based cross-sectional chest CT image analysis is able to predict 5 year mortality in adult (age > 60 years) person. Their framework includes deep learning model, and the predictive performances of this model are better than those who use human-generated features. Besides, visualization of different components of this deep learning-based model can provide an explanation about the better prediction performances [53].

Gene expression can be controlled by enhancer elements and cis-acting DNA regulatory elements [54]. However, existing enhancer predictors face a challenge, that is, the lack of availability of huge and experimentally confirmed enhancers for humans or other species. Yang et al. [55] developed a DNN-based hybrid architecture termed as BiRen which takes only DNA sequence as input to predict enhancers. Experimental results proved that BiRen can predict common enhances more accurately than previous state-of-the-art methods, which are based on DNA sequence only.

**3.1. Prediction of molecular subtypes of breast cancer**

1 = copy number gain, −1 = copy number loss and 0 = no change.

used for the tasks is shown in **Figure 2(a)** [58].

trained our network in backpropagation style.

Classification of molecular subtypes of breast cancer using high-throughput genomics data, such as gene expressions and copy number alterations (CNAs), is a challenging task because we have much smaller number of training samples than the number of genomic features. Many traditional machine learning methods often get overfitted over training data to handle

Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

http://dx.doi.org/10.5772/intechopen.75311

343

We conducted our experiments with two data sets: gene expression and CNA. Both data sets had approximate 2000 patients and larger than 16,000 genes. We collected these data sets from a project called Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) [57]. We built a patient by gene mutation matrix from the CNA data where we had only three different values to represent three different copy number statuses of a gene in a patient:

Using the CNA profiles of 991 and 984 patients as training and test sets, respectively, which was separated in original study [56] based on the patients collected at different periods, we classified the status of estrogen-receptor as a supervised binary classification problem. Furthermore, using the CNA profiles of 935 and 842 patients as training and test sets, respectively, we classified the status of PAM50 subtypes (luminal A, luminal B, HER-2 enriched and basal-like) as a supervised multiclass classification problem (it should be noted that some patients have no status of the PAM50 subtypes). Our DCNN-based deep learning architecture

Our network took a CNA profile with approximate 16,000 genes of a breast cancer patient as the input in the input data layer. The input layer passed this input data to a convolutional layer. In the convolutional layer, we passed a kernel with weights over the input data and performed an element-wise multiplication with the input data. Then, we took the sum of the elements from this operation to represent the input data. This operation helped us capture the potential correlations among the neighbor genes. Output from the convolutional layers went as an input into max pooling layer. Here, we also passed a kernel without any weights over the input of this layer. We then took the maximum value from the input data under the kernel to represent the input data. This operation helped the model to achieve three invariant properties such as scaling, translation, and rotation. We introduced nonlinearity into our model by passing the output from max pooling layer through a rectified linear unit (ReLu) layer. ReLu layer performed a thresholding operation by converting all values less than zero to zero. At this point, we passed the output from the ReLu layer to a fully connected layer to get the higher level abstraction. We then passed the output from this fully connected to a dropout layer where we used another ReLu between them. Dropout layer randomly dropped out a number of neurons from the network. Dropout strategy enforced the network to learn a general weight for each of the neurons of the network. This helped the model to achieve better generalization ability over the test data as well as prevented the model from the overfitting problem. Then, we used another fully connected layer to get the prediction score for each of the classes. We converted this class score into class probability using softmax function. We

such a problem. Here, we explored to use DNN methods to overcome this problem.

Analysis of high-dimensional single-cell RNA-seq data is very important to answer several biological questions such as the amount of heterogeneity of cells in a population, the discovery of a biomarker for explicit cells and retrieving analogous cell types. Lin et al. [56] introduced an NN-based model to address all these queries without integrating any prior knowledge into the model. This method can deduce cell type more properly using a database of tens of thousands of single cell profiles than any existing methods.

Although the significant advancements have been made in applying DNN models to different bioinformatics applications as described earlier, here, we introduce several DNN-based classification frameworks which take either CNA profiles or gene expression profiles or both of them as input for the prediction of molecular subtypes of breast cancer. In addition, we also present a DNN-based regression model which takes high-dimensional DNAm data as input to predict triglyceride concentrations (before and after treatment) in the human blood.

## **3. Case studies**

We discussed three case studies where deep learning-based approaches were used to address two biological problems such as classification of molecular subtypes of breast cancer (**Figure 2(a)** and **(b)**) and prediction of triglyceride concentration in human blood (**Figure 2(c)**).

**Figure 2.** Deep learning architectures of three different case studies: (a) classification of molecular subtypes of breast cancer using single data source; (b) classification of molecular subtypes of breast cancer using multiple data sources; and (c) prediction of triglyceride concentration in human blood.

## **3.1. Prediction of molecular subtypes of breast cancer**

predict common enhances more accurately than previous state-of-the-art methods, which are

Analysis of high-dimensional single-cell RNA-seq data is very important to answer several biological questions such as the amount of heterogeneity of cells in a population, the discovery of a biomarker for explicit cells and retrieving analogous cell types. Lin et al. [56] introduced an NN-based model to address all these queries without integrating any prior knowledge into the model. This method can deduce cell type more properly using a database

Although the significant advancements have been made in applying DNN models to different bioinformatics applications as described earlier, here, we introduce several DNN-based classification frameworks which take either CNA profiles or gene expression profiles or both of them as input for the prediction of molecular subtypes of breast cancer. In addition, we also present a DNN-based regression model which takes high-dimensional DNAm data as input to predict triglyceride concentrations (before and after treatment) in the human

We discussed three case studies where deep learning-based approaches were used to address two biological problems such as classification of molecular subtypes of breast cancer (**Figure 2(a)** and **(b)**) and prediction of triglyceride concentration in human blood (**Figure 2(c)**).

**Figure 2.** Deep learning architectures of three different case studies: (a) classification of molecular subtypes of breast cancer using single data source; (b) classification of molecular subtypes of breast cancer using multiple data sources; and

(c) prediction of triglyceride concentration in human blood.

of tens of thousands of single cell profiles than any existing methods.

based on DNA sequence only.

342 Artificial Intelligence - Emerging Trends and Applications

blood.

**3. Case studies**

Classification of molecular subtypes of breast cancer using high-throughput genomics data, such as gene expressions and copy number alterations (CNAs), is a challenging task because we have much smaller number of training samples than the number of genomic features. Many traditional machine learning methods often get overfitted over training data to handle such a problem. Here, we explored to use DNN methods to overcome this problem.

We conducted our experiments with two data sets: gene expression and CNA. Both data sets had approximate 2000 patients and larger than 16,000 genes. We collected these data sets from a project called Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) [57]. We built a patient by gene mutation matrix from the CNA data where we had only three different values to represent three different copy number statuses of a gene in a patient: 1 = copy number gain, −1 = copy number loss and 0 = no change.

Using the CNA profiles of 991 and 984 patients as training and test sets, respectively, which was separated in original study [56] based on the patients collected at different periods, we classified the status of estrogen-receptor as a supervised binary classification problem. Furthermore, using the CNA profiles of 935 and 842 patients as training and test sets, respectively, we classified the status of PAM50 subtypes (luminal A, luminal B, HER-2 enriched and basal-like) as a supervised multiclass classification problem (it should be noted that some patients have no status of the PAM50 subtypes). Our DCNN-based deep learning architecture used for the tasks is shown in **Figure 2(a)** [58].

Our network took a CNA profile with approximate 16,000 genes of a breast cancer patient as the input in the input data layer. The input layer passed this input data to a convolutional layer. In the convolutional layer, we passed a kernel with weights over the input data and performed an element-wise multiplication with the input data. Then, we took the sum of the elements from this operation to represent the input data. This operation helped us capture the potential correlations among the neighbor genes. Output from the convolutional layers went as an input into max pooling layer. Here, we also passed a kernel without any weights over the input of this layer. We then took the maximum value from the input data under the kernel to represent the input data. This operation helped the model to achieve three invariant properties such as scaling, translation, and rotation. We introduced nonlinearity into our model by passing the output from max pooling layer through a rectified linear unit (ReLu) layer. ReLu layer performed a thresholding operation by converting all values less than zero to zero. At this point, we passed the output from the ReLu layer to a fully connected layer to get the higher level abstraction. We then passed the output from this fully connected to a dropout layer where we used another ReLu between them. Dropout layer randomly dropped out a number of neurons from the network. Dropout strategy enforced the network to learn a general weight for each of the neurons of the network. This helped the model to achieve better generalization ability over the test data as well as prevented the model from the overfitting problem. Then, we used another fully connected layer to get the prediction score for each of the classes. We converted this class score into class probability using softmax function. We trained our network in backpropagation style.

We evaluated the prediction performances of our network by two metrics such as overall accuracy and area under the curve (AUC). We achieved 84.1% accuracy and 0.904 AUC for binary classification. On the other hand, for multiclass classification, we achieved 58.19% accuracy and 0.79 AUC. We compared our experimental outcomes with other two supervised machine learning methods such as support vector machine (SVM) and random forest (RF). Prediction performances of SVM for the binary classification were 76.5% accuracy and 0.702 AUC while 45.0% accuracy and 0.78 AUC for multiclass classification. With RF, we had 82.7% accuracy and 0.817 AUC for binary classification and 49.5% accuracy and 0.729 AUC for multiclass classification. Our experimental outcomes showed that deep learning-based models have better performance than the traditional machine learning methods such as SVM and RF.

the purpose of testing the model's prediction performances; (2) we used posttreatment data to predict triglyceride concentration after the medication. In this case, we had 400 randomly selected training samples and 99 test samples; (3) we used pretreatment data to predict triglyceride concentration after the medication. Here, we had 620 randomly selected training

Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

http://dx.doi.org/10.5772/intechopen.75311

345

Our DNN-based regression model (**Figure 2(c)**) is a fully connected neural network with four hidden layers between input and output layers. This network took a vector of size larger than 450,000 which represented an epigenome-wide DNA methylation profile of a sample. This vector went through two full connected layers, and the output is the higher level abstraction of the input. Then, we used a ReLU layer to introduce nonlinearity into the model, and the output of ReLU went into a dropout layer to achieve the generalization ability to test the data. The output of dropout layer then went into the final output layer and provided the predicted

We used root mean square error (RMSE) and Pearson correlation coefficient (PCC) to measure our prediction performances. Prediction performances of our DNN-based regression models are as follows: for Task 1, we had RMSE 88.5 and PCC 0.27; for Task 2, we had RMSE 48.1 and PCC 0.22 and for Task 3, we had RMSE 47.4 and PCC 0.29. We used SVM as our baseline model. SVM provided prediction performances are as follows: for Task 1, the RMSE was 90.3 and PCC was 0.13; for Task 2, the RMSE was 48.7 and PCC was 0.19 and for Task 3, the RMSE

Experimental results showed that our DNN-based triglyceride concentration regression models provided improved prediction performances for all three tasks than the baseline SVMbased regression method. Both our DNN-based and SVM-based regression models achieve best prediction performances when we used the pretreatment data to predict triglyceride concentration after the medication. This outcome shows that there is a long-term epigenetic effect

Classification of molecular subtypes of a disease using omics profiles is a challenging problem since the data sets are quite high-dimensional and highly correlated. The curse of highdimensionality also affects the performance of predicting a phenotype using DNAm data. Traditional machine learning algorithms, such as SVM and RF, have potential challenges to handle high-dimensional and highly correlated data sets. Recently, DNN learning has been demonstrated advantages over these methods since it does not require any hand-crafted features. DNN learning automatically extracts features from the raw data and efficiently analyzes high-dimensional and correlated data. In this chapter, we reviewed the status of applying DNN in bioinformatics and showed some case studies which introduced several DNN frameworks for classifying molecular subtypes of breast cancer using only one data source or two heterogeneous data sources. In addition, we also presented a DNN-based regression framework which took epigenome-wide DNAm data as input to predict triglyceride concentrations

samples and 94 test samples.

triglyceride concentration.

was 46.9 and PCC was 0.13.

on the phenotypic traits.

**4. Conclusion**

We also explored to build DCNN-based network (**Figure 2(b)**) to classify the status of PAM50 subtypes using both gene expression and CNA data. One of these data sets may have biological knowledge which is absent in another one. Our network shown in **Figure 2(b)** had two branches: i and j branches which took CNA data and the gene expression data of the same patient as inputs, respectively. Each of these branches performed the same series of operations over the input data. Outputs of the last fully connected layers from both branches were concatenated to be fed into the third part of the network which provided the final predictions. Here, we also used overall accuracy and AUC as the performance measurement metric. In terms of prediction performances, our model achieved 79.2% accuracy and 0.85 AUC. We compared our prediction performances with two baseline models: SVM and RF. Their prediction performances in terms of overall accuracy were 69.5 and 70.1%, respectively. Besides, our baseline models provided AUC 0.804 and 0.781, respectively. From these comparisons, we can clearly see that our deep learning-based data integration model (**Figure 2(b)**) achieved improved prediction performances than all of our baseline models. Our data integration model also outperformed the models (**Figure 2(a)**) which were built using only one data source as input to the architecture.

## **3.2. Prediction of triglyceride concentration**

Triglyceride is a kind of fat which is found in human blood. Excessive triglyceride concentration can cause different heart-related diseases such as stroke. In this experiment, we built a DNNbased regression model [59], which took epigenome-wide DNA methylation profiles as input to predict triglyceride concentration at multiple draw of peripheral human blood samples.

For our experiments, we collected the epigenome-wide DNAm profiles and triglyceride concentrations (mg/dL) measured at the baseline level (pretreatment) of visit 2 and changes in response to treatment with fenofibrate (posttreatment) at visit 4 from Genetic Analysis Workshop 20 (GAW20). The DNAm profiles were generated using the Illumina Infinium HumanMethylation450 BeadChip array. The beta value measuring the methylation level was expressed between 0 and 1 in 993 samples of the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study. It should be noted that there were only 499 samples with the posttreatment DNAm data. Here, we built three regression models to solve three tasks: (1) we used pretreatment data to predict triglyceride concentration before the medication. We built this model using 900 randomly selected training samples, and we had 93 test samples for the purpose of testing the model's prediction performances; (2) we used posttreatment data to predict triglyceride concentration after the medication. In this case, we had 400 randomly selected training samples and 99 test samples; (3) we used pretreatment data to predict triglyceride concentration after the medication. Here, we had 620 randomly selected training samples and 94 test samples.

Our DNN-based regression model (**Figure 2(c)**) is a fully connected neural network with four hidden layers between input and output layers. This network took a vector of size larger than 450,000 which represented an epigenome-wide DNA methylation profile of a sample. This vector went through two full connected layers, and the output is the higher level abstraction of the input. Then, we used a ReLU layer to introduce nonlinearity into the model, and the output of ReLU went into a dropout layer to achieve the generalization ability to test the data. The output of dropout layer then went into the final output layer and provided the predicted triglyceride concentration.

We used root mean square error (RMSE) and Pearson correlation coefficient (PCC) to measure our prediction performances. Prediction performances of our DNN-based regression models are as follows: for Task 1, we had RMSE 88.5 and PCC 0.27; for Task 2, we had RMSE 48.1 and PCC 0.22 and for Task 3, we had RMSE 47.4 and PCC 0.29. We used SVM as our baseline model. SVM provided prediction performances are as follows: for Task 1, the RMSE was 90.3 and PCC was 0.13; for Task 2, the RMSE was 48.7 and PCC was 0.19 and for Task 3, the RMSE was 46.9 and PCC was 0.13.

Experimental results showed that our DNN-based triglyceride concentration regression models provided improved prediction performances for all three tasks than the baseline SVMbased regression method. Both our DNN-based and SVM-based regression models achieve best prediction performances when we used the pretreatment data to predict triglyceride concentration after the medication. This outcome shows that there is a long-term epigenetic effect on the phenotypic traits.

## **4. Conclusion**

We evaluated the prediction performances of our network by two metrics such as overall accuracy and area under the curve (AUC). We achieved 84.1% accuracy and 0.904 AUC for binary classification. On the other hand, for multiclass classification, we achieved 58.19% accuracy and 0.79 AUC. We compared our experimental outcomes with other two supervised machine learning methods such as support vector machine (SVM) and random forest (RF). Prediction performances of SVM for the binary classification were 76.5% accuracy and 0.702 AUC while 45.0% accuracy and 0.78 AUC for multiclass classification. With RF, we had 82.7% accuracy and 0.817 AUC for binary classification and 49.5% accuracy and 0.729 AUC for multiclass classification. Our experimental outcomes showed that deep learning-based models have better performance than the traditional machine learning methods such as SVM and RF.

We also explored to build DCNN-based network (**Figure 2(b)**) to classify the status of PAM50 subtypes using both gene expression and CNA data. One of these data sets may have biological knowledge which is absent in another one. Our network shown in **Figure 2(b)** had two branches: i and j branches which took CNA data and the gene expression data of the same patient as inputs, respectively. Each of these branches performed the same series of operations over the input data. Outputs of the last fully connected layers from both branches were concatenated to be fed into the third part of the network which provided the final predictions. Here, we also used overall accuracy and AUC as the performance measurement metric. In terms of prediction performances, our model achieved 79.2% accuracy and 0.85 AUC. We compared our prediction performances with two baseline models: SVM and RF. Their prediction performances in terms of overall accuracy were 69.5 and 70.1%, respectively. Besides, our baseline models provided AUC 0.804 and 0.781, respectively. From these comparisons, we can clearly see that our deep learning-based data integration model (**Figure 2(b)**) achieved improved prediction performances than all of our baseline models. Our data integration model also outperformed the models (**Figure 2(a)**) which were built using only one data source as input to the

Triglyceride is a kind of fat which is found in human blood. Excessive triglyceride concentration can cause different heart-related diseases such as stroke. In this experiment, we built a DNNbased regression model [59], which took epigenome-wide DNA methylation profiles as input to predict triglyceride concentration at multiple draw of peripheral human blood samples.

For our experiments, we collected the epigenome-wide DNAm profiles and triglyceride concentrations (mg/dL) measured at the baseline level (pretreatment) of visit 2 and changes in response to treatment with fenofibrate (posttreatment) at visit 4 from Genetic Analysis Workshop 20 (GAW20). The DNAm profiles were generated using the Illumina Infinium HumanMethylation450 BeadChip array. The beta value measuring the methylation level was expressed between 0 and 1 in 993 samples of the Genetics of Lipid Lowering Drugs and Diet Network (GOLDN) study. It should be noted that there were only 499 samples with the posttreatment DNAm data. Here, we built three regression models to solve three tasks: (1) we used pretreatment data to predict triglyceride concentration before the medication. We built this model using 900 randomly selected training samples, and we had 93 test samples for

architecture.

**3.2. Prediction of triglyceride concentration**

344 Artificial Intelligence - Emerging Trends and Applications

Classification of molecular subtypes of a disease using omics profiles is a challenging problem since the data sets are quite high-dimensional and highly correlated. The curse of highdimensionality also affects the performance of predicting a phenotype using DNAm data. Traditional machine learning algorithms, such as SVM and RF, have potential challenges to handle high-dimensional and highly correlated data sets. Recently, DNN learning has been demonstrated advantages over these methods since it does not require any hand-crafted features. DNN learning automatically extracts features from the raw data and efficiently analyzes high-dimensional and correlated data. In this chapter, we reviewed the status of applying DNN in bioinformatics and showed some case studies which introduced several DNN frameworks for classifying molecular subtypes of breast cancer using only one data source or two heterogeneous data sources. In addition, we also presented a DNN-based regression framework which took epigenome-wide DNAm data as input to predict triglyceride concentrations in human blood. In summary, our and others' works have demonstrated that DNN is a promising tool to predict phenotypic traits and diseases from genome-wide omics data.

[3] Lippert C, Sabatini R, Maher MC, Kang EY, Lee S, Arikan O, Harley A, Bernal A, Garst P, Lavrenko V, Yocum K. Identification of individuals by trait prediction using wholegenome sequencing data. Proceedings of the National Academy of Sciences. 2017 Sep

Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

http://dx.doi.org/10.5772/intechopen.75311

347

[4] Chen YC, Douville C, Wang C, Niknafs N, Yeo G, Beleva-Guthrie V, Carter H, Stenson PD, Cooper DN, Li B, Mooney S. A probabilistic model to predict clinical phenotypic traits from genome sequencing. PLoS Computational Biology. 2014;**10**(9):e1003825. DOI:

[5] Liu F, Wen B, Kayser M. Colorful DNA polymorphisms in humans. Seminars in Cell & Developmental Biology. 2013;**24**(6):562-575. DOI: 10.1016/j.semcdb.2013.03.013

[6] Hart KL, Kimura SL, Mushailov V, Budimlija ZM, Prinz M, Wurmbach E. Improved eyeand skin-color prediction based on 8 SNPs. Croatian Medical Journal. 2013;**54**(3):248-256.

[7] Claes P, Liberton DK, Daniels K, Rosana KM, Quillen EE, Pearson LN, McEvoy B, Bauchet M, Zaidi AA, Yao W, Tang H. Modeling 3D facial shape from DNA. PLoS

[8] Breast Cancer Information and Awareness. Available from: http://www.breastcancer.

[9] Perou CM, Sørlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, Pollack JR, Ross DT, Johnsen H, Akslen LA, Fluge Ø. Molecular portraits of human breast tumours. Nature.

[10] Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, Quackenbush JF. Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology. 2009;**27**(8):1160-1167. DOI: 10.1200/

[11] Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R, Geisler S, Demeter J. Repeated observation of breast tumor subtypes in independent gene expression data sets. Proceedings of the National Academy of Sciences of the

United States of America. 2003;**100**(14):8418-8423. DOI: 10.1073/pnas.0932692100

[12] Milioli HH, Vimieiro R, Tishchenko I, Riveros C, Berretta R, Moscato P. Iteratively refining breast cancer intrinsic subtypes in the METABRIC dataset. BioData Mining.

[13] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems. 2012.

[14] Khosla A, Zhou T, Malisiewicz T, Efros AA, Torralba A. Undoing the damage of dataset bias. In: European Conference on Computer Vision. 2012. pp. 158-171. DOI: 10.1109/

Genetics. 2014;**10**(3):e1004224. DOI:/10.1371/journal.pgen.1004224

19;**114**(38):10166-10171

10.1371/journal.pcbi.1003825

DOI: 10.3325/cmj.2013.54.248

org. [Accessed: 2017-10-20]

JCO.2008.18.1370

pp. 1097-1105

CVPR.2011.5995347

2000;**406**(6797):747-752. DOI: 10.1038/35021093

2016;**9**(1):2. DOI: 10.1186/s13040-015-0078-9

## **Acknowledgements**

The authors thank the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and the organizer of genetic analysis workshop 20 (GAW20) for providing the data sets used in this study. This work was supported in part by Canadian Breast Cancer Foundation – Prairies/NWT Region, Natural Sciences and Engineering Research Council of Canada, Manitoba Research Health Council and University of Manitoba.

## **Conflict of interest**

The authors declare that they have no competing interests.

## **Author details**

Md. Mohaiminul Islam1,2,3, Yang Wang<sup>2</sup> and Pingzhao Hu1,2,3,4\*

\*Address all correspondence to: pingzhao.hu@umanitoba.ca

1 Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, Manitoba, Canada

2 Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada

3 George and Fay Yee Centre for Healthcare Innovation, University of Manitoba, Winnipeg, Manitoba, Canada

4 Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg, Manitoba, Canada

## **References**


[3] Lippert C, Sabatini R, Maher MC, Kang EY, Lee S, Arikan O, Harley A, Bernal A, Garst P, Lavrenko V, Yocum K. Identification of individuals by trait prediction using wholegenome sequencing data. Proceedings of the National Academy of Sciences. 2017 Sep 19;**114**(38):10166-10171

in human blood. In summary, our and others' works have demonstrated that DNN is a prom-

The authors thank the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) and the organizer of genetic analysis workshop 20 (GAW20) for providing the data sets used in this study. This work was supported in part by Canadian Breast Cancer Foundation – Prairies/NWT Region, Natural Sciences and Engineering Research Council of

and Pingzhao Hu1,2,3,4\*

1 Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg,

2 Department of Computer Science, University of Manitoba, Winnipeg, Manitoba, Canada 3 George and Fay Yee Centre for Healthcare Innovation, University of Manitoba, Winnipeg,

4 Department of Electrical and Computer Engineering, University of Manitoba, Winnipeg,

[1] Knasmüller S, Nersesyan A, Mišík M, Gerner C, Mikulits W, Ehrlich V, Hoelzl C, Szakmary A, Wagner KH. Use of conventional and-omics based methods for health claims of dietary antioxidants: A critical overview. British Journal of Nutrition.

[2] Schneider MV, Orchard S. Omics technologies, data and bioinformatics principles. In:

Bioinformatics for Omics Data: Methods and Protocols. 2011. pp. 3-30

ising tool to predict phenotypic traits and diseases from genome-wide omics data.

Canada, Manitoba Research Health Council and University of Manitoba.

The authors declare that they have no competing interests.

\*Address all correspondence to: pingzhao.hu@umanitoba.ca

2008;**99**(E-S1):ES3-E52. DOI: 10.1017/S000711450896575

**Acknowledgements**

346 Artificial Intelligence - Emerging Trends and Applications

**Conflict of interest**

**Author details**

Manitoba, Canada

Manitoba, Canada

Manitoba, Canada

**References**

Md. Mohaiminul Islam1,2,3, Yang Wang<sup>2</sup>


[15] Saenko K, Kulis B, Fritz M, Darrell T. Adapting visual category models to new domains. Computer Vision–ECCV. 2010;**2010**:213-226

[29] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research. 2011;

Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

http://dx.doi.org/10.5772/intechopen.75311

349

[30] Maienschein-Cline M, Zhou J, White KP, Sciammas R, Dinner AR. Discovering transcription factor regulatory targets using gene expression and binding data. Bioinformatics.

[31] Danaee P, Ghaeini R, Hendrix DA. A deep learning approach for cancer detection and relevant gene identification. Pacific Symposium on Biocomputing. Pacific Symposium

[32] Yuan Y, Shi Y, Li C, Kim J, Cai W, Han Z, Feng DD. DeepGene: An advanced cancer type classifier based on deep learning and somatic point mutations. BMC Bioinformatics.

[33] Liang M, Li Z, Chen T, Zeng J. Integrative data analysis of multi-platform cancer data with a multimodal deep learning approach. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB). 2015;**12**(4):928-937. DOI: 10.1109/TCBB.2014.2377729

[34] American Cancer Society. Cancer facts & figures 2016. Atlanta: American Cancer Society; 2016. Available from: http://www.cancer.org/acs/groups/content/@research/documents/

[35] Stern RS. Prevalence of a history of skin cancer in 2007: Results of an incidence-based model. Archives of Dermatology. 2010;**146**(3):279-282. DOI: 10.1001/archdermatol.2010.4

[36] Kelley DR, Snoek J, Rinn JL. Basset: Learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Research. 2016;**26**(7):990-

[37] Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;**542**(7639):115-118.

[38] Xu J, Xiang L, Liu Q, Gilmore H, Wu J, Tang J, Madabhushi A. Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images. IEEE Transactions on Medical Imaging. 2016;**35**(1):119-130. DOI: 10.1109/TMI.2015.2458702 [39] Chong YT, Koh JL, Friesen H, Duffy SK, Cox MJ, Moses A, Moffat J, Boone C, Andrews BJ. Yeast proteome dynamics from single cell imaging and automated analysis. Cell.

[40] Kraus OZ, Grys BT, Ba J, Chong Y, Frey BJ, Boone C, Andrews BJ. Automated analysis of high-content microscopy data with deep learning. Molecular Systems Biology.

[41] Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Computational Biology. 2017;**13**(1):e1005324 [42] Seemayer S, Gruber M, Söding J. CCMpred—Fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics. 2014;**30**(21):3128-3130.

on Biocomputing. 2016;**22**:219. DOI: 10.1142/9789813207813\_0022

**12**:2493-2537

2011;**28**(2):206-213. DOI: 10.1038/srep20649

2016;**17**(17):476. DOI: 10.1186/s12859-016-1334-9

document/acspc047079.pdf. [Accessed: 2017-10-10]

2015;**161**(6):1413-1424. DOI: 10.1016/j.cell.2015.04.051

2017;**13**(4):924. DOI: 10.15252/msb.20177551

DOI: 10.1371/journal.pcbi.1005324

999. DOI: 10.1101/gr.200535.115

DOI: 10.1038/nature21056


[29] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research. 2011; **12**:2493-2537

[15] Saenko K, Kulis B, Fritz M, Darrell T. Adapting visual category models to new domains.

[16] Aytar Y, Zisserman A. Tabula rasa: Model transfer for object category detection. In: IEEE International Conference on Computer Vision (ICCV); 2011 Nov 6; IEEE. pp. 2252-2259

[17] Donahue J, Jia Y, Vinyals O, Hoffman J, Zhang N, Tzeng E, Darrell T. Decaf: A deep convolutional activation feature for generic visual recognition. In: International Conference

[18] Sharif Razavian A, Azizpour H, Sullivan J, Carlsson S. CNN features off-the-shelf: An astounding baseline for recognition. In: Proceedings of the IEEE Conference on

[19] Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint

[20] Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on

[21] Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint

[22] Wan L, Zeiler M, Zhang S, Cun YL, Fergus R. Regularization of neural networks using dropconnect. In: Proceedings of the 30th International Conference on Machine Learning

[23] Hinton GE. Training products of experts by minimizing contrastive divergence. Training. Neural Computation. 2006;**14**(8):1771-1800. DOI: 10.1162/089976602760128018

[24] Larochelle H, Erhan D, Courville A, Bergstra J, Bengio Y. An empirical evaluation of deep architectures on problems with many factors of variation. In: Proceedings of the

[25] Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion.

[26] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC. Imagenet large scale visual recognition challenge. International Journal of Computer Vision. 2015;**115**(3):211-252. DOI: 10.1007/s11263-015-0816-y [27] Abdel-Hamid O, Mohamed AR, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and

Language Processing. 2014;**22**(10):1533-1545. DOI: 10.1109/TASLP.2014.2339736

[28] Ren S, He K, Girshick R, Sun J. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelli-

24th International Conference on Machine Learning 2007. pp. 473-480

Journal of Machine Learning Research. 2010;**11**:3371-3408

gence. 2017;**39**(6):1137-1149. DOI: 10.1109/TPAMI.2016.2577031

Computer Vision and Pattern Recognition Workshops. 2014. pp. 806-813

Computer Vision and Pattern Recognition; 2014; p. 580-587

Computer Vision–ECCV. 2010;**2010**:213-226

348 Artificial Intelligence - Emerging Trends and Applications

on Machine Learning; 2014; p. 647-655

arXiv:1312.6229. 2013

arXiv:1207.0580; 2012

(ICML-13). 2013. pp. 1058-1066


[43] Jones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: Combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics. 2014;**31**(7):999-1006. DOI: 10.1093/bioinformatics/btu79

[57] Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, Gräf S. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;**486**(7403):346-352. DOI: 10.1038/

Deep Learning Models for Predicting Phenotypic Traits and Diseases from Omics Data

http://dx.doi.org/10.5772/intechopen.75311

351

[58] Islam MM, Ajwad R, Chi C, Domaratzki M, Wang Y, Hu P. Somatic copy number alteration-based prediction of molecular subtypes of breast cancer using deep learning model.

[59] Islam MM, Tian Y, Cheng Y, Wang Y, Hu P. A deep neural network regression model for triglyceride concentrations prediction using epigenome-wide methylation profiles.

30th Canadian Conference on Artificial Intelligence. 2017 May 16:57-63

nature10983

BMC Proceedings. 2018 (In press)


[57] Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, Gräf S. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012;**486**(7403):346-352. DOI: 10.1038/ nature10983

[43] Jones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: Combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins.

[44] Sun T, Zhou B, Lai L, Pei J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics. 2017;**18**(1):277. DOI: 10.1186/

[45] Eser U, Churchman LSFIDDLE. An integrative deep learning framework for functional genomic data inference. Cold Spring Harbor Laboratory. Biorxiv. 2016. DOI:

[46] Malabat C, Feuerbach F, Ma L, Saveanu C, Jacquier A. Quality control of transcription start site selection by nonsense-mediated-mRNA decay. eLife. 2015;**4**:e06722. DOI:

[47] Chen Y, Li Y, Narayan R, Subramanian A, Xie X. Gene expression inference with deep learning. Bioinformatics. 2016;**32**(12):1832-1839. DOI: 10.1093/bioinformatics/btw074 [48] Godinez WJ, Hossain I, Lazic SE, Davies JW, Zhang X.A multi-scale convolutional neural network for phenotyping high-content cellular images. Bioinformatics. 2017;**33**(13):2010-

[49] Qin Q, Feng J. Imputation for transcription factor binding predictions based on deep learning. PLoS Computational Biology. 2017;**13**(2):e1005403. DOI: 10.1371/journal.

[50] Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nature biotechnology.

[51] Zhou J, Troyanskaya OG. Predicting effects of noncoding variants with deep learningbased sequence model. Nature Methods. 2015;**12**(10):931-934. DOI: 10.1038/nmeth.3547

[52] Oakden-Rayner L, Carneiro G, Bessen T, Nascimento JC, Bradley AP, Palmer LJ. Precision radiology: Predicting longevity using feature engineering and deep learning methods in a radiomics framework. Scientific Reports. 2017;**7**(1):1648. DOI: 10.1038/

[53] Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In:

[54] Calo E, Wysocka J. Modification of enhancer chromatin: What, how, and why? Molecular

[55] Yang B, Liu F, Ren C, Ouyang Z, Xie Z, Bo X, Shu W. BiRen: Predicting enhancers with a deep-learning-based model using the DNA sequence alone. Bioinformatics. 2017 Feb

[56] Lin C, Jain S, Kim H, Bar-Joseph Z. Using neural networks for reducing the dimensions of single-cell RNA-Seq data. Nucleic Acids Research. 2017;**45**(17):e156. DOI: 10.1093/nar/

European Conference on Computer Vision. 2014. pp. 818-833

Cell. 2013;**49**(5):825-837. DOI: 10.1016/j.molcel.2013.01.038

17;**33**(13):1930-1936. DOI: 10.1093/bioinformatics/btx105

Bioinformatics. 2014;**31**(7):999-1006. DOI: 10.1093/bioinformatics/btu79

s12859-017-1700-2

350 Artificial Intelligence - Emerging Trends and Applications

10.1101/081380

pcbi.1005403

s41598-017-01931-w

gkx681

10.7554/eLife.06722

2019. DOI: 10.1093/bioinformatics/btx069

2015;**33**(8):831-838. DOI: 10.1038/nbt.3300


**Chapter 18**

**Provisional chapter**

**Can Reinforcement Learning Be Applied to Surgery?**

**Background**: Remarkable progress has recently been made in the field of artificial intelligence

**Objective**: We sought to investigate whether reinforcement learning could be used in sur-

**Methods**: We created simple 2D tasks (Tasks 1–3) that mimicked surgery. We used a neural network library, Keras, for reinforcement learning. In Task 1, a Mac OS X with an 8 GB memory (MacBook Pro, Apple, USA) was used. In Tasks 2 and 3, a Ubuntu 14. 04LTS with a

**Results**: In the task with a relatively small task area (Task 1), the simulated knife finally passed through all the target areas, and thus, the expected task was learned by AI. In contrast, in the task with a large task area (Task 2), a drastically increased amount of time was required, suggesting that learning was not achieved. Some improvement was observed when the CPU memory was expanded and inhibitory task areas were added (Task 3). **Conclusions**: We propose the combination of reinforcement learning and surgery. Application of reinforcement learning to surgery may become possible by setting rules, such as

26 GB memory (Google Compute Engine, Google, USA) was used.

appropriate rewards and playable (operable) areas, in simulated tasks.

intelligence, deep Q network (DQN), reinforcement learning

**Keywords:** hysterectomy, robotic surgical procedures, deep learning, artificial

Remarkable progress has recently been made in the field of artificial intelligence (AI), and new advancements are made almost every day [1–3]. Although it is not possible to discuss

**Can Reinforcement Learning Be Applied to Surgery?**

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.76146

Masakazu Sato, Kaori Koga, Tomoyuki Fujii and

Masakazu Sato, Kaori Koga, Tomoyuki Fujii and

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.76146

Yutaka Osuga

**Abstract**

gery in the future.

(AI).

**1. Introduction**

Yutaka Osuga

#### **Can Reinforcement Learning Be Applied to Surgery? Can Reinforcement Learning Be Applied to Surgery?**

DOI: 10.5772/intechopen.76146

Masakazu Sato, Kaori Koga, Tomoyuki Fujii and Yutaka Osuga Masakazu Sato, Kaori Koga, Tomoyuki Fujii and Yutaka Osuga

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.76146

#### **Abstract**

**Background**: Remarkable progress has recently been made in the field of artificial intelligence (AI).

**Objective**: We sought to investigate whether reinforcement learning could be used in surgery in the future.

**Methods**: We created simple 2D tasks (Tasks 1–3) that mimicked surgery. We used a neural network library, Keras, for reinforcement learning. In Task 1, a Mac OS X with an 8 GB memory (MacBook Pro, Apple, USA) was used. In Tasks 2 and 3, a Ubuntu 14. 04LTS with a 26 GB memory (Google Compute Engine, Google, USA) was used.

**Results**: In the task with a relatively small task area (Task 1), the simulated knife finally passed through all the target areas, and thus, the expected task was learned by AI. In contrast, in the task with a large task area (Task 2), a drastically increased amount of time was required, suggesting that learning was not achieved. Some improvement was observed when the CPU memory was expanded and inhibitory task areas were added (Task 3).

**Conclusions**: We propose the combination of reinforcement learning and surgery. Application of reinforcement learning to surgery may become possible by setting rules, such as appropriate rewards and playable (operable) areas, in simulated tasks.

**Keywords:** hysterectomy, robotic surgical procedures, deep learning, artificial intelligence, deep Q network (DQN), reinforcement learning

## **1. Introduction**

Remarkable progress has recently been made in the field of artificial intelligence (AI), and new advancements are made almost every day [1–3]. Although it is not possible to discuss

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

all developments in AI, such as image recognition, automatic driving or trials of the game of Go, a particularly promising AI task is game playing [4–6]. AI can be trained to play games, such as the game Breakout, and can perform as well as an experienced human subject. In the context of game play, reinforcement learning is a common strategy [6, 7]. For instance, the objective of the game Breakout is to break blocks to score points. While the performance of the player at the instance of breaking a block should not necessarily be evaluated, the performance immediately prior to successfully breaking a block, such as the player's ability to direct the ball, should be evaluated. With this aim, the AI calculates the expected values of the scores (reward) that could be obtained by various actions and learns how to achieve the maximum reward via reinforcement learning [6]. Thus, the AI succeeds in breaking more blocks and obtaining higher scores.

**2. Materials and methods**

To create the 2D task, a publicly available game code for the game "Snake" was referenced and tuned (https://github.com/farizrahman4u/qlearning4k). The code for the operation task is provided in the **S7 Text**. The snake game was used for this investigation because it could be easily modified to the task we needed. Therefore, in particular, the original code should not

Can Reinforcement Learning Be Applied to Surgery? http://dx.doi.org/10.5772/intechopen.76146 355

We created 9 × 9 squares of a simple 2D task representing a surgery (a hysterectomy) as shown in **Figure 1A**. The goal of the task is to remove the object (uterus, yellow); indirect goals, such as cutting ligaments and arteries, were also established. In the context of surgery, there are preferred cutting points that consider the densities of the arteries and the distance from the object, and these points were set as target points (green). Thus, the objective was for the tip of the knife (orange) to pass (cut) all the target points. In other words, if surgeons see the mass (tumors or uterus what so ever) to remove like yellow areas in **Figure 1A**, then they would cut the ligament (green areas) by using knife as drawing a circle, without approaching the area where the bleeding is expected to occur (red). Rewards, such as +1 for passing green areas, −1 for passing red areas, and 0 for passing blue areas (peritoneum), were defined. In addition, −1

**Figure 1.** Creation of task. (A) Rules of the task. The name and reward of each area were as follows: orange, tip of knife; dark blue, path where tip of knife had passed; red, ligament or artery, −1; green, target area, +1; yellow, uterus, −1 (task over); blue, peritoneum, 0; brown, out of area, −1 (task over). Starting point was located in the upper middle of area. (B) Results of learning. The results after 500 epochs of learning (upper) and 1500 epochs of learning (lower) are shown. The

knife passed all the green points in the shortest amount of time after learning.

necessarily be the snake game if one could create the task which mimics surgery.

**2.1. 2D tasks**

**2.2. Task 1**

The concept of reinforcement learning has a commonality with surgical procedures. A hysterectomy, which is one of the most basic gynecologic procedures [8], can be considered as an example. The aim of a hysterectomy is to enucleate the uterus; however, there are no specific guidelines regarding precisely how the knife should be used at the moment that the uterus is being removed. Rather, resecting the uterine arteries in advance, for instance, may reduce total blood loss, and this possibility should be evaluated. In addition, the resection of ligaments should also be evaluated, as this step precedes extraction of the uterus.

Altogether, we simulated and investigated whether reinforcement learning could be applied to surgery and whether AI could be possibly used to perform surgeries in the future.

In this study, we created simple 2D tasks that mimicked surgery (such as a hysterectomy simulation) and investigated whether the surgery could be performed as expected via reinforcement learning. During the task, the player (an imaginary knife) moved around the task areas. The player scored when passing target areas, such as imaginary ligaments and arteries, and lost points for other actions. The task was over when the player reached the uterus or moved outside the task area. We established rules and observed the process during which movement of the imaginary knife was learned and improved. In the task with a relatively small task area (Task 1), the knife finally passed all target areas, and the expected learning was achieved. In contrast, in the task with a large task area (Task 2), significantly more time was required to complete the task, suggesting that learning was not achieved. We addressed this problem by expanding the CPU memory and adding inhibitory task areas (Task 3), and some improvements were observed.

In this study, we applied the concept of reinforcement learning to surgical procedures and identified some commonalities between reinforcement learning and the way surgeons approach an operation. We found that various aspects of the techniques of efficient learning should be developed for applying reinforcement learning to surgery. These aspects include choosing a model that closely reproduces the surgical scene using a high-performance computer for deep learning and tuning of neural networks. Surgeons will need to understand the rules, such as when a reward can be earned and where the agent is allowed to move. Efficient learning would be possible by integrating these rules into the AI by engineers.

## **2. Materials and methods**

#### **2.1. 2D tasks**

all developments in AI, such as image recognition, automatic driving or trials of the game of Go, a particularly promising AI task is game playing [4–6]. AI can be trained to play games, such as the game Breakout, and can perform as well as an experienced human subject. In the context of game play, reinforcement learning is a common strategy [6, 7]. For instance, the objective of the game Breakout is to break blocks to score points. While the performance of the player at the instance of breaking a block should not necessarily be evaluated, the performance immediately prior to successfully breaking a block, such as the player's ability to direct the ball, should be evaluated. With this aim, the AI calculates the expected values of the scores (reward) that could be obtained by various actions and learns how to achieve the maximum reward via reinforcement learning [6]. Thus, the AI succeeds in breaking more blocks and

The concept of reinforcement learning has a commonality with surgical procedures. A hysterectomy, which is one of the most basic gynecologic procedures [8], can be considered as an example. The aim of a hysterectomy is to enucleate the uterus; however, there are no specific guidelines regarding precisely how the knife should be used at the moment that the uterus is being removed. Rather, resecting the uterine arteries in advance, for instance, may reduce total blood loss, and this possibility should be evaluated. In addition, the resection of ligaments should also be evaluated, as this step precedes extraction

Altogether, we simulated and investigated whether reinforcement learning could be applied

In this study, we created simple 2D tasks that mimicked surgery (such as a hysterectomy simulation) and investigated whether the surgery could be performed as expected via reinforcement learning. During the task, the player (an imaginary knife) moved around the task areas. The player scored when passing target areas, such as imaginary ligaments and arteries, and lost points for other actions. The task was over when the player reached the uterus or moved outside the task area. We established rules and observed the process during which movement of the imaginary knife was learned and improved. In the task with a relatively small task area (Task 1), the knife finally passed all target areas, and the expected learning was achieved. In contrast, in the task with a large task area (Task 2), significantly more time was required to complete the task, suggesting that learning was not achieved. We addressed this problem by expanding the CPU memory and adding inhibitory task areas (Task 3), and some

In this study, we applied the concept of reinforcement learning to surgical procedures and identified some commonalities between reinforcement learning and the way surgeons approach an operation. We found that various aspects of the techniques of efficient learning should be developed for applying reinforcement learning to surgery. These aspects include choosing a model that closely reproduces the surgical scene using a high-performance computer for deep learning and tuning of neural networks. Surgeons will need to understand the rules, such as when a reward can be earned and where the agent is allowed to move. Efficient learning would be possible by integrating these rules into the

to surgery and whether AI could be possibly used to perform surgeries in the future.

obtaining higher scores.

354 Artificial Intelligence - Emerging Trends and Applications

of the uterus.

improvements were observed.

AI by engineers.

To create the 2D task, a publicly available game code for the game "Snake" was referenced and tuned (https://github.com/farizrahman4u/qlearning4k). The code for the operation task is provided in the **S7 Text**. The snake game was used for this investigation because it could be easily modified to the task we needed. Therefore, in particular, the original code should not necessarily be the snake game if one could create the task which mimics surgery.

## **2.2. Task 1**

We created 9 × 9 squares of a simple 2D task representing a surgery (a hysterectomy) as shown in **Figure 1A**. The goal of the task is to remove the object (uterus, yellow); indirect goals, such as cutting ligaments and arteries, were also established. In the context of surgery, there are preferred cutting points that consider the densities of the arteries and the distance from the object, and these points were set as target points (green). Thus, the objective was for the tip of the knife (orange) to pass (cut) all the target points. In other words, if surgeons see the mass (tumors or uterus what so ever) to remove like yellow areas in **Figure 1A**, then they would cut the ligament (green areas) by using knife as drawing a circle, without approaching the area where the bleeding is expected to occur (red). Rewards, such as +1 for passing green areas, −1 for passing red areas, and 0 for passing blue areas (peritoneum), were defined. In addition, −1

**Figure 1.** Creation of task. (A) Rules of the task. The name and reward of each area were as follows: orange, tip of knife; dark blue, path where tip of knife had passed; red, ligament or artery, −1; green, target area, +1; yellow, uterus, −1 (task over); blue, peritoneum, 0; brown, out of area, −1 (task over). Starting point was located in the upper middle of area. (B) Results of learning. The results after 500 epochs of learning (upper) and 1500 epochs of learning (lower) are shown. The knife passed all the green points in the shortest amount of time after learning.

was given and the task was defined as over if the knife passed over the yellow area (uterus) or if the knife passed outside the task area. Finally, −1 was given for passing areas were previously passed.

**3. Results**

time after learning.

**4. Discussion**

forcement learning.

**3.1. Learning process and trials for appropriate learning**

authkey/f5524cabe6c92eff86119b2f0c58ba26

We created 9 × 9 squares of a simple 2D task representing the surgical scene (**Figure 1A**). We provide representative movies after 10 epochs, 500 epochs and 1500 epochs of learning (**Figure 1B** and **Movies 1–3**). The knife passed all the green points in the shortest amount of

Can Reinforcement Learning Be Applied to Surgery? http://dx.doi.org/10.5772/intechopen.76146 357

We next extended the task to 48 × 48 squares (**Movies 4** and **5**). This imposed significantly heavier burdens on the computer than in Task 1, and therefore, we expanded the CPU memory as described in the Section 2. The knife was located in limited areas after 500 epochs of learning (**Movie 4**), and the area of movement was extended after 1500 epochs of learning (**Movie 5**). Even after extending the CPU memory, the task required a long amount of time,

In Task 2, the knife moved to areas where no reward could be obtained (upper right area), which was hypothesized to prevent appropriate learning (**Movie 5**). We considered that some areas are not significant in an actual hysterectomy and added inhibitory areas (Task 3) that would trigger the task to end (dark blue areas in **Movie 6**). By adding these inhibitory areas, the knife reached the lower areas where it had never reached after 1500 epochs of learning (**Movie 6**). However, even with these limitations, a long time for learning was required within the simulated environments. These results suggest that efficient learning can be achieved by setting appropriate rules such as adding inhibitory areas. All video materials referenced in this section are available at: https://www.intechopen.com/download/index/process/160/

In this study, we simulated and investigated whether reinforcement learning could be applied

Progress in AI has been remarkable, and AI is currently an essential part of the technology industry. Recent progress in AI can be attributed to the development of deep learning. Deep learning is a form of machine learning featured by the characteristic that the user does not have to choose features or representations while inputting data [10]. Deep learning is currently widely used in image recognition and audio recognition. While playing games with reinforcement learning has also been investigated, reinforcement learning has recently been combined with deep learning, resulting in drastic improvements in performance [4–6].

Thus, we sought to investigate whether reinforcement learning could be used in surgery in the future and developed appropriate simulations. Our team is composed of clinicians rather than engineers; however, we performed this study after studying deep learning and rein-

to surgery in the future and evaluated the types of hurdles that may exist.

and thus, learning was not necessarily achieved. Thus, we modified Task 2 to Task 3.

## **2.3. Task 2**

We extended the task to 48 × 48 squares. This imposed significantly heavier burdens on the computer compared to Task 1, and we extended the CPU memory as described below.

## **2.4. Task 3**

We added inhibitory areas (which would trigger the task to be over) to Task 2, and named it Task 3.

## **2.5. Deep Q network**

We used a neural network library, Keras, for the reinforcement learning (https://keras.io) and TensorFlow as its backend (https://www.tensorflow.org) [9]. The codes for the agent can be found at https://github.com/farizrahman4u/qlearning4k.

The code simply obey the rules' Q(S, a) = r + gamma \* Q(S′, a′)'.

Q(S, a) means the maximum score the agent will get by the end of the game, if it does action a, when the game is in state S. On performing action a, the game will jump to a new state S′, giving the agent an immediate reward r.

In short, Q(S, a) = Immediate reward from this state + Max-score from the next state onwards.

Gamma is a discounting factor to give more weight to the immediate reward.

We are gynecologists and understand the very minimum of deep q learning, and we just tuned nb\_frames (the number of frames should the agent remeber), batch\_size (batch size for training), gamma (discounting factor) and nb\_eopch (number of epochs to train). The original neural networks contained two convolutional layers and two dense layers. We slightly tuned the network by adding convolutional layers or changing the size of frames, but we did not see much improvement. Thus, the readers might find a more efficient way of learning. However, the objective of the present study is to propose the combination of reinforcement learning and surgery, and we did not investigate the tuning any further after obtaining conclusions.

## **2.6. Development environment**

The development environments used in this study were Python 2.7.12, Keras 1.1.0, TensorFlow 0.8.0, and Matplotlib1.5.3. In Task 1, a Mac OS X with an Intel Core i5 processor and 8 GB memory (MacBook Pro, Apple, USA) was used. In Tasks 2 and 3, a Ubuntu 14. 04LTS with an Intel Xenon E5 v2 processor and 26 GB memory (Google Compute Engine, Google, USA) was used.

## **3. Results**

was given and the task was defined as over if the knife passed over the yellow area (uterus) or if the knife passed outside the task area. Finally, −1 was given for passing areas were previ-

We extended the task to 48 × 48 squares. This imposed significantly heavier burdens on the computer compared to Task 1, and we extended the CPU memory as described below.

We added inhibitory areas (which would trigger the task to be over) to Task 2, and named it

We used a neural network library, Keras, for the reinforcement learning (https://keras.io) and TensorFlow as its backend (https://www.tensorflow.org) [9]. The codes for the agent can be

Q(S, a) means the maximum score the agent will get by the end of the game, if it does action a, when the game is in state S. On performing action a, the game will jump to a new state S′,

In short, Q(S, a) = Immediate reward from this state + Max-score from the next state onwards.

We are gynecologists and understand the very minimum of deep q learning, and we just tuned nb\_frames (the number of frames should the agent remeber), batch\_size (batch size for training), gamma (discounting factor) and nb\_eopch (number of epochs to train). The original neural networks contained two convolutional layers and two dense layers. We slightly tuned the network by adding convolutional layers or changing the size of frames, but we did not see much improvement. Thus, the readers might find a more efficient way of learning. However, the objective of the present study is to propose the combination of reinforcement learning and surgery, and we did not investigate the tuning any further after

The development environments used in this study were Python 2.7.12, Keras 1.1.0, TensorFlow 0.8.0, and Matplotlib1.5.3. In Task 1, a Mac OS X with an Intel Core i5 processor and 8 GB memory (MacBook Pro, Apple, USA) was used. In Tasks 2 and 3, a Ubuntu 14. 04LTS with an Intel Xenon E5 v2 processor and 26 GB memory (Google Compute Engine, Google, USA)

Gamma is a discounting factor to give more weight to the immediate reward.

found at https://github.com/farizrahman4u/qlearning4k.

giving the agent an immediate reward r.

The code simply obey the rules' Q(S, a) = r + gamma \* Q(S′, a′)'.

ously passed.

356 Artificial Intelligence - Emerging Trends and Applications

**2.3. Task 2**

**2.4. Task 3**

**2.5. Deep Q network**

obtaining conclusions.

was used.

**2.6. Development environment**

Task 3.

#### **3.1. Learning process and trials for appropriate learning**

We created 9 × 9 squares of a simple 2D task representing the surgical scene (**Figure 1A**). We provide representative movies after 10 epochs, 500 epochs and 1500 epochs of learning (**Figure 1B** and **Movies 1–3**). The knife passed all the green points in the shortest amount of time after learning.

We next extended the task to 48 × 48 squares (**Movies 4** and **5**). This imposed significantly heavier burdens on the computer than in Task 1, and therefore, we expanded the CPU memory as described in the Section 2. The knife was located in limited areas after 500 epochs of learning (**Movie 4**), and the area of movement was extended after 1500 epochs of learning (**Movie 5**). Even after extending the CPU memory, the task required a long amount of time, and thus, learning was not necessarily achieved. Thus, we modified Task 2 to Task 3.

In Task 2, the knife moved to areas where no reward could be obtained (upper right area), which was hypothesized to prevent appropriate learning (**Movie 5**). We considered that some areas are not significant in an actual hysterectomy and added inhibitory areas (Task 3) that would trigger the task to end (dark blue areas in **Movie 6**). By adding these inhibitory areas, the knife reached the lower areas where it had never reached after 1500 epochs of learning (**Movie 6**). However, even with these limitations, a long time for learning was required within the simulated environments. These results suggest that efficient learning can be achieved by setting appropriate rules such as adding inhibitory areas. All video materials referenced in this section are available at: https://www.intechopen.com/download/index/process/160/ authkey/f5524cabe6c92eff86119b2f0c58ba26

## **4. Discussion**

In this study, we simulated and investigated whether reinforcement learning could be applied to surgery in the future and evaluated the types of hurdles that may exist.

Progress in AI has been remarkable, and AI is currently an essential part of the technology industry. Recent progress in AI can be attributed to the development of deep learning. Deep learning is a form of machine learning featured by the characteristic that the user does not have to choose features or representations while inputting data [10]. Deep learning is currently widely used in image recognition and audio recognition. While playing games with reinforcement learning has also been investigated, reinforcement learning has recently been combined with deep learning, resulting in drastic improvements in performance [4–6].

Thus, we sought to investigate whether reinforcement learning could be used in surgery in the future and developed appropriate simulations. Our team is composed of clinicians rather than engineers; however, we performed this study after studying deep learning and reinforcement learning.

practice for endoscopic surgery simulation [11–14]. The speed at which high-performance computers are being developed is astonishing, and deep learning is being thoroughly investigated by engineers both in academia and business [15–17]. Combining the progress in both fields may provide designs allowing AI to realistically perform surgery. In such areas, clinicians can contribute the essential aspects of the surgery, that is, the playable task areas and the appropriate scores. Although increasing reward areas or limiting playable areas would speed learning time, such restrictions could also prevent the AI from finding alternative paths. Thus,

Can Reinforcement Learning Be Applied to Surgery? http://dx.doi.org/10.5772/intechopen.76146 359

In this study, we applied the concept of reinforcement learning to surgical procedures and identified common points between reinforcement learning and the way surgeons approach an operation. Reinforcement learning is now exclusively used and studied in the context of game play; however, it could also be applied to performing surgeries now that robotic surgery is widely available. Although there are many hurdles to overcome, AI could be applied to surgery by setting the appropriate rules, such as defining rewards and playable (operable) areas. To realize this goal, it is important for clinicians to further study deep learning and

We greatly appreciate American Journal Experts for their generous help in editing the

https://www.intechopen.com/download/index/process/160/authkey/f5524cabe6c92eff86119

clinicians and engineers should work cooperatively define rules.

reinforcement learning strategies.

**Acknowledgements**

**Conflict of interest**

**Supporting information**

Authors declare there is no conflict of interests.

**Movie 1.** Result of Game 1 after 10 epochs of learning. **Movie 2.** Result of Game 1 after 500 epochs of learning. **Movie 3.** Result of Game 1 after 1500 epochs of learning. **Movie 4.** Result of Game 2 after 500 epochs of learning. **Movie 5.** Result of Game 2 after 1500 epochs of learning. **Movie 6.** Result of Game 3 after 1500 epochs of learning.

manuscript.

b2f0c58ba26

**Figure 2.** Choice of the shortest route. Blue and red arrows in the circle represent the shortest routes. Only the route of the blue arrows was chosen after several trials.

In reinforcement learning, it is preferable to obtain a reward during the immediately following action rather than having expectations to obtain rewards in distant future actions [6]. This can be understood by the results of Task 1, in which the knife passed all the green areas in the shortest possible time, even though a rule never inhibited a more roundabout approach (**Figure 1B**). This can be expected to as a way to shorten the surgical time. Furthermore, choosing the shortest route was also of interest. The blue and red arrows in the circle in **Figure 2** represent the shortest routes; the route of the blue arrows was chosen after several trials (**Figure 1B**). The method used to decide the next action in reinforcement learning is maximizing the expected values that would be obtained after subsequent actions [6]. Therefore, actions that have greater expected values are likely to be preferred. Therefore, we expected that the red arrows were not preferred because points would be deducted (red area) or the task would be ended (uterus) with subsequent actions. This approach is also considered superior in surgery. In other words, avoiding areas associated with point deductions would result in retaining the appropriate margins during surgery. Therefore, the characteristics of reinforcement learning seemed to be compatible with typical surgical approaches.

The other results obtained were expected. It was concluded that reinforcement learning could solve a simple 2D task; 3D models that more closely replicate surgery should be further considered. In addition, increasing the playable area provides significantly more options for actions, and high-performance computers and tuning of neural networks will be needed for more complex tasks. 3D operation models are currently being developed and are used in practice for endoscopic surgery simulation [11–14]. The speed at which high-performance computers are being developed is astonishing, and deep learning is being thoroughly investigated by engineers both in academia and business [15–17]. Combining the progress in both fields may provide designs allowing AI to realistically perform surgery. In such areas, clinicians can contribute the essential aspects of the surgery, that is, the playable task areas and the appropriate scores. Although increasing reward areas or limiting playable areas would speed learning time, such restrictions could also prevent the AI from finding alternative paths. Thus, clinicians and engineers should work cooperatively define rules.

In this study, we applied the concept of reinforcement learning to surgical procedures and identified common points between reinforcement learning and the way surgeons approach an operation. Reinforcement learning is now exclusively used and studied in the context of game play; however, it could also be applied to performing surgeries now that robotic surgery is widely available. Although there are many hurdles to overcome, AI could be applied to surgery by setting the appropriate rules, such as defining rewards and playable (operable) areas. To realize this goal, it is important for clinicians to further study deep learning and reinforcement learning strategies.

## **Acknowledgements**

We greatly appreciate American Journal Experts for their generous help in editing the manuscript.

## **Conflict of interest**

In reinforcement learning, it is preferable to obtain a reward during the immediately following action rather than having expectations to obtain rewards in distant future actions [6]. This can be understood by the results of Task 1, in which the knife passed all the green areas in the shortest possible time, even though a rule never inhibited a more roundabout approach (**Figure 1B**). This can be expected to as a way to shorten the surgical time. Furthermore, choosing the shortest route was also of interest. The blue and red arrows in the circle in **Figure 2** represent the shortest routes; the route of the blue arrows was chosen after several trials (**Figure 1B**). The method used to decide the next action in reinforcement learning is maximizing the expected values that would be obtained after subsequent actions [6]. Therefore, actions that have greater expected values are likely to be preferred. Therefore, we expected that the red arrows were not preferred because points would be deducted (red area) or the task would be ended (uterus) with subsequent actions. This approach is also considered superior in surgery. In other words, avoiding areas associated with point deductions would result in retaining the appropriate margins during surgery. Therefore, the characteristics of reinforcement learning seemed to be compatible with typi-

**Figure 2.** Choice of the shortest route. Blue and red arrows in the circle represent the shortest routes. Only the route of

The other results obtained were expected. It was concluded that reinforcement learning could solve a simple 2D task; 3D models that more closely replicate surgery should be further considered. In addition, increasing the playable area provides significantly more options for actions, and high-performance computers and tuning of neural networks will be needed for more complex tasks. 3D operation models are currently being developed and are used in

cal surgical approaches.

the blue arrows was chosen after several trials.

358 Artificial Intelligence - Emerging Trends and Applications

Authors declare there is no conflict of interests.

## **Supporting information**

**Movie 1.** Result of Game 1 after 10 epochs of learning.

**Movie 2.** Result of Game 1 after 500 epochs of learning.

**Movie 3.** Result of Game 1 after 1500 epochs of learning.

**Movie 4.** Result of Game 2 after 500 epochs of learning.

**Movie 5.** Result of Game 2 after 1500 epochs of learning.

**Movie 6.** Result of Game 3 after 1500 epochs of learning.

https://www.intechopen.com/download/index/process/160/authkey/f5524cabe6c92eff86119 b2f0c58ba26

**S7 Text.** The game codes for Game 1, 2 and 3 (python script). The code of agent.py, memory. py and game.py could be obtained in the link in Section 2.

http://dx.doi.org/10.5772/intechopen.76146

**S7 Text.** The game codes for Game 1, 2 and 3 (python script). The code of agent.py, memory.

py and game.py could be obtained in the link in Section 2.

[5] Gibney E. Google AI algorithm masters ancient game of Go. Nature. 2016;**529**(7587):

Can Reinforcement Learning Be Applied to Surgery? http://dx.doi.org/10.5772/intechopen.76146 381

[6] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Humanlevel control through deep reinforcement learning. Nature. 2015;**518**(7540):529-533. DOI:

[7] Littman ML. Reinforcement learning improves behaviour from evaluative feedback. Nature. 2015;**521**(7553):445-451. DOI: 10.1038/nature14540. PubMed PMID: 26017443 [8] Wallace SK, Fazzari MJ, Chen H, Cliby WA, Chalas E. Outcomes and postoperative complications after hysterectomies performed for Benign compared with malignant indications. Obstetrics and Gynecology. 2016;**128**(3):467-475. DOI: 10.1097/AOG.0000000000001591.

[9] Rampasek L, Goldenberg A. TensorFlow: Biology's gateway to deep learning? Cell Systems. 2016;**2**(1):12-14. DOI: 10.1016/j.cels.2016.01.009. PubMed PMID: 27136685 [10] Schmidhuber J. Deep learning in neural networks: An overview. Neural Networks.

[11] Beyer-Berjot L, Berdah S, Hashimoto DA, Darzi A, Aggarwal R. A virtual reality training curriculum for laparoscopic colorectal surgery. Journal of Surgical Education. 2016;**73**(6):932-941. DOI: 10.1016/j.jsurg.2016.05.012. Epub 2016/06/28; PubMed PMID:

[12] Khan ZA, Kamal N, Hameed A, Mahmood A, Zainab R, Sadia B, et al. SmartSIM—A virtual reality simulator for laparoscopy training using a generic physics engine. The International Journal of Medical Robotics + Computer Assisted Surgery: MRCAS.

2016;**16**:437. DOI: 10.1002/rcs.1771. Epub 2016/09/28; PubMed PMID: 27671920 [13] Li XL, Du DF, Jiang H. The learning curves of robotic and three-dimensional laparoscopic surgery in cervical cancer. Journal of Cancer. 2016;**7**(15):2304-2308. DOI: 10.7150/ jca.16653. PubMed PMID: 27994668; PubMed Central PMCID: PMCPMC5166541 [14] Romero-Loera S, Cárdenas-Lailson LE, de la Concha-Bermejillo F, Crisanto-Campos BA, Valenzuela-Salazar C, Moreno-Portillo M. Skills comparison using a 2D vs. 3D laparoscopic simulator. Cirugia y Cirujanos. 2016;**84**(1):37-44. DOI: 10.1016/j.circen.2015.12.012.

[15] Kusy M, Zajdel R. Application of reinforcement learning algorithms for the adaptive computation of the smoothing parameter for probabilistic neural network. IEEE Transactions on Neural Networks and Learning Systems. 2015;**26**(9):2163-2175. DOI:

[16] Senda K, Hattori S, Hishinuma T, Kohda T.Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method. IEEE Transactions on Cybernetics. 2014;**44**(12):2696-2705. DOI: 10.1109/TCYB.2014.2313655. PubMed PMID: 24733037 [17] Xu B, Yang C, Shi Z. Reinforcement learning output feedback NN control using deterministic learning technique. IEEE Transactions on Neural Networks and Learning Systems.

2014;**25**(3):635-641. DOI: 10.1109/TNNLS.2013.2292704. PubMed PMID: 24807456

10.1109/TNNLS.2014.2376703. PubMed PMID: 25532211

2015;**61**:85-117. DOI: 10.1016/j.neunet.2014.09.003. PubMed PMID: 25462637

445-446. DOI: 10.1038/529445a. Epub 2016/01/29, PubMed PMID: 26819021

10.1038/nature14236. PubMed PMID: 25719670

PubMed PMID: 27500339

27342755

(English Edition)

## **Author details**

Masakazu Sato\*, Kaori Koga, Tomoyuki Fujii and Yutaka Osuga

\*Address all correspondence to: masakasatou-tky@umin.ac.jp

Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of Tokyo, Bunkyo-ku, Tokyo, Japan

## **References**


**Author details**

**References**

Tokyo, Bunkyo-ku, Tokyo, Japan

380 Artificial Intelligence - Emerging Trends and Applications

PMID: 27730415

Masakazu Sato\*, Kaori Koga, Tomoyuki Fujii and Yutaka Osuga

Department of Obstetrics and Gynecology, Graduate School of Medicine, The University of

[1] Ravi D, Wong C, Deligianni F, Berthelot M, Andreu Perez J, Lo B, et al. Deep learning for health informatics. IEEE Journal of Biomedical and Health Informatics. Jan 2017;**21**(1):

[2] Scholkopf B. Artificial intelligence: Learning to see and act. Nature. 2015;**518**(7540):486-487.

[3] Zhang YC, Kagen AC. Machine learning Interface for medical image analysis. Journal of Digital Imaging. Oct 2017;**30**(5):615-621. DOI: 10.1007/s10278-016-9910-0. PubMed

[4] Gibney E. DeepMind algorithm beats people at classic video games. Nature. 2015;**518**(7540):

465-466. DOI: 10.1038/518465a. Epub 2015/02/27, PubMed PMID: 25719643

4-21. DOI: 10.1109/JBHI.2016.2636665. PubMed PMID: 28055930

DOI: 10.1038/518486a. Epub 2015/02/27, PubMed PMID: 25719660

\*Address all correspondence to: masakasatou-tky@umin.ac.jp


**Chapter 19**

**Provisional chapter**

**Application of AI in Modeling of Real System in**

**Application of AI in Modeling of Real System in** 

DOI: 10.5772/intechopen.75602

In recent years, discharge of synthetic dye waste from different industries leading to aquatic and environmental pollution is a serious global problem of great concern. Hence, the removal of dye prediction plays an important role in wastewater management and conservation of nature. Artificial intelligence methods are popular owing due to its ease of use and high level of accuracy. This chapter proposes a detailed review of artificial intelligence-based removal dye prediction methods particularly multiple linear regression (MLR), artificial neural networks (ANNs), and least squares-support vector machine (LS-SVM). Furthermore, this chapter will focus on ensemble prediction models (EPMs) used for removal dye prediction. EPMs improve the prediction accuracy by integrating several prediction models. The principles, advantages, disadvantages, and applications of these artificial intelligence-based methods are explained in this chapter. Furthermore, future directions of the research on

artificial intelligence-based removal dye prediction methods are discussed.

**Keywords:** multiple linear regression (MLR), artificial neural networks (ANNs), least

Recently, pollution of water sources by various contaminants becomes a global environmental issue [1]. Among different types of water contaminants, dyes as part of human's life are a major contamination group [2]. Dyes are widely used as coloring agents in the textile, plastics, paper, leather, food, antiseptics, cosmetics, fungicides, and so forth and can be entered to the environment through colored wastewater from these industries [3]. However, during the coloration processes, a significant amount (20–50%) of these dyes is lost and released into

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

**Chemistry**

**Abstract**

**1. Introduction**

**Chemistry**

M. H. Ahmadi Azqhandi and M. Shekari

M. H. Ahmadi Azqhandi and M. Shekari

http://dx.doi.org/10.5772/intechopen.75602

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

squares-support vector regression (LS-SVM)

#### **Application of AI in Modeling of Real System in Chemistry Application of AI in Modeling of Real System in Chemistry**

DOI: 10.5772/intechopen.75602

#### M. H. Ahmadi Azqhandi and M. Shekari M. H. Ahmadi Azqhandi and M. Shekari

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75602

#### **Abstract**

In recent years, discharge of synthetic dye waste from different industries leading to aquatic and environmental pollution is a serious global problem of great concern. Hence, the removal of dye prediction plays an important role in wastewater management and conservation of nature. Artificial intelligence methods are popular owing due to its ease of use and high level of accuracy. This chapter proposes a detailed review of artificial intelligence-based removal dye prediction methods particularly multiple linear regression (MLR), artificial neural networks (ANNs), and least squares-support vector machine (LS-SVM). Furthermore, this chapter will focus on ensemble prediction models (EPMs) used for removal dye prediction. EPMs improve the prediction accuracy by integrating several prediction models. The principles, advantages, disadvantages, and applications of these artificial intelligence-based methods are explained in this chapter. Furthermore, future directions of the research on artificial intelligence-based removal dye prediction methods are discussed.

**Keywords:** multiple linear regression (MLR), artificial neural networks (ANNs), least squares-support vector regression (LS-SVM)

## **1. Introduction**

Recently, pollution of water sources by various contaminants becomes a global environmental issue [1]. Among different types of water contaminants, dyes as part of human's life are a major contamination group [2]. Dyes are widely used as coloring agents in the textile, plastics, paper, leather, food, antiseptics, cosmetics, fungicides, and so forth and can be entered to the environment through colored wastewater from these industries [3]. However, during the coloration processes, a significant amount (20–50%) of these dyes is lost and released into

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

the environment as colored wastewater. However, due to toxicity, carcinogenicity, mutagenic and teratogenic properties, and a long-standing environmental pollution, contaminants have become a great environmental concern with potential adverse effects to human health [4]. Also, it can enter the body via the pulmonary system or the digestive system by ingesting contaminated water or food [5–7]. Even though they can be influenced on the photosynthesis process through reduce light penetration and result in reduction oxygen levels in water and, in severe case, consequential the suffocation of aquatic flora and fauna.

Therefore, it is necessary to be removed from wastewater before discharging into bodies of water. Though a number of processes such as ozonation, filtration, membrane, coagulation, precipitation, adsorption, electrochemical techniques, and biosorption have been applied to treat colored textile wastewater from aqueous media [8–11]. However, most of the mentioned methods have shown various restrictions including generation of huge amounts of sludge by the means of electrochemical and chemical coagulation processes or request to high technology with high cost in the membrane technology and advanced oxidation process [12]. Among different treatment methods, adsorption has found particular attention from the researchers worldwide due to the fact that the adsorption is an easy operating, effective, single, and costeffective option for pollutant removal from aqueous environment.

Although the adsorption is an easy operating technique, it is known as a complicated process in the chemistry and dependent on several factors which have a direct impact on the process performance. Thus, it is vital to select an appropriate mathematical model for optimizing and predicting the removal process. The modeling and optimization of adsorption are still in the stage of research. Commonly used models for describing kinematic and/or equilibrium studies (e.g., second-order models or Langmuir, Freundlich) may be inadequate in determining the relationship between factors and evaluating their effect on the absorption process. Optimization is a way for determining the best solution in terms of certain quality criteria, such as process efficiency, and results in improving the performance of the process or designed system [13, 14].

The optimization of the adsorption process tries to find out the design and/or environmental parameters at which the adsorption process would give the best efficiency (**Figure 1**) [15].

Adsorption process is a complex process; therefore, due to complexity of the relationship between output and input parameters, it is difficult to be modeled using statistical approaches. Computational intelligence models are often more flexible than statistical models when modeling complex datasets with possible nonlinearities or missing data [18]. Recently, powerful AI prediction method, such as random forest (RF), adaptive neuro-fuzzy inference system (ANFIS), least square-support vector regression (LS-SVR), radial basis function neural network (RBF-NN), boosted regression tree (BRT), and artificial neural network (ANN) in modeling adsorption process have been successfully used [18–24]. AI is the branch of computer science concerned with making computers behave like humans. The term was coined in 1956 by John McCarthy at the Massachusetts Institute of Technology; it can be employed to explain and model many complex chemical systems because of its reliability, robustness, simplicity, and nonlinearity. AI approach can be learned from experimental data to solve of the complex nonlinear, multidimensional func-

Application of AI in Modeling of Real System in Chemistry

http://dx.doi.org/10.5772/intechopen.75602

385

The previous reviews of adsorption procedure and engineering applications of AI confirmed

The two main objectives of this study are (i) summarizing research on the absorption of dyes

tional relationships without any prior assumptions about their nature [25].

**Figure 1.** Framework for AI-based prediction model.

that there is no specific review on the usage of AI for adsorption process.

by AI models and (ii) providing more research needs for AIs for dye absorption.

Typically, experiments are carried out in a way that one factor is applied and then analyzed, while other factors remain unaffected. In the usual way, one-factor-at-a-time approach is generally time-consuming, and it is impossible to achieve optimal desirability due to the need for a large number of experiments and lack of interaction among factors. This approach is time-consuming; the researcher must screen all the variables independently and require a large number of tests, leading to high cost of study. In addition, a variable in time does not include interactions between selected parameters. This method is called one variable at time (OVAT) [16].

Multivariate statistics techniques (MST) can significantly reduce the number of experiments and explanations of the independent variable (in combination or individually) in the process. MST helps to develop and optimize the operating system, significantly reducing the cost of testing [17].

**Figure 1.** Framework for AI-based prediction model.

the environment as colored wastewater. However, due to toxicity, carcinogenicity, mutagenic and teratogenic properties, and a long-standing environmental pollution, contaminants have become a great environmental concern with potential adverse effects to human health [4]. Also, it can enter the body via the pulmonary system or the digestive system by ingesting contaminated water or food [5–7]. Even though they can be influenced on the photosynthesis process through reduce light penetration and result in reduction oxygen levels in water and,

Therefore, it is necessary to be removed from wastewater before discharging into bodies of water. Though a number of processes such as ozonation, filtration, membrane, coagulation, precipitation, adsorption, electrochemical techniques, and biosorption have been applied to treat colored textile wastewater from aqueous media [8–11]. However, most of the mentioned methods have shown various restrictions including generation of huge amounts of sludge by the means of electrochemical and chemical coagulation processes or request to high technology with high cost in the membrane technology and advanced oxidation process [12]. Among different treatment methods, adsorption has found particular attention from the researchers worldwide due to the fact that the adsorption is an easy operating, effective, single, and cost-

Although the adsorption is an easy operating technique, it is known as a complicated process in the chemistry and dependent on several factors which have a direct impact on the process performance. Thus, it is vital to select an appropriate mathematical model for optimizing and predicting the removal process. The modeling and optimization of adsorption are still in the stage of research. Commonly used models for describing kinematic and/or equilibrium studies (e.g., second-order models or Langmuir, Freundlich) may be inadequate in determining the relationship between factors and evaluating their effect on the absorption process. Optimization is a way for determining the best solution in terms of certain quality criteria, such as process efficiency, and results in improving the performance of the process

The optimization of the adsorption process tries to find out the design and/or environmental parameters at which the adsorption process would give the best efficiency (**Figure 1**) [15].

Typically, experiments are carried out in a way that one factor is applied and then analyzed, while other factors remain unaffected. In the usual way, one-factor-at-a-time approach is generally time-consuming, and it is impossible to achieve optimal desirability due to the need for a large number of experiments and lack of interaction among factors. This approach is time-consuming; the researcher must screen all the variables independently and require a large number of tests, leading to high cost of study. In addition, a variable in time does not include interactions between selected parameters. This method is called one variable at time

Multivariate statistics techniques (MST) can significantly reduce the number of experiments and explanations of the independent variable (in combination or individually) in the process. MST helps to develop and optimize the operating system, significantly reducing the cost of

in severe case, consequential the suffocation of aquatic flora and fauna.

384 Artificial Intelligence - Emerging Trends and Applications

effective option for pollutant removal from aqueous environment.

or designed system [13, 14].

(OVAT) [16].

testing [17].

Adsorption process is a complex process; therefore, due to complexity of the relationship between output and input parameters, it is difficult to be modeled using statistical approaches. Computational intelligence models are often more flexible than statistical models when modeling complex datasets with possible nonlinearities or missing data [18]. Recently, powerful AI prediction method, such as random forest (RF), adaptive neuro-fuzzy inference system (ANFIS), least square-support vector regression (LS-SVR), radial basis function neural network (RBF-NN), boosted regression tree (BRT), and artificial neural network (ANN) in modeling adsorption process have been successfully used [18–24]. AI is the branch of computer science concerned with making computers behave like humans. The term was coined in 1956 by John McCarthy at the Massachusetts Institute of Technology; it can be employed to explain and model many complex chemical systems because of its reliability, robustness, simplicity, and nonlinearity. AI approach can be learned from experimental data to solve of the complex nonlinear, multidimensional functional relationships without any prior assumptions about their nature [25].

The previous reviews of adsorption procedure and engineering applications of AI confirmed that there is no specific review on the usage of AI for adsorption process.

The two main objectives of this study are (i) summarizing research on the absorption of dyes by AI models and (ii) providing more research needs for AIs for dye absorption.

## **2. AI definition**

AI is a subcategory of computer science. Its goal is to enable the development of computers that are able to do things normally done by people. The Stanford researcher, John McCarthy, was named, in 1956, at the current Dartmouth conference, where the mission of the AI field was defined. If we start with this definition, each program can be considered as AI if it does something that we usually think as intelligent in humans.

As the first reported work was in 2000, conducted by Annadurai. Annadurai [29] developed regression models to predict Direct Scarlet B. The inputs for the regression models include the pH, temperature, and the particle size. The proposed models showed promising features to be easy and efficient forecast tools for calculating removal percentage dye from aqueous solution. More recently, our group simplified their MLR model by introducing only four inputs, namely, pH, sonication time, adsorbent dose, and the initial dye concentration. Their results indicated that

Application of AI in Modeling of Real System in Chemistry

http://dx.doi.org/10.5772/intechopen.75602

387

Ease of use is accounted as one of the main advantages of the MLR method because no parameter needs to be adjusted. Meanwhile, since no detailed physical information is required, this method is efficient and cost-effective. Nonetheless, the MLR is the main constraint due to the inability to deal with nonlinear problems, although the previous research has proven that the

ANNs are computing systems inspired by the biological neural networks that constitute animal brains [25, 49]. An ANN is based on a collection of connected units or nodes called artificial neurons (analogous to biological neurons in an animal brain). Each connection between artificial neurons can transmit a signal from one to another. The artificial neuron (AN) that receives the signal can process it and then signal artificial neurons connected to it. ANs typically have a weight that increases or decreases the strength of the signal at a connection. Signals travel from the first input, to the last output layer, possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. Over time, attention is focused on matching-specific mental abilities, leading to deviations from biology. ANNs have been used on a variety of tasks, including speech recognition, computer vision, social network filtering, machine translation,

In the past two decades, many studies have been carried out to predict various types of decolorization dye from aqueous solution, such as electrocoagulation process [49], Fenton process [50], and adsorption [51] by applying ANNs. M. Ahmadi and Kh. Naderi applied general regression neural network (GRNN) to predict the removal of methylene blue (MB) and Basic Yellow 28 (BY28) from aqueous solution. Their findings indicated that a well-designed GRNN is able to predict the removal of azo dye based on sonication time, initial dye concentration, and adsorbent mass. Ahmadi and J. Pooralhossini used backpropagation neural network (BPNN) to predict the decolorization of sunset yellow (SY) and disulfine blue (DB) [52]. The obtained results show that the BPNN model outperforms the classical statistical model in terms of R<sup>2</sup>

RMSE, MAE, and AAD for both dyes. Ahmadi and team used BPNN to predict the efficiency of two carcinogenic dye (methylene blue (MB) and malachite green (MG)) adsorption onto Mn@ CuS/ZnS nanocomposite-loaded activated carbon (Mn@ CuS/ZnS-NC-AC) as a novel adsorbent to identify the model parameters in order to improve the prediction performance [35]. Ahmadi and Dastkhoon used neural network to predict Safranin-O (SO) and indigo carmine (IC) adsorption onto Ni:FeO(OH)-NWs-AC. In this work, the influence of process variables (initial dye concentration, adsorbent mass, and sonication time) on the removal of both dyes was investigated by central composite rotatable design (CCRD) of RSM, multilayer perceptron (MLP) neural network, and Doolittle factorization algorithm (DFA). The ANN model

,

MLR can be used as an efficient tool for predicting parentage removal [10, 25, 36–48].

**2.2. Artificial neural network (ANN)**

playing board and video games, and medical diagnosis.

the proposed method cannot well predict the removal dye percentage [13, 30–35].

The typical AI-based prediction method consists of four main steps. The first step is to acquire input and output data. Input data are those aspects that affect or relate to output data. These aspects include but are not limited to pH, sonication time, adsorbent dose, temperature, and initial concentration of contaminant. The output data is removal percentage. The next step is to preprocess the collected data in an appropriate format before using them to train the forecast model. Some data pre-processing techniques such as data normalization, data transfer, and data interpolation are applied at this stage to improve data quality and reduce negative impact. When the data is ready, the third step is to train the prediction model.

Since the crucial concept of empirical modeling is learning from historical data, a training process is essential for the development of the model. This step is achieved by selecting the appropriate parameters for the model. The type of parameters is determined by the algorithms selected by the researcher, while selecting the proper parameter can guarantee the performance of the model. The last step is testing the model. At this stage, the data test is examined to test the prediction of model performance. Performance indicators such as RMSE, R2 , MAE, and AAD are used to evaluate performance.

The AI can be more classified into four types (i.e., multiple linear regression (MLR), artificial neural network (ANN), least square-support vector machine (LS-SVM), and ensemble prediction models) based on the learning algorithms; the following part of this section describes main techniques used for AI-based prediction model.

## **2.1. Multiple linear regression**

MLR is a statistical technique that uses several explanatory variables to predict a response variable. The goal of MLR is to model the relationship between the input and response variables [26, 27].

The model for MLR, given n observations, is

$$y\_i = B\_0 + B\_1 \mathbf{x}\_{\text{in}} + B\_2 \mathbf{x}\_{\text{in}} + \dots + B\_p \mathbf{x}\_{ip} + \mathbf{E} \quad \text{where } \text{i } \epsilon \text{(1:n)}. \tag{1}$$

The response surface methodology (RSM) is the most popular methods used in the absorption research literature. RSM determine the mathematical relation between parameters and responses. Modeling or model fitting in RSM consists of two steps: coding of experimental data and regression. For the first, input and output data was coded by using the general equation due to RSM operate on coded input values like +1, 0, and − 1 instead of actual values [28].

In the next step, coded experimental data are fitted to a selected model using multiple linear regression (MLR). In spite of the simplicity of MLR, it was used recently in the removal processes. As the first reported work was in 2000, conducted by Annadurai. Annadurai [29] developed regression models to predict Direct Scarlet B. The inputs for the regression models include the pH, temperature, and the particle size. The proposed models showed promising features to be easy and efficient forecast tools for calculating removal percentage dye from aqueous solution. More recently, our group simplified their MLR model by introducing only four inputs, namely, pH, sonication time, adsorbent dose, and the initial dye concentration. Their results indicated that the proposed method cannot well predict the removal dye percentage [13, 30–35].

Ease of use is accounted as one of the main advantages of the MLR method because no parameter needs to be adjusted. Meanwhile, since no detailed physical information is required, this method is efficient and cost-effective. Nonetheless, the MLR is the main constraint due to the inability to deal with nonlinear problems, although the previous research has proven that the MLR can be used as an efficient tool for predicting parentage removal [10, 25, 36–48].

## **2.2. Artificial neural network (ANN)**

**2. AI definition**

386 Artificial Intelligence - Emerging Trends and Applications

R2

AI is a subcategory of computer science. Its goal is to enable the development of computers that are able to do things normally done by people. The Stanford researcher, John McCarthy, was named, in 1956, at the current Dartmouth conference, where the mission of the AI field was defined. If we start with this definition, each program can be considered as AI if it does

The typical AI-based prediction method consists of four main steps. The first step is to acquire input and output data. Input data are those aspects that affect or relate to output data. These aspects include but are not limited to pH, sonication time, adsorbent dose, temperature, and initial concentration of contaminant. The output data is removal percentage. The next step is to preprocess the collected data in an appropriate format before using them to train the forecast model. Some data pre-processing techniques such as data normalization, data transfer, and data interpolation are applied at this stage to improve data quality and reduce negative

Since the crucial concept of empirical modeling is learning from historical data, a training process is essential for the development of the model. This step is achieved by selecting the appropriate parameters for the model. The type of parameters is determined by the algorithms selected by the researcher, while selecting the proper parameter can guarantee the performance of the model. The last step is testing the model. At this stage, the data test is examined to test the prediction of model performance. Performance indicators such as RMSE,

The AI can be more classified into four types (i.e., multiple linear regression (MLR), artificial neural network (ANN), least square-support vector machine (LS-SVM), and ensemble prediction models) based on the learning algorithms; the following part of this section describes

MLR is a statistical technique that uses several explanatory variables to predict a response variable. The goal of MLR is to model the relationship between the input and response vari-

*yi* = *B*<sup>0</sup> + *B*<sup>1</sup> *xi*<sup>1</sup> + *B*<sup>2</sup> *xi*<sup>2</sup> + …+*Bp xip* + E where i ϵ[1:*n*]. (1)

The response surface methodology (RSM) is the most popular methods used in the absorption research literature. RSM determine the mathematical relation between parameters and responses. Modeling or model fitting in RSM consists of two steps: coding of experimental data and regression. For the first, input and output data was coded by using the general equation due to RSM operate on coded input values like +1, 0, and − 1 instead of actual values [28]. In the next step, coded experimental data are fitted to a selected model using multiple linear regression (MLR). In spite of the simplicity of MLR, it was used recently in the removal processes.

impact. When the data is ready, the third step is to train the prediction model.

something that we usually think as intelligent in humans.

, MAE, and AAD are used to evaluate performance.

main techniques used for AI-based prediction model.

The model for MLR, given n observations, is

**2.1. Multiple linear regression**

ables [26, 27].

ANNs are computing systems inspired by the biological neural networks that constitute animal brains [25, 49]. An ANN is based on a collection of connected units or nodes called artificial neurons (analogous to biological neurons in an animal brain). Each connection between artificial neurons can transmit a signal from one to another. The artificial neuron (AN) that receives the signal can process it and then signal artificial neurons connected to it. ANs typically have a weight that increases or decreases the strength of the signal at a connection. Signals travel from the first input, to the last output layer, possibly after traversing the layers multiple times. The original goal of the ANN approach was to solve problems in the same way that a human brain would. Over time, attention is focused on matching-specific mental abilities, leading to deviations from biology. ANNs have been used on a variety of tasks, including speech recognition, computer vision, social network filtering, machine translation, playing board and video games, and medical diagnosis.

In the past two decades, many studies have been carried out to predict various types of decolorization dye from aqueous solution, such as electrocoagulation process [49], Fenton process [50], and adsorption [51] by applying ANNs. M. Ahmadi and Kh. Naderi applied general regression neural network (GRNN) to predict the removal of methylene blue (MB) and Basic Yellow 28 (BY28) from aqueous solution. Their findings indicated that a well-designed GRNN is able to predict the removal of azo dye based on sonication time, initial dye concentration, and adsorbent mass. Ahmadi and J. Pooralhossini used backpropagation neural network (BPNN) to predict the decolorization of sunset yellow (SY) and disulfine blue (DB) [52]. The obtained results show that the BPNN model outperforms the classical statistical model in terms of R<sup>2</sup> , RMSE, MAE, and AAD for both dyes. Ahmadi and team used BPNN to predict the efficiency of two carcinogenic dye (methylene blue (MB) and malachite green (MG)) adsorption onto Mn@ CuS/ZnS nanocomposite-loaded activated carbon (Mn@ CuS/ZnS-NC-AC) as a novel adsorbent to identify the model parameters in order to improve the prediction performance [35]. Ahmadi and Dastkhoon used neural network to predict Safranin-O (SO) and indigo carmine (IC) adsorption onto Ni:FeO(OH)-NWs-AC. In this work, the influence of process variables (initial dye concentration, adsorbent mass, and sonication time) on the removal of both dyes was investigated by central composite rotatable design (CCRD) of RSM, multilayer perceptron (MLP) neural network, and Doolittle factorization algorithm (DFA). The ANN model was found to be more precise than the other models. We performed the sensitivity analysis (by using of weight neuron) and confirmed that sonication time has the essential factor affecting the removal of SO and IC [33]. Ahmadi et al. developed a BP neural network model and partial least squares (PLS) to predict the ultrasonic-assisted simultaneous removal of fast green (FG), eosin Y (EY), and quinine yellow (QY) from aqueous media following the use of MOF-5 as a metal organic framework and activated carbon hybrid (AC-MOF-5). The obtained results show that ANN and PLS model is a powerful tool for prediction of under-study dye adsorption by AC-MOF-5 [53].

methylene blue adsorption onto copper oxide nanoparticle loaded on activated carbon (CuO-NP-AC) by Ghaedi et al. Both studies indicate that SVM has a better performance in building

Application of AI in Modeling of Real System in Chemistry

http://dx.doi.org/10.5772/intechopen.75602

389

The main advantage of LS-SVM that was introduced by Foucquier et al. [58] is based on the structural risk minimization principle which aims to minimize the upper bound of the general error consisting of the sum of the training error. Also, SVM provides a better balance between prediction accuracy and computation speed comparing with ANNs and RSM [34]. The limitation of SVM method is the determination of kernel function. There is no uniform standard for determining which kernel will result in the most accurate SVM. Researchers have to determine the kernel function based on the characteristics of the data as well as their

Whereof each prediction method has its own limitations, currently, a trend has evolved to introduce new mathematical methods, called ensemble learning. These methods have been more extensive in analytical chemistry for complex data analysis. The adaptability of data mining methods makes them able to deal with typical problems: too many descriptors in the model, mixtures of different data types, complex data having missing values, multiple classes, or unbalanced data sets. For instance, some data mining methods such as support vector machine [34] and artificial neural networks (ANNs) [59] have been applied in different realms of science and engineering. The Ensemble prediction has become increasingly popular

This method differs from each different prediction method because it creates a composite model which integrates different individual prediction models. Instead of a prediction algorithm, this method acts as a framework to reduce prediction errors by combining different reduction algorithms together. Ensemble prediction methods have been successfully applied in several areas of chemistry that they have large data volumes, including spectroscopy [62, 63], quantitative structure–activity relationship (QSAR) modeling [64, 65], and omics sciences [66]. Ensemble prediction is able to manage with many types of responses and predictors such as categorical or numeric and loss functions such as Laplace, Gaussian, Poisson, and Bernoulli [67]. De'ath [68] has shown that ensemble prediction methods, unlike many other regression methods, can be used for both prediction and explanation of the underlying relationships

The two main outputs of an ensemble prediction model which are the partial dependence plots and the variable importance rankings can be used together for model interpretation. Friedman [69] has proposed that while these outputs might not offer a complete description, they can at least give an insight of the relation between the response and the predictors.

Some authors (e.g., Hastie et al.) [70] have shown that ensemble prediction is one of the most powerful machine/statistical learning ideas than have been presented during the 1990s, and

model and prediction than other AI-based prediction methods [57].

*2.3.1. Advantages and limitations*

*2.3.2. Ensemble prediction models*

own experience.

in chemistry [60, 61].

between response and predictors.

The main advantage of ANN method is its ability to detect complex nonlinear relationship between the inputs and outputs that this characteristic makes it possible to be applied for real systems. However, ANN method fails to establish any interconnection relationship between building physical parameters and removal percentage, which limits the model's fitting ability.

## **2.3. Least square-support vector machine (LS-SVM)**

SVM as a learning method was developed by Vapnik and is a powerful tool [34, 54]. This supervised learning method can be used for regression or classification in nonlinear models, and density estimation leads to complex optimization problems, typically quadratic programming. However, this method (SVM) is often time-consuming and difficult to adapt, suffering from the problem of a large memory requirement and CPU time when trained in batch mode. This limitation is overcome by LS-SVM as the modified version of SVM which solves the set of linear equations instead of the quadratic programming problem to minimize the complex nature of the optimization processes. The theory and more details of SVM and LS-SVM can be found in the literature.

Liang and team first applied LS-SVM in the area of water quality measurements in 2011. They showed that the model output made a relatively good training fitting effect and the predict effect was relatively satisfactory. The model has not only a good learning accuracy but a good generalization ability. The predictive fitting precision of the test data set was more than 90%, and the prediction error is minimum, and RMSE is 0.0028 [55].

Similarly, our group used LS-SVM for the optimization and/or modeling of pH, ZnS-NPs-AC mass, MB concentration, and sonication time to develop respective predictive equations for the simulation of the efficiency of MB adsorption. The obtained results using LS-SVM exhibit a nonlinear approach which shows better performances in comparison to central composite design (CCD) for the prediction of MB adsorption [34].

Niyaz Mohammad Mahmoodi and team used of least square-support vector machine (LS-SVM) to model the dye removal. The graphical plots and the values of statistical parameter showed LS-SVM as an intelligent model suitable for modeling of dye adsorption [56].

Our group compared LS-SVM with other AI-based prediction methods in removal dye prediction. We compared SVR with several ANN models for prediction of the adsorption of methylene blue (MB) from aqueous solutions by zinc sulfide nanoparticles with activated carbon (ZnS-NPs-AC). Also, a multiple linear regression (MLR) model and LS-SVM model with principal component analysis (PCA) were used for pre-processing to predict the efficiency of methylene blue adsorption onto copper oxide nanoparticle loaded on activated carbon (CuO-NP-AC) by Ghaedi et al. Both studies indicate that SVM has a better performance in building model and prediction than other AI-based prediction methods [57].

## *2.3.1. Advantages and limitations*

was found to be more precise than the other models. We performed the sensitivity analysis (by using of weight neuron) and confirmed that sonication time has the essential factor affecting the removal of SO and IC [33]. Ahmadi et al. developed a BP neural network model and partial least squares (PLS) to predict the ultrasonic-assisted simultaneous removal of fast green (FG), eosin Y (EY), and quinine yellow (QY) from aqueous media following the use of MOF-5 as a metal organic framework and activated carbon hybrid (AC-MOF-5). The obtained results show that ANN and PLS model is a powerful tool for prediction of under-study dye

The main advantage of ANN method is its ability to detect complex nonlinear relationship between the inputs and outputs that this characteristic makes it possible to be applied for real systems. However, ANN method fails to establish any interconnection relationship between building physical parameters and removal percentage, which limits the model's fitting ability.

SVM as a learning method was developed by Vapnik and is a powerful tool [34, 54]. This supervised learning method can be used for regression or classification in nonlinear models, and density estimation leads to complex optimization problems, typically quadratic programming. However, this method (SVM) is often time-consuming and difficult to adapt, suffering from the problem of a large memory requirement and CPU time when trained in batch mode. This limitation is overcome by LS-SVM as the modified version of SVM which solves the set of linear equations instead of the quadratic programming problem to minimize the complex nature of the optimization processes. The theory and more details of SVM and LS-SVM can

Liang and team first applied LS-SVM in the area of water quality measurements in 2011. They showed that the model output made a relatively good training fitting effect and the predict effect was relatively satisfactory. The model has not only a good learning accuracy but a good generalization ability. The predictive fitting precision of the test data set was more than 90%,

Similarly, our group used LS-SVM for the optimization and/or modeling of pH, ZnS-NPs-AC mass, MB concentration, and sonication time to develop respective predictive equations for the simulation of the efficiency of MB adsorption. The obtained results using LS-SVM exhibit a nonlinear approach which shows better performances in comparison to central composite

Niyaz Mohammad Mahmoodi and team used of least square-support vector machine (LS-SVM) to model the dye removal. The graphical plots and the values of statistical parameter showed LS-SVM as an intelligent model suitable for modeling of dye adsorption [56]. Our group compared LS-SVM with other AI-based prediction methods in removal dye prediction. We compared SVR with several ANN models for prediction of the adsorption of methylene blue (MB) from aqueous solutions by zinc sulfide nanoparticles with activated carbon (ZnS-NPs-AC). Also, a multiple linear regression (MLR) model and LS-SVM model with principal component analysis (PCA) were used for pre-processing to predict the efficiency of

adsorption by AC-MOF-5 [53].

388 Artificial Intelligence - Emerging Trends and Applications

be found in the literature.

**2.3. Least square-support vector machine (LS-SVM)**

and the prediction error is minimum, and RMSE is 0.0028 [55].

design (CCD) for the prediction of MB adsorption [34].

The main advantage of LS-SVM that was introduced by Foucquier et al. [58] is based on the structural risk minimization principle which aims to minimize the upper bound of the general error consisting of the sum of the training error. Also, SVM provides a better balance between prediction accuracy and computation speed comparing with ANNs and RSM [34]. The limitation of SVM method is the determination of kernel function. There is no uniform standard for determining which kernel will result in the most accurate SVM. Researchers have to determine the kernel function based on the characteristics of the data as well as their own experience.

#### *2.3.2. Ensemble prediction models*

Whereof each prediction method has its own limitations, currently, a trend has evolved to introduce new mathematical methods, called ensemble learning. These methods have been more extensive in analytical chemistry for complex data analysis. The adaptability of data mining methods makes them able to deal with typical problems: too many descriptors in the model, mixtures of different data types, complex data having missing values, multiple classes, or unbalanced data sets. For instance, some data mining methods such as support vector machine [34] and artificial neural networks (ANNs) [59] have been applied in different realms of science and engineering. The Ensemble prediction has become increasingly popular in chemistry [60, 61].

This method differs from each different prediction method because it creates a composite model which integrates different individual prediction models. Instead of a prediction algorithm, this method acts as a framework to reduce prediction errors by combining different reduction algorithms together. Ensemble prediction methods have been successfully applied in several areas of chemistry that they have large data volumes, including spectroscopy [62, 63], quantitative structure–activity relationship (QSAR) modeling [64, 65], and omics sciences [66]. Ensemble prediction is able to manage with many types of responses and predictors such as categorical or numeric and loss functions such as Laplace, Gaussian, Poisson, and Bernoulli [67]. De'ath [68] has shown that ensemble prediction methods, unlike many other regression methods, can be used for both prediction and explanation of the underlying relationships between response and predictors.

The two main outputs of an ensemble prediction model which are the partial dependence plots and the variable importance rankings can be used together for model interpretation. Friedman [69] has proposed that while these outputs might not offer a complete description, they can at least give an insight of the relation between the response and the predictors.

Some authors (e.g., Hastie et al.) [70] have shown that ensemble prediction is one of the most powerful machine/statistical learning ideas than have been presented during the 1990s, and it has been suggested [71, 72] that the application of ensemble prediction to classification and regression trees results in individual classifiers (e.g., classification trees, regression trees) which generally are competitive with any other method.

different types of predictor variables. In addition, they have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and can automatically handle interaction effects between predictors. However, compared with other predictive methods, ensemble models require more time to calculate and a high level of knowledge as a combination of different base models. Another disadvantage of the model is the fact that its predictive function depends on the selection of base models. In the previous study, the researchers selected the base model based on their previous knowledge. There is a lack of approach to determine which base model should be considered and included in the ensemble model.

Application of AI in Modeling of Real System in Chemistry

http://dx.doi.org/10.5772/intechopen.75602

391

According to previous researches, each type of AI-based prediction method has its own disadvantages and advantages; thus scientists have to select suitable method to solve their problems. For example, MLR is more appropriate than other methods in predicting removal dye because of its ease of use and high calculation speed. While LS-SVM and ANNs are more suitable for real system with high nonlinearity because of their high level of prediction accuracy. On the other hand, some researchers have tried to compare AI-based prediction methods with other methods in removal dye prediction. For example, Tanzifi [74] compared ANNs with RSM for predicting removal Amido Black 10B. The comparison of the adsorption efficiencies obtained by the ANN model and the experimental data evidenced that the ANN model could estimate the behavior of the Amido Black 10B dye adsorption process under various conditions. Their study proposed ANN model as a simpler and more efficient building energy prediction tool when compared with energy simulation software. Based on these researches, the advantages and disadvantages of AI-based prediction methods are summarized below.

**1.** Comparing AI with other engineering approaches shows that the building and/or development of the AI-based prediction methods does not need any detailed physical information

**2.** Based on previous study, if model is well trained, AI methods give promising prediction

**3.** The data gaining and data loading process is relatively simple, which means the prediction

There are many published papers on the prediction of dye adsorption using OVAT. However, in this chapter, we review the important research studies on dye adsorption forecasting using

which in return saves both cost and time for leading the prediction.

**3. Discussion**

**3.1. Advantages**

accuracy.

**4. Conclusion**

The advantages of AI model are:

model can be easily obtained.

In addition, Breiman [71] shows that applying ensemble prediction to classification and regression tree can actually be much quicker than fitting a neural net classifier.

Due to the high prediction accuracy, ensemble learning method has become a favorable topic in recent years and has already been applied to many fields successfully. For example, our group [73] developed boosted regression tree (BRT), an ensemble method for fitting statistical models that differ fundamentally from conventional techniques that aim to fit a single parsimonious model. In this study, response surface methodology (RSM), artificial neural network (ANN), and BRT have been used for the optimization and/or modeling of stirred time (min), pH, adsorbent mass (mg), and concentrations of MB and Cd2+ ions (mg L−1) to develop respective predictive equations for simulation of the efficiency of MB and Cd2+ adsorption based on experimental data set achieved in batch study. All three models showed good predictions in this study. But the BRT model was more precise than the other models, and it showed that BRT is a powerful tool for modeling and optimizing removal of MB and Cd(II).

Similarly, our group [32] used adaptive network-based fuzzy inference system (ANFIS) ensemble models as a support tool for examining data and making prediction to recognize and predict the removal percentage in MB and SY dye solution of different concentrations. The predictive capabilities of MLR and ANFIS are compared in terms of square correlation coefficient (R<sup>2</sup> ), root mean square error (RMSE), mean absolute error (MAE), and absolute average deviation (AAD) against the empirical data. It is found that the ANFIS model shows the better prediction accuracy than the CCD model.

In another work by our group [13], random forest (RF) and response surface methodology (RSM) were used to model and predict the efficiency of malachite green removal from aqueous solution by ultrasound-assisted adsorption onto the silver hydroxide nanoparticles loaded on activated carbon (AgOH-NPs-AC). The parameters such as pH, initial MG concentration, sonication time, and adsorbent dosage involved in the adsorption process were set within the ranges 2.0–10, 4–20 mg L−1, 2–6 min, and 0.005–0.025 g, respectively. The performance of the RF and CCD models for the description of experimental data was evaluated in terms of R<sup>2</sup> , RMSE, MAE, and AAD. The obtained results showed that the RF model outperformed classical statistical model for modeling the process of dye adsorption.

Also, ensemble prediction approach (i.e., radial basis function neural network (RBF-NN) and random forest (RF)) was developed and evaluated against a quadratic response surface model to predict the maximum removal efficiency of brilliant green (BG) from aqueous media in relation to BG concentration (4–20 mg L−1), sonication time (2–6 min), and ZnS-NP-AC mass (0.010– 0.030 g) by ultrasound-assisted adsorption [31]. All three (i.e., RBF network, RF, and polynomial) models were compared against the experimental data using four statistical indices, namely, R<sup>2</sup> , RMSE, MAE, and AAD. Graphical plots were also used for model comparison. The obtained results using RBF network and RF exhibit a better performance than MLR for both dyes.

The main advantage of the ensemble method is the improvement of accuracy. Also, these methods incorporate important advantages such as accommodating missing data and handling different types of predictor variables. In addition, they have no need for prior data transformation or elimination of outliers, can fit complex nonlinear relationships, and can automatically handle interaction effects between predictors. However, compared with other predictive methods, ensemble models require more time to calculate and a high level of knowledge as a combination of different base models. Another disadvantage of the model is the fact that its predictive function depends on the selection of base models. In the previous study, the researchers selected the base model based on their previous knowledge. There is a lack of approach to determine which base model should be considered and included in the ensemble model.

## **3. Discussion**

it has been suggested [71, 72] that the application of ensemble prediction to classification and regression trees results in individual classifiers (e.g., classification trees, regression trees)

In addition, Breiman [71] shows that applying ensemble prediction to classification and

Due to the high prediction accuracy, ensemble learning method has become a favorable topic in recent years and has already been applied to many fields successfully. For example, our group [73] developed boosted regression tree (BRT), an ensemble method for fitting statistical models that differ fundamentally from conventional techniques that aim to fit a single parsimonious model. In this study, response surface methodology (RSM), artificial neural network (ANN), and BRT have been used for the optimization and/or modeling of stirred time (min), pH, adsorbent mass (mg), and concentrations of MB and Cd2+ ions (mg L−1) to develop respective predictive equations for simulation of the efficiency of MB and Cd2+ adsorption based on experimental data set achieved in batch study. All three models showed good predictions in this study. But the BRT model was more precise than the other models, and it showed that

Similarly, our group [32] used adaptive network-based fuzzy inference system (ANFIS) ensemble models as a support tool for examining data and making prediction to recognize and predict the removal percentage in MB and SY dye solution of different concentrations. The predictive capabilities of MLR and ANFIS are compared in terms of square correlation

average deviation (AAD) against the empirical data. It is found that the ANFIS model shows

In another work by our group [13], random forest (RF) and response surface methodology (RSM) were used to model and predict the efficiency of malachite green removal from aqueous solution by ultrasound-assisted adsorption onto the silver hydroxide nanoparticles loaded on activated carbon (AgOH-NPs-AC). The parameters such as pH, initial MG concentration, sonication time, and adsorbent dosage involved in the adsorption process were set within the ranges 2.0–10, 4–20 mg L−1, 2–6 min, and 0.005–0.025 g, respectively. The performance of the RF and CCD models for the description of experimental data was evaluated in terms of R<sup>2</sup>

RMSE, MAE, and AAD. The obtained results showed that the RF model outperformed classi-

Also, ensemble prediction approach (i.e., radial basis function neural network (RBF-NN) and random forest (RF)) was developed and evaluated against a quadratic response surface model to predict the maximum removal efficiency of brilliant green (BG) from aqueous media in relation to BG concentration (4–20 mg L−1), sonication time (2–6 min), and ZnS-NP-AC mass (0.010– 0.030 g) by ultrasound-assisted adsorption [31]. All three (i.e., RBF network, RF, and polynomial) models were compared against the experimental data using four statistical indices, namely, R<sup>2</sup>

RMSE, MAE, and AAD. Graphical plots were also used for model comparison. The obtained

The main advantage of the ensemble method is the improvement of accuracy. Also, these methods incorporate important advantages such as accommodating missing data and handling

results using RBF network and RF exhibit a better performance than MLR for both dyes.

), root mean square error (RMSE), mean absolute error (MAE), and absolute

regression tree can actually be much quicker than fitting a neural net classifier.

BRT is a powerful tool for modeling and optimizing removal of MB and Cd(II).

which generally are competitive with any other method.

390 Artificial Intelligence - Emerging Trends and Applications

the better prediction accuracy than the CCD model.

cal statistical model for modeling the process of dye adsorption.

coefficient (R<sup>2</sup>

According to previous researches, each type of AI-based prediction method has its own disadvantages and advantages; thus scientists have to select suitable method to solve their problems. For example, MLR is more appropriate than other methods in predicting removal dye because of its ease of use and high calculation speed. While LS-SVM and ANNs are more suitable for real system with high nonlinearity because of their high level of prediction accuracy. On the other hand, some researchers have tried to compare AI-based prediction methods with other methods in removal dye prediction. For example, Tanzifi [74] compared ANNs with RSM for predicting removal Amido Black 10B. The comparison of the adsorption efficiencies obtained by the ANN model and the experimental data evidenced that the ANN model could estimate the behavior of the Amido Black 10B dye adsorption process under various conditions. Their study proposed ANN model as a simpler and more efficient building energy prediction tool when compared with energy simulation software. Based on these researches, the advantages and disadvantages of AI-based prediction methods are summarized below.

## **3.1. Advantages**

,

,

The advantages of AI model are:


## **4. Conclusion**

There are many published papers on the prediction of dye adsorption using OVAT. However, in this chapter, we review the important research studies on dye adsorption forecasting using AI methods. The literature survey in this chapter showed that the AI approaches can be successfully used for the modeling and predication of dye adsorption process with acceptable accuracy compared to conventional linear models such as RSM. The future research proposed for AIs in the field of dye removal for carrying out extensive studies are as follows:

[4] Liu C, Omer A. Adsorptive removal of cationic methylene blue dye using carboxymethyl cellulose/k-carrageenan/activated montmorillonite composite beads: Isotherm and kinetic studies. International Journal of Biological Macromolecules. 2018;**106**:823-833 [5] Gao H, Zhao S, Cheng X, Wang X, Zheng L. Removal of anionic azo dyes from aqueous solution using magnetic polymer multi-wall carbon nanotube nanocomposite as adsor-

Application of AI in Modeling of Real System in Chemistry

http://dx.doi.org/10.5772/intechopen.75602

393

[6] Zhu H-Y, Fu Y-Q, Jiang R, Yao J, Xiao L, Zeng G-M. Novel magnetic chitosan/poly (vinyl alcohol) hydrogel beads: Preparation, characterization and application for adsorption of

[7] Madrakian T, Afkhami A, Ahmadi M, Bagheri H. Removal of some cationic dyes from aqueous solutions using magnetic-modified multi-walled carbon nanotubes. Journal of

[8] Verma AK, Dash RR, Bhunia P. A review on chemical coagulation/flocculation technologies for removal of colour from textile wastewaters. Journal of Environmental Mana-

[9] Asgher M, Bhatti HN. Evaluation of thermodynamics and effect of chemical treatments on sorption potential of *Citrus* waste biomass for removal of anionic dyes from aqueous

[10] Körbahti BK, Artut K, Geçgel C, Özer A. Electrochemical decolorization of textile dyes and removal of metal ions from textile dye and metal ion binary mixtures. Chemical

[11] Gupta V. Application of low-cost adsorbents for dye removal—A review. Journal of

[12] Azqhandi MHA, Rajabi F, Keramati M. Synthesis of Cd doped ZnO/CNT nanocomposite by using microwave method: Photocatalytic behavior, adsorption and kinetic study.

[13] Solaymani E, Ghaedi M, Karimi H, Azqhandi A, Hossein M, Asfaram A. Intensified removal of malachite green by AgOH-AC nanoparticles combined with ultrasound:

[14] Amini M, Yazdanbakhsh A, Eslami A, Aghayani E. Optimization of coagulation-flocculation process for dye and COD removal from real dyeing wastewater and evaluation of effluent biodegradability in a carpet factory. Journal of Health in the Field. 2018;**13**(3):24-33

[15] Tuzen M, Sarı A, Saleh TA. Response surface optimization, kinetic and thermodynamic

[16] Sleiman M, Vildozo D, Ferronato C, Chovelon J-M. Photocatalytic degradation of azo dye Metanil yellow: Optimization and kinetic modeling using a chemometric approach.

nanocomposite.

Modeling and optimization. Applied Organometallic Chemistry. 2017:1-12

studies for effective removal of rhodamine B by magnetic AC/CeO<sup>2</sup>

Journal of Environmental Management. 2018;**206**:170-177

Applied Catalysis B: Environmental. 2007;**77**(1):1-11

dye from aqueous solution. Bioresource Technology. 2012;**105**:24-30

bent. Chemical Engineering Journal. 2013;**223**:84-90

solutions. Ecological Engineering. 2012;**38**(1):79-85

Environmental Management. 2009;**90**(8):2313-2342

Engineering Journal. 2011;**173**(3):677-688

Results in Physics. 2017:1106-1114

Hazardous Materials. 2011;**196**:109-114

gement. 2012;**93**(1):154-168


Based on the reported study and discussions presented in this chapter, it can be concluded that AI methods are excellent approaches for the adsorption of dyes. The information offered in this chapter would be highly useful to the scientists working in the field of dye adsorption and AIs in their investigations.

## **Author details**

M. H. Ahmadi Azqhandi\* and M. Shekari

\*Address all correspondence to: mhahmadia58@gmail.com

Applied Chemistry Department, Faculty of Petroleum and Gas (Gachsaran), Yasouj University, Gachsaran, Iran

## **References**


[4] Liu C, Omer A. Adsorptive removal of cationic methylene blue dye using carboxymethyl cellulose/k-carrageenan/activated montmorillonite composite beads: Isotherm and kinetic studies. International Journal of Biological Macromolecules. 2018;**106**:823-833

AI methods. The literature survey in this chapter showed that the AI approaches can be successfully used for the modeling and predication of dye adsorption process with acceptable accuracy compared to conventional linear models such as RSM. The future research proposed

**i.** The prediction capability of other AI models such as group method of data handling (GMDH), random forest (RF), neural gas network, regression tree, and radial basis func-

**ii.** The hybridization of the AIs together such as LS-SVM and GMDH forecasting methods,

**iii.** A few studies have been reported about combining ANN approaches with optimization algorithms. However, it is necessary to extend the optimization of network configuration for modeling adsorption process using an evolutionary computation method such as ant colony, PSO, GA algorithm, tabu search, artificial bee colony, firefly algorithm, teaching-learning-based optimization, harmony search, shuffled frog-leaping algorithm,

Based on the reported study and discussions presented in this chapter, it can be concluded that AI methods are excellent approaches for the adsorption of dyes. The information offered in this chapter would be highly useful to the scientists working in the field of dye adsorption

Applied Chemistry Department, Faculty of Petroleum and Gas (Gachsaran), Yasouj

[1] Silva TL, Cazetta AL, Souza PS, Zhang T, Asefa T, Almeida VC. Mesoporous activated carbon fibers synthesized from denim fabric waste: Efficient adsorbents for removal of textile dye from aqueous solutions. Journal of Cleaner Production. 2018;**171**:482-490 [2] Sarvajith M, Reddy GKK, Nancharaiah Y. Textile dye biodecolourization and ammonium removal over nitrite in aerobic granular sludge sequencing batch reactors. Journal

[3] Li L, Qi G, Wang B, Yue D, Wang Y, Sato T. Fulvic acid anchored layered double hydroxides: A multifunctional composite adsorbent for the removal of anionic dye and toxic

for AIs in the field of dye removal for carrying out extensive studies are as follows:

tion network (RBFN) for dye adsorption needs further research studies.

regarding the potential of predict dye adsorption, was proposed.

simulated annealing, and invasive weed optimization.

\*Address all correspondence to: mhahmadia58@gmail.com

of Hazardous Materials. 2018;**342**:536-543

metal. Journal of Hazardous Materials. 2018;**343**:19-28

and AIs in their investigations.

392 Artificial Intelligence - Emerging Trends and Applications

University, Gachsaran, Iran

M. H. Ahmadi Azqhandi\* and M. Shekari

**Author details**

**References**


[17] Cestari AR, Vieira EF, Mota JA. The removal of an anionic red dye from aqueous solutions using chitosan beads—The role of experimental factors on adsorption using a full factorial design. Journal of Hazardous Materials. 2008;**160**(2):337-343

[28] Daneshi A, Younesi H, Ghasempouri SM, Sharifzadeh M. Production of poly-3-hydroxybutyrate by *Cupriavidus necator* from corn syrup: Statistical modeling and optimization of biomass yield and volumetric productivity. Journal of Chemical Technology and

Application of AI in Modeling of Real System in Chemistry

http://dx.doi.org/10.5772/intechopen.75602

395

[29] Annadurai G, Sheeja R. Use of Box-Behnken design of experiments for the adsorption of verofix red using biopolymer. Bioprocess and Biosystems Engineering. 1998;**18**(6):463-466

[30] Ghaedi M, Azqhandi MHA, Asfaram A. Application of machine/statistical learning, artificial intelligence and statistical experimental design for modeling and optimization of methylene blue and Cd (II) removal from binary aqueous solution by natural walnut

[31] Azqhandi MA, Ghaedi M, Yousefi F, Jamshidi M. Application of random forest, radial basis function neural networks and central composite design for modeling and/or optimization of the ultrasonic assisted adsorption of brilliant green on ZnS-NP-AC. Journal

[32] Porhemmat S, Rezvani AR, Ghaedi M, Azqhandi MHA, Bazrafshan AA. Nanocomposites: Synthesis, characterization and its application to removal azo dyes using ultrasonic assisted method: Modeling and optimization. Ultrasonics Sonochemistry. 2017;**38**:530-543 [33] Dastkhoon M, Ghaedi M, Asfaram A, Azqhandi MHA, Purkait MK. Simultaneous removal of dyes onto nanowires adsorbent use of ultrasound assisted adsorption to clean waste water: Chemometrics for modeling and optimization, multicomponent adsorption and kinetic study. Chemical Engineering Research and Design. 2017;**124**:222-237 [34] Asfaram A, Ghaedi M, Azqhandi MA, Goudarzi A, Dastkhoon M. Statistical experimental design, least squares-support vector machine (LS-SVM) and artificial neural network (ANN) methods for modeling the facilitated adsorption of methylene blue dye. RSC

[35] Asfaram A, Ghaedi M, Azqhandi MHA, Goudarzi A, Hajati S. Ultrasound-assisted binary adsorption of dyes onto Mn@ CuS/ZnS-NC-AC as a novel adsorbent: Application of chemometrics for optimization and modeling. Journal of Industrial and Engineering

[36] Aleboyeh A, Daneshvar N, Kasiri M. Optimization of CI acid red 14 azo dye removal by electrocoagulation batch process with response surface methodology. Chemical

[37] Moghaddam SS, Moghaddam MA, Arami M. Coagulation/flocculation process for dye removal using sludge from water treatment plant: Optimization through response sur-

system: Optimization and modeling using a response surface methodology (RSM) based

[39] Ravikumar K, Krishnan S, Ramalingam S, Balu K. Optimization of process variables by the application of response surface methodology for dye removal using a novel adsor-

/UV

Engineering and Processing: Process Intensification. 2008;**47**(5):827-832

face methodology. Journal of Hazardous Materials. 2010;**175**(1):651-657

on the central composite design. Dyes and Pigments. 2007;**75**(3):533-543

[38] Cho I-H, Zoh K-D. Photocatalytic degradation of azo dye (reactive red 120) in TiO<sup>2</sup>

carbon. Physical Chemistry Chemical Physics. 2017;**19**:11299-11317

of Colloid and Interface Science. 2017;**505**:278-292

Advances. 2016;**6**(46):40502-40516

bent. Dyes and Pigments. 2007;**72**(1):66-74

Chemistry. 2017;**54**:377-388

Biotechnology. 2010;**85**(11):1528-1539


[28] Daneshi A, Younesi H, Ghasempouri SM, Sharifzadeh M. Production of poly-3-hydroxybutyrate by *Cupriavidus necator* from corn syrup: Statistical modeling and optimization of biomass yield and volumetric productivity. Journal of Chemical Technology and Biotechnology. 2010;**85**(11):1528-1539

[17] Cestari AR, Vieira EF, Mota JA. The removal of an anionic red dye from aqueous solutions using chitosan beads—The role of experimental factors on adsorption using a full

[18] Arabzadeh S, Ghaedi M, Ansari A, Taghizadeh F, Rajabi M. Comparison of nickel oxide and palladium nanoparticle loaded on activated carbon for efficient removal of methylene blue kinetic and isotherm studies of removal process. Human & Experimental

[19] Jamshidi M, Ghaedi M, Dashtian K, Hajati S, Bazrafshan A. Ultrasound-assisted removal of Al3+ ions and alizarin red S by activated carbon engrafted with Ag nanoparticles: Central composite design and genetic algorithm optimization. RSC Advances. 2015;

[20] Roosta M, Ghaedi M, Daneshfar A, Sahraei R. Ultrasound assisted microextractionnano material solid phase dispersion for extraction and determination of thymol and carvacrol in pharmaceutical samples: Experimental design methodology. Journal of

[21] Azad FN, Ghaedi M, Dashtian K, Hajati S, Pezeshkpour V. Ultrasonically assisted hydrothermal synthesis of activated carbon–HKUST-1-MOF hybrid for efficient simultaneous ultrasound-assisted removal of ternary organic dyes and antibacterial investigation:

[22] Ghaedi M, Negintaji G, Marahel F. Solid phase extraction and removal of brilliant green dye on zinc oxide nanoparticles loaded on activated carbon: New kinetic model and thermodynamic evaluation. Journal of Industrial and Engineering Chemistry.

[23] Ghaedi M, Khafri HZ, Asfaram A, Goudarzi A. Response surface methodology approach for optimization of adsorption of Janus green B from aqueous solution onto ZnO/

[24] Azad FN, Ghaedi M, Dashtian K, Hajati S, Goudarzi A, Jamshidi M. Enhanced simultaneous removal of malachite green and safranin O by ZnO nanorod-loaded activated carbon: Modeling, optimization and adsorption isotherms. New Journal of Chemistry.

[25] Khataee AR, Zarei M, Moradkhannejhad L. Application of response surface methodology for optimization of azo dye removal by oxalate catalyzed photoelectro-Fenton pro-

[26] Khanmohammadi M, Azghandi MA. Application of Doolittle algorithm as a multivariate calibration method for infrared spectrometric determination of some ingredients in

[27] Khanmohammadi M, Azqhandi MA. Introducing an orthogonal-triangular decomposition algorithm and its application in multivariate calibration. Analytical Methods.

cess using carbon nanotube-PTFE cathode. Desalination. 2010;**258**(1):112-119

detergent washing powder. Analytical Chemistry Letters. 2011;**1**(3):202-206


Taguchi optimization. Ultrasonics Sonochemistry. 2016;**31**:383-393

and Biomolecular Spectroscopy. 2016;**152**:233-240

factorial design. Journal of Hazardous Materials. 2008;**160**(2):337-343

Toxicology. 2015;**34**(2):153-169

394 Artificial Intelligence - Emerging Trends and Applications

Chromatography B. 2015;**975**:34-39

**5**(73):59522-59532

2014;**20**(4):1444-1452

2015;**39**(10):7998-8005

2011;**3**(12):2721-2725

Zn(OH)<sup>2</sup>


[40] Ravikumar K, Ramalingam S, Krishnan S, Balu K. Application of response surface methodology to optimize the process variables for reactive red and acid brown dye removal using a novel adsorbent. Dyes and Pigments. 2006;**70**(1):18-26

[53] Askari H, Ghaedi M, Dashtian K, Azghandi MHA. Rapid and high-capacity ultrasonic assisted adsorption of ternary toxic anionic dyes onto MOF-5-activated carbon: Artificial neural networks, partial least squares, desirability function and isotherm and kinetic

Application of AI in Modeling of Real System in Chemistry

http://dx.doi.org/10.5772/intechopen.75602

397

[54] Yovel Y, Franz MO, Stilz P, Schnitzler HU. Plant classification from bat-like echolocation

[55] Liang L, Wang Q, Chen Y. In application of support vector machine in online monitoring of wastewater treatment based on combined kernel functions. In: 2011 International Conference on Electrical and Control Engineering (ICECE); IEEE. 2011. pp. 3840-3843

[56] Mahmoodi NM, Hosseinabadi-Farahani Z, Chamani H. Nanostructured adsorbent

[57] Ghaedi M, Ghaedi A, Hossainpour M, Ansari A, Habibi M, Asghari A. Least square-support vector (LS-SVM) method for modeling of methylene blue dye adsorption using copper oxide loaded on activated carbon: Kinetic and isotherm study. Journal of Industrial

[58] Foucquier A, Robert S, Suard F, Stéphan L, Jay A. State of the art in building modelling and energy performances prediction: A review. Renewable and Sustainable Energy

[59] Azad FN, Ghaedi M, Asfaram A, Jamshidi A, Hassani G, Goudarzi A, Azqhandi MHA, Ghaedi A. Optimization of the process parameters for the adsorption of ternary dyes by Ni doped FeO (OH)-NWs–AC using response surface methodology and an artificial

[60] Sayegh A, Tate JE, Ropkins K. Understanding how roadside concentrations of NOx are influenced by the background levels, traffic density, and meteorological conditions

[61] Zhang W, Du Z, Zhang D, Yu S, Hao Y. Boosted regression tree model-based assessment of the impacts of meteorological drivers of hand, foot and mouth disease in Guangdong,

[62] Ghasemi JB, Tavakoli H. Application of random forest regression to spectral multivariate

[63] Menze BH, Kelm BM, Masuch R, Himmelreich U, Bachert P, Petrich W, Hamprecht FA. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data. BMC Bioinformatics.

[64] Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BP. Random forest: A classification and regression tool for compound classification and QSAR modeling.

Journal of Chemical Information and Computer Sciences. 2003;**43**(6):1947-1958

using boosted regression trees. Atmospheric Environment. 2016;**127**:163-175

): Synthesis and least square support vector machine modeling of dye removal.

study. Ultrasonics Sonochemistry. 2017;**37**:71-82

(MnO<sup>2</sup>

signals. PLoS Computational Biology. 2008;**4**(3):e1000032

Desalination and Water Treatment. 2016;**57**(45):21524-21533

and Engineering Chemistry. 2014;**20**(4):1641-1649

neural network. RSC Advances. 2016;**6**(24):19768-19779

China. Science of the Total Environment. 2016;**553**:366-371

calibration. Analytical Methods. 2013;**5**(7):1863-1871

Reviews. 2013;**23**:272-288

2009;**10**(1):213


[53] Askari H, Ghaedi M, Dashtian K, Azghandi MHA. Rapid and high-capacity ultrasonic assisted adsorption of ternary toxic anionic dyes onto MOF-5-activated carbon: Artificial neural networks, partial least squares, desirability function and isotherm and kinetic study. Ultrasonics Sonochemistry. 2017;**37**:71-82

[40] Ravikumar K, Ramalingam S, Krishnan S, Balu K. Application of response surface methodology to optimize the process variables for reactive red and acid brown dye removal

[41] Sharma S, Malik A, Satya S. Application of response surface methodology (RSM) for optimization of nutrient supplementation for Cr (VI) removal by Aspergillus lentulus

[42] Gönen F, Aksu Z. Single and binary dye and heavy metal bioaccumulation properties of Candida tropicalis: Use of response surface methodology (RSM) for the estimation of

[43] Khataee A, Fathinia M, Aber S, Zarei M. Optimization of photocatalytic treatment of dye

[44] Daneshvar N, Salari D, Khataee A. Photocatalytic degradation of azo dye acid red

[45] Chatterjee S, Kumar A, Basu S, Dutta S. Application of response surface methodology for methylene blue dye removal from aqueous solution using low cost adsorbent. Chemical

[46] Körbahti BK. Response surface optimization of electrochemical treatment of textile dye

[47] Zuorro A, Fidaleo M, Lavecchia R. Response surface methodology (RSM) analysis of

[48] Sadeghi-Kiakhani M, Arami M, Gharanjig K. Preparation of chitosan-ethyl acrylate as a biopolymer adsorbent for basic dyes removal from colored solutions. Journal of

[49] Daneshvar N, Khataee A, Djafarzadeh N. The use of artificial neural networks (ANN) for modeling of decolorization of textile dye solution containing CI basic yellow 28 by electrocoagulation process. Journal of Hazardous Materials. 2006;**137**(3):1788-1795 [50] Elmolla ES, Chaudhuri M, Eltoukhy MM. The use of artificial neural network (ANN) for modeling of COD removal from antibiotic aqueous solution by the Fenton process.

[51] Balci B, Keskinkan O, Avci M. Use of BDST and an ANN model for prediction of dye adsorption efficiency of Eucalyptus camaldulensis barks in fixed-bed system. Expert

[52] Pooralhossini J, Zanjanchi MA, Ghaedi M, Asfaram A, Azqhandi MHA. Statistical optimization and modeling approach for azo dye decolorization: Combined effects of ultrasound waves and nanomaterial-based adsorbent. Applied Organometallic Chemistry.

nanoparticles by central composite design: Intermediates

. Journal of Photochemistry and

O2

process. Journal

using a novel adsorbent. Dyes and Pigments. 2006;**70**(1):18-26

AML05. Journal of Hazardous Materials. 2009;**164**(2):1198-1204

solution on supported TiO<sup>2</sup>

396 Artificial Intelligence - Emerging Trends and Applications

removal yields. Journal of Hazardous Materials. 2009;**172**(2):1512-1519

identification. Journal of Hazardous Materials. 2010;**181**(1):886-897

wastewater. Journal of Hazardous Materials. 2007;**145**(1):277-286

photodegradation of sulfonated diazo dye reactive green 19 by UV/H<sup>2</sup>

14 in water on ZnO as an alternative catalyst to TiO<sup>2</sup>

Photobiology A: Chemistry. 2004;**162**(2):317-322

of Environmental Management. 2013;**127**:28-35

Environmental Chemical Engineering. 2013;**1**(3):406-415

Journal of Hazardous Materials. 2010;**179**(1):127-134

Systems with Applications. 2011;**38**(1):949-956

2018;**32**(3):1-14

Engineering Journal. 2012;**181**:289-299


[65] Polishchuk PG, Muratov EN, Artemenko AG, Kolumbin OG, Muratov NN, Kuz'min VE. Application of random forest approach to QSAR prediction of aquatic toxicity. Journal of Chemical Information and Modeling. 2009;**49**(11):2481-2488

**Chapter 20**

**Provisional chapter**

**Application of AI in Chemical Engineering**

**Application of AI in Chemical Engineering**

DOI: 10.5772/intechopen.76027

A major shortcoming of traditional strategies is the fact that solving chemical engineering problems due to the highly nonlinear behavior of chemical processes is often impossible or very difficult. Today, artificial intelligence (AI) techniques are becoming useful due to simple implementation, easy designing, generality, robustness and flexibility. The AI includes various branches, namely, artificial neural network, fuzzy logic, genetic algorithm, expert systems and hybrid systems. They have been widely used in various applications of the chemical engineering field including modeling, process control, classification, fault detection and diagnosis. In this chapter, the capabilities of AI are investi-

**Keywords:** chemical engineering, AI algorithms, classification, process control,

Artificial intelligence (AI) applications in chemical engineering have increased dramatically recently. This chapter deals with various applications of artificial intelligence (AI) in the chemical engineering field including process such as modeling, optimization, process control, fault detection and diagnosis. The aim of the chapter is to provide an overview of the field by presenting the capabilities and limitations of using the AI approach, focusing on artificial

It is shown that complexities of conventional approaches when dealing with chemical processes which are inherently highly nonlinear can be tackled through the application of AI

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Zeinab Hajjar, Shokoufe Tayyebi and Mohammad Hosein Eghbal Ahmadi

Zeinab Hajjar, Shokoufe Tayyebi and Mohammad Hosein Eghbal Ahmadi

http://dx.doi.org/10.5772/intechopen.76027

**Abstract**

**1. Introduction**

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

gated in various chemical engineering fields.

neural network (ANN) and fuzzy logic methods.

methods. Four illustrative relevant examples are also presented.

modeling, optimization, fault detection and diagnosis


#### **Application of AI in Chemical Engineering Application of AI in Chemical Engineering**

DOI: 10.5772/intechopen.76027

Zeinab Hajjar, Shokoufe Tayyebi and Mohammad Hosein Eghbal Ahmadi Zeinab Hajjar, Shokoufe Tayyebi and Mohammad Hosein Eghbal Ahmadi

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.76027

#### **Abstract**

[65] Polishchuk PG, Muratov EN, Artemenko AG, Kolumbin OG, Muratov NN, Kuz'min VE. Application of random forest approach to QSAR prediction of aquatic toxicity.

[66] Brereton RG, Lloyd GR. Support vector machines for classification and regression.

[67] Ridgeway G. Package gbm: Generalized Boosted Regression Models. Version 1.6. 3.2.

[68] De'Ath G. Boosted trees for ecological modeling and prediction. Ecology. 2007;**88**(1):243-251 [69] Friedman JH. Greedy function approximation: A gradient boosting machine. Annals of

[70] Hastie T, Tibshirani R, Friedman J, Franklin J. The elements of statistical learning: Data mining, inference and prediction. The Mathematical Intelligencer. 2005;**27**(2):83-85 [71] Breiman L. Arcing classifier (with discussion and a rejoinder by the author). The Annals

[73] Mazaheri H, Ghaedi M, Azqhandi MA, Asfaram A. Application of machine/statistical learning, artificial intelligence and statistical experimental design for the modeling and optimization of methylene blue and Cd (ii) removal from a binary aqueous solution by natural walnut carbon. Physical Chemistry Chemical Physics. 2017;**19**(18):11299-11317

[74] Tanzifi M, Yaraki MT, Kiadehi AD, Hosseini SH, Olazar M, Bhati AK, Agarwal S, Gupta VK, Kazemi A. Adsorption of Amido black 10B from aqueous solution using polyani-

nanocomposite: Experimental investigation and artificial neural network mod-

Journal of Chemical Information and Modeling. 2009;**49**(11):2481-2488

Austria: R Foundation for Statistical Computing Vienna; 2012

[72] Breiman L. Random forests. Machine Learning. 2001;**45**(1):5-32

eling. Journal of Colloid and Interface Science. 2018;**510**:246-261

Analyst. 2010;**135**(2):230-267

398 Artificial Intelligence - Emerging Trends and Applications

Statistics. 2001:1189-1232

of Statistics. 1998;**26**(3):801-849

line/SiO<sup>2</sup>

A major shortcoming of traditional strategies is the fact that solving chemical engineering problems due to the highly nonlinear behavior of chemical processes is often impossible or very difficult. Today, artificial intelligence (AI) techniques are becoming useful due to simple implementation, easy designing, generality, robustness and flexibility. The AI includes various branches, namely, artificial neural network, fuzzy logic, genetic algorithm, expert systems and hybrid systems. They have been widely used in various applications of the chemical engineering field including modeling, process control, classification, fault detection and diagnosis. In this chapter, the capabilities of AI are investigated in various chemical engineering fields.

**Keywords:** chemical engineering, AI algorithms, classification, process control, modeling, optimization, fault detection and diagnosis

## **1. Introduction**

Artificial intelligence (AI) applications in chemical engineering have increased dramatically recently. This chapter deals with various applications of artificial intelligence (AI) in the chemical engineering field including process such as modeling, optimization, process control, fault detection and diagnosis. The aim of the chapter is to provide an overview of the field by presenting the capabilities and limitations of using the AI approach, focusing on artificial neural network (ANN) and fuzzy logic methods.

It is shown that complexities of conventional approaches when dealing with chemical processes which are inherently highly nonlinear can be tackled through the application of AI methods. Four illustrative relevant examples are also presented.

After reading this chapter, the reader is expected to have a basic grounding in the application of AI methods in chemical engineering and understand their implementation issues.

## **2. Application of AI in chemical process modeling**

Chemical process models which present the system behavior are useful in all phases of chemical engineering, from research and design to optimization and control and even plant operations [1].

Generally, There are two major types of modeling approaches in chemical engineering, namely, mechanistic (white box, first principle) and AI-based approach like ANN and fuzzy logic methods. In the mechanistic approach, fundamental physical and chemical laws, such as conservation laws, construct the basis of the model. This approach contains algebraic and differential equations which involve mass, energy and momentum balances. Due to the large number of variables affecting the process behavior and complex mathematical equations governing the system, many chemical processes are nonlinear and complicated. Consequently, it is hard and sometimes even impossible to present them by mechanistic models. Even if such a model has been developed, it might be impractical to solve or identify its parameters. Moreover, a mechanistic model needs detailed knowledge and a lot of skill and ingenuity to incorporate the basic phenomena of the process in the model. Difficulties can arise from poor knowledge [2]. In some cases, considering some assumptions such as physical properties' constancy, ideality of gas phase and linearization of the nonlinear equations of the model is inevitable, which all impose limitations on the model leading to the reduction of the model's robustness [3].

Among the types of ANN structures, the multi-layer perceptron (MLP) neural network which has a feed-forward scheme is believed as the most useful topology for system modeling [8]. Moreover, the recurrent ANN model which is a mapping of past inputs and outputs to the

Application of AI in Chemical Engineering http://dx.doi.org/10.5772/intechopen.76027 401

In the fuzzy model approach where the two types of which are Mamdani [9] and Takagi-Sugeno (TS) [10], all the uncertainties and model complications are treated in linguistic expressions in the form of "If-Then" rules based on the theory of fuzzy logic [11]. TS like ANN

Mamdani Fuzzy that differs in the way the information and rules are presented has several superiorities over the TS approach for the modeling of chemical processes. First, qualitative experience and knowledge of the experts who are dealing with the process are incorporated in the development of the model [12]. In addition, there is no need for data in order to build the Mamdani fuzzy model. Consequently, a Mamdani fuzzy model is more intuitive, transparent and interpretable [13]. In contrast, each TS-type model is a local approximator and the predictability of the model is valid for the specific operating condition of the process under which the model was developed and tested [14]. Accordingly, it can hardly be applied for analyzing the process behavior and cannot be scaled up or down and therefore is less useful for industrial practice. Despite the capabilities of Mamdani method, it is worth underlying that a Mamdani fuzzy model suffers from the large number of rules when dealing with the

Genetic algorithm (GA) can be used to optimize the performance of a fuzzy model. The role of GA is recognized as optimal parameters' estimation such as the parameters of scaling functions and the universes of discourse [15, 16] or the membership functions (MFs) [17, 18]. GA is also applied as a method for rule reduction/selection by removing some rules like redundant, unnecessary or misleading ones [17] when dealing with high-dimensional problems in which

is commonly considered as a data-driven AI-based modeling approach [12].

the number of rules is so large that it cannot be managed efficiently.

future outputs can be used for dynamic processes.

**Figure 1.** Schematic of AI model.

processes with large number of variables.

On the contrary, AI-based techniques have demonstrated their superb ability and have received much attention for chemical process modeling. These techniques, for which developing detailed knowledge of the process is of less concern, may overcome the drawbacks of the mechanistic approach when dealing with complex and nonlinear systems. Using AI-based methods, inherently qualitative variables in chemical processes like catalyst deactivation in reactors can also be considered in the model, while these types of variables are not possible to implement in mechanistic models.

The most common methods of AI for modeling purposes in chemical engineering are ANN and fuzzy logic, which sometimes are hybridized with evolutionary algorithms [4–7]. In addition to ANN and fuzzy logic methods, their hybrid scheme named adaptive-network-based fuzzy inference system (ANFIS) which is actually a fuzzy inference system implemented in the framework of adaptive networks has also been applied for modeling purposes in chemical engineering.

The first step of developing an AI-based model is defining the input/output variables of the system which is to be modeled (**Figure 1**). Afterward, according to the experimental data or the knowledge of the governing phenomena, the model is developed. The parameters that characterize the AI-based model like the number of fuzzy sets (when using fuzzy logic), the number and the transfer functions of hidden layers (when using the ANN method) depend on the complexity and nonlinearity of the system and the types of variables affecting the process.

**Figure 1.** Schematic of AI model.

After reading this chapter, the reader is expected to have a basic grounding in the application

Chemical process models which present the system behavior are useful in all phases of chemical engineering, from research and design to optimization and control and even plant opera-

Generally, There are two major types of modeling approaches in chemical engineering, namely, mechanistic (white box, first principle) and AI-based approach like ANN and fuzzy logic methods. In the mechanistic approach, fundamental physical and chemical laws, such as conservation laws, construct the basis of the model. This approach contains algebraic and differential equations which involve mass, energy and momentum balances. Due to the large number of variables affecting the process behavior and complex mathematical equations governing the system, many chemical processes are nonlinear and complicated. Consequently, it is hard and sometimes even impossible to present them by mechanistic models. Even if such a model has been developed, it might be impractical to solve or identify its parameters. Moreover, a mechanistic model needs detailed knowledge and a lot of skill and ingenuity to incorporate the basic phenomena of the process in the model. Difficulties can arise from poor knowledge [2]. In some cases, considering some assumptions such as physical properties' constancy, ideality of gas phase and linearization of the nonlinear equations of the model is inevitable, which all

impose limitations on the model leading to the reduction of the model's robustness [3].

On the contrary, AI-based techniques have demonstrated their superb ability and have received much attention for chemical process modeling. These techniques, for which developing detailed knowledge of the process is of less concern, may overcome the drawbacks of the mechanistic approach when dealing with complex and nonlinear systems. Using AI-based methods, inherently qualitative variables in chemical processes like catalyst deactivation in reactors can also be considered in the model, while these types of variables are not possible to

The most common methods of AI for modeling purposes in chemical engineering are ANN and fuzzy logic, which sometimes are hybridized with evolutionary algorithms [4–7]. In addition to ANN and fuzzy logic methods, their hybrid scheme named adaptive-network-based fuzzy inference system (ANFIS) which is actually a fuzzy inference system implemented in the framework of adaptive networks has also been applied for modeling purposes in chemical

The first step of developing an AI-based model is defining the input/output variables of the system which is to be modeled (**Figure 1**). Afterward, according to the experimental data or the knowledge of the governing phenomena, the model is developed. The parameters that characterize the AI-based model like the number of fuzzy sets (when using fuzzy logic), the number and the transfer functions of hidden layers (when using the ANN method) depend on the complexity and nonlinearity of the system and the types of variables affecting the process.

of AI methods in chemical engineering and understand their implementation issues.

**2. Application of AI in chemical process modeling**

400 Artificial Intelligence - Emerging Trends and Applications

tions [1].

implement in mechanistic models.

engineering.

Among the types of ANN structures, the multi-layer perceptron (MLP) neural network which has a feed-forward scheme is believed as the most useful topology for system modeling [8]. Moreover, the recurrent ANN model which is a mapping of past inputs and outputs to the future outputs can be used for dynamic processes.

In the fuzzy model approach where the two types of which are Mamdani [9] and Takagi-Sugeno (TS) [10], all the uncertainties and model complications are treated in linguistic expressions in the form of "If-Then" rules based on the theory of fuzzy logic [11]. TS like ANN is commonly considered as a data-driven AI-based modeling approach [12].

Mamdani Fuzzy that differs in the way the information and rules are presented has several superiorities over the TS approach for the modeling of chemical processes. First, qualitative experience and knowledge of the experts who are dealing with the process are incorporated in the development of the model [12]. In addition, there is no need for data in order to build the Mamdani fuzzy model. Consequently, a Mamdani fuzzy model is more intuitive, transparent and interpretable [13]. In contrast, each TS-type model is a local approximator and the predictability of the model is valid for the specific operating condition of the process under which the model was developed and tested [14]. Accordingly, it can hardly be applied for analyzing the process behavior and cannot be scaled up or down and therefore is less useful for industrial practice. Despite the capabilities of Mamdani method, it is worth underlying that a Mamdani fuzzy model suffers from the large number of rules when dealing with the processes with large number of variables.

Genetic algorithm (GA) can be used to optimize the performance of a fuzzy model. The role of GA is recognized as optimal parameters' estimation such as the parameters of scaling functions and the universes of discourse [15, 16] or the membership functions (MFs) [17, 18]. GA is also applied as a method for rule reduction/selection by removing some rules like redundant, unnecessary or misleading ones [17] when dealing with high-dimensional problems in which the number of rules is so large that it cannot be managed efficiently.

**4. Application of neural networks in chemical process control**

can be effectively controlled for a number of complex and nonlinear processes [23].

*<sup>M</sup>*(*k*) <sup>=</sup> <sup>∅</sup> (*ysp*, *<sup>y</sup>*(*<sup>k</sup>* <sup>−</sup> <sup>1</sup>), …,*y*(*<sup>k</sup>* <sup>−</sup> *<sup>n</sup>*), *<sup>M</sup>*(*<sup>k</sup>* <sup>−</sup> <sup>1</sup> <sup>−</sup> *<sup>τ</sup>*), …,*M*(*<sup>k</sup>* <sup>−</sup> *<sup>m</sup>* <sup>−</sup> *<sup>τ</sup>*)

values of the process model state variables and the past control action.

and separation and purification [26, 27].

the input and output as shown below:

The process control strategies have been developed to improve the performance of the process, reduce energy consumption and ensure high safety and environmental goals. The conventional controllers cannot show satisfactory responses in many industrial chemical processes with high nonlinear dynamics and parameter uncertainties, whereas AI approaches

Because of their high potential for handling nonlinear relationships and self-learning capabilities, there has been considerable interest in the use of neural networks for the control in different fields of chemical processes such as thermal processes [24], reaction processes [25]

One of the algorithms based on neural network control is the inverse model control. In this approach, it is assumed that the input vector for neural network is the required future or reference output together with the past inputs and the past output variables; the approach can help to make for better performance the controlled variables when the unmeasured disturbance is present. The manipulated variable of the controlled plant is the output of the neural network controller [23]. In the system with time delay (τ), if the orders of dynamic model for the output and the input are *n* and *m*, the inverse model can be expressed as the function of

where ∅ represents the function of the inverse model, *k* is the discrete time and *M*, *y* and *ysp* are the control action, the output of the plant and the set point controller, respectively. Therefore, the controller predicts the control action, as shown in **Figure 3**, by having current and past

Fuzzy systems have been used in different applications for controlling chemical processes [28–30]. Researchers also used fuzzy logic controller coupled with an optimal control in an exothermic chemical reaction [31], a batch polymerization reactor [32] and polymerization processes [33]. In addition, since time delay can be often seen in many industrial chemical processes, a possible alternative is the fuzzy model predictive control (FMPC) which has been proposed [34, 35]. In systems with uncertainties of the system model, the choice of type-1 fuzzy may not always be the appropriate solution for a control problem [36]. In these cases, the type-2 fuzzy logic control has been represented in many fields of chemical processes [37, 38]. Hybrid controller based on AI strategies combines two or more AI techniques in order to improve control performance of the chemical process. One of the most popular strategies is the adaptive neuro-fuzzy inference system (ANFIS) controller. This approach is a hybrid intelligent system which uses the learning ability of the neural network with the knowledge

representation of the fuzzy logic [39]. The schematic of ANFIS model with two inputs (x<sup>1</sup>

tains five layers of feed-forward neural network which are explained as follows:

and one output (φ) is shown in **Figure 4**. As shown in **Figure 4**, the ANFIS architecture con-

) (1)

Application of AI in Chemical Engineering http://dx.doi.org/10.5772/intechopen.76027 403

 and x<sup>2</sup> )

**Figure 2.** Schematic of hybrid Mamdani fuzzy and GA modeling method.

The hybrid Mamdani fuzzy and GA modeling methods commonly consist of two main steps: (i) constructing a start-up version of the model using only the heuristic knowledge and (ii) tuning procedure using the GA. The schematic of this algorithm is shown in **Figure 2**. In the first step, the output variables determining the behavior of the system are defined, given that, the input variables which affect the selected output variables are determined. Afterward, a base fuzzy model is defined, characterized by the number and types of fuzzy sets of variables and the production rules presenting the behavior of the process based on the knowledge and expertise of the experts who have been working with the system. This model is used as the start-up version of the model which has to be tuned. In the second step, GA is formulated for optimization of parameters that characterize model, such as membership function parameters, membership function types and so on.

## **3. Application of AI in optimization of chemical processes**

Chemical process optimization has its origins in linear programming at the beginning of the 1960s [19]. This problem is finding the best solution from a variety of efficient alternatives of design or operating variables in order to minimize or maximize a desired objective function. In a general way, the objective function can be the minimization of the operating costs and the undesired material production or the maximization of energy efficiency, the yields and operation productivity, the profitability, safety and the reliability of the plant.

Most chemical processes are nonlinear and complex, so there are many solutions (in some cases becoming endless) in the optimization problems. Such problems are often too complex to be solved through gradient-based optimization approaches. Evolutionary algorithms (EAs) like GA [20], harmony search [21], particle swarm optimization [22] and so on categorized in the AI-based method that is a generic population-based metaheuristic optimization algorithm are capable of efficiently finding an optimal solution in complex problems, such as optimization of chemical processes.

## **4. Application of neural networks in chemical process control**

The process control strategies have been developed to improve the performance of the process, reduce energy consumption and ensure high safety and environmental goals. The conventional controllers cannot show satisfactory responses in many industrial chemical processes with high nonlinear dynamics and parameter uncertainties, whereas AI approaches can be effectively controlled for a number of complex and nonlinear processes [23].

Because of their high potential for handling nonlinear relationships and self-learning capabilities, there has been considerable interest in the use of neural networks for the control in different fields of chemical processes such as thermal processes [24], reaction processes [25] and separation and purification [26, 27].

One of the algorithms based on neural network control is the inverse model control. In this approach, it is assumed that the input vector for neural network is the required future or reference output together with the past inputs and the past output variables; the approach can help to make for better performance the controlled variables when the unmeasured disturbance is present. The manipulated variable of the controlled plant is the output of the neural network controller [23]. In the system with time delay (τ), if the orders of dynamic model for the output and the input are *n* and *m*, the inverse model can be expressed as the function of the input and output as shown below:

The hybrid Mamdani fuzzy and GA modeling methods commonly consist of two main steps: (i) constructing a start-up version of the model using only the heuristic knowledge and (ii) tuning procedure using the GA. The schematic of this algorithm is shown in **Figure 2**. In the first step, the output variables determining the behavior of the system are defined, given that, the input variables which affect the selected output variables are determined. Afterward, a base fuzzy model is defined, characterized by the number and types of fuzzy sets of variables and the production rules presenting the behavior of the process based on the knowledge and expertise of the experts who have been working with the system. This model is used as the start-up version of the model which has to be tuned. In the second step, GA is formulated for optimization of parameters that characterize model, such as membership function param-

Chemical process optimization has its origins in linear programming at the beginning of the 1960s [19]. This problem is finding the best solution from a variety of efficient alternatives of design or operating variables in order to minimize or maximize a desired objective function. In a general way, the objective function can be the minimization of the operating costs and the undesired material production or the maximization of energy efficiency, the yields and

Most chemical processes are nonlinear and complex, so there are many solutions (in some cases becoming endless) in the optimization problems. Such problems are often too complex to be solved through gradient-based optimization approaches. Evolutionary algorithms (EAs) like GA [20], harmony search [21], particle swarm optimization [22] and so on categorized in the AI-based method that is a generic population-based metaheuristic optimization algorithm are capable of efficiently finding an optimal solution in complex problems, such as optimiza-

eters, membership function types and so on.

**Figure 2.** Schematic of hybrid Mamdani fuzzy and GA modeling method.

402 Artificial Intelligence - Emerging Trends and Applications

tion of chemical processes.

**3. Application of AI in optimization of chemical processes**

operation productivity, the profitability, safety and the reliability of the plant.

$$M(k) = \mathcal{Q}\left(y\_{\nu^{\prime}}y(k-1), \dots, y(k-n), M(k-1-\tau), \dots, M(k-m-\tau)\right) \tag{1}$$

where ∅ represents the function of the inverse model, *k* is the discrete time and *M*, *y* and *ysp* are the control action, the output of the plant and the set point controller, respectively. Therefore, the controller predicts the control action, as shown in **Figure 3**, by having current and past values of the process model state variables and the past control action.

Fuzzy systems have been used in different applications for controlling chemical processes [28–30]. Researchers also used fuzzy logic controller coupled with an optimal control in an exothermic chemical reaction [31], a batch polymerization reactor [32] and polymerization processes [33]. In addition, since time delay can be often seen in many industrial chemical processes, a possible alternative is the fuzzy model predictive control (FMPC) which has been proposed [34, 35]. In systems with uncertainties of the system model, the choice of type-1 fuzzy may not always be the appropriate solution for a control problem [36]. In these cases, the type-2 fuzzy logic control has been represented in many fields of chemical processes [37, 38].

Hybrid controller based on AI strategies combines two or more AI techniques in order to improve control performance of the chemical process. One of the most popular strategies is the adaptive neuro-fuzzy inference system (ANFIS) controller. This approach is a hybrid intelligent system which uses the learning ability of the neural network with the knowledge representation of the fuzzy logic [39]. The schematic of ANFIS model with two inputs (x<sup>1</sup> and x<sup>2</sup> ) and one output (φ) is shown in **Figure 4**. As shown in **Figure 4**, the ANFIS architecture contains five layers of feed-forward neural network which are explained as follows:

**4.1. First layer**

change between 0 and 1.

**4.2. Second layer**

strength of a rule.

**4.3. Third layer**

**4.4. Fourth layer**

**4.5. Fifth layer**

This layer is named as an input layer. Each neuron in this layer saves the parameters of the membership function and crisp inputs are converted to membership degree values which

Application of AI in Chemical Engineering http://dx.doi.org/10.5772/intechopen.76027 405

Each neuron of this layer performs a connective operation (i.e., "AND") to calculate the firing

The normalized firing strength is multiplied by a linear combination of the inputs (i.e., Takagi-

The application of ANFIS in the process control of chemical plants was seen in the distillation

A fault is defined as a deviation from an acceptable range of an observable variable or calculated parameter that is referred to as a failure. A failure can be described as diversity of malfunctions in the real plant which can be caused due to instruments' failures, disturbances and plant parameter uncertainties. The abnormal conditions in a plant can result in financial losses. Therefore, in the chemical processes, fault detection and diagnosis have been the focal point of many researches and various fault detection and diagnosis strategies have been presented in the literature. The fault diagnostic systems should possess desirable characteristics

One of the intelligent fault diagnosing techniques is neural network systems. Because of their high potential for capturing nonlinear relationships, neural networks represent a powerful tool for fault diagnosis [43–47]. In fault detection based on neural networks, the number of neurons in the input and output layers are equal to the number of measured variables and the number of potential faults in the process, respectively. The outputs of the neural diagnoser

The last layer of the network is the weighted average of the outputs of the fourth layer.

**5. Application of AI techniques in fault detection and diagnosis of** 

such as quick detection, isolability, robustness and multiple fault identifiability [42].

A normalization process is performed by the neurons of this layer.

Sugeno fuzzy rule) in order to obtain a rules layer.

column [40], biodiesel reactor [41].

**chemical engineering**

**Figure 3.** Schematic of neural network inverse model control.

**Figure 4.** Schematic architecture of ANFIS model with two fuzzy rules for two inputs and one output.

#### **4.1. First layer**

This layer is named as an input layer. Each neuron in this layer saves the parameters of the membership function and crisp inputs are converted to membership degree values which change between 0 and 1.

#### **4.2. Second layer**

Each neuron of this layer performs a connective operation (i.e., "AND") to calculate the firing strength of a rule.

#### **4.3. Third layer**

A normalization process is performed by the neurons of this layer.

#### **4.4. Fourth layer**

The normalized firing strength is multiplied by a linear combination of the inputs (i.e., Takagi-Sugeno fuzzy rule) in order to obtain a rules layer.

#### **4.5. Fifth layer**

**Figure 4.** Schematic architecture of ANFIS model with two fuzzy rules for two inputs and one output.

**Figure 3.** Schematic of neural network inverse model control.

404 Artificial Intelligence - Emerging Trends and Applications

The last layer of the network is the weighted average of the outputs of the fourth layer.

The application of ANFIS in the process control of chemical plants was seen in the distillation column [40], biodiesel reactor [41].

## **5. Application of AI techniques in fault detection and diagnosis of chemical engineering**

A fault is defined as a deviation from an acceptable range of an observable variable or calculated parameter that is referred to as a failure. A failure can be described as diversity of malfunctions in the real plant which can be caused due to instruments' failures, disturbances and plant parameter uncertainties. The abnormal conditions in a plant can result in financial losses. Therefore, in the chemical processes, fault detection and diagnosis have been the focal point of many researches and various fault detection and diagnosis strategies have been presented in the literature. The fault diagnostic systems should possess desirable characteristics such as quick detection, isolability, robustness and multiple fault identifiability [42].

One of the intelligent fault diagnosing techniques is neural network systems. Because of their high potential for capturing nonlinear relationships, neural networks represent a powerful tool for fault diagnosis [43–47]. In fault detection based on neural networks, the number of neurons in the input and output layers are equal to the number of measured variables and the number of potential faults in the process, respectively. The outputs of the neural diagnoser are binary variables representing the occurrence of a fault (if the corresponding value is 1) or the lack of fault occurrence (if the corresponding value is 0) [47].Another fault diagnosis approach of AI techniques is fuzzy logic, which is applied in chemical processes [48–51]. In fault diagnosis based on fuzzy logic, the fuzzy relations between faults and symptoms are assumed to be from one to many (i.e., one fault may cause several symptoms). For example:

Illustration case study one: Prediction of virus removal from water using microfiltration

One of the separation technologies for virus removal from water for municipal effluent reuse is the application of membranes. Conventional modeling approaches for predicting membrane performance suffer from various limitations such as the lack of predictive fouling models or complexities of estimating the properties of the membrane surface and membrane interactions. In this case study, an optimum Mamdani fuzzy model is developed for removal prediction of two types of viruses from water [2]. The GA is employed for optimal estimation of parameters characterizing the membership functions of the input/output variables of the model. The first step is defining input/output variables of the model.The amount of virus rejection (R%) which is considered as an output variable of the model is determined as

*Cp* \_\_\_

are virus concentrations in permeate and feed, respectively. The input vari-

ables of the model are concentration of FMD virus (CFMD), concentration of IBR virus (CIBR), operating pressure (P), volume (V) and rpm (stirring speed). The experimental data can found in the works of Madaeni and Kurdian [2]. All the variables that exist in the system are dis-

The next steps are setting a fuzzy inference system using initial fuzzy sets with parameters and defining the fitness function. Two parameters of the Gaussian membership function

 denote the vector of fuzzy model and data set, respectively. There are two methods for the genetically tuning procedure. For cases with a low number of rules, the parameters of membership functions in all rules (for both input and output variables) can be considered as decision variables through optimization formulation. In this way, a variable at each rule can have different optimized shape membership functions. This approach increases the prediction capability yet at the cost of reducing the interpretability of the model. The second method is used when there are a large number of rules in the model. In this method, each variable of the model in all rules has the same optimized shape of the membership function. This method has a lower number of decision variables in optimization

\_\_\_ *<sup>x</sup>* <sup>−</sup> *<sup>x</sup>*¯ *<sup>σ</sup>* ) 2

*Cf*) (2)

Application of AI in Chemical Engineering http://dx.doi.org/10.5772/intechopen.76027 407

) (3)

<sup>2</sup> (4)

membrane based on hybrid Mamdani fuzzy and GA.

*<sup>R</sup>* % <sup>=</sup> <sup>100</sup>(<sup>1</sup> <sup>−</sup>

cretized by Gaussian-type membership functions.

including *x*¯ and *σ*(Eq. (3)) are obtained via GA.

*f*(*x*) = exp(−(

*MSE* = (*ym* − *ye*)

The mean square error (MSE) is selected as the fitness function as follows:

formulation for the same case when compared with the first method.

follows:

where Cp and Cf

where ym and ye

```
If Sym1
        is S1,n AND Sym2
                          is S2,n AND … AND Symm is Sm,n Then F1
                                                                    is Hn
```
where *Symi* (*i* = 1… *m*) is the vector of fuzzy input variables (symptoms) and *Fi* (*j* = 1 … *n*) is the fuzzy output variables (Faults). *Si, j* is the input linguistic value relevant to jth output and *Hj* is the output linguistic value. The schematic of a fuzzy fault detection system is shown in **Figure 5** [51].

Although the neural network is a powerful tool for fault diagnosis due to its ability in capturing the nonlinear relationship with no heuristic reasoning about process, it requires a large number of data corresponding to various operating conditions in which the effects of various faults exist. On the other hand, the fuzzy diagnoser system expresses the heuristic knowledge between the symptoms and their corresponding faults of the process such as linguistic rules and does not require any quantitative data sets corresponding to history and trends of the system under any operating conditions [52, 53]. The disadvantage of the fuzzy diagnoser is that managing heuristic and knowledge-based rules is more difficult and time demanding and sometimes even impossible for plant-wide integrated processes [54, 55]. Therefore, neuro-fuzzy diagnoser applications in chemical plants are proposed in the literature [56–58].

In the following, four illustration studies of AI techniques are presented for various purposes (i.e. modeling, optimization, process control, fault detection and diagnosis) of chemical processes.

**Figure 5.** Schematic diagram of a fuzzy fault detection scheme.

Illustration case study one: Prediction of virus removal from water using microfiltration membrane based on hybrid Mamdani fuzzy and GA.

are binary variables representing the occurrence of a fault (if the corresponding value is 1) or the lack of fault occurrence (if the corresponding value is 0) [47].Another fault diagnosis approach of AI techniques is fuzzy logic, which is applied in chemical processes [48–51]. In fault diagnosis based on fuzzy logic, the fuzzy relations between faults and symptoms are assumed to be from one to many (i.e., one fault may cause several symptoms). For example:

(*i* = 1… *m*) is the vector of fuzzy input variables (symptoms) and *Fi*

the fuzzy output variables (Faults). *Si, j* is the input linguistic value relevant to jth output and

Although the neural network is a powerful tool for fault diagnosis due to its ability in capturing the nonlinear relationship with no heuristic reasoning about process, it requires a large number of data corresponding to various operating conditions in which the effects of various faults exist. On the other hand, the fuzzy diagnoser system expresses the heuristic knowledge between the symptoms and their corresponding faults of the process such as linguistic rules and does not require any quantitative data sets corresponding to history and trends of the system under any operating conditions [52, 53]. The disadvantage of the fuzzy diagnoser is that managing heuristic and knowledge-based rules is more difficult and time demanding and sometimes even impossible for plant-wide integrated processes [54, 55]. Therefore, neuro-fuzzy diagnoser applications in chemical plants are proposed in the literature [56–58]. In the following, four illustration studies of AI techniques are presented for various purposes (i.e. modeling, optimization, process control, fault detection and diagnosis) of chemical

is the output linguistic value. The schematic of a fuzzy fault detection system is shown in

is *S2,n* AND … AND *Symm* is *Sm,n* Then *F1*

is *Hn*

(*j* = 1 … *n*) is

If *Sym1*

406 Artificial Intelligence - Emerging Trends and Applications

where *Symi*

**Figure 5** [51].

processes.

*Hj*

is *S1,n* AND *Sym2*

**Figure 5.** Schematic diagram of a fuzzy fault detection scheme.

One of the separation technologies for virus removal from water for municipal effluent reuse is the application of membranes. Conventional modeling approaches for predicting membrane performance suffer from various limitations such as the lack of predictive fouling models or complexities of estimating the properties of the membrane surface and membrane interactions. In this case study, an optimum Mamdani fuzzy model is developed for removal prediction of two types of viruses from water [2]. The GA is employed for optimal estimation of parameters characterizing the membership functions of the input/output variables of the model. The first step is defining input/output variables of the model.The amount of virus rejection (R%) which is considered as an output variable of the model is determined as follows:

$$R\,\%=100\left(1-\frac{C\_p}{C\_\gamma}\right) \tag{2}$$

where Cp and Cf are virus concentrations in permeate and feed, respectively. The input variables of the model are concentration of FMD virus (CFMD), concentration of IBR virus (CIBR), operating pressure (P), volume (V) and rpm (stirring speed). The experimental data can found in the works of Madaeni and Kurdian [2]. All the variables that exist in the system are discretized by Gaussian-type membership functions.

The next steps are setting a fuzzy inference system using initial fuzzy sets with parameters and defining the fitness function. Two parameters of the Gaussian membership function including *x*¯ and *σ*(Eq. (3)) are obtained via GA.

$$f(\mathbf{x}) = \exp\left(-\left(\frac{\mathbf{x} - \overline{\mathbf{x}}}{\sigma}\right)^2\right) \tag{3}$$

The mean square error (MSE) is selected as the fitness function as follows:

$$MSE = \langle y\_m - y\_e \rangle^2 \tag{4}$$

where ym and ye denote the vector of fuzzy model and data set, respectively.

There are two methods for the genetically tuning procedure. For cases with a low number of rules, the parameters of membership functions in all rules (for both input and output variables) can be considered as decision variables through optimization formulation. In this way, a variable at each rule can have different optimized shape membership functions. This approach increases the prediction capability yet at the cost of reducing the interpretability of the model. The second method is used when there are a large number of rules in the model. In this method, each variable of the model in all rules has the same optimized shape of the membership function. This method has a lower number of decision variables in optimization formulation for the same case when compared with the first method.

According to the possible combination of input variables, 10 rules can be defined for this model, and due to the low number of rules, the first-mentioned method for parameter optimization is applied. The population size and maximum generation number of GA are set as 500 and 100, respectively. Having passed the optimization procedure, the fuzzy model is developed with optimized parameters. This model is developed based on qualitative rules, bypassing the complexities and drawbacks of the white-box modeling method.

In this case study, the main process variables which are optimized to enhance C<sup>2</sup>

reactor is firstly modeled and then solved [62]. Afterward, using GA, the C<sup>2</sup>

values of decision variables are presented in **Table 3**. The maximum C<sup>2</sup>

yield considered as the fitness function is defined as follows:

for one, two and three secondary injections of oxygen.

The main GA parameters are presented in **Table 2**.

*YC*<sup>2</sup>

The kinetic model presented by Daneshpayeh et al. [63] is used as the reaction sub-model. The

<sup>=</sup> <sup>2</sup> <sup>×</sup> *NC* \_\_\_\_\_\_2 *NCH*<sup>4</sup>

The best results are achieved for three injections of oxygen along the reactor. The optimum

achieved for three secondary oxygen injections at the operating temperature of 746.05°C. This

Illustrative case study three: Online genetic-ANFIS control for advanced microwave biodiesel

The microchem reactor based on the microwave process technology is used to produce biodiesel which is good for the environment. The reactor temperature should be controlled to ensure an optimal yield of biodiesel and to minimize the generation of unwanted byproducts. For this aim, Wali et al. implemented an artificial intelligent controller design based on the online genetic-ANFIS temperature control for advanced biodiesel microwave reactor [41]. The microwave power supply as the manipulated variable, the reactor temperature as control variable and the feed-flow rate as the disturbance variable have been considered in

The online genetic-ANFIS controller has been evaluated at different operation conditions (set-point tracking and disturbance rejection). The genetic-ANFIS controller successfully tracks the demands of reactor temperature set-point faster than adaptive control without any

Illustrative case study four: Neuromorphic multiple-fault diagnosing the plant-wide system.

/s

/s (×0.21)\*

) 0.03–0.5 m<sup>3</sup>

) 0.5–4 m

**Decision variables Constraints**

Oxygen flow rate at each injection part 0.03–0.5 m<sup>3</sup>

Operating temperature (T) 700–850°C

yield which is achieved by an AI-based method is approximately 4% higher

listed in **Table 1**.

optimized C<sup>2</sup>

reactor.

this process.

oscillations [41].

Methane flow rate (U<sup>c</sup>

Length of each section of reactor (L<sup>i</sup>

\*Note that actually it is the air which entered the bed.

**Table 1.** Decision variables and their constraints.

than the original model [61].

C2

yield are

409

yield is optimized

Application of AI in Chemical Engineering http://dx.doi.org/10.5772/intechopen.76027

yield of 22.87% is

× 100 (5)

The comparison between model data and experimental data shows an accuracy of near 90% for the developed fuzzy model [2].

Illustrative case study two: Optimization of fluidized bed reactor of oxidative coupling of methane based on GA.

In this case study, optimization of C<sup>2</sup> (ethane + ethylene) yield in the oxidative coupling of methane (OCM) over the Mn/Na<sup>2</sup> WO4 /SiO<sup>2</sup> catalyst in a fluidized bed reactor is carried out [20].

OCM is a series of chemical reactions, first presented in the 1980s by Keller and Bhasin [59] for the direct conversion of natural gas into the desired product of ethylene and other valueadded chemicals. One of the barriers to the commercialization of this process is the low yield of the reactions. Various solutions have been proposed for yield improvement in the literature [60, 61]. One possibility to improve the C<sup>2</sup> yield is stage-wise feeding configuration along the reactor as shown in **Figure 6** [62].

In this scheme, it is assumed that the injected gas in each stage has only oxygen in content and all the methane is introduced to the reactor at the entrance of the bed.

**Figure 6.** Stage-wise feeding configuration in OCM reactor.

In this case study, the main process variables which are optimized to enhance C<sup>2</sup> yield are listed in **Table 1**.

The kinetic model presented by Daneshpayeh et al. [63] is used as the reaction sub-model. The reactor is firstly modeled and then solved [62]. Afterward, using GA, the C<sup>2</sup> yield is optimized for one, two and three secondary injections of oxygen.

C2 yield considered as the fitness function is defined as follows:

$$Y\_{c\_i} = \frac{2 \times N\_{c\_i}}{N\_{CH\_i}} \times 100\tag{5}$$

The main GA parameters are presented in **Table 2**.

According to the possible combination of input variables, 10 rules can be defined for this model, and due to the low number of rules, the first-mentioned method for parameter optimization is applied. The population size and maximum generation number of GA are set as 500 and 100, respectively. Having passed the optimization procedure, the fuzzy model is developed with optimized parameters. This model is developed based on qualitative rules,

The comparison between model data and experimental data shows an accuracy of near 90%

Illustrative case study two: Optimization of fluidized bed reactor of oxidative coupling of

OCM is a series of chemical reactions, first presented in the 1980s by Keller and Bhasin [59] for the direct conversion of natural gas into the desired product of ethylene and other valueadded chemicals. One of the barriers to the commercialization of this process is the low yield of the reactions. Various solutions have been proposed for yield improvement in the literature

In this scheme, it is assumed that the injected gas in each stage has only oxygen in content and

(ethane + ethylene) yield in the oxidative coupling of

catalyst in a fluidized bed reactor is carried out [20].

yield is stage-wise feeding configuration along the

bypassing the complexities and drawbacks of the white-box modeling method.

/SiO<sup>2</sup>

WO4

all the methane is introduced to the reactor at the entrance of the bed.

for the developed fuzzy model [2].

408 Artificial Intelligence - Emerging Trends and Applications

In this case study, optimization of C<sup>2</sup>

[60, 61]. One possibility to improve the C<sup>2</sup>

**Figure 6.** Stage-wise feeding configuration in OCM reactor.

methane (OCM) over the Mn/Na<sup>2</sup>

reactor as shown in **Figure 6** [62].

methane based on GA.

The best results are achieved for three injections of oxygen along the reactor. The optimum values of decision variables are presented in **Table 3**. The maximum C<sup>2</sup> yield of 22.87% is achieved for three secondary oxygen injections at the operating temperature of 746.05°C. This optimized C<sup>2</sup> yield which is achieved by an AI-based method is approximately 4% higher than the original model [61].

Illustrative case study three: Online genetic-ANFIS control for advanced microwave biodiesel reactor.

The microchem reactor based on the microwave process technology is used to produce biodiesel which is good for the environment. The reactor temperature should be controlled to ensure an optimal yield of biodiesel and to minimize the generation of unwanted byproducts. For this aim, Wali et al. implemented an artificial intelligent controller design based on the online genetic-ANFIS temperature control for advanced biodiesel microwave reactor [41]. The microwave power supply as the manipulated variable, the reactor temperature as control variable and the feed-flow rate as the disturbance variable have been considered in this process.

The online genetic-ANFIS controller has been evaluated at different operation conditions (set-point tracking and disturbance rejection). The genetic-ANFIS controller successfully tracks the demands of reactor temperature set-point faster than adaptive control without any oscillations [41].

Illustrative case study four: Neuromorphic multiple-fault diagnosing the plant-wide system.


**Table 1.** Decision variables and their constraints.


outperformed the conventional neuromorphic diagnoser for the detection of multiple concurrent faults. It was also shown that the proposed scheme can correctly diagnose various combinations of six concurrent faults of the TE process (from two to six simultaneous faults). This achievement reflects the major advantage of the proposed approach, which is its ability to perform fault diagnosis in situations where multiple concurrent faults with overlapping

Application of AI in Chemical Engineering http://dx.doi.org/10.5772/intechopen.76027 411

AI techniques provide tools to tackle complex problems. Challenging and useful applications of AI techniques have been introduced in the chemical engineering processes. Four illustrative case studies are investigated in fields of process modeling, optimization, process control and fault detection and diagnosis. From the description of the various applications, the ability

[1] Luyben W. Process modeling, simulation and control for chemical engineers. Petroleum

[2] Madaeni SS, Kurdian AR. Fuzzy modeling and hybrid genetic algorithm optimization of virus removal from water using microfiltration membrane. Chemical Engineering

[3] Araromi DO, Sonibare JA, Emuoyibofarhe JO. Fuzzy identification of reactive distillation for acetic acid recovery from waste water. Journal of Environmental Chemical

[4] Hajjar Z, Kazemeini M, Rashidi A, Tayyebi S. Artificial intelligence techniques for modeling and optimization of the HDS process over a new graphene based catalyst.

[5] Soltanali S, Halladj R, Tayyebi S, Rashidi A. Neural network and genetic algorithm for modeling and optimization of effective parameters on synthesized ZSM-5 particle size.

Phosphorus, Sulfur, and Silicon and the Related Elements. 2016;**191**:1256-1261

of AI techniques has been revealed in a wide range of fields in chemical processes.

Zeinab Hajjar\*, Shokoufe Tayyebi and Mohammad Hosein Eghbal Ahmadi

\*Address all correspondence to: hajjarz1@gmail.com

Refinery Engineering. 1996;**2**:289-290

Research and Design. 2011;**89**:456-470

Engineering. 2014;**2**:1394-1403

Materials Letters. 2014;**136**:138-140

Research Institute of Petroleum Industry (RIPI), Tehran, Iran

symptoms have occurred [47].

**6. Conclusion**

**Author details**

**References**

#### **Table 2.** GA parameters.


**Table 3.** Optimum values of decision variables at maximum yield.

In plant-wide systems, because of system complexity and overlapping symptoms, conventional neural networks operating based on steady-state characteristic data are not usually capable of diagnosing multiple concurrent faults. To overcome this problem, Tayyebi et al. proposed a new neuromorphic diagnosis framework based on augmented input containing steady-state characteristic data along with newly defined dynamic characteristic data [47]. In this approach, the input vector of the neural network diagnosis has been selected in such a way that different faults cause distinctive symptoms. Therefore, information related to both the history of the process and the steady state has been utilized to achieve distinctive symptoms. Accordingly, one can use characteristic points of the dynamic trend of each measured variable to uniquely distinguish and detect various faults. To evaluate the proposed approach based on the hybrid parameter, the Tennessee Eastman process (TE) that contains large numbers of measurements and manipulated variables and overlapping faults was used as the plant-wide benchmark. The performance of the neuromorphic diagnoser based on the augmented inputs has been compared with that of the conventional neuromorphic diagnostic system whose inputs are steady-state characteristic data. The comparison showed that the proposed method outperformed the conventional neuromorphic diagnoser for the detection of multiple concurrent faults. It was also shown that the proposed scheme can correctly diagnose various combinations of six concurrent faults of the TE process (from two to six simultaneous faults). This achievement reflects the major advantage of the proposed approach, which is its ability to perform fault diagnosis in situations where multiple concurrent faults with overlapping symptoms have occurred [47].

## **6. Conclusion**

AI techniques provide tools to tackle complex problems. Challenging and useful applications of AI techniques have been introduced in the chemical engineering processes. Four illustrative case studies are investigated in fields of process modeling, optimization, process control and fault detection and diagnosis. From the description of the various applications, the ability of AI techniques has been revealed in a wide range of fields in chemical processes.

## **Author details**

Zeinab Hajjar\*, Shokoufe Tayyebi and Mohammad Hosein Eghbal Ahmadi

\*Address all correspondence to: hajjarz1@gmail.com

Research Institute of Petroleum Industry (RIPI), Tehran, Iran

## **References**

In plant-wide systems, because of system complexity and overlapping symptoms, conventional neural networks operating based on steady-state characteristic data are not usually capable of diagnosing multiple concurrent faults. To overcome this problem, Tayyebi et al. proposed a new neuromorphic diagnosis framework based on augmented input containing steady-state characteristic data along with newly defined dynamic characteristic data [47]. In this approach, the input vector of the neural network diagnosis has been selected in such a way that different faults cause distinctive symptoms. Therefore, information related to both the history of the process and the steady state has been utilized to achieve distinctive symptoms. Accordingly, one can use characteristic points of the dynamic trend of each measured variable to uniquely distinguish and detect various faults. To evaluate the proposed approach based on the hybrid parameter, the Tennessee Eastman process (TE) that contains large numbers of measurements and manipulated variables and overlapping faults was used as the plant-wide benchmark. The performance of the neuromorphic diagnoser based on the augmented inputs has been compared with that of the conventional neuromorphic diagnostic system whose inputs are steady-state characteristic data. The comparison showed that the proposed method

**Decision variable Optimum value**

**Parameter Value** Population size 50 Generations 10,000 Survival probability 0.5 Linear crossover probability 0.5 Mutation probability 0.167

Length of first section (m) 0.6345 Length of second section (m) 3.9389 Length of third section (m) 0.5882 Length of fourth section (m) 1.0116 Temperature (°C) 746.05 Yield 22.87

/s) 0.0833

/s) 0.0500

/s) 0.0536

/s) 0.0755

/s) 0.3998

Oxygen flow at the beginning of the reactor (m<sup>3</sup>

410 Artificial Intelligence - Emerging Trends and Applications

**Table 2.** GA parameters.

Oxygen flow at second section of the reactor (m<sup>3</sup>

Oxygen flow at third section of the reactor (m<sup>3</sup>

Oxygen flow at fourth section of the reactor (m<sup>3</sup>

Methane flow at the beginning of the reactor (m<sup>3</sup>

**Table 3.** Optimum values of decision variables at maximum yield.


[6] Soltanali S, Halladj R, Tayyebi S, Rashidi A. Application of genetic-fuzzy approach for estimation of nano ZSM-5 crystallinity. Materials Letters. 2015;**150**:39-43

[21] Yousefi M, Enayatifar R, Darus AN, Abdullah AH. Optimization of plate-fin heat exchangers by an improved harmony search algorithm. Applied Thermal Engineering.

Application of AI in Chemical Engineering http://dx.doi.org/10.5772/intechopen.76027 413

[22] Mahmood HA, Adam N, Sahari BB, Masuri SU. Development of a particle swarm optimisation model for estimating the homogeneity of a mixture inside a newly designed CNG-H2-AIR mixer for a dual fuel engine: An experimental and theoretic study. Fuel.

[23] Tayyebi S, Alishiri M. The control of MSF desalination plants based on inverse model

[24] Nowak G, Rusin A. Using the artificial neural network to control the steam turbine heat-

[25] Li S, Li Y. Neural network based nonlinear model predictive control for an intensified

[26] Fernandez de Canete J, Gonzalez S, del Saz-Orozco P, Garcia I. A harmonic balance approach to robust neural control of MIMO nonlinear processes applied to a distillation

[27] Damour C, Benne M, Grondin-Perez B, Chabriat J. Nonlinear predictive control based on artificial neural network model for industrial crystallization. Journal of Food

[28] Hojjati H, Sheikhzadeh M, Rohani S. Control of supersaturation in a semibatch antisolvent crystallization process using a fuzzy logic controller. Industrial and Engineering

[29] Underwood CP. Fuzzy multivariable control of domestic heat pumps. Applied Thermal

[30] Baroud Z, Benmiloud M, Benalia A, Ocampo-Martinez C. Novel hybrid fuzzy-PID control scheme for air supply in PEM fuel-cell-based systems. International Journal of

[31] Karr CL, Sharma SK, Hatcher WJ, Harper TR. Fuzzy control of an exothermic chemical reaction using genetic algorithms. Engineering Applications of Artificial Intelligence.

[32] Etinkaya SC, Zeybek Z, Hapoglu H, Alpbaz M. Optimal temperature control in a batch polymerization reactor using fuzzy-relational models-dynamics matrix control.

[33] Lima NMN, Linan LZ, Filho RM, Wolf Maciel MR, Embiruc M, Grácio u, F. Modeling and predictive control using fuzzy logic: Application for a polymerization system.

[34] Chang X-H, Yang G-H. Fuzzy robust constrained model predictive control for nonlinear

continuous reactor. Chemical Engineering and Processing. 2015;**96**:14-27

control by neural network. Desalination. 2014;**333**:92-100

column, Journal of Process Control. 2010;**20**:1270-1277

Engineering. 2010;**99**:225-231

Engineering. 2015;**90**:957-969

AIChE Journal. 2010;**56**:965-978

1993;**6**:575-582

Chemistry Research. 2007;**46**:1232-1240

Hydrogen Energy. 2017;**42**:10435-10447

Computers and Chemical Engineering. 2006;**30**:1315-1323

systems. Asian Journal of Control. 2011;**13**(6):947-955

ing process. Applied Thermal Engineering. 2016;**108**:204-210

2013;**50**:877-885

2017;**218**:131-150


[21] Yousefi M, Enayatifar R, Darus AN, Abdullah AH. Optimization of plate-fin heat exchangers by an improved harmony search algorithm. Applied Thermal Engineering. 2013;**50**:877-885

[6] Soltanali S, Halladj R, Tayyebi S, Rashidi A. Application of genetic-fuzzy approach for

[7] Hajjar Z, Khodadadi A, Mortazavi Y, Tayyebi S, Soltanali S. Artificial intelligence modeling of DME conversion to gasoline and light olefins over modified nano ZSM-5 catalysts.

[8] Gibbs MS, Morgan N, Maier HR, Dandy GC, Nixon JB, Holmes M. Investigation into the relationship between chlorine decay and water distribution parameters using data

[9] Mamdani E, Assilian S. An experiment in linguistic synthesis with a fuzzy logic control-

[10] Takagi T, Sugeno M. Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions on Systems, Man, and Cybernetics. 1985;**15**:116-132

[11] Zadeh LA. Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems.

[12] Adoko AC, Gokceoglu C, Wu L, Zuo QJ. Knowledge-based and data-driven fuzzy modeling for rockburst prediction. International Journal of Rock Mechanics and Mining

[13] Sala A, Guerra TM, Babuška R. Perspectives of fuzzy systems and control. Fuzzy Sets

[14] Habbi H, Zelmat M, Bouamama BO. A dynamic fuzzy model for a drum-boiler-turbine

[15] Gudwin R, Gomide F, Pedrycz W. Context adaptation in fuzzy processing and genetic

[16] Cordon O, Herrera F, del Jesus MJ, Villar P. A multiobjective genetic algorithm for feature selection and granularity learning in fuzzy-rule based classification systems. In:

[17] Cordon O, del Jesus MJ, Herrera F. Genetic learning of fuzzy rule-based classification systems cooperating with fuzzy reasoning methods, International Journal of Intelligence

[18] Pulkkinen P, Koivisto H. A dynamically constrained multiobjective genetic fuzzy system for regression problems. IEEE Transactions on Fuzzy Systems. 2010;**18**:161-177

[19] Chaves I, López J, Zapata J, Robayo A. Process Analysis and Simulation in Chemical

[20] Eghbal-Ahmadi M-H, Zaerpour M, Daneshpayeh M, Mostoufi N. Optimization of fluidized bed reactor of oxidative coupling of methane. International Journal of Chemical

IFSA World Congr. 20th NAFIPS Int. Conf. 2001. Jt. 9th. 3 2001. pp. 1253-1258

algorithms. International Journal of Intelligence Systems. 1998;**13**:929-948

estimation of nano ZSM-5 crystallinity. Materials Letters. 2015;**150**:39-43

driven methods. Mathematical and Computer Modelling. 2006;**44**:485-498

ler. International Journal of Man-Machine Studies. 1975;**7**:1-13

Fuel. 2016;**179**:79-86

412 Artificial Intelligence - Emerging Trends and Applications

1978;**1**:3-28

Sciences. 2013;**61**:86-95

and Systems. 2005;**156**:432-444

Systems. 1998;**13**:1025-1053

Reactor Engineering. 2012;**10**:1-21

Engineering. 2016

system. Automatica. 2003;**39**:1213-1219


[35] Teng L, Wang Y, Cai W, Li H. Robust model predictive control of discrete nonlinear systems with time delays and disturbances via T–S fuzzy approach. Journal of Process Control. 2017;**53**:70-79

[49] Tarifa EE, Scenna NJ. Fault diagnosis, direct graphs, and fuzzy logic. Computers &

Application of AI in Chemical Engineering http://dx.doi.org/10.5772/intechopen.76027 415

[50] Tarifa EE, Scenna NJ. Fault diagnosis for MSF dynamic states using a SDG and fuzzy logic, Fault diagnosis for MSF dynamic states using a SDG and fuzzy logic. Desalination.

[51] Tayyebi S, Shahrokhi M, Boozarjomehry RB. Fault diagnosis in a yeast fermentation bioreactor by genetic fuzzy system. Iranian journal of chemistry and chemical engineering.

[52] Hang J, Zhang J, Cheng M. Application of multi-class fuzzy support vector machine classifier for fault diagnosis of wind turbine. Fuzzy Sets and Systems. 2016;**297**:128-140

[53] Jahromi AT, Er MJ, Li X, Lim BS. Sequential fuzzy clustering based dynamic fuzzy neural network for fault diagnosis and prognosis. Neurocomputing. 2016;**196**:31-41

[54] Vachtsevanos G, Lewis F, Roemer M. Intelligent Fault Diagnosis and Prognosis for Engineering Systems. New York: John Wiley & Sons, Inc., Hoboken, New Jersey; 2006

[55] Dou D, Zhou S. Comparison of four direct classification methods for intelligent fault

[56] Lau CK, Heng YS, Hussain MA, Mohamad Nor MI. Fault diagnosis of the polypropylene production process (UNIPOL PP) using ANFIS. ISA Transactions. 2010;**49**:559-566

[57] Bonsignore L, Davarifar M, Rabhi A, Tina GM, Elhajjaji A. Neuro-fuzzy fault detection

[58] Shabanian M, Montazeri M. A neuro-fuzzy online fault detection and diagnosis algorithm for nonlinear and dynamic systems. International Journal of Control, Automation

[59] Keller GE, Bhasin MM. Synthesis of ethylene via oxidative coupling of methane.

[60] Kao YK, Lei L, Lin YS.A comparative simulation study on oxidative coupling of methane in fixed-bed and membrane reactors. Industrial and Engineering Chemistry Research.

[61] Lu Y, Dixon AG, Moser WR, Ma YH, Balachandran U. Oxygen-permeable dense membrane reactor for the oxidative coupling of methane. Journal of Membrane Science.

[62] Daneshpayeh M, Mostoufi N, Khodadadi A, Sotudeh-Gharebagh R, Mortazavi Y. Modeling of stagewise feeding in fluidized bed reactor of oxidative coupling of meth-

[63] Daneshpayeh M, Khodadadi A, Mostoufi N, Mortazavi Y, Sotudeh-Gharebagh R, Talebizadeh A. Kinetic modeling of oxidative coupling of methane over Mn/Na<sup>2</sup>

catalyst. Fuel Processing Technology. 2009;**90**:403-410

WO4 /

diagnosis of rotating machinery. Applied Soft Computing. 2016;**46**:459-468

method for photovoltaic systems. Energy Procedia. 2014;**62**:431-441

I. Determination of active catalysts. Journal of Catalysis. 1982;**73**:9-19

Chemical Engineering. 1997;**21**:S649-S654

2004;**166**:93-101

2010;**29**:61-72

and Systems. 2011;**9**:665-670

ane. Energy and Fuels. 2009;**23**:3745-3752

1997;**36**:3583-3593

2000;**170**:27-34

SiO<sup>2</sup>


[49] Tarifa EE, Scenna NJ. Fault diagnosis, direct graphs, and fuzzy logic. Computers & Chemical Engineering. 1997;**21**:S649-S654

[35] Teng L, Wang Y, Cai W, Li H. Robust model predictive control of discrete nonlinear systems with time delays and disturbances via T–S fuzzy approach. Journal of Process

[36] Miccio M, Cosenza B. Control of a distillation column by type-2 and type-1 fuzzy logic

[37] Galluzzo M, Cosenza B. Nonlinear fuzzy control of fed-batch reactor for the penicillin

[38] Galluzzo M, Cosenza B. Control of a non-isothermal continuous stirred tank reactor by a feedback-feed forward structure using type-2 fuzzy logic controllers. Information

[39] Perendeci A, Arslan S, Celebi SS, Tanyolac A. Prediction of effluent quality of an anaerobic treatment plant under unsteady state through ANFIS modeling with on-line input

[40] Fernandez de Canete J, Garcia-Cerezo A, Garcia-Moral I, Del Saz P, Ochoa E. Objectoriented approach applied to ANFIS modeling and control of a distillation column.

[41] Wali WA, Al-Shamma AI, Hassan KH, Cullen JD. Online genetic-ANFIS temperature control for advanced microwave biodiesel reactor. Journal of Process Control.

[42] Venkatasubramanian V, Rengaswamy R, Yin K, Kavuri SN. A review of process fault detection and diagnosis part I: Quantitative model-based methods. Computers and

[43] Tayarani-Bathaie SS, Khorasani K. Fault detection and isolation of gas turbine engines

[44] Tan WL, Nor NM, Abu Bakar MZ, Ahmed Z, Sata SA. Optimum parameters for fault detection and diagnosis system of batch reaction using multiple neural networks.

[45] Behbahani RM, Jazayeri-Rad H, Hajmirzaee S. Fault detection and diagnosis in a sour gas absorption column using neural networks. Chemical Engineering and Technology.

[46] Zhang Z, Zhao J. A deep belief network based fault diagnosis model for complex chemi-

[47] Tayyebi S, Boozarjomehry RB, Shahrokhi M. Neuromorphic multiple-fault diagnosing system based on plant dynamic characteristics. Industrial and Engineering Chemistry

[48] Musulin E, Yélamos I, Puigjaner L. Integration of principal component analysis and fuzzy logic systems for comprehensive process fault detection and diagnosis. Industrial

using a bank of neural networks. Journal of Process Control. 2015;**36**:22-41

Journal of Loss Prevention in the Process Industries. 2012;**25**:138-141

cal processes. Computers & Chemical Engineering. 2017;**107**:395-407

and Engineering Chemistry Research. 2006;**45**:1739-1750

PID controllers. Journal of Process Control. 2014;**24**:475-484

variables. Chemical Engineering Journal. 2008;**145**:78-85

Expert Systems with Applications. 2013;**40**:5648-5660

Chemical Engineering. 2003;**27**:293-311

production. Computers and Chemical Engineering. 2012;**36**:273-281

Control. 2017;**53**:70-79

414 Artificial Intelligence - Emerging Trends and Applications

Sciences. 2011;**181**:3535-3550

2012;**22**:1256-1272

2009;**32**(5):840-845

Research. 2013;**52**:12927-12936


**Chapter 21**

**Provisional chapter**

**Application of Biomedical Text Mining**

**Application of Biomedical Text Mining**

DOI: 10.5772/intechopen.75924

With the enormous volume of biological literature, increasing growth phenomenon due to the high rate of new publications is one of the most common motivations for the biomedical text mining. Aiming at this massive literature to process, it could extract more biological information for mining biomedical knowledge. Using the information will help understand the mechanism of disease generation, promote the development of disease diagnosis technology, and promote the development of new drugs in the field of biomedical research. Based on the background, this chapter introduces the rise of biomedical text mining. Then, it describes the biomedical text-mining technology, namely natural language processing, including the several components. This chapter emphasizes the two aspects in biomedical text mining involving static biomedical information recognization and dynamic biomedical information extraction using instance analysis from our previous works. The aim is to provide a way to quickly understand biomedical text mining for some researchers. **Keywords:** bioinformatics, text mining, natural language processing, information

With the rapid growth of the high-throughput biological technology, the study of biomedical science is entering omics era. It brings several omics data including genomics and transcriptomics; the vast amounts of biological data continue to emerge out of the life science research. The new phenomenon, new discovery, and new experimental data in biomedical research are mostly published in science journals by electronic text form. A large number of biological information is scattered in all kinds of studies. Handling these biomedical literatures could extract more biological information and discover new biomedical knowledge. Manual processing is like looking for a needle in a haystack. Biomedical literature can be seen

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75924

Lejun Gong

Lejun Gong

**Abstract**

extraction

**1. Introduction**

#### **Application of Biomedical Text Mining Application of Biomedical Text Mining**

#### Lejun Gong Lejun Gong

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75924

#### **Abstract**

With the enormous volume of biological literature, increasing growth phenomenon due to the high rate of new publications is one of the most common motivations for the biomedical text mining. Aiming at this massive literature to process, it could extract more biological information for mining biomedical knowledge. Using the information will help understand the mechanism of disease generation, promote the development of disease diagnosis technology, and promote the development of new drugs in the field of biomedical research. Based on the background, this chapter introduces the rise of biomedical text mining. Then, it describes the biomedical text-mining technology, namely natural language processing, including the several components. This chapter emphasizes the two aspects in biomedical text mining involving static biomedical information recognization and dynamic biomedical information extraction using instance analysis from our previous works. The aim is to provide a way to quickly understand biomedical text mining for some researchers.

DOI: 10.5772/intechopen.75924

**Keywords:** bioinformatics, text mining, natural language processing, information extraction

## **1. Introduction**

With the rapid growth of the high-throughput biological technology, the study of biomedical science is entering omics era. It brings several omics data including genomics and transcriptomics; the vast amounts of biological data continue to emerge out of the life science research. The new phenomenon, new discovery, and new experimental data in biomedical research are mostly published in science journals by electronic text form. A large number of biological information is scattered in all kinds of studies. Handling these biomedical literatures could extract more biological information and discover new biomedical knowledge. Manual processing is like looking for a needle in a haystack. Biomedical literature can be seen

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

as a large unstructured data repository, which makes text mining come into play. Text mining has emerged as a potential solution to achieve knowledge for bridging between the free text and structured representation of biomedical information using artificial intelligence technology including natural language processing (NLP), machine learning (ML), and data mining to process large text collections. Therefore, text-mining technology is a powerful tool for mining valuable information from biomedical literature. There are broader definitions of biomedical text mining. Namely any work that extracts information from text could be considered as text mining, which would include a range of static information recognition, from dynamic information extraction to application of biomedical text mining. In the following sections, we describe the details according to the abovementioned aspects.

up the representation of protein entities represented in OWL and SPARQL format. Ontologies provide machine-readable descriptions for biomedical concepts linking domain-specific vocabulary. Ontology-based mining systems attempt to map a terminology in text to a concept in an ontology. Kim et al. [8] applied syntactic parsing on sentences with annotated GO concepts to

Application of Biomedical Text Mining http://dx.doi.org/10.5772/intechopen.75924 419

The natural language processing (NLP) components include general tasks including tokeni-

Tokenization describes the processing that the text is broken into sentences and words. When the text is put into text-mining system, for example, a paper could be viewed as a continuous word stream; they are first broken up into chapters and paragraphs, and then the broken paragraphs continue to be pieced as sentences, words, and phonemes for a more sophisticated processing. For the tokenization task, the tokenizer could extract token features which

POS tagging is aiming at the words to annotate tags based on the context in the text. POS tags divide words into categories based on the role in the sentence. POS tags provide information about the word's semantic content. Nouns usually denote the entities, whereas prepositions

Syntactical parsing performs a full syntactical analysis of sentences according to a certain grammar theory including constituency and dependency grammars. Constituency grammars describe the syntactical structure of sentences in terms of phrases, namely element sequences. Most constituency grammars generally contain noun phrases, verb phrases, prepositional phrases, adjective phrases, and so on. Each phrase may consist of smaller phrases or words according to the rules of the grammar. The role of different phrases is contained in the syntactical structure of sentences. For example, a noun phrase may be marked as the subject of sentence, object. Dependency grammars focus on the direct relations between words, not considering the constituents. Dependency analysis uses Direct Acyclic Graph (DAG) to denote the relations between words using nodes and dependencies for edges. For example, a subject

depends on the predicate verb, while an adjective depends on the noun and so on.

tions in their own discipline, let alone related other disciplines.

With the enormous volume of biological literature, increasing growth phenomenon due to the high rate of new publications is one of the most common motivations for the biomedical text mining. It is reported that the growth in PubMed/Medline literature is exponential at this rate of publication. Thus, it is very difficult for researchers to keep up with the relevant publica-

Such a large scale and the rapid growth of biomedical literature data, carrying a lot of biological information, some new phenomena, biomedical discoveries, and new experimental data are often published in recent biomedical literature. Aiming at this massive literature to process, it could extract more biological information for mining hidden biomedical knowledge. These

exploit similarities of sentential syntactic dependencies for mapping to concepts.

are types of capitalization, digits, punctuation, special characters, and so on.

zation, morphological analysis, POS tagging, and syntactic parsing.

express relationships between entities.

**3. Biomedical text mining**

## **2. Natural language processing**

Natural language processing is a field of artificial intelligence in computer science with interaction between computers and natural languages. With the mushroom growth of machine learning, much natural language processing research has a great relationship with machine learning. Many machine-learning algorithms have been applied to natural language-processing tasks. Extracting structured data from the complexity of heterogeneous-narrated medical reports is significantly challenged, and the work [1] obtained the results with F1 scores greater than 95% using machine learning, for example, HMM model. Weng et al. [2] proposed the machine learning to classify clinical notes to the medical subdomain. They reported that the classifier of the convolutional recurrent neural network with word embeddings yielded the best performance on iDASH and MGH datasets with F1 scores of 0.845 and 0.870, respectively. Basaldella et al. [3] proposed a hybrid approach using a two-stage pipeline with a machine-learning classifier combining a dictionary approach, and they achieved an overall precision of 86% at a recall of 60% on the named entity recognition task. For helping to obtain biomedical knowledge, a flourishing of ontologies attempted to represent the complexity of the biomedical concepts in text-mining area. The ontologies describe a wide variety of biological concepts spanning from biology to medicine. Moreover, they not only attempt to capture the meaning of a particular domain based on biomedical community but also are key element for knowledge management, and data integration [4]. Ontologies and controlled vocabularies could improve the efficiency and consistency of biomedical data curation, which has a great increasing interest in developing ontologies. For example, gene ontology (GO) [5] is a community-based bioinformatics resource related to gene function used to represent biological knowledge involving three aspects: molecular function, cellular component, and biological process. Molecular function describes the molecular activities of gene products, such as binding or catalysis. Cellular component indicates the place where gene products are active, namely a component of a cell such as an anatomical structure. While biological process represents pathways, larger processes are made up of the activities of multiple gene products. Another is GENIA [6] corpus developed to provide reference materials to allow natural language-processing techniques work for extracting information. GENIA corpus is a semantically annotated corpus of biological literature, which is being compiled and annotated in the scope of GENIA project, aiming at providing high-quality reference materials for bioinformatics. Natale et al. [7] proposed protein ontology (PRO) for enhancing and scaling up the representation of protein entities represented in OWL and SPARQL format. Ontologies provide machine-readable descriptions for biomedical concepts linking domain-specific vocabulary. Ontology-based mining systems attempt to map a terminology in text to a concept in an ontology. Kim et al. [8] applied syntactic parsing on sentences with annotated GO concepts to exploit similarities of sentential syntactic dependencies for mapping to concepts.

The natural language processing (NLP) components include general tasks including tokenization, morphological analysis, POS tagging, and syntactic parsing.

Tokenization describes the processing that the text is broken into sentences and words. When the text is put into text-mining system, for example, a paper could be viewed as a continuous word stream; they are first broken up into chapters and paragraphs, and then the broken paragraphs continue to be pieced as sentences, words, and phonemes for a more sophisticated processing. For the tokenization task, the tokenizer could extract token features which are types of capitalization, digits, punctuation, special characters, and so on.

POS tagging is aiming at the words to annotate tags based on the context in the text. POS tags divide words into categories based on the role in the sentence. POS tags provide information about the word's semantic content. Nouns usually denote the entities, whereas prepositions express relationships between entities.

Syntactical parsing performs a full syntactical analysis of sentences according to a certain grammar theory including constituency and dependency grammars. Constituency grammars describe the syntactical structure of sentences in terms of phrases, namely element sequences. Most constituency grammars generally contain noun phrases, verb phrases, prepositional phrases, adjective phrases, and so on. Each phrase may consist of smaller phrases or words according to the rules of the grammar. The role of different phrases is contained in the syntactical structure of sentences. For example, a noun phrase may be marked as the subject of sentence, object. Dependency grammars focus on the direct relations between words, not considering the constituents. Dependency analysis uses Direct Acyclic Graph (DAG) to denote the relations between words using nodes and dependencies for edges. For example, a subject depends on the predicate verb, while an adjective depends on the noun and so on.

## **3. Biomedical text mining**

as a large unstructured data repository, which makes text mining come into play. Text mining has emerged as a potential solution to achieve knowledge for bridging between the free text and structured representation of biomedical information using artificial intelligence technology including natural language processing (NLP), machine learning (ML), and data mining to process large text collections. Therefore, text-mining technology is a powerful tool for mining valuable information from biomedical literature. There are broader definitions of biomedical text mining. Namely any work that extracts information from text could be considered as text mining, which would include a range of static information recognition, from dynamic information extraction to application of biomedical text mining. In the following sections, we

Natural language processing is a field of artificial intelligence in computer science with interaction between computers and natural languages. With the mushroom growth of machine learning, much natural language processing research has a great relationship with machine learning. Many machine-learning algorithms have been applied to natural language-processing tasks. Extracting structured data from the complexity of heterogeneous-narrated medical reports is significantly challenged, and the work [1] obtained the results with F1 scores greater than 95% using machine learning, for example, HMM model. Weng et al. [2] proposed the machine learning to classify clinical notes to the medical subdomain. They reported that the classifier of the convolutional recurrent neural network with word embeddings yielded the best performance on iDASH and MGH datasets with F1 scores of 0.845 and 0.870, respectively. Basaldella et al. [3] proposed a hybrid approach using a two-stage pipeline with a machine-learning classifier combining a dictionary approach, and they achieved an overall precision of 86% at a recall of 60% on the named entity recognition task. For helping to obtain biomedical knowledge, a flourishing of ontologies attempted to represent the complexity of the biomedical concepts in text-mining area. The ontologies describe a wide variety of biological concepts spanning from biology to medicine. Moreover, they not only attempt to capture the meaning of a particular domain based on biomedical community but also are key element for knowledge management, and data integration [4]. Ontologies and controlled vocabularies could improve the efficiency and consistency of biomedical data curation, which has a great increasing interest in developing ontologies. For example, gene ontology (GO) [5] is a community-based bioinformatics resource related to gene function used to represent biological knowledge involving three aspects: molecular function, cellular component, and biological process. Molecular function describes the molecular activities of gene products, such as binding or catalysis. Cellular component indicates the place where gene products are active, namely a component of a cell such as an anatomical structure. While biological process represents pathways, larger processes are made up of the activities of multiple gene products. Another is GENIA [6] corpus developed to provide reference materials to allow natural language-processing techniques work for extracting information. GENIA corpus is a semantically annotated corpus of biological literature, which is being compiled and annotated in the scope of GENIA project, aiming at providing high-quality reference materials for bioinformatics. Natale et al. [7] proposed protein ontology (PRO) for enhancing and scaling

describe the details according to the abovementioned aspects.

**2. Natural language processing**

418 Artificial Intelligence - Emerging Trends and Applications

With the enormous volume of biological literature, increasing growth phenomenon due to the high rate of new publications is one of the most common motivations for the biomedical text mining. It is reported that the growth in PubMed/Medline literature is exponential at this rate of publication. Thus, it is very difficult for researchers to keep up with the relevant publications in their own discipline, let alone related other disciplines.

Such a large scale and the rapid growth of biomedical literature data, carrying a lot of biological information, some new phenomena, biomedical discoveries, and new experimental data are often published in recent biomedical literature. Aiming at this massive literature to process, it could extract more biological information for mining hidden biomedical knowledge. These vast amounts of biomedical literature, even in the field of expert, could not rely on the manual way from fully grasp the status quo and development trend of the research to obtain the information of interest for extracting biomedical knowledge. It is the urgent needs of application of text mining and information extraction from biomedical literature in the field of molecular biology and biomedical knowledge extraction. Biomedical text mining [9] is the frontier research field containing the collection combined computational linguistics, bioinformatics, medical information science, research fields, and so on. The development of biomedical text mining is less than 25 years [10], which belongs to a branch of bioinformatics. Bioinformatics is defined as application information science and technology to understand, organize, and manage biomolecular data. It aims to provide some tools and resources for biological researchers, facilitate them to get biological data, and analyze data, so as to discover new knowledge [11] of the biological world. Biomedical text mining is a sub-field of bioinformatics. It refers to the use of text-mining technology to process biomedical literature object, acquire biological information, organize and manage the acquired bioinformation, and provide it to researchers. Therefore, biomedical text mining can extract various biological information [12], such as gene and protein information, gene expression regulation information, gene polymorphism and epigenetic information, gene, and disease relationship. The biological information can help people to understand life phenomena and understand the rules of life activities. Using the information will help understand the mechanism of disease generation, promote the development of disease diagnosis technology, and promote the development of new drugs in the field of biomedical research. A large number of text-mining methods have been established to assist in the extraction of biological information. These methods could be proposed for extracting information that vary in their degree of reliance on dictionaries, statistical and knowledgebased approaches, automatic rule generation applying part-of-speech (POS) tagger, and some machine-learning algorithms, for example, Hidden Markov Models (HMMs) and decision trees. Cronin et al. [13] classified patient portal messages by a comparison of rule-based and machine-learning approach using a bag of words and natural language-processing (NLP) approaches. The best performance of classifier for individual communication subtypes was random forests for logistical-contact information with 0.963 receiver-operator curve.

There are some biological terminologies that describe domain objects in the medical literature, which is called entity. Such as gene that is the essence of life information, protein information that is the executor of gene function in biomedicine, identification of these entities in the life sciences plays an important role in revealing the phenomenon of life, which is the only way which must be passed to further explore these important biological entities, but also an important task in biomedical text mining. Biological entity representation in biomedical literature is extremely complex. The complexity of performance in both single entity in the form of a word entity, variable word length, and uppercase and lowercase mixed together, for example, urokinase, Cactus, IkapaBalpha, and so on. There are multi-words to form phrases, such as bradykinin B (1) receptor, protein phosphatase 2A, which brought a great deal of difficulty to establish biological entity boundaries; some of the same words or phrases that can be expressed in different categories of biomedical entities, such as c-myc, IL-2 protein can be expressed, can be said to detect gene, through the context, some biomedical entities have different forms of writing, such as protein phosphatase 2A, protein phosphatase, and 2A protein phosphatase 2A and refers to as the same biological entity in biomedical text. The complexity and diversity of the entity recognition of biomedical named entities has become a challenging study. Traditional recognition approaches have three methods containing dictionary-based, rules-based, and statistical machine learning.

Application of Biomedical Text Mining http://dx.doi.org/10.5772/intechopen.75924 421

Dictionary-based approach needs a detailed term dictionary for entity recognition from the document. Generally, the pre-given term dictionary is edited by the specific biological molecular database. Accordingly, the approach is limited by the term dictionary coverage but it has the relatively higher accuracy. Thus, the approach is widely applied in the actual develop-

Rule-based approach requires studying the entities' named features and laws for formulating rules to identify the entity, which make the staff to develop biomedical domain background knowledge. For example, biomedical entities are often noun phrases. The first character of a human gene nomenclature is a letter, and the rest is a combination of letters and numbers. Based on the successful rule system, AbGene [18], Tanabe and Wilbur used it to produce a large and high-quality gene protein dictionary from biomedical texts, and it is also used as a

Statistical machine-learning approach is the rapid development along with the construction of biomedical corpora. The annotated corpus could be used as the training corpus of statistical machine-learning method. With the help of the biomedical corpora, entities could be identified from the biomedical text. There are some researches in the aspects. ABNER developed by Settles [20] used the Conditional Random Fields (CRF) as the statistical model to identify biomedical entities with the average 72.0% recall, the 69.1% precise, and 70.5% F-score in JNLPBA using morphological and semantic features. Mitsumori et al. [21] proposed an approach to process entity using Support Vector Machine (SVM) as a statistical model with internal and external resource features, which show that the performance of identification is improved by using the external biological dictionary features. Saha et al. [22] used maximum entropy model combined with word-clustering features and feature selection techniques to identify biomedical entities. The approach achieves better performance without using domain knowledge. Li et al. [23] use two-step CRFs to identify biomedical entities, the first CRF model

ment in the system. For example, Whatizit [16] and FRENDA [17].

component of relationship extraction [19].

In addition, there are some social institutes to focus on the development of biomedical textmining technology. Based on the rapid development of omics era, the BioCreative called Critical Assessment of Information Extraction system in Biology is a community-wide effort for the evaluation of text mining and information extraction systems applied to the biomedicine domain using natural language processing [14, 15].The researches of biomedical text mining are presented at several conferences including Pacific Symposium on Biocomputing, BioNLP, and Practical Applications of Computational Biology and Bioinformatics [10].

## **4. Static biomedical information recognition**

In the era of system biology, from the system perspective-related information on molecular biology research includes both biomedical entities, some genes, proteins, gene products, drugs, diseases such as basic, static entities to reflect its existence form, called static biomedical information. There are some biological terminologies that describe domain objects in the medical literature, which is called entity. Such as gene that is the essence of life information, protein information that is the executor of gene function in biomedicine, identification of these entities in the life sciences plays an important role in revealing the phenomenon of life, which is the only way which must be passed to further explore these important biological entities, but also an important task in biomedical text mining. Biological entity representation in biomedical literature is extremely complex. The complexity of performance in both single entity in the form of a word entity, variable word length, and uppercase and lowercase mixed together, for example, urokinase, Cactus, IkapaBalpha, and so on. There are multi-words to form phrases, such as bradykinin B (1) receptor, protein phosphatase 2A, which brought a great deal of difficulty to establish biological entity boundaries; some of the same words or phrases that can be expressed in different categories of biomedical entities, such as c-myc, IL-2 protein can be expressed, can be said to detect gene, through the context, some biomedical entities have different forms of writing, such as protein phosphatase 2A, protein phosphatase, and 2A protein phosphatase 2A and refers to as the same biological entity in biomedical text. The complexity and diversity of the entity recognition of biomedical named entities has become a challenging study. Traditional recognition approaches have three methods containing dictionary-based, rules-based, and statistical machine learning.

vast amounts of biomedical literature, even in the field of expert, could not rely on the manual way from fully grasp the status quo and development trend of the research to obtain the information of interest for extracting biomedical knowledge. It is the urgent needs of application of text mining and information extraction from biomedical literature in the field of molecular biology and biomedical knowledge extraction. Biomedical text mining [9] is the frontier research field containing the collection combined computational linguistics, bioinformatics, medical information science, research fields, and so on. The development of biomedical text mining is less than 25 years [10], which belongs to a branch of bioinformatics. Bioinformatics is defined as application information science and technology to understand, organize, and manage biomolecular data. It aims to provide some tools and resources for biological researchers, facilitate them to get biological data, and analyze data, so as to discover new knowledge [11] of the biological world. Biomedical text mining is a sub-field of bioinformatics. It refers to the use of text-mining technology to process biomedical literature object, acquire biological information, organize and manage the acquired bioinformation, and provide it to researchers. Therefore, biomedical text mining can extract various biological information [12], such as gene and protein information, gene expression regulation information, gene polymorphism and epigenetic information, gene, and disease relationship. The biological information can help people to understand life phenomena and understand the rules of life activities. Using the information will help understand the mechanism of disease generation, promote the development of disease diagnosis technology, and promote the development of new drugs in the field of biomedical research. A large number of text-mining methods have been established to assist in the extraction of biological information. These methods could be proposed for extracting information that vary in their degree of reliance on dictionaries, statistical and knowledgebased approaches, automatic rule generation applying part-of-speech (POS) tagger, and some machine-learning algorithms, for example, Hidden Markov Models (HMMs) and decision trees. Cronin et al. [13] classified patient portal messages by a comparison of rule-based and machine-learning approach using a bag of words and natural language-processing (NLP) approaches. The best performance of classifier for individual communication subtypes was

420 Artificial Intelligence - Emerging Trends and Applications

random forests for logistical-contact information with 0.963 receiver-operator curve.

**4. Static biomedical information recognition**

In addition, there are some social institutes to focus on the development of biomedical textmining technology. Based on the rapid development of omics era, the BioCreative called Critical Assessment of Information Extraction system in Biology is a community-wide effort for the evaluation of text mining and information extraction systems applied to the biomedicine domain using natural language processing [14, 15].The researches of biomedical text mining are presented at several conferences including Pacific Symposium on Biocomputing, BioNLP, and Practical Applications of Computational Biology and Bioinformatics [10].

In the era of system biology, from the system perspective-related information on molecular biology research includes both biomedical entities, some genes, proteins, gene products, drugs, diseases such as basic, static entities to reflect its existence form, called static biomedical information. Dictionary-based approach needs a detailed term dictionary for entity recognition from the document. Generally, the pre-given term dictionary is edited by the specific biological molecular database. Accordingly, the approach is limited by the term dictionary coverage but it has the relatively higher accuracy. Thus, the approach is widely applied in the actual development in the system. For example, Whatizit [16] and FRENDA [17].

Rule-based approach requires studying the entities' named features and laws for formulating rules to identify the entity, which make the staff to develop biomedical domain background knowledge. For example, biomedical entities are often noun phrases. The first character of a human gene nomenclature is a letter, and the rest is a combination of letters and numbers. Based on the successful rule system, AbGene [18], Tanabe and Wilbur used it to produce a large and high-quality gene protein dictionary from biomedical texts, and it is also used as a component of relationship extraction [19].

Statistical machine-learning approach is the rapid development along with the construction of biomedical corpora. The annotated corpus could be used as the training corpus of statistical machine-learning method. With the help of the biomedical corpora, entities could be identified from the biomedical text. There are some researches in the aspects. ABNER developed by Settles [20] used the Conditional Random Fields (CRF) as the statistical model to identify biomedical entities with the average 72.0% recall, the 69.1% precise, and 70.5% F-score in JNLPBA using morphological and semantic features. Mitsumori et al. [21] proposed an approach to process entity using Support Vector Machine (SVM) as a statistical model with internal and external resource features, which show that the performance of identification is improved by using the external biological dictionary features. Saha et al. [22] used maximum entropy model combined with word-clustering features and feature selection techniques to identify biomedical entities. The approach achieves better performance without using domain knowledge. Li et al. [23] use two-step CRFs to identify biomedical entities, the first CRF model is used to identify named entity and the second CRF model is used to the types of named entities, which obtained 74.3% F-score in JNLPBA corpus. The mentioned three approaches have their own advantages, respectively. There is also a hybrid approach to be used for identifying biomedical entities. Our research group does some works related to identify biomedical entities involving the abovementioned three methods. We give several examples to illustrate the related static biomedical information recognition.

biomedical literature, the contents are achieved based on pattern mapping, and the crawled

Application of Biomedical Text Mining http://dx.doi.org/10.5772/intechopen.75924 423

Aiming at the collected literature, to every biomedical literature, there may be several paragraphs, and each paragraph may include one or several sentences. The basic textual units identified by the tokenization as constituent tokens in each sentence are tagged by the Standford POS tagger [30]. It is a tool that appoints parts of speech to every word (for instance, verbs, adjectives, nouns, and so on) by reading text in some language with the implementation of log-linear-part-of-speech taggers. The results are shown in **Figure 2** aiming at an example sentence. Aiming at the features of biomedical concept, the POS-word pairs are important for the biomedical concepts. For example, nouns, adjectives, participles, and so on, could be components of some biomedical concepts, while some words, such as indefinite articles and verbs, would be omitted to the identification of the biomedical concepts. Thus, we extract phrase blocks combined by nouns, adjectives, and participles. These phrase blocks are preprocessed as a unified base form. For example, the word of tagging POS labels "NN" (single noun) and "NNS" (plural noun) transforms the word of single noun for the normalization. **Figure 2**

Considering the high precision of dictionary-based approach, the approach used it for the identification of the biomedical concepts. Thus, concept dictionary is built by some authorized biomedical ontologies, for example, Disease Ontology (DO), Gene Ontology (GO), and GENIA ontology. GENIA corpus is used to provide reference material for biotext mining. GENIA corpus is also semantically annotated corpus containing 2000 MEDLINE literature with almost 100,000 annotation information and more than 507,325 words including 18,546 sentences for biomedical terms in 3.02 version. Each article is encoded in an XML-based mark-up scheme with ID, title, and abstract in the corpus. Moreover, both abstracts and titles have been marked up for meaningful annotated terms, biologically and semantically. The corpus provides semantically annotated biological terms identifiable with any terminal concepts for GENIA ontology, For instances, "<cons lex = "IL-2\_gene" sem = "G#DNA\_domain\_or\_region" > IL-2 gene</cons>", the label "lex" describes the concepts, and the label "sem" represent the type of concept. Due to the structural scheme of GENIA ontology, it could be extracted by the regular expression. These extracted biologically meaningful terms build concept dictionary for the

**Figure 2.** Sentence with part-of-speech tags generated by Stanford maximum entropy part-of-speech tagger. Tags are NN: normal noun; IN: preposition or conjunction; VBZ: verb in present/past tense; JJ: adjective; CC: conjunction; NNP:

contents are stored in the local database to be further processed.

*4.1.2. Preprocess, POS tagging, and phrase block*

shows an example about the POS tagging [25].

*4.1.3. Biomedical dictionaries' construction*

proper noun.

## **4.1. Dictionary-based approach to identify biomedical entities**

In this section, we introduce our previous work [24–26] which is published in the International Journal of Pattern Recognition and Artificial Intelligence. We first achieve experimental literature and build a concept dictionary based on the authoritative corpus. Using part-of-speech (POS) tagging, phrase block's formulation and designed VWIA algorithm to identify entities for matching biomedical concepts. **Figure 1** [24] describes the pipeline.

#### *4.1.1. Obtain experimental data*

The experimental data of the study are from PubMed/Medline using the e-utilities API tool which is also used in the works [27–29] for automatically downloading literatures from the website (http://www.ncbi.nlm.nih.gov/). It looks like the web spider to catch a series of hyperlinks with a Uniform Resource Locator (URL). By obtaining URL using e-utilities related to

**Figure 1.** Pipeline of our approach to recognize biomedical concepts.

biomedical literature, the contents are achieved based on pattern mapping, and the crawled contents are stored in the local database to be further processed.

#### *4.1.2. Preprocess, POS tagging, and phrase block*

is used to identify named entity and the second CRF model is used to the types of named entities, which obtained 74.3% F-score in JNLPBA corpus. The mentioned three approaches have their own advantages, respectively. There is also a hybrid approach to be used for identifying biomedical entities. Our research group does some works related to identify biomedical entities involving the abovementioned three methods. We give several examples to illustrate the

In this section, we introduce our previous work [24–26] which is published in the International Journal of Pattern Recognition and Artificial Intelligence. We first achieve experimental literature and build a concept dictionary based on the authoritative corpus. Using part-of-speech (POS) tagging, phrase block's formulation and designed VWIA algorithm to identify entities

The experimental data of the study are from PubMed/Medline using the e-utilities API tool which is also used in the works [27–29] for automatically downloading literatures from the website (http://www.ncbi.nlm.nih.gov/). It looks like the web spider to catch a series of hyperlinks with a Uniform Resource Locator (URL). By obtaining URL using e-utilities related to

related static biomedical information recognition.

422 Artificial Intelligence - Emerging Trends and Applications

**Figure 1.** Pipeline of our approach to recognize biomedical concepts.

*4.1.1. Obtain experimental data*

**4.1. Dictionary-based approach to identify biomedical entities**

for matching biomedical concepts. **Figure 1** [24] describes the pipeline.

Aiming at the collected literature, to every biomedical literature, there may be several paragraphs, and each paragraph may include one or several sentences. The basic textual units identified by the tokenization as constituent tokens in each sentence are tagged by the Standford POS tagger [30]. It is a tool that appoints parts of speech to every word (for instance, verbs, adjectives, nouns, and so on) by reading text in some language with the implementation of log-linear-part-of-speech taggers. The results are shown in **Figure 2** aiming at an example sentence. Aiming at the features of biomedical concept, the POS-word pairs are important for the biomedical concepts. For example, nouns, adjectives, participles, and so on, could be components of some biomedical concepts, while some words, such as indefinite articles and verbs, would be omitted to the identification of the biomedical concepts. Thus, we extract phrase blocks combined by nouns, adjectives, and participles. These phrase blocks are preprocessed as a unified base form. For example, the word of tagging POS labels "NN" (single noun) and "NNS" (plural noun) transforms the word of single noun for the normalization. **Figure 2** shows an example about the POS tagging [25].

#### *4.1.3. Biomedical dictionaries' construction*

Considering the high precision of dictionary-based approach, the approach used it for the identification of the biomedical concepts. Thus, concept dictionary is built by some authorized biomedical ontologies, for example, Disease Ontology (DO), Gene Ontology (GO), and GENIA ontology. GENIA corpus is used to provide reference material for biotext mining. GENIA corpus is also semantically annotated corpus containing 2000 MEDLINE literature with almost 100,000 annotation information and more than 507,325 words including 18,546 sentences for biomedical terms in 3.02 version. Each article is encoded in an XML-based mark-up scheme with ID, title, and abstract in the corpus. Moreover, both abstracts and titles have been marked up for meaningful annotated terms, biologically and semantically. The corpus provides semantically annotated biological terms identifiable with any terminal concepts for GENIA ontology, For instances, "<cons lex = "IL-2\_gene" sem = "G#DNA\_domain\_or\_region" > IL-2 gene</cons>", the label "lex" describes the concepts, and the label "sem" represent the type of concept. Due to the structural scheme of GENIA ontology, it could be extracted by the regular expression. These extracted biologically meaningful terms build concept dictionary for the


**Figure 2.** Sentence with part-of-speech tags generated by Stanford maximum entropy part-of-speech tagger. Tags are NN: normal noun; IN: preposition or conjunction; VBZ: verb in present/past tense; JJ: adjective; CC: conjunction; NNP: proper noun.

biomedical text mining. For example, the concept taxonomy of GENIA 3.02 version contains some biologically relevant nominal categories as shown in **Table 1** [24].

#### *4.1.4. Results*

Biomedical texts are divided into a few sentences in biomedical literature, and for every sentence, some phrase blocks are parsed. The algorithm named Variable-step Window Identification Algorithm (VWIA) is developed for identifying biomedical concepts. The approach obtained the overall 95.0% F-measure aiming at the GENIA corpus. The implementation of the approach is shown in **Figure 3** [24].

#### **4.2. Machine-learning approach**

In this work [26, 27], we introduce machine learning to help recognize biomedical named entity. The pipeline architecture of our approach is shown in **Figure 4** [26, 27].

The pipeline of our system mainly contains four modules: preprocessing module, training module, tagging module, and testing module.


*4.2.1. Preprocessing module*

**Figure 4.** The pipeline architecture of machine-learning approach.

**Figure 3.** Visualization of the dictionary-based approach.

feature dataset.

Applying machine-learning methods, the original biomedical texts could not directly be processed by the CRF model; they would be preprocessed to form the related formats. The original training, testing, and predicted texts would be processed to the specified format files with

Application of Biomedical Text Mining http://dx.doi.org/10.5772/intechopen.75924 425

**Table 1.** Biomedical concept categories and numbers in GENIA.

**Figure 3.** Visualization of the dictionary-based approach.

biomedical text mining. For example, the concept taxonomy of GENIA 3.02 version contains

Biomedical texts are divided into a few sentences in biomedical literature, and for every sentence, some phrase blocks are parsed. The algorithm named Variable-step Window Identification Algorithm (VWIA) is developed for identifying biomedical concepts. The approach obtained the overall 95.0% F-measure aiming at the GENIA corpus. The implemen-

In this work [26, 27], we introduce machine learning to help recognize biomedical named

The pipeline of our system mainly contains four modules: preprocessing module, training

**Categories Numbers Categories Numbers** protein\_molecule 21,632 peptide 524 protein\_family\_or\_group 8372 body\_part 449 DNA\_domain\_or\_region 8054 atom 341 cell\_type 7233 RNA\_family\_or\_group 334 other\_organic\_compound 4096 polynucleotide 260 cell\_line 3974 inorganic 256 protein\_complex 2417 nucleotide 239 lipid 2359 mono\_cell 222 virus 2126 other\_artificial\_source 209 multi\_cell 1766 protein\_substructure 129 DNA\_family\_or\_group 1558 DNA\_substructure 107 protein\_domain\_or\_region 1017 protein\_N/A 99 protein\_subunit 920 carbohydrate 98 amino\_acid\_monomer 785 DNA\_N/A 48 tissue 692 RNA\_domain\_or\_region 39 cell\_component 669 RNA\_N/A 15 RNA\_molecule 602 RNA\_substructure 2 DNA\_molecule 542 Total 72,185

entity. The pipeline architecture of our approach is shown in **Figure 4** [26, 27].

some biologically relevant nominal categories as shown in **Table 1** [24].

tation of the approach is shown in **Figure 3** [24].

424 Artificial Intelligence - Emerging Trends and Applications

module, tagging module, and testing module.

**Table 1.** Biomedical concept categories and numbers in GENIA.

**4.2. Machine-learning approach**

*4.1.4. Results*

**Figure 4.** The pipeline architecture of machine-learning approach.

#### *4.2.1. Preprocessing module*

Applying machine-learning methods, the original biomedical texts could not directly be processed by the CRF model; they would be preprocessed to form the related formats. The original training, testing, and predicted texts would be processed to the specified format files with feature dataset.

#### *4.2.2. Training module*

Trained texts are put into the preprocessing module, aiming at the selected and extracted features to form the specified trained files that are trained with some parameters' set, which could obtain trained model files. The input of training module is the result of the preprocessed training texts, while the output of it is the model files including feature function set and weight parameter.

## *4.2.3. Tagging module*

After the preprocessing module processes the test texts, the formed specified test files are put into a tagging module together with the model files, which could obtain the tagged files. The input of a tagging module is the result of preprocessed testing texts, while the output of it is the tagged result files.

## *4.2.4. Testing module*

The testing module is to measure the performance of our system. After processing the test texts in the preprocessing module, the test standard files are put into a test module together with the tagging result files. The input of a tagging module is the results of preprocessed test files, while the output of it is the identified performance of our system including precision, recall, and F-measure.

#### *4.2.5. Results*

The approach considered features including POS features, word surface clue feature (uppercase, lowercase, numbers, specific char, initial) using the Genia corpus 3.02 version to train and test the system's performance with a 10-cross validation. The system's performance is shown in **Table 2** related to the six classes. According to the above method, using Java programming based on Linux OS to develop Biomedical entity recognition Miner system called (BerMiner). **Figure 5** shows the results of the identified biomedical entity.

**5. Extraction of dynamic biomedical information**

**Figure 5.** BerMiner extracted results with six categories using machine learning [26, 27].

dynamic biomedical information in detail.

Biomedical entities produce a series of information interaction in the process of genetic information transfer and expression, such as genes and gene interaction relations, the relationship between genes and disease, the relationship between genes and gene product, molecular signal conduction pathways, and so on. The information is represented as the dynamic form. Dynamic biomedical information represents the process of activities of biomedical entities. Dynamic biomedical information is extracted, namely association between biomedical entities which is often extracted based on entity co-occurrence analysis with statistics theory. Glenisson et al. [28] proposed an approach to extract the relationship between genes using vector space method and k-medoids algorithm for gene clustering. Wren [29] explored the method to measure biological entities relationship using mutual information model. Wu et al. [30] researched the interactions between genes and drugs based on text-mining technology, which was divided into three steps. First, the approach identified genes and drugs entities from Medline abstract; the second step is to extract different levels of gene-2-drug pairs. The third step is to rank the gene-2-drug pairs based on mathematical statistical model. Our research group has done the similar works [31–33]. Here, we example the works [31, 33] for describing the process related to the extraction of

Application of Biomedical Text Mining http://dx.doi.org/10.5772/intechopen.75924 427


**Table 2.** Identified biomedical entity's performance based on machine learning [26, 27].


**Figure 5.** BerMiner extracted results with six categories using machine learning [26, 27].

## **5. Extraction of dynamic biomedical information**

*4.2.2. Training module*

426 Artificial Intelligence - Emerging Trends and Applications

and weight parameter.

*4.2.3. Tagging module*

the tagged result files.

*4.2.4. Testing module*

recall, and F-measure.

*4.2.5. Results*

Trained texts are put into the preprocessing module, aiming at the selected and extracted features to form the specified trained files that are trained with some parameters' set, which could obtain trained model files. The input of training module is the result of the preprocessed training texts, while the output of it is the model files including feature function set

After the preprocessing module processes the test texts, the formed specified test files are put into a tagging module together with the model files, which could obtain the tagged files. The input of a tagging module is the result of preprocessed testing texts, while the output of it is

The testing module is to measure the performance of our system. After processing the test texts in the preprocessing module, the test standard files are put into a test module together with the tagging result files. The input of a tagging module is the results of preprocessed test files, while the output of it is the identified performance of our system including precision,

The approach considered features including POS features, word surface clue feature (uppercase, lowercase, numbers, specific char, initial) using the Genia corpus 3.02 version to train and test the system's performance with a 10-cross validation. The system's performance is shown in **Table 2** related to the six classes. According to the above method, using Java programming based on Linux OS to develop Biomedical entity recognition Miner system called

(BerMiner). **Figure 5** shows the results of the identified biomedical entity.

**Project Precision (%) Recall (%) F-measure (%)**

DNA 61.31 45.79 52.4 RNA 56.48 44.63 49.69 Protein 76.48 72.64 74.50 Cell-type 71.11 58.51 64.16 Cell-line 71.80 53.58 61.29 Virus 78.56 66.37 71.83

**Table 2.** Identified biomedical entity's performance based on machine learning [26, 27].

Biomedical entities produce a series of information interaction in the process of genetic information transfer and expression, such as genes and gene interaction relations, the relationship between genes and disease, the relationship between genes and gene product, molecular signal conduction pathways, and so on. The information is represented as the dynamic form. Dynamic biomedical information represents the process of activities of biomedical entities. Dynamic biomedical information is extracted, namely association between biomedical entities which is often extracted based on entity co-occurrence analysis with statistics theory. Glenisson et al. [28] proposed an approach to extract the relationship between genes using vector space method and k-medoids algorithm for gene clustering. Wren [29] explored the method to measure biological entities relationship using mutual information model. Wu et al. [30] researched the interactions between genes and drugs based on text-mining technology, which was divided into three steps. First, the approach identified genes and drugs entities from Medline abstract; the second step is to extract different levels of gene-2-drug pairs. The third step is to rank the gene-2-drug pairs based on mathematical statistical model. Our research group has done the similar works [31–33]. Here, we example the works [31, 33] for describing the process related to the extraction of dynamic biomedical information in detail.

#### **5.1. Relationship extraction based on statistic model**

Dynamic information represents the process of activities of biomedical entities. In this study, we focus on the dynamic process of biomedical entities, and extract dynamic biomedical information, namely association between biomedical entities, based on entity co-occurrence analysis which is attached to the statistics theory with more precision. Entity co-occurrence analysis considers that if any two entities occurred in a certain level of paper (e.g., a full text, a paragraph, a sentence, and a phrase), then the two entities could have be related. Different levels have different strengths of entity association. Through syntactic analysis of biomedical texts, the weight of association in phrase level is the highest than in other two levels, and the sentence level is higher than the full text. Due to the multiclass biomedical entities used in this study, extracted associations are also multiple types. We build a data modeling of entity association based on entity co-occurrence analysis with statistics. The data modeling could formally be represented as a three tuples as shown in Eq. (1)

$$D = \left(E, \mathbb{C}, R\right) \tag{1}$$

**5.2. Relationship visualization**

**Figure 7.** Relationship visualization between genes.

**Figure 6.** MMEA algorithm.

Biomedical text mining focuses on the centrality of user interactivity, and it needs to provide users for interacting with data results. The text-mining visualization facilitates user interactivity with graphical approaches. For example, we developed a circle network graph to allow

Application of Biomedical Text Mining http://dx.doi.org/10.5772/intechopen.75924 429

Let *E* be a set of biomedical entities, and *C* be a set of types of association. Let *R* be a set of correlation of associations between biomedical entities.

Supposing *ϕ*(*e*) represents entity category where *e* ∈ *E*; *W* is a set of weights of association levels where *wk* <sup>∈</sup> *<sup>W</sup>*. To the two entities (*<sup>e</sup> i* , *e <sup>j</sup>* <sup>∈</sup> *<sup>E</sup>*) of the *k*th level, their co-occurrence frequency is represented as *f k*(*e i* , *e <sup>j</sup>*), and the correlation between two biomedical entities is defined as Eq. (2)

$$T = \left(e\_{\varvee} e\_{\curlyeq} \mathbb{C}(\varphi(e\_{\varvee}), \varphi(e\_{\varvee})), \mathbb{R}(e\_{\varvee} e\_{\varvee})\right) \tag{2}$$

Let *C*( *ϕ*(*e i* ), *ϕ*(*e j*)) be association category related to the entity category *ϕ*(*<sup>e</sup> i* ) and *<sup>ϕ</sup>*(*<sup>e</sup> <sup>j</sup>*). For instance, *C*( *ϕ*(*e i* ), *ϕ*(*e j*)) could represent association between gene and disease, or association between gene and microRNA, and so on. Let *R*(*<sup>e</sup> i* , *e <sup>j</sup>*) be correlated factor between entity *<sup>e</sup> i* and *e j* as shown in Eq. (3)

$$R(e\_{\ell'}e\_j) = \sum\_{\vec{k}} w\_{\vec{\nu}} f\_{\ell}(e\_{\ell'} e\_j) \tag{3}$$

After building data modeling of entity association, we further consider extracting these dynamic biomedical information from biomedical literature. Aiming at the entities identified, we design an algorithm of Mining Multiclass Entity Association, named (MMEA), under the data modeling based on co-occurrence statistical analysis as shown in **Figure 6**.

The input of MMEA algorithm is entities identified building on the step of entity recognition. The MMEA algorithm first gets the category of each entity (lines 4–6). Aiming at an entity *e i* , the algorithm decides the types of associations between it and other entities (line 7) and computes the correlation factor *R*(*<sup>e</sup> i* , *e <sup>j</sup>*)by Eq. (3) (line 8). The above-achieved results are stored in the four tuples *<sup>T</sup>* <sup>=</sup> ( *e i* , *e j* , *<sup>C</sup>*( *ϕ*(*e i* ), *ϕ*(*e j*)) , *R*(*e i* , *e j*)) for further processing (line 9). The process proceeds with the increment of *i* entity until all entity associations are obtained (lines 3–11).

**Figure 6.** MMEA algorithm.

**5.1. Relationship extraction based on statistic model**

428 Artificial Intelligence - Emerging Trends and Applications

formally be represented as a three tuples as shown in Eq. (1)

correlation of associations between biomedical entities.

els where *wk* <sup>∈</sup> *<sup>W</sup>*. To the two entities (*<sup>e</sup>*

*<sup>T</sup>* <sup>=</sup> (*ei*

and microRNA, and so on. Let *R*(*<sup>e</sup>*

*R*(*ei*

and computes the correlation factor *R*(*<sup>e</sup>*

*e i* , *e j* , *<sup>C</sup>*( *ϕ*(*e i* ), *ϕ*(*e j*)) , *R*(*e i* , *e j*))

stored in the four tuples *<sup>T</sup>* <sup>=</sup> (

*k*(*e i* , *e*

represented as *f*

Let *C*( *ϕ*(*e i* ), *ϕ*(*e j*))

Eq. (3)

entity *e i*

*C*( *ϕ*(*e i* ), *ϕ*(*e j*))

Dynamic information represents the process of activities of biomedical entities. In this study, we focus on the dynamic process of biomedical entities, and extract dynamic biomedical information, namely association between biomedical entities, based on entity co-occurrence analysis which is attached to the statistics theory with more precision. Entity co-occurrence analysis considers that if any two entities occurred in a certain level of paper (e.g., a full text, a paragraph, a sentence, and a phrase), then the two entities could have be related. Different levels have different strengths of entity association. Through syntactic analysis of biomedical texts, the weight of association in phrase level is the highest than in other two levels, and the sentence level is higher than the full text. Due to the multiclass biomedical entities used in this study, extracted associations are also multiple types. We build a data modeling of entity association based on entity co-occurrence analysis with statistics. The data modeling could

*D* = (*E*, *C*, *R*) (1)

Let *E* be a set of biomedical entities, and *C* be a set of types of association. Let *R* be a set of

Supposing *ϕ*(*e*) represents entity category where *e* ∈ *E*; *W* is a set of weights of association lev-

*<sup>j</sup>*), and the correlation between two biomedical entities is defined as Eq. (2)

), *ϕ*(*ej*)), *R*(*ei*

could represent association between gene and disease, or association between gene

*<sup>j</sup>*) be correlated factor between entity *<sup>e</sup>*

*<sup>j</sup>* <sup>∈</sup> *<sup>E</sup>*) of the *k*th level, their co-occurrence frequency is

*i* ) and *<sup>ϕ</sup>*(*<sup>e</sup>*

*<sup>j</sup>*)by Eq. (3) (line 8). The above-achieved results are

for further processing (line 9). The process

, *ej*)) (2)

*i* and *e j*

, *ej*) (3)

*<sup>j</sup>*). For instance,

as shown in

*i* , *e*

, *ej* , *C*(*ϕ*(*ei*

*i* , *e*

be association category related to the entity category *ϕ*(*<sup>e</sup>*

, *ej*) = ∑ *k wk f k*(*ei*

data modeling based on co-occurrence statistical analysis as shown in **Figure 6**.

*i* , *e*

After building data modeling of entity association, we further consider extracting these dynamic biomedical information from biomedical literature. Aiming at the entities identified, we design an algorithm of Mining Multiclass Entity Association, named (MMEA), under the

The input of MMEA algorithm is entities identified building on the step of entity recognition. The MMEA algorithm first gets the category of each entity (lines 4–6). Aiming at an

proceeds with the increment of *i* entity until all entity associations are obtained (lines 3–11).

, the algorithm decides the types of associations between it and other entities (line 7)

#### **5.2. Relationship visualization**

Biomedical text mining focuses on the centrality of user interactivity, and it needs to provide users for interacting with data results. The text-mining visualization facilitates user interactivity with graphical approaches. For example, we developed a circle network graph to allow

**Figure 7.** Relationship visualization between genes.

**Conflict of interest**

**Author details**

Lejun Gong

**References**

medinform.7235

s13326-017-0157-6

10.1186/s12911-017-0556-8

Biology. 2004;**2**(3):551-568

Bioinformatics. 2005;**6**(1):57-71. Review

Databases and Curation. 2012;**2012**:bas017

Research. 2015;**43**(Database issue):D1049-D1056

bio-textmining. Bioinformatics. 2003;**19**(Suppl 1):i180-i182

In this chapter, the instances are from our previous works [24–27, 31–33].

Jiangsu Key Laboratory of Big Data Security and Intelligent Processing, School of Computer

Application of Biomedical Text Mining http://dx.doi.org/10.5772/intechopen.75924 431

[1] Zheng S, Lu JJ, Ghasemzadeh N, Hayek SS, Quyyumi AA, Wang F. Effective information extraction framework for heterogeneous clinical reports using online machine learning and controlled vocabularies. JMIR Medical Informatics. 2017;**5**(2):e12. DOI: 10.2196/

[2] Weng WH, Wagholikar KB, McCray AT, Szolovits P, Chueh HC. Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach. BMC Medical Informatics and Decision Making. 2017;**17**(1):155. DOI:

[3] Basaldella M, Furrer L, Tasso C, Rinaldi F. Entity recognition in the biomedical domain using a hybrid approach. Journal of Biomedical Semantics. 2017;**8**(1):51. DOI: 10.1186/

[4] Krallinger M, Leitner F, Vazquez M, Salgado D, Marcelle C, Tyers M, Valencia A, Chatraryamontri A. How to link ontologies and protein-protein interactions to literature: Textmining approaches and the BioCreative experience. Database: The Journal of Biological

[5] Gene Ontology Consortium. Gene ontology consortium: Going forward. Nucleic Acids

[6] Kim JD, Ohta T, Tateisi Y, Tsujii J. GENIA corpus – semantically annotated corpus for

[7] Natale DA, Arighi CN, Blake JA, et al. Protein ontology (PRO): Enhancing and scaling up the representation of protein entities. Nucleic Acids Research. 2017;**45**(D1):D339-D346 [8] Kim JJ, Park JC. Bioie: Retargetable information extraction and ontological annotation of biological interactions from the literature. Journal of Bioinformatics and Computational

[9] Cohen AM, Hersh WR. A survey of current work in biomedical text mining. Briefings in

Science, Nanjing University of Posts and Telecommunications, Nanjing, China

Address all correspondence to: glj98226@163.com

**Figure 8.** Relationship visualization between disease and gene.

disease researchers to explore the relationship between genes related to breast cancer. To capture the susceptive genes related to breast cancer, a fan-like network visualization is designed by our research group. The node represents the biomedical entities, and the link lines indicate the associations between entities (as shown in **Figures 7** and **8**).

## **6. Conclusions**

This chapter first introduces the rise of biomedical text mining. Then, it describes the biomedical text-mining technology, namely natural language processing, including the several components. In the following sections, it emphasize the two aspects in biomedical text mining involving static biomedical information recognization and dynamic biomedical information extraction using instance analysis which our previous works.

## **Acknowledgements**

This research is supported by the National Natural Science Foundation of China (Grant Nos: 61502243, 61502247, 61272084, 61300240, 61572263, 61502251, and 61503195), Natural Science Foundation of the Jiangsu Province (Grant Nos: BK20130417, BK20150863, BK20140895, and BK20140875), China Postdoctoral Science Foundation (Grant No. 2016M590483), Jiangsu Province postdoctoral Science Foundation (Grant No. 1501072B), Scientific and Technological Support Project (Society) of Jiangsu Province (Grant No. BE2016776), Nanjing University of Posts and Telecommunications' Science Foundation (Grant Nos: NY214068 and NY213088). This work is also supported in part by Zhejiang Engineering Research Center of Intelligent Medicine (2016E10011).

## **Conflict of interest**

In this chapter, the instances are from our previous works [24–27, 31–33].

## **Author details**

Lejun Gong

Address all correspondence to: glj98226@163.com

Jiangsu Key Laboratory of Big Data Security and Intelligent Processing, School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, China

## **References**

disease researchers to explore the relationship between genes related to breast cancer. To capture the susceptive genes related to breast cancer, a fan-like network visualization is designed by our research group. The node represents the biomedical entities, and the link lines indicate

This chapter first introduces the rise of biomedical text mining. Then, it describes the biomedical text-mining technology, namely natural language processing, including the several components. In the following sections, it emphasize the two aspects in biomedical text mining involving static biomedical information recognization and dynamic biomedical information

This research is supported by the National Natural Science Foundation of China (Grant Nos: 61502243, 61502247, 61272084, 61300240, 61572263, 61502251, and 61503195), Natural Science Foundation of the Jiangsu Province (Grant Nos: BK20130417, BK20150863, BK20140895, and BK20140875), China Postdoctoral Science Foundation (Grant No. 2016M590483), Jiangsu Province postdoctoral Science Foundation (Grant No. 1501072B), Scientific and Technological Support Project (Society) of Jiangsu Province (Grant No. BE2016776), Nanjing University of Posts and Telecommunications' Science Foundation (Grant Nos: NY214068 and NY213088). This work is also supported in part by Zhejiang Engineering Research Center of Intelligent

the associations between entities (as shown in **Figures 7** and **8**).

**Figure 8.** Relationship visualization between disease and gene.

430 Artificial Intelligence - Emerging Trends and Applications

extraction using instance analysis which our previous works.

**6. Conclusions**

**Acknowledgements**

Medicine (2016E10011).


[10] Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB. Frontiers of biomedical text mining: Current progress. Briefings in Bioinformatics. 2007;**8**(5):358-375

[26] Gong LJ, Yang RG, Yang HY, Jiang KY, Yang G. BerMiner: A machine learning system for identifying bio-entity. 2015 International Conference on Software Engineering and

Application of Biomedical Text Mining http://dx.doi.org/10.5772/intechopen.75924 433

[27] Yang RG, Wu ZX, Yang Z, Yang G, Gong LJ. Identifying biomedical entity based on deep learning. 2015 International Conference on Software Engineering and Information

[28] Glenisson P, Coessens B, Van Vooren S, et al. TXTGate: Profiling gene groups with text-

[29] Wren JD. Extending the mutual information measure to rank inferred literature relation-

[30] Wu Y, Liu M, Zheng WJ, et al. Ranking gene-drug relationships in biomedical literature using latent dirichlet allocation. Pacific Symposium on Biocomputing. 2012:422-433 [31] Gong LJ, Yang RG, Dong ZJ, Chen H, Yang G. Extraction of disease-centred dynamic biomedical information from literature. Journal of Computational and Theoretical

[32] Gong LJ, Yang RG, Sun X. Prioritization of disease susceptibility genes using LSM/

[33] Gong LJ, Wei YB, Xie JM, Yuan ZD, Sun X. Text mining approach for relationships between genes and diseases. Dongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of

SVD. IEEE Transactions on Biomedical Engineering. 2013;**60**(12):3410-3417

Southeast University (Natural Science Edition). 2010;**40**(3):486-490

Information System (SEIS 2015); 2015:447-450

based information. Genome Biology. 2004;**5**(6):R43

System (SEIS 2015); 2015:713-718

ships. BMC Bioinformatics. 2004;**5**:145

Nanoscience. 2015;**13**(1):722-727 12


[26] Gong LJ, Yang RG, Yang HY, Jiang KY, Yang G. BerMiner: A machine learning system for identifying bio-entity. 2015 International Conference on Software Engineering and Information System (SEIS 2015); 2015:447-450

[10] Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB. Frontiers of biomedical text

[11] Luscombe NM, Greenbaum D, Gerstein M. What is bioinformatics? A proposed definition and overview of the field. Methods of Information in Medicine. 2001;**40**(4):346-358

[12] Krallinger M, Erhardt RA, Valencia A. Text-mining approaches in molecular biology and

[13] Cronin RM, Fabbri D, Denny JC, Rosenbloom ST, Jackson GP.A comparison of rule-based and machine learning approaches for classifying patient portal messages. International

[14] Leitner F, Mardis SA, Krallinger M, Cesareni G, Hirschman LA, Valencia A. An overview of BioCreative II.5. IEEE/ACM Transactions on Computational Biology and

[15] Hirschman L, Yeh A, Blaschke C, Valencia A. Overview of BioCreAtIvE: Critical assessment of information extraction for biology. BMC Bioinformatics. 2005;**6**(Suppl 1):S1 [16] Rebholz-Schuhmann D, Kirsch H, Couto F. Facts from text–is text mining ready to

[17] Barthelmes J, Ebeling C, Chang A, et al. BRENDA, AMENDA and FRENDA: The enzyme information system in 2007. Nucleic Acids Research. 2007;**35**(Database issue):D511-D514

[18] Tanabe L, Wilbur WJ. Tagging gene and protein names in biomedical text. Bioinformatics.

[19] Tanabe L, Wilbur WJ. Generation of a large gene/protein lexicon by morphological pattern analysis. Journal of Bioinformatics and Computational Biology. 2004;**1**(4):611-626

[20] Settles B. ABNER: An open source tool for automatically tagging genes, proteins and

[21] Mitsumori T, Fation S, Murata M, et al. Gene/protein name recognition based on support vector machine using dictionary as features. BMC Bioinformatics. 2005;**6**(Suppl 1):S8

[22] Saha SK, Sarkar S, Mitra P. Feature selection techniques for maximum entropy based biomedical named entity recognition. Journal of Biomedical Informatics. 2009;**42**(5):905-911

[23] Li L, Zhou R, Huang D. Two-phase biomedical named entity recognition using CRFs.

[24] Gong L, Yang R, Liu Q, Dong Z, Chen H, Yang G. A dictionary-based approach for identifying biomedical concepts. International Journal of Pattern Recognition and Artificial

[25] Gong L, Sun X. ATRMiner: A System for Automatic Biomedical Named Entities

Intelligence. 2017;**31**(9):1757004. http://dx.doi.org/10.1142/S021800141757004X

other entity names in text. Bioinformatics. 2005;**21**(14):3191-3192

Computational Biology and Chemistry. 2009;**33**(4):334-338

Recognition. Yantai: ICNC 2010; 2010. pp. 3842-3845

mining: Current progress. Briefings in Bioinformatics. 2007;**8**(5):358-375

biomedicine. Drug Discovery Today. 2005;**10**(6):439-445 Review

Journal of Medical Informatics. 2017 Sep;**105**:110-120

Bioinformatics. 2010;**7**(3):385-399

432 Artificial Intelligence - Emerging Trends and Applications

deliver? PLoS Biology. 2005;**3**(2):e65

2002;**18**(8):1124-1132


**Chapter 22**

**Provisional chapter**

**Static/Dynamic Zoometry Concept to Design Cattle**

**Static/Dynamic Zoometry Concept to Design Cattle** 

The dairy cattle productivity is largely dependent on the facility quality and environmental condition. Various researchers had conducted a study in this field, but it is not developing the knowledge of animal dimensions and behaviors correlated with their facility design. Complexities of dynamics zoometry depend on cow behaviors that they are forced to use neural network (NN) approach. Hence, the purpose of this chapter is to create the concept of static and dynamic zoometry to guide the ergonomics facilities design. The research started with study literature on anthropometry, dairy cattle, facility design, and neural network. The following step is collecting the static zoometry data in 16 dimensions and dynamics zoometry in 7 dimensions. On the one hand, static data is utilized as an input factor. On the other hand, dynamic data is utilized as desire factor of back propagation neural network (BPNN) model. The result of BPNN training is utilized to design the dairy cattle facilities, e.g., cage with minimal length = 357.67 cm, width = 132.03 cm (per tail), and height = 205.28 cm. The chapter successfully developed the concept of zoometry

approach and BPNN model as a pioneer of implementing comfort knowledge.

**Keywords:** animal comfort, cattle cage, facility design, neural network, zoometry,

Milk is an important food commodity in the world as it provides calcium, phosphorous, magnesium, and protein which are all essential for human health. Adequate consumption of milk

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

DOI: 10.5772/intechopen.75136

**Facilities Using Back Propagation Neural Network**

**Facilities Using Back Propagation Neural Network** 

**(BPNN)**

**(BPNN)**

Sugiono Sugiono, Rudy Soenoko and

Sugiono Sugiono, Rudy Soenoko and

http://dx.doi.org/10.5772/intechopen.75136

ergonomics, milk productivity

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

Rio Prasetyo Lukodono

Rio Prasetyo Lukodono

**Abstract**

**1. Introduction**

#### **Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural Network (BPNN) Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural Network (BPNN)**

DOI: 10.5772/intechopen.75136

Sugiono Sugiono, Rudy Soenoko and Rio Prasetyo Lukodono Sugiono Sugiono, Rudy Soenoko and Rio Prasetyo Lukodono

Additional information is available at the end of the chapter Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/intechopen.75136

#### **Abstract**

The dairy cattle productivity is largely dependent on the facility quality and environmental condition. Various researchers had conducted a study in this field, but it is not developing the knowledge of animal dimensions and behaviors correlated with their facility design. Complexities of dynamics zoometry depend on cow behaviors that they are forced to use neural network (NN) approach. Hence, the purpose of this chapter is to create the concept of static and dynamic zoometry to guide the ergonomics facilities design. The research started with study literature on anthropometry, dairy cattle, facility design, and neural network. The following step is collecting the static zoometry data in 16 dimensions and dynamics zoometry in 7 dimensions. On the one hand, static data is utilized as an input factor. On the other hand, dynamic data is utilized as desire factor of back propagation neural network (BPNN) model. The result of BPNN training is utilized to design the dairy cattle facilities, e.g., cage with minimal length = 357.67 cm, width = 132.03 cm (per tail), and height = 205.28 cm. The chapter successfully developed the concept of zoometry approach and BPNN model as a pioneer of implementing comfort knowledge.

**Keywords:** animal comfort, cattle cage, facility design, neural network, zoometry, ergonomics, milk productivity

## **1. Introduction**

Milk is an important food commodity in the world as it provides calcium, phosphorous, magnesium, and protein which are all essential for human health. Adequate consumption of milk

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

from early childhood and throughout life strengthens bones and protects against diseases. Consuming dairy product, especially milk, can increase children bone health and development, develop a positive brain concentrations of glutathione (GSH), and increase the body's resistance against many infectious diseases and diseases caused by malnutrition and reduce cancer risk [1–4]. Milk can directly be consumed or manufactured into other dairy products such as cheese, ice cream, butter, ghee, cream, yogurt, etc. Milk provides the following beneficial nutrients in varying quantities [5–8]:

walking distance [12]. In short, dairy cattle facilities are an important factor of key success in milk business, and it should be designed to keep the cow and calves comfortable in order to

Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural…

http://dx.doi.org/10.5772/intechopen.75136

437

In order to provide a solution to comfort design for dairy cow facilities, the physical ergonomics knowledge is needed. The important knowledge of physical ergonomics area for a human body is defined as anthropometry. It studies the variation of the human body dimensions in static and dynamics condition [13]. Using the same analogy, this study converts the concept of anthropometry into the concept of zoometry to study the impact of variation of animal body dimensions. This analogy imagines that animal (cow) requires comfort in order to gain optimal milk production, similar to human. A human requires comfort at work in order to reach high productivity. Zoometry is derived from Greek ζώο (zó̱o) which means animal and μέτρον (metrí̱ste) that means a measurement. Zoometry concept is defined as static and dynamics animal measurement dimension in order to determine the physical variation of the specific animal population. This research utilized dairy cattle zoometry concept to design cattle facility (house, free stall, floor, etc.) in order to increase the dairy cattle comfort and milk production. This study considers static and dynamic measurement because it constitutes cattle main activity. There are three main activities that are affected by the static and dynamic condition such as feeding, lying, and standing or transition events between lying and standing. Zoometry concept can deter cattle from annoyances or uncomfortable facility. The research promotes pioneer strategy to improve the cow comfort using zoometry concept. Directly and manually measuring dairy cattle dimensions based on zoometry concept is timeconsuming and costly. Analysis data of dairy cattle dimensions can be employed to eliminate the problems. Unfortunately, relationship complexity of each dimension on dairy cattle is hard to describe in mathematics formula or in the regression model. Artificial intelligence back propagation neural network (BPNN) I was used in determining the best solution. It can eliminate some cost and improve efficiency. The BPNN model can be utilized to predict pattern making-related body dimensions by inputting few key dairy cattle body dimensions.

The first step to understand the zoometry concept is knowing the anthropometry concept. Cattle zoometry concept is adapting the principles of anthropometry rules for human dimensions. Anthropometry is human body measurement method in a population and analyzes that measurement for various purposes. This measurement could be utilized in architecture, product design, clothing design, child nutrients, design workplace, etc. The ISO 7250-1: 2008 intended to explain as a guide for an engineer who is required to determine the size of human physic for their job [14]. The knowledge on human body, technical measurement, and statistic are very important to obtain a higher quality of anthropometry implementation. Anthropometry will improve work facilities capable to facilitate a person in working physically through ease of reach and access and through the process of cognition of a job [15, 16].

maintain milk quality and thus maximize economic production.

**2. Zoometry concept**

**2.1. Anthropometry**


As community realizes the importance of milk product in their life, it boosted the demand for milk production. As consequence, the number of dairy cattle farmers and productivity should be increased. The government needs an approach to increase fresh milk production. One of the methods to increase the milk productivity is maintaining cow facilities and developing them to be more comfortable for the cattle (cage, free stall, floor, etc.). Several researchers investigated the dairy cattle comfort including cow house and respective facilities. Cook et al. reported that physical accommodation of dairy cattle should provide a relatively dry area for the dairy cattle to lie down and be comfortable [9]. Dairy cow overcrowds could reduce rest time, increase idle standing in alleys, alter feeding behavior, and, in general, reduce cow comfort [10]. A study conducted on 47 farms in Northeastern Spain explained the significant effect of a stall on the productivity of dairy cow and impact on milk production and cow health [11]. The cow facilities should be constructed to minimize time to reach food and water. Free stall facility is usually selected to minimize the effect of weather changes and improve cleanliness and cow comfort. Dairy facilities should be designed to keep the cows and calves comfortable in order to maintain dry matter intake (DMI) and thus maximize economic production. Other studies have reported similar results. Based on Jim Reynolds research, the cattle comfort can be identified into five factors for heat stress, sanitation, free stall design, walking surface, and walking distance [12]. In short, dairy cattle facilities are an important factor of key success in milk business, and it should be designed to keep the cow and calves comfortable in order to maintain milk quality and thus maximize economic production.

In order to provide a solution to comfort design for dairy cow facilities, the physical ergonomics knowledge is needed. The important knowledge of physical ergonomics area for a human body is defined as anthropometry. It studies the variation of the human body dimensions in static and dynamics condition [13]. Using the same analogy, this study converts the concept of anthropometry into the concept of zoometry to study the impact of variation of animal body dimensions. This analogy imagines that animal (cow) requires comfort in order to gain optimal milk production, similar to human. A human requires comfort at work in order to reach high productivity. Zoometry is derived from Greek ζώο (zó̱o) which means animal and μέτρον (metrí̱ste) that means a measurement. Zoometry concept is defined as static and dynamics animal measurement dimension in order to determine the physical variation of the specific animal population. This research utilized dairy cattle zoometry concept to design cattle facility (house, free stall, floor, etc.) in order to increase the dairy cattle comfort and milk production. This study considers static and dynamic measurement because it constitutes cattle main activity. There are three main activities that are affected by the static and dynamic condition such as feeding, lying, and standing or transition events between lying and standing. Zoometry concept can deter cattle from annoyances or uncomfortable facility. The research promotes pioneer strategy to improve the cow comfort using zoometry concept.

Directly and manually measuring dairy cattle dimensions based on zoometry concept is timeconsuming and costly. Analysis data of dairy cattle dimensions can be employed to eliminate the problems. Unfortunately, relationship complexity of each dimension on dairy cattle is hard to describe in mathematics formula or in the regression model. Artificial intelligence back propagation neural network (BPNN) I was used in determining the best solution. It can eliminate some cost and improve efficiency. The BPNN model can be utilized to predict pattern making-related body dimensions by inputting few key dairy cattle body dimensions.

## **2. Zoometry concept**

## **2.1. Anthropometry**

from early childhood and throughout life strengthens bones and protects against diseases. Consuming dairy product, especially milk, can increase children bone health and development, develop a positive brain concentrations of glutathione (GSH), and increase the body's resistance against many infectious diseases and diseases caused by malnutrition and reduce cancer risk [1–4]. Milk can directly be consumed or manufactured into other dairy products such as cheese, ice cream, butter, ghee, cream, yogurt, etc. Milk provides the following benefi-

**11.** Iodine—for regulation of the body's rate of metabolism (how quickly the body burns

As community realizes the importance of milk product in their life, it boosted the demand for milk production. As consequence, the number of dairy cattle farmers and productivity should be increased. The government needs an approach to increase fresh milk production. One of the methods to increase the milk productivity is maintaining cow facilities and developing them to be more comfortable for the cattle (cage, free stall, floor, etc.). Several researchers investigated the dairy cattle comfort including cow house and respective facilities. Cook et al. reported that physical accommodation of dairy cattle should provide a relatively dry area for the dairy cattle to lie down and be comfortable [9]. Dairy cow overcrowds could reduce rest time, increase idle standing in alleys, alter feeding behavior, and, in general, reduce cow comfort [10]. A study conducted on 47 farms in Northeastern Spain explained the significant effect of a stall on the productivity of dairy cow and impact on milk production and cow health [11]. The cow facilities should be constructed to minimize time to reach food and water. Free stall facility is usually selected to minimize the effect of weather changes and improve cleanliness and cow comfort. Dairy facilities should be designed to keep the cows and calves comfortable in order to maintain dry matter intake (DMI) and thus maximize economic production. Other studies have reported similar results. Based on Jim Reynolds research, the cattle comfort can be identified into five factors for heat stress, sanitation, free stall design, walking surface, and

cial nutrients in varying quantities [5–8]:

436 Artificial Intelligence - Emerging Trends and Applications

**1.** Calcium—for healthy bones and teeth

**5.** Vitamin B12—for production of healthy cells

**6.** Vitamin A—for good eyesight and immune function

**10.** Vitamin C—for the formation of healthy connective tissues

**2.** Phosphorous—for energy release

**3.** Magnesium—for muscle function

**4.** Protein—for growth and repair

**7.** Zinc—for immune function

**8.** Riboflavin—for healthy skin

**9.** Folate—for production of healthy cells

energy and the rate of growth

The first step to understand the zoometry concept is knowing the anthropometry concept. Cattle zoometry concept is adapting the principles of anthropometry rules for human dimensions. Anthropometry is human body measurement method in a population and analyzes that measurement for various purposes. This measurement could be utilized in architecture, product design, clothing design, child nutrients, design workplace, etc. The ISO 7250-1: 2008 intended to explain as a guide for an engineer who is required to determine the size of human physic for their job [14]. The knowledge on human body, technical measurement, and statistic are very important to obtain a higher quality of anthropometry implementation. Anthropometry will improve work facilities capable to facilitate a person in working physically through ease of reach and access and through the process of cognition of a job [15, 16].

The anthropometry will set up an economical design work facility which means from a marketing point of view, these facilities could be built cheaply. On the other hand, although the design is made with cheaply but can support the performance of human according to the technical needs. Some considerations that can be utilized in anthropometry include data utilized to indicate the characteristics of the target population with limitations contained in the field conditions. The criteria utilized for the design are appropriate for user facilities. The data collected are expected to represent broad conditions that can be utilized for a wider population [17]. The posture utilized by every individual work will be influenced by the body and tool dimension as well as facilities used. The level of relationship between work facility and posture is influenced by the characteristics and frequency of interaction between the two. Generally, anthropometry is divided in into two branches: static anthropometry and dynamic anthropometry [18, 19]. The static anthropometry deals with the measurement process when the human body is in a stable position or in a static condition. On the other hand, dynamic anthropometry deals with measurement process that relates to the measurement of range of human body movement, for example, of arm movement, walking position, and head movement to reach an object.

Dairy cattle psychology respects to some conditions, e.g., conditions of the cage, SUI as the comfort index, cow's desire to always be in a lying position, cattle density in the room, comfortable and clean sitting position, high-quality feed and drinks, minimum competition to obtain food and drink, cow gets enough room to conduct activity, anti-slip floor, comfortable

Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural…

http://dx.doi.org/10.5772/intechopen.75136

439

Dairy cattle behaviors are well developed in feeding, environment/microclimate condition, facility design, house, and social communication. The dairy cattle are able to distinguish red, yellow, green, and blue colors. However capability in differentiating between green and blue is poor [24]. Moreover, cattle are able to distinguish simple shapes such as triangles, circles, and line. Color information is important in cow facility design to increase dairy cattle comfort. In the free area, the cow automatically moves from the dark to light area. They tend to avoid strong contrast between sun and shadow. Comparing to human hearing capability, cow possesses almost similar frequency range and are able to listen to high tones that human cannot hear. Cattle hearing is important in inter- and intra-species communication [25]. Cattle sense of touch is important in determining which herbage is rejected or accepted. The secondary/special olfactory system can detect pheromones, volatile chemicals that are important in

Cattle communicate by sending out a different signal such as poses, sound, and smells [24]. A high density of cattle inside the house limits the freedom movement and can increase social stress. Cattle possess a distinct circadian rhythm, in which the main rest, feed, and rumination activities vary according to a fixed pattern. Grazing occupies a large amount of time for dairy cows about 8 hours/day. Grazing behavior is affected by many factors, including environmental conditions and plant species. In a dairy herd of Friesian cows, it was found that there was a consistent order for lying down and standing up [27]. The natural lying down behavior begins when the animal sniffs at the ground while it slowly moves forward. The head and body of the fully developed cow are thrust 0.60–0.70 m forward during the lying down process. When a cow wants to get up in a natural way, it firstly rises to its knees, and afterward

the hind part of its body is swung up via the knees, which function as a rocking point.

There are many kinds of dairy cattle, e.g., fries Holland (the Netherlands), Shorthorn (UK), Holstein Friesian (the Netherlands), Jersey (the UK, France), Brown Swiss (Switzerland), Red Danish (Denmark), Drought Master (Australia), etc. The dairy cattle in Indonesia is dominated by Holstein Friesian possessing white and black spot or red spot. The female cow has average weight = 560 Kg to 725 Kg, and the male cow has average weight = 820 Kg to 1000 Kg. The dairy cattle have to grow up from calf, adult cow, mature cow, and old cow. A new-born calf weighs from 90 to 100 pound and has height from 32 to 36 inches. The 6-month-old heifer starts to graze (eat grass) in the pasture. Heifer usually has weight about 400 pounds with height from 38 to 42 inch. Yearling cow has to weight about 700 pounds and height from 47 to 52 inch and still has quite a bit of growing to do before it joins milking herd in another year. Two-year-old dairy cattle start to produce milk and keep on growing for next few years to be a mature cow. It weighs 1200 pound and has height from 53.5 to 55 inch. The last stage is mature cow which has more than 1500 pound and produces optimal milk. Holstein Friesian can produce milk around 57.000 Kg per year with low fat content at approximately 3.5 to 3.7% [28].

air cycle, comfortable lighting, and a shady spot [22, 23].

reproduction and feed selection [26].

### **2.2. Cattle psychology and physiology**

Similar to humans, cows are also able to perform a process of cognitive response to the symptoms of the surrounding environment. Cows perform various activities inside an environment which is described in **Table 1** [20].

Dairy cattle cognitive process is exhibited in performing activities by responding to surrounding environment, which is described as follows [21]:



**Table 1.** Cow activity.

Dairy cattle psychology respects to some conditions, e.g., conditions of the cage, SUI as the comfort index, cow's desire to always be in a lying position, cattle density in the room, comfortable and clean sitting position, high-quality feed and drinks, minimum competition to obtain food and drink, cow gets enough room to conduct activity, anti-slip floor, comfortable air cycle, comfortable lighting, and a shady spot [22, 23].

The anthropometry will set up an economical design work facility which means from a marketing point of view, these facilities could be built cheaply. On the other hand, although the design is made with cheaply but can support the performance of human according to the technical needs. Some considerations that can be utilized in anthropometry include data utilized to indicate the characteristics of the target population with limitations contained in the field conditions. The criteria utilized for the design are appropriate for user facilities. The data collected are expected to represent broad conditions that can be utilized for a wider population [17]. The posture utilized by every individual work will be influenced by the body and tool dimension as well as facilities used. The level of relationship between work facility and posture is influenced by the characteristics and frequency of interaction between the two. Generally, anthropometry is divided in into two branches: static anthropometry and dynamic anthropometry [18, 19]. The static anthropometry deals with the measurement process when the human body is in a stable position or in a static condition. On the other hand, dynamic anthropometry deals with measurement process that relates to the measurement of range of human body movement, for

example, of arm movement, walking position, and head movement to reach an object.

Similar to humans, cows are also able to perform a process of cognitive response to the symptoms of the surrounding environment. Cows perform various activities inside an environ-

Dairy cattle cognitive process is exhibited in performing activities by responding to surround-

**2.** have a certain emotional level that can be formed because of the interaction with the sur-

**3.** ability to show an emotional reaction which is a reflection of the process of cognition;

**2.2. Cattle psychology and physiology**

438 Artificial Intelligence - Emerging Trends and Applications

ment which is described in **Table 1** [20].

**1.** ability to distinguish objects around;

**5.** ability to conduct social learning.

 Eating 2–5 Lying 12–14 Interaction 2–3 Ruminating 7–10 Drinking 0.5 Outside pen 2.5–3.5

rounding environment;

**Table 1.** Cow activity.

ing environment, which is described as follows [21]:

**4.** have a different personality from one to another; and

**No Activities Daily allocation (hour)**

Dairy cattle behaviors are well developed in feeding, environment/microclimate condition, facility design, house, and social communication. The dairy cattle are able to distinguish red, yellow, green, and blue colors. However capability in differentiating between green and blue is poor [24]. Moreover, cattle are able to distinguish simple shapes such as triangles, circles, and line. Color information is important in cow facility design to increase dairy cattle comfort. In the free area, the cow automatically moves from the dark to light area. They tend to avoid strong contrast between sun and shadow. Comparing to human hearing capability, cow possesses almost similar frequency range and are able to listen to high tones that human cannot hear. Cattle hearing is important in inter- and intra-species communication [25]. Cattle sense of touch is important in determining which herbage is rejected or accepted. The secondary/special olfactory system can detect pheromones, volatile chemicals that are important in reproduction and feed selection [26].

Cattle communicate by sending out a different signal such as poses, sound, and smells [24]. A high density of cattle inside the house limits the freedom movement and can increase social stress. Cattle possess a distinct circadian rhythm, in which the main rest, feed, and rumination activities vary according to a fixed pattern. Grazing occupies a large amount of time for dairy cows about 8 hours/day. Grazing behavior is affected by many factors, including environmental conditions and plant species. In a dairy herd of Friesian cows, it was found that there was a consistent order for lying down and standing up [27]. The natural lying down behavior begins when the animal sniffs at the ground while it slowly moves forward. The head and body of the fully developed cow are thrust 0.60–0.70 m forward during the lying down process. When a cow wants to get up in a natural way, it firstly rises to its knees, and afterward the hind part of its body is swung up via the knees, which function as a rocking point.

There are many kinds of dairy cattle, e.g., fries Holland (the Netherlands), Shorthorn (UK), Holstein Friesian (the Netherlands), Jersey (the UK, France), Brown Swiss (Switzerland), Red Danish (Denmark), Drought Master (Australia), etc. The dairy cattle in Indonesia is dominated by Holstein Friesian possessing white and black spot or red spot. The female cow has average weight = 560 Kg to 725 Kg, and the male cow has average weight = 820 Kg to 1000 Kg. The dairy cattle have to grow up from calf, adult cow, mature cow, and old cow. A new-born calf weighs from 90 to 100 pound and has height from 32 to 36 inches. The 6-month-old heifer starts to graze (eat grass) in the pasture. Heifer usually has weight about 400 pounds with height from 38 to 42 inch. Yearling cow has to weight about 700 pounds and height from 47 to 52 inch and still has quite a bit of growing to do before it joins milking herd in another year. Two-year-old dairy cattle start to produce milk and keep on growing for next few years to be a mature cow. It weighs 1200 pound and has height from 53.5 to 55 inch. The last stage is mature cow which has more than 1500 pound and produces optimal milk. Holstein Friesian can produce milk around 57.000 Kg per year with low fat content at approximately 3.5 to 3.7% [28].

Cow body dimension will influence the horizontal movement of the cow when it gets up or lies down. It uses space around 3 m. The moving forward motion is 0.6 m, and minimum distance to the bedding from the head or neck of the cow is approximately 0.2 m [24]. The reach of dairy cattle during feed intake depends on the type of tether and feed alley height. The body length of the cow, from the shoulder area to the tail head and spine, is not flexible which make it difficult for the cow to make sharp changes of direction while it is walking. Therefore much space is required when a cow turns. The range of vision of cattle covers 330–3600 , and the field of vision covered by both eyes at the same time is 25–300 .

(dynamic zoometry) data. Equipment used in this research are a paper sheet, pen, ruler, stopwatch, and handy cam. The static measurement focused on the cow body dimension, e.g., length of the body, length of the leg, body width, neck length, etc. The dynamic zoometry is taken when the cow is engaged in physical activity. The measurement focused on cow movement (walking), moving the tail, moving head during drink or eating, and standing up to lying down or vice versa. To obtain detailed measurement, some videos have been taken during the measurement process. Briefly, statistic test for normality and validity data are pre-

Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural…

http://dx.doi.org/10.5772/intechopen.75136

441

**Figure 1** exhibits one example of measuring the static zoometry for a dairy cow. All measurers have received training before they start working on the dairy farm. Training included how to measure, understanding zoometry concept, knowing the dairy cattle behaviors, and implementing the animal research ethics. Measurements are made under tight control supervision.

The research investigated zoometry for dairy cattle for both of static and dynamic zoometry measurements. Zoometry concept is important to understand the dairy cattle lives. Zoometry dimension is defined in both static and dynamic measurement of cattle body dimensions and cattle behaviors. There are 16 dimensions of static zoometry for dairy cattle. **Figure 2a** and **b**

the dimension of D11 to D13 describes in the lateral direction of dairy cattle. The D1 is for the height of the head, D2 is the height of the body, D3 is the length of neck + head, D14 is head

**Table 2** shows some of the results of static zoometry measurement in 16 dimensions as mentioned before with 25 data, respectively. Homogeneity tests are used to ensure when the data collection of cow body size is in a uniform condition without any specific arrangement. According to the data, the average of data and the deviation standard are calculated for all

to D10 explain the position in the front view of dairy cattle (lengthwise direction), and

,.., D16) of static zoometry in the 2D Picture. The dimen-

The quality control procedures during and after survey are explained afterward.

, D2 , D3

width, D15 is the length of tail, and D16 is the length of horns.

**Figure 1.** Dairy cattle zoometry measurement for static and dynamic data.

sented in zoometry data analysis.

defined all the dimension (D1

sions D1

Similar to a human dimension, there are several things that can affect dairy cattle dimension as defined in measurements as follows:

• *Species*

Any different taxonomic levels will have a tendency of different dimensions. The higher differences level of the taxonomy will have higher different dimensions as well. Zoometry measurement should be taken in the similar to animal species (types).

• *Phase development*

Animals that undergo metamorphosis or change in phase of development will have different dimensions in every phase of its path.

• *Age*

Animal body size will vary in each period of growth.

• *Gender*

Male generally has larger body dimensions than females.

• *Clumps*

Diversity clumps in the animals lead to a tendency of difference in size in any dimension zoometry.

#### **2.3. Zoometry concept**

Similar to anthropometry for human body dimensions, the zoometry concept concerned with the comparative measurement of the animal body and its part as well as the variables which impact these measurements. The main goal of zoometry is to increase the cattle comfortability which can influence physiological and psychological condition. The good physiological and psychological conditions will increase the amount of daily milk production. Zoometry can be utilized to define the best size of cattle facilities, such as cage size, stall, floor, etc.

The first step to create the zoometry concept is developing a database of cattle dimensions. The zoometry data of dairy cattle are collected from dairy farmers in Indonesia with an average age between 3.5 to 6 years old (optimal daily milk production). Total dairy cattle taken as the sample was around 500 samples. Generally, the measurement is divided into two sections for static dimension (mentioned later as static zoometry) data and for cow movement (dynamic zoometry) data. Equipment used in this research are a paper sheet, pen, ruler, stopwatch, and handy cam. The static measurement focused on the cow body dimension, e.g., length of the body, length of the leg, body width, neck length, etc. The dynamic zoometry is taken when the cow is engaged in physical activity. The measurement focused on cow movement (walking), moving the tail, moving head during drink or eating, and standing up to lying down or vice versa. To obtain detailed measurement, some videos have been taken during the measurement process. Briefly, statistic test for normality and validity data are presented in zoometry data analysis.

Cow body dimension will influence the horizontal movement of the cow when it gets up or lies down. It uses space around 3 m. The moving forward motion is 0.6 m, and minimum distance to the bedding from the head or neck of the cow is approximately 0.2 m [24]. The reach of dairy cattle during feed intake depends on the type of tether and feed alley height. The body length of the cow, from the shoulder area to the tail head and spine, is not flexible which make it difficult for the cow to make sharp changes of direction while it is walking. Therefore much space is required when a cow turns. The range of vision of cattle covers 330–3600

Similar to a human dimension, there are several things that can affect dairy cattle dimension

Any different taxonomic levels will have a tendency of different dimensions. The higher differences level of the taxonomy will have higher different dimensions as well. Zoometry mea-

Animals that undergo metamorphosis or change in phase of development will have different

Diversity clumps in the animals lead to a tendency of difference in size in any dimension

Similar to anthropometry for human body dimensions, the zoometry concept concerned with the comparative measurement of the animal body and its part as well as the variables which impact these measurements. The main goal of zoometry is to increase the cattle comfortability which can influence physiological and psychological condition. The good physiological and psychological conditions will increase the amount of daily milk production. Zoometry can be

The first step to create the zoometry concept is developing a database of cattle dimensions. The zoometry data of dairy cattle are collected from dairy farmers in Indonesia with an average age between 3.5 to 6 years old (optimal daily milk production). Total dairy cattle taken as the sample was around 500 samples. Generally, the measurement is divided into two sections for static dimension (mentioned later as static zoometry) data and for cow movement

utilized to define the best size of cattle facilities, such as cage size, stall, floor, etc.

.

the field of vision covered by both eyes at the same time is 25–300

surement should be taken in the similar to animal species (types).

as defined in measurements as follows:

440 Artificial Intelligence - Emerging Trends and Applications

dimensions in every phase of its path.

Animal body size will vary in each period of growth.

Male generally has larger body dimensions than females.

• *Species*

• *Age*

• *Gender*

• *Clumps*

zoometry.

**2.3. Zoometry concept**

• *Phase development*

, and

**Figure 1** exhibits one example of measuring the static zoometry for a dairy cow. All measurers have received training before they start working on the dairy farm. Training included how to measure, understanding zoometry concept, knowing the dairy cattle behaviors, and implementing the animal research ethics. Measurements are made under tight control supervision. The quality control procedures during and after survey are explained afterward.

The research investigated zoometry for dairy cattle for both of static and dynamic zoometry measurements. Zoometry concept is important to understand the dairy cattle lives. Zoometry dimension is defined in both static and dynamic measurement of cattle body dimensions and cattle behaviors. There are 16 dimensions of static zoometry for dairy cattle. **Figure 2a** and **b** defined all the dimension (D1 , D2 , D3 ,.., D16) of static zoometry in the 2D Picture. The dimensions D1 to D10 explain the position in the front view of dairy cattle (lengthwise direction), and the dimension of D11 to D13 describes in the lateral direction of dairy cattle. The D1 is for the height of the head, D2 is the height of the body, D3 is the length of neck + head, D14 is head width, D15 is the length of tail, and D16 is the length of horns.

**Table 2** shows some of the results of static zoometry measurement in 16 dimensions as mentioned before with 25 data, respectively. Homogeneity tests are used to ensure when the data collection of cow body size is in a uniform condition without any specific arrangement. According to the data, the average of data and the deviation standard are calculated for all

**Figure 1.** Dairy cattle zoometry measurement for static and dynamic data.

where N′ = data should be taken.

k = level of confidence

x<sup>i</sup> = observation data

*N*' = [

Following statements explained dynamic dimension:

• D19 dimension is leg reach on walking movement.

• D21 dimension is length for laying down movement.

• D22 dimension is length for raising movement.

**Figure 3.** Dairy cattle dynamic zoometry dimensions (D17, D19, D20).

• D17 dimension is angle scope for vertical movement of a cow head.

• D20 dimension is angle scope for horizontal movement of cow tail.

• D23 dimension is width for lying down or getting up movement.

• D18 dimension is angle scope for horizontal movement of a cow head.

s = level of error

N = data have been collected

(2/0.05 √

\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ 25(966289) <sup>−</sup> (983)2) \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_ <sup>983</sup> ]

Human dynamic anthropometry is concerned with the measurement of human work or human motion, e.g., hand movement, sitting down, turning, etc. In analogy with human data, the dynamic zoometry is defined with animal measurement (dairy cattle) on movement and cattle behaviors, e.g., vertical head movement to reach food, the vertical movement to lie down/get up, tail movement, etc. The data is very important in designing the comfortable cattle facilities, e.g., free stall, watering system, floor, house, and feeding rack. **Figure 3** explains the dimensions of dairy cattle dynamic zoometry. There are seven dimensions for D17 to D23.

2 = 7

Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural…

http://dx.doi.org/10.5772/intechopen.75136

443

**Figure 2.** Static zoometry dimensions: (a) front view of cow dimensions (D1 to D10) and (b) the backside of cow dimensions (D11 to D13).


**Table 2.** Data measurement results from 500 number of source data of cattle for static dairy cattle dimension.

dimensions (D1–D16), as example, D1 has average *D*¯¯= 39.64 and deviation standard (σ1 ) = 2.68. Moreover, D1 has lower control limit (LCL) = 34.17, and upper control limit (UCL) = 44.47 is categorized in homogeneity data. The same way, the other static cattle dimensions (D2–D16) are also categorized in homogeneity data. As a result the data measurement is ready to use for the next step of sufficient data test.

Sufficient data test for static cattle dimensions is determined based on formula 1. To calculate the number of data requirement (N′), the research select confidence level of collecting data 95% (k = 2), and error = 5% (s = 0.05) which has N′ = 7. As a result the data D1 can be categorized in sufficient data (N′ < N). Using the same way, the data of the other dimension D2–D16 all are categorized in sufficient data:

$$\mathcal{N}^{\top} = \left[ \frac{\left( k \,/\, s \sqrt{N \sum\_{i=1}^{N} \mathbf{x}\_i - \left( \sum\_{i=1}^{N} \mathbf{x}\_i \right)^2} \right)}{\sum\_{i=1}^{N} \mathbf{x}\_i} \right]^2$$

#### where N′ = data should be taken.


 $\mathbf{x}\_{\mathrm{i}} = \text{observation data}$ 
$$N = \left[\frac{\left\{2/0.05\sqrt{25(966289) - (983)^2}\right\}}{983}\right]^2 = 77$$

Human dynamic anthropometry is concerned with the measurement of human work or human motion, e.g., hand movement, sitting down, turning, etc. In analogy with human data, the dynamic zoometry is defined with animal measurement (dairy cattle) on movement and cattle behaviors, e.g., vertical head movement to reach food, the vertical movement to lie down/get up, tail movement, etc. The data is very important in designing the comfortable cattle facilities, e.g., free stall, watering system, floor, house, and feeding rack. **Figure 3** explains the dimensions of dairy cattle dynamic zoometry. There are seven dimensions for D17 to D23. Following statements explained dynamic dimension:


dimensions (D1–D16), as example, D1 has average *D*¯¯= 39.64 and deviation standard (σ1

**Table 2.** Data measurement results from 500 number of source data of cattle for static dairy cattle dimension.

**No Explanation Dairy cattle dimension**

D1 Height of the head 36 38 — 38 40 D2 Height of cow body 77 78 — 78 79 D3 Length of the head + neck 104 106 — 104 105 — — — — — — — D15 Length of the tail 114 113 — 115 120 D16 Length of the corn 9 14 — 10 8

the next step of sufficient data test.

(D11 to D13).

**Figure 2.** Static zoometry dimensions: (a) front view of cow dimensions (D1

442 Artificial Intelligence - Emerging Trends and Applications

all are categorized in sufficient data:

Moreover, D1 has lower control limit (LCL) = 34.17, and upper control limit (UCL) = 44.47 is categorized in homogeneity data. The same way, the other static cattle dimensions (D2–D16) are also categorized in homogeneity data. As a result the data measurement is ready to use for

Sufficient data test for static cattle dimensions is determined based on formula 1. To calculate the number of data requirement (N′), the research select confidence level of collecting data 95% (k = 2), and error = 5% (s = 0.05) which has N′ = 7. As a result the data D1 can be categorized in sufficient data (N′ < N). Using the same way, the data of the other dimension D2–D16

) = 2.68.

to D10) and (b) the backside of cow dimensions

**1 2 — 499 500**

(1)

• D23 dimension is width for lying down or getting up movement.

**Figure 3.** Dairy cattle dynamic zoometry dimensions (D17, D19, D20).

To increase comfort during rising or lie down movement, the resting area must provide cattle with the easy movement for vertical, forward, and lateral movement without obstruction, injury, or fear. A rising motion includes the freedom to lunge forward, bob the head up or down, and stride forward. Resting motion also includes the freedom to lunge forward and bob the head. Each time the cow lies down, a cow puts about two-thirds of body weight on its front knees. Then the knees drop freely to the floor from a height of 20 to 30 centimeter. Therefore it is very important to provide best-quality bedding; as consequence, the cow can painlessly lie down at any time. The easy method to know the comfort level is to look at and check how fast a cow lies down in a cubicle.

## **3. Back propagation neural network (BPNN)**

A neural network can be described as a *black box* that knows how to process input system to create useful outputs. Neural network is defined as "interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal neuron [29]. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns." The NN calculation is very complex and difficult to understand by using a mathematical model. Neural network copied the working system of the biological nervous system as an example for the brain to process the information. Other experts define the neural network (NN) as a powerful data modeling tool capable to capture and represent complex input/output relationships involving many factors [30].

which explained how to divide the composition of training data, for cross validation and testing. The training data is required for a neural network to predict aerodynamic coefficient [32]. The paper shows manual NN training comparisons based on different transfer functions and training datasets. It is noted that dataset is an important part to obtain a better MSE performance. Commonly, the training data is >60%, the cross validation is ≈15%, and testing

Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural…

http://dx.doi.org/10.5772/intechopen.75136

445

Structure of BPNN model is a key performance to accelerate for reducing the gap between NN output and target during training the system. The BPNN structure contains a number of neurons, transfer function, and a number of hidden layers. The common method to create the BPNN structure is by the trial-and-error method. The method will be time-consuming during the training procedure. The optimal design of neural network using the Taguchi Method [33]. Moreover, we can use taguchi method to optimize the structure of BPNN for a limited amount of data [34]. Genetic algorithm (GA) is utilized to optimize the adjustment of the

**Figure 5** chronologically explains the interconnection between BPNN and GA applications to adjust weight parameter in the quick propagation learning rule. The activities in **Figure 5** contain collection and preparation of data, define robust (optimum) NN architectures, initialize population (connection weights and thresholds), assign input and output values to NN, compute hidden layer values, compute output values, and compute fitness using MSE formula. To find the best NN weight, at the start of the genetic algorithm, an initial population of chromosomes is created. The gene values are assigned to the initial weights of the network, and the network is trained based on the back propagation neural network (BPNN) algorithm. The next step of the algorithm is the fitness values of all the chromosomes of population evaluated; the inverse of MSE is regarded as the fitness function of GA. The individual's genes are modified by crossover and mutation process. These operations result in a new-generation population of chromosomes. The generational process is repeated until convergence condition has been achieved. The weights of the BPNN network are created via a global optimization using GA, which increases the quality and the performance of the BPNN model. In the end, the neural network was trained with selected weight connection. According to the reference [36], the best BPNN structure is described as follows: one hidden layer, initial neuron = 17, tanh transfer function between input and hidden layer, the linear sigmoid transfer function between the hidden layer and output layer, quick propagation learning algorithm,

data is ≈10%.

weight values during the training process.

**Figure 4.** Description of BPNN training the data sets [29].

The most common neural network model is the multilayer perceptron (MLP) which contains three layers: input layer, hidden layer, and an output layer. This type of neural network is known as a supervised network because it requires the desired output in order to learn. The goal of this type of network is to create a model that correctly maps the input to the output using historical data; therefore, the model can then be utilized to produce an output when the desired output is unknown. The MLP and many other neural networks learn using an algorithm called "back propagation" [31]. The goal of a back propagation neural network (BPNN) is to minimize the error which in the project is shown as a mean square error (MSE). With each presentation, the output of the neural network is compared to the desired output, and an error is computed. This error is then fed back (back propagated) to the neural network and utilized to adjust the weights such that the error decreases with each iteration and the neural model gets increasingly closer to producing the desired output. This process is known as "training." **Figure 4** describes how the BPNN system works to minimize the gap between target and output by adjusting network weight.

BPNN works in three parts of a database called the set of training data, set of cross validation data, and set of testing data. *Training data* is utilized in the neural network to learn the databases' correlation and its function as an input data. *Cross validation data* is utilized to evaluate the performance of the learning process to avoid over-training. *Testing data* is utilized to evaluate the performance of the training when it is complete. Production input data was fed into the trained neural network to produce an output. There is no evidence of references

**Figure 4.** Description of BPNN training the data sets [29].

To increase comfort during rising or lie down movement, the resting area must provide cattle with the easy movement for vertical, forward, and lateral movement without obstruction, injury, or fear. A rising motion includes the freedom to lunge forward, bob the head up or down, and stride forward. Resting motion also includes the freedom to lunge forward and bob the head. Each time the cow lies down, a cow puts about two-thirds of body weight on its front knees. Then the knees drop freely to the floor from a height of 20 to 30 centimeter. Therefore it is very important to provide best-quality bedding; as consequence, the cow can painlessly lie down at any time. The easy method to know the comfort level is to look at and

A neural network can be described as a *black box* that knows how to process input system to create useful outputs. Neural network is defined as "interconnected assembly of simple processing elements, units or nodes, whose functionality is loosely based on the animal neuron [29]. The processing ability of the network is stored in the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from, a set of training patterns." The NN calculation is very complex and difficult to understand by using a mathematical model. Neural network copied the working system of the biological nervous system as an example for the brain to process the information. Other experts define the neural network (NN) as a powerful data modeling tool capable to capture and represent complex input/out-

The most common neural network model is the multilayer perceptron (MLP) which contains three layers: input layer, hidden layer, and an output layer. This type of neural network is known as a supervised network because it requires the desired output in order to learn. The goal of this type of network is to create a model that correctly maps the input to the output using historical data; therefore, the model can then be utilized to produce an output when the desired output is unknown. The MLP and many other neural networks learn using an algorithm called "back propagation" [31]. The goal of a back propagation neural network (BPNN) is to minimize the error which in the project is shown as a mean square error (MSE). With each presentation, the output of the neural network is compared to the desired output, and an error is computed. This error is then fed back (back propagated) to the neural network and utilized to adjust the weights such that the error decreases with each iteration and the neural model gets increasingly closer to producing the desired output. This process is known as "training." **Figure 4** describes how the BPNN system works to minimize the gap between

BPNN works in three parts of a database called the set of training data, set of cross validation data, and set of testing data. *Training data* is utilized in the neural network to learn the databases' correlation and its function as an input data. *Cross validation data* is utilized to evaluate the performance of the learning process to avoid over-training. *Testing data* is utilized to evaluate the performance of the training when it is complete. Production input data was fed into the trained neural network to produce an output. There is no evidence of references

check how fast a cow lies down in a cubicle.

444 Artificial Intelligence - Emerging Trends and Applications

put relationships involving many factors [30].

target and output by adjusting network weight.

**3. Back propagation neural network (BPNN)**

which explained how to divide the composition of training data, for cross validation and testing. The training data is required for a neural network to predict aerodynamic coefficient [32]. The paper shows manual NN training comparisons based on different transfer functions and training datasets. It is noted that dataset is an important part to obtain a better MSE performance. Commonly, the training data is >60%, the cross validation is ≈15%, and testing data is ≈10%.

Structure of BPNN model is a key performance to accelerate for reducing the gap between NN output and target during training the system. The BPNN structure contains a number of neurons, transfer function, and a number of hidden layers. The common method to create the BPNN structure is by the trial-and-error method. The method will be time-consuming during the training procedure. The optimal design of neural network using the Taguchi Method [33]. Moreover, we can use taguchi method to optimize the structure of BPNN for a limited amount of data [34]. Genetic algorithm (GA) is utilized to optimize the adjustment of the weight values during the training process.

**Figure 5** chronologically explains the interconnection between BPNN and GA applications to adjust weight parameter in the quick propagation learning rule. The activities in **Figure 5** contain collection and preparation of data, define robust (optimum) NN architectures, initialize population (connection weights and thresholds), assign input and output values to NN, compute hidden layer values, compute output values, and compute fitness using MSE formula. To find the best NN weight, at the start of the genetic algorithm, an initial population of chromosomes is created. The gene values are assigned to the initial weights of the network, and the network is trained based on the back propagation neural network (BPNN) algorithm. The next step of the algorithm is the fitness values of all the chromosomes of population evaluated; the inverse of MSE is regarded as the fitness function of GA. The individual's genes are modified by crossover and mutation process. These operations result in a new-generation population of chromosomes. The generational process is repeated until convergence condition has been achieved. The weights of the BPNN network are created via a global optimization using GA, which increases the quality and the performance of the BPNN model. In the end, the neural network was trained with selected weight connection. According to the reference [36], the best BPNN structure is described as follows: one hidden layer, initial neuron = 17, tanh transfer function between input and hidden layer, the linear sigmoid transfer function between the hidden layer and output layer, quick propagation learning algorithm,

important for cattle to provide comfortable space for rest. Nigel B. Cook [9] research on free stall design for maximum cow comfort reported that 11.3 hours is needed for lying down in the stall and 2.9 hours for standing in the stall a day, in total 14.2 hours per day contact with free stall (around = 59.17%). The cow must have free stall design correctly sized as it correlates with milk production. A lot of farmers reduce stall length and width in order to save construction cost. It will reduce the level of cow comfort and milk production. Free stall should be designed correctly and maintained and should be sloped from front to back and provide a

Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural…

http://dx.doi.org/10.5772/intechopen.75136

447

To control the steps of the research which are logically right, flowchart of developing and implementing the zoometry concept is presented as can be seen in **Figure 6**. According to the graph, the next step after collecting the static and dynamic data is doing the statistic test for checking the data quality. Manual measuring for dynamics data is time-consuming and costly and requires more energy, e.g., measuring the length for raising movement. The database construction is developed based on **Tables 2** and **3**, **Table 2** as input data and **Table 3** as desired data. As a result, BPNN can predict easily the dynamics zoometry dimension from any inputs of static zoometry dimension. To look for the best neural network structure, at the same time, the GA method is employed during NN training. BPNN module is ready to use to

**Figure 7** describes BPNN training result using genetic algorithm (GA) optimization in one replication. According to the graph, the training process will stop in 49 generations with mean square error (MSE) = 0.0287. BPNN model is ready to predict any input data to determine output data (dynamics data). The BPNN model is very useful for the user conduct test to determine cow behavior correlated with cow dimensions in design process. The user can put any

(D17 to D23). As example, the input values are D1 = 38 cm, D2 = 78 cm, D3 = 109 cm, D<sup>4</sup> = 157 cm, D5 = 137 cm, D6 = 51 cm, D7 = 58 cm, D8 = 41 cm, D<sup>9</sup> = 61 cm, D10 = 118 cm, D11 = 63 cm, D12 = 47 cm, D13 = 31 cm, D14 = 19 cm, D15 = 122 cm, and D16 = 14 cm, and produced output values are D17 = 52 cm, D18 = 214 cm, D19 = 54 cm, D20 = 121 cm, D21 = 312 cm, D22 = 301 cm, and

to D16) to predict dynamics cattle dimension

**1 2 — 499 500**

) 50 55 — 43 45

) 120 110 — 112 103

) 200 220 — 225 240

be part of designing the cattle comfort facilities, e.g., free stall, cattle house, etc.

**No Explanation Dairy cattle dimension (cm)**

**Table 3.** Data measurement results from 500 dairy cattle for dynamic dairy cattle dimension.

D19 Step walking (cm) 53 72 — 77 64

D21 Space for lying down (cm) 313 309 — 312 318 D22 Space for getting up (cm) 300 296 — 299 205 D23 Width space (cm) 132 130 — 120 118

"normal value" of static cattle dimensions (D1

D17 Vertical head movement (0

D20 Cow tail movement (0

D18 Horizontal head movement (0

comfortable surface.

**Figure 5.** Neuro-genetic algorithm in BPNN development [35].

and 5000 epochs. GA parameters are selected as follows: the Roulette rule is employed to select the best chromosome based on proportionality to its rank, the initial values for learning rate and momentum are 0.5000 and 0.0166, number of population is 50 chromosomes and epoch number was 100 at maximum, initial network weight factor is 0.1074, mutation probability is 0.01, and heuristic crossover was utilized.

## **4. Construction of prediction model for cattle facilities using BPNN**

The dairy cattle facilities should support cattle activities such as resting, drinking, eating, and milking. The facilities must guarantee the cattle will averse being stuck, injuries, and stress behaviors. The best facilities are indicated by cattle comfort level and increasing milk production. Moreover, the floor should not be warm and humid to reduce possible skin injuries and added thermal comfort. The other facility is a watering system; a watering cup should have an opening of at least 0.06 m2 , approximately 30 cm in diameters or similar opening size [13]. It is recommended that the main water supply is ring connected and that the water is under a constant pressure. The best method of watering is supplying via service pipe; it will make sure fresh water is always supplied with a minimum amount of dirt. The free stall is very important for cattle to provide comfortable space for rest. Nigel B. Cook [9] research on free stall design for maximum cow comfort reported that 11.3 hours is needed for lying down in the stall and 2.9 hours for standing in the stall a day, in total 14.2 hours per day contact with free stall (around = 59.17%). The cow must have free stall design correctly sized as it correlates with milk production. A lot of farmers reduce stall length and width in order to save construction cost. It will reduce the level of cow comfort and milk production. Free stall should be designed correctly and maintained and should be sloped from front to back and provide a comfortable surface.

To control the steps of the research which are logically right, flowchart of developing and implementing the zoometry concept is presented as can be seen in **Figure 6**. According to the graph, the next step after collecting the static and dynamic data is doing the statistic test for checking the data quality. Manual measuring for dynamics data is time-consuming and costly and requires more energy, e.g., measuring the length for raising movement. The database construction is developed based on **Tables 2** and **3**, **Table 2** as input data and **Table 3** as desired data. As a result, BPNN can predict easily the dynamics zoometry dimension from any inputs of static zoometry dimension. To look for the best neural network structure, at the same time, the GA method is employed during NN training. BPNN module is ready to use to be part of designing the cattle comfort facilities, e.g., free stall, cattle house, etc.

**Figure 7** describes BPNN training result using genetic algorithm (GA) optimization in one replication. According to the graph, the training process will stop in 49 generations with mean square error (MSE) = 0.0287. BPNN model is ready to predict any input data to determine output data (dynamics data). The BPNN model is very useful for the user conduct test to determine cow behavior correlated with cow dimensions in design process. The user can put any "normal value" of static cattle dimensions (D1 to D16) to predict dynamics cattle dimension (D17 to D23). As example, the input values are D1 = 38 cm, D2 = 78 cm, D3 = 109 cm, D<sup>4</sup> = 157 cm, D5 = 137 cm, D6 = 51 cm, D7 = 58 cm, D8 = 41 cm, D<sup>9</sup> = 61 cm, D10 = 118 cm, D11 = 63 cm, D12 = 47 cm, D13 = 31 cm, D14 = 19 cm, D15 = 122 cm, and D16 = 14 cm, and produced output values are D17 = 52 cm, D18 = 214 cm, D19 = 54 cm, D20 = 121 cm, D21 = 312 cm, D22 = 301 cm, and


**Table 3.** Data measurement results from 500 dairy cattle for dynamic dairy cattle dimension.

and 5000 epochs. GA parameters are selected as follows: the Roulette rule is employed to select the best chromosome based on proportionality to its rank, the initial values for learning rate and momentum are 0.5000 and 0.0166, number of population is 50 chromosomes and epoch number was 100 at maximum, initial network weight factor is 0.1074, mutation prob-

**4. Construction of prediction model for cattle facilities using BPNN**

The dairy cattle facilities should support cattle activities such as resting, drinking, eating, and milking. The facilities must guarantee the cattle will averse being stuck, injuries, and stress behaviors. The best facilities are indicated by cattle comfort level and increasing milk production. Moreover, the floor should not be warm and humid to reduce possible skin injuries and added thermal comfort. The other facility is a watering system; a watering cup should have

It is recommended that the main water supply is ring connected and that the water is under a constant pressure. The best method of watering is supplying via service pipe; it will make sure fresh water is always supplied with a minimum amount of dirt. The free stall is very

, approximately 30 cm in diameters or similar opening size [13].

ability is 0.01, and heuristic crossover was utilized.

**Figure 5.** Neuro-genetic algorithm in BPNN development [35].

446 Artificial Intelligence - Emerging Trends and Applications

an opening of at least 0.06 m2

standard σ = 3.70 cm and then using percentile 95th will produce Wzoometry = 125.96 cm + 1.64 ×

on the data in **Table 1** and BPNN test, H has average 194.64 cm and deviation standard σ = 6.49 cm and then by using percentile 95th will produce Hzoometry = 194.64 cm + 1.64 ×

Cattle house design is recommended based on the results of zoometry calculation of length, width, and height. The cattle house design should have minimum value of length = 357.67 cm, width = 132.03 cm, and height = 205.28 cm. The height dimension of cattle house must consider the other factors such as air circulation, lighting, and the other facilities especially in a tropical climate with higher level of temperature and relative humidity. It can increase the heat stress index, which finally reduces milk production. In the United States, climate change that makes higher temperature and humidity than normal is likely to affect milk production

The paper has successfully developed the concept of zoometry to describe the dimensions of dairy cattle to design facilities. There are two zoometry for static and dynamics condition with a total number of dimensions at 16 and 7. The chapter successfully presented the BPNN training as the complexity of dynamic data (cattle motion behavior) correlated with cattle dimension. The paper also describes how to implement the zoometry concept in order to develop cattle house design. This method could be used to design other facilities such as free

Using 500 cattle data source, the zoometry concept still fluctuates despite success in homogeny test and data-sufficient test. As consequence, a huge number of dimension data is required to

because dairy cows are sensitive to excessive temperature and humidity.

+ *D*¯ <sup>9</sup> + (*D*¯ <sup>3</sup>

Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural…

× tan (0.5 × *D*¯ <sup>17</sup>)). Based

http://dx.doi.org/10.5772/intechopen.75136

449

3.70 cm = 132.03 cm

6.49 cm = 205.28 cm.

**5. Conclusion(s)**

• Height of cattle house (H)

Height of cattle house is defined as the summation of *D*¯ <sup>2</sup>

**Figure 7.** Static and dynamic zoometry training in BPNN-GA application.

stall, watering system, designing floor, and feeding rack.

**Figure 6.** Flowchart of developing and implementing the zoometry concept using BPNN module.

D23 = 130 cm. The following stage involved implementing the zoometry concept and BPNN model to evaluate and redesign the cattle facilities. The first step of designing the cattle house is defined the house parameters which are described as follows:

• Length of cattle house (L)

Length of cattle house is defined as the total summation of length for lying down and length for getting up minus cattle length or L = *D*¯ 21 + *<sup>D</sup>*¯22 – (*D*¯<sup>3</sup> + *D*¯<sup>4</sup> ). Based on data in **Table 1** for *D*¯<sup>3</sup> + *D*¯4 and BPNN test for *D*¯ <sup>21</sup> + *D*¯22, L has average 346.67 cm and deviation standard σ = 6.71 cm then by using percentile 95th will produce Lzoometry = 346.67 cm + 1.64 × 6.71 cm = 357.67 cm.

• Width of cattle house (W)

The width of cattle house is defined as the space for lying down or getting up easily or defined in D23. Based on the data in BPNN test, *<sup>D</sup>*23 has value = 125.96 cm and deviation

**Figure 7.** Static and dynamic zoometry training in BPNN-GA application.

standard σ = 3.70 cm and then using percentile 95th will produce Wzoometry = 125.96 cm + 1.64 × 3.70 cm = 132.03 cm

• Height of cattle house (H)

Height of cattle house is defined as the summation of *D*¯ <sup>2</sup> + *D*¯ <sup>9</sup> + (*D*¯ <sup>3</sup> × tan (0.5 × *D*¯ 17)). Based on the data in **Table 1** and BPNN test, H has average 194.64 cm and deviation standard σ = 6.49 cm and then by using percentile 95th will produce Hzoometry = 194.64 cm + 1.64 × 6.49 cm = 205.28 cm.

Cattle house design is recommended based on the results of zoometry calculation of length, width, and height. The cattle house design should have minimum value of length = 357.67 cm, width = 132.03 cm, and height = 205.28 cm. The height dimension of cattle house must consider the other factors such as air circulation, lighting, and the other facilities especially in a tropical climate with higher level of temperature and relative humidity. It can increase the heat stress index, which finally reduces milk production. In the United States, climate change that makes higher temperature and humidity than normal is likely to affect milk production because dairy cows are sensitive to excessive temperature and humidity.

#### **5. Conclusion(s)**

D23 = 130 cm. The following stage involved implementing the zoometry concept and BPNN model to evaluate and redesign the cattle facilities. The first step of designing the cattle house

Length of cattle house is defined as the total summation of length for lying down and length

 and BPNN test for *D*¯ <sup>21</sup> + *D*¯22, L has average 346.67 cm and deviation standard σ = 6.71 cm then by using percentile 95th will produce Lzoometry = 346.67 cm + 1.64 × 6.71 cm = 357.67 cm.

The width of cattle house is defined as the space for lying down or getting up easily or defined in D23. Based on the data in BPNN test, *<sup>D</sup>*23 has value = 125.96 cm and deviation

+ *D*¯<sup>4</sup>

). Based on data in **Table 1** for *D*¯<sup>3</sup>

+

is defined the house parameters which are described as follows:

**Figure 6.** Flowchart of developing and implementing the zoometry concept using BPNN module.

for getting up minus cattle length or L = *D*¯ <sup>21</sup> + *D*¯22 – (*D*¯<sup>3</sup>

• Length of cattle house (L)

448 Artificial Intelligence - Emerging Trends and Applications

• Width of cattle house (W)

*D*¯4

The paper has successfully developed the concept of zoometry to describe the dimensions of dairy cattle to design facilities. There are two zoometry for static and dynamics condition with a total number of dimensions at 16 and 7. The chapter successfully presented the BPNN training as the complexity of dynamic data (cattle motion behavior) correlated with cattle dimension. The paper also describes how to implement the zoometry concept in order to develop cattle house design. This method could be used to design other facilities such as free stall, watering system, designing floor, and feeding rack.

Using 500 cattle data source, the zoometry concept still fluctuates despite success in homogeny test and data-sufficient test. As consequence, a huge number of dimension data is required to obtain steady zoometry. The zoometry concept will be an important topic of research in the future correlated with cattle comfort and cattle productivity.

[9] Cook NB, Bennett TB, Nordlund KV. Monitoring indices of cow comfort in free-stall-Houtilized dairy herds. Journal of Dairy Science. 2005;**88**(11):3876-3885. https://doi.

Static/Dynamic Zoometry Concept to Design Cattle Facilities Using Back Propagation Neural…

http://dx.doi.org/10.5772/intechopen.75136

451

[10] Krawczel PD, Klaiber LB, Butzler RE, Klaiber LM, Dann HM, Mooney CS, Grant RJ. Shortterm increases in stocking density affect the lying and social behavior, but not the productivity, of lactating Holstein dairy cows. Journal of Dairy Science. 2012;**95**(8):4298-4308.

[11] Vazquez OP, Smith TR. Factors affecting pasture intake and Total dry matter intake in grazing dairy cows. Journal of Dairy Science. 2000;**83**(10):2301-2309. https://doi.

[12] Jim R. Dairy Facilities and Cow Comfort. Tulare, CA: Veterinary Medicine Teaching and

[13] Bridger R. Introduction to Egonomics. Engineering. Vol. 8. 2003. https://doi.org/10.4324/

[14] International Organization for Standardization (ISO) (2008), Basic human body mea-

[15] Taifa IW, Desai DA. Anthropometric measurements for ergonomic design of students' furniture in India. Engineering Science and Technology: An International Journal.

[16] Masson AE, Hignett S, Gyi DE. Anthropometric study to understand body size and shape for plus size people at work. Procedia Manufacturing. 2015;**3**:5647-5654. https://

[17] Pheasant S, Bodyspace Anthropometry, Ergonomics and the Design of Work Second

[18] Zhang P, Qint S, Wright DK Novel method of capturing static and dynamic anthropometric data for home design, IEEE; 2005. https://doi.org/10.1109/EURCON.2005.1629990

[19] Kroemer KHE. Engineering anthropometry. Ergonomics. 1989;**32**(7):767-784. https://doi.

[20] Grant R. Taking advantage of natural behavior improves dairy cow performance. In:

[21] Marino L, Allen K. The psychology of cows. Animal Behavior and Cognition. 2017;**4**(44):

[22] Krawczel P, Grant R. Effects of cow comfort on milk quality, productivity, and behavior. 2013. http://articles.extension.org:80/pages/70107/effects-of-cow-comfort-on-milk-quality-

org/10.3168/jds.S0022-0302(05)73073-3

https://doi.org/10.3168/jds.2011-4687

org/10.3168/jds.S0022-0302(00)75117-4

doi.org/10.1016/j.promfg.2015.07.776

org/10.1080/00140138908966841

productivity-and-behavior

surements for technological design (ISO 7250-1;2008)

2017;**20**(1):232-239. https://doi.org/10.1016/j.jestch.2016.08.004

Western Dairy Management Conference. 2007. pp. 1-13

[23] Dairy NZ. Dairy NZ Economic Survey 2007-08. In Report; 2009

474-498. https://doi.org/10.26451/abc.04.04.06.2017

Research Center; 2011

9780203426135

Edition; 2003

## **Author details**

Sugiono Sugiono1 \*, Rudy Soenoko2 and Rio Prasetyo Lukodono1

\*Address all correspondence to: Sugiono\_ub@ub.ac.id


## **References**


[9] Cook NB, Bennett TB, Nordlund KV. Monitoring indices of cow comfort in free-stall-Houtilized dairy herds. Journal of Dairy Science. 2005;**88**(11):3876-3885. https://doi. org/10.3168/jds.S0022-0302(05)73073-3

obtain steady zoometry. The zoometry concept will be an important topic of research in the

and Rio Prasetyo Lukodono1

[1] Rozenberg S, Body JJ, Bruyère O, Bergmann P, Brandi ML, Cooper C, Reginster JY, et al. Effects of dairy products consumption on health: Benefits and beliefs—A Commentary from the Belgian Bone Club and the European Society for Clinical and Economic aspects of osteoporosis, osteoarthritis and musculoskeletal diseases. Calcified Tissue

[2] Choi IY, Lee P, Denney DR, Spaeth K, Nast O, Ptomey L, Sullivan DK, et al. Dairy intake is associated with brain glutathione concentration in older adults. American Journal of

[3] Faghih A, Anoosheh M, Ahmadi F, Ghofranipoor F. The effect of boy students' participation on consumption of milk and dairy. Hormozgan Medical Journal. 2007;**10**(4):349-356

[4] Davoodi H, Esmaeili S, Mortazavian AM. Effects of milk and milk products consumption on cancer: A review. Comprehensive Reviews in Food Science and Food Safety.

[5] Flynn A, Cashman K. Nutritional aspects of minerals in bovine and human milks. In: Fox PF, editor. Advanced Dairy Chemistry, Vol. 3 Lactose, Water, Salts and Vitamins.

[6] Fox PF, Mcsweeney PLH. Dairy Chemistry and Biochemistry. 1998;**1542**(9):478. https://

[7] Holt C. Effect of heating and cooling on the milk salts and their interaction with casein. In: Fox PF, editor. Heat Induced Changes in Milk. 2nd ed. Brussels: International Dairy

[8] Öste R, Jägerstad M, Anderson I. Vitamins in milk and milk products. In: Fox PF, editor. Advanced Dairy Chemistry, Vol. 3 Lactose, Water, Salts and Vitamins. 2nd ed. London:

Clinical Nutrition. 2015;**101**(2):287-293. https://doi.org/10.3945/ajcn.114.096701

future correlated with cattle comfort and cattle productivity.

\*, Rudy Soenoko2

450 Artificial Intelligence - Emerging Trends and Applications

\*Address all correspondence to: Sugiono\_ub@ub.ac.id

1 Industrial Engineering, Universitas Brawijaya, Malang, Indonesia 2 Mechanical Engineering, Universitas Brawijaya, Malang, Indonesia

International. 2016. https://doi.org/10.1007/s00223-015-0062-x

2013;**12**(3):249-264. https://doi.org/10.1111/1541-4337.12011

2nd ed. London: Chapman & Hall; 1997

doi.org/10.1007/978-3-319-14892-2

Federation; 1995. pp. 159-178

Chapman & Hall; 1997

**Author details**

Sugiono Sugiono1

**References**


[24] Anonymous, Interdisciplinary report. Housing Design for Cattle – Danish Recommendation. Third Edition 2001. The Danish Agricultural Advisory Center*,* Translated into

[25] Phillips C. Cattle Behaviour and Welfare: 2nd Edition. 2007. https://doi.org/10.1002/

[27] Benham PFJ. Synchronization of behaviour in grazing cattle. Applied Animal Ethology.

[28] Michael BJ, John SF, Joseph HP. Facility and Climate Effect on Dry Matter Intake of Dairy Cattle. 5th Western Dairy Management Conference, April 4-6. Las Vegas: Nevada; 2001

[29] Gurney K. An introduction to neural networks. Neurocomputing (Vol. 14). 1997. https://

[30] Yu X, Efe MO, Kaynak O. A general backpropagation algorithm for feedforward neural networks learning. IEEE Transactions on Neural Networks/a Publication of the IEEE

[31] Laurene F. Fundamentals of Neural Network. Prentice Hall International, Inc, USA:

[32] Rajkumar T, Bardina J. Training Data Requirement for a Neural Network to Predict Aerodynamic Coefficient. NASA Ames research center, Moffet Field, California, USA

[33] Khaw JFC, Lim BS, Lim LEN. Optimal design of neural networks using the Taguchi method. Neurocomputing. 1995;**7**(3):225-245. https://doi.org/10.1016/0925-2312(94)00013-I

[34] Sugiono, Wu MH, Oraifige I. Applying the design of experiment (DoF) to optimise the NN architecture in the car body design system. In: Proceedings of 2011 17th International

[35] Sugiono, Wu MH, Oraifige I. Employ the Taguchi Method to Optimize BPNN's Architectures in Car Body Design System, American Journal of Computational and Applied Mathematics. 2012;**2**(4):140-151. https://doi.org/10.5923/j.ajcam.20120204.02 [36] Sugiono S, Soenoko R, Riawati L. Investigating the impact of physiological aspect on cow milk production using artificial intelligence. International Review of Mechanical

Neural Networks Council. 2002;**13**(1):251-254. https://doi.org/10.1109/72.977323

[26] Currie WB. Structure and Function of Domestic Animals. CRC Press; 1995

1982;**8**(4):403-404. https://doi.org/10.1016/0304-3762(82)90075-X

English and Issued in 2002. 122 pp

452 Artificial Intelligence - Emerging Trends and Applications

doi.org/10.1016/S0925-2312(96)00046-X

Florida Institute of Technology; 1994

Conference on Automation and Computing, ICAC 2011

Engineering. 2017;**11**(1). https://doi.org/10.15866/ireme.v11i1.9873

## *Edited by Marco Antonio Aceves-Fernandez*

Artificial intelligence (AI) is taking an increasingly important role in our society. From cars, smartphones, airplanes, consumer applications, and even medical equipment, the impact of AI is changing the world around us. The ability of machines to demonstrate advanced cognitive skills in taking decisions, learn and perceive the environment, predict certain behavior, and process written or spoken languages, among other skills, makes this discipline of paramount importance in today's world. Although AI is changing the world for the better in many applications, it also comes with its challenges. This book encompasses many applications as well as new techniques, challenges, and opportunities in this fascinating area.

Published in London, UK © 2018 IntechOpen © vchal / iStock

Artificial Intelligence - Emerging Trends and Applications

Artificial Intelligence

Emerging Trends and Applications

*Edited by Marco Antonio Aceves-Fernandez*