**2. Application of Neural network in Clinical decision Support System**

These days the Artificial Neural Networks(ANN) have been widely used as tools for solving many decisions modeling problems. The various capabilities and properties of ANN like Non-parametric, Non-linearity, Input-Output mapping, Adaptivity make it a better alterna‐ tive for solving massively parallel distributive structure and complex task in comparison of statistical techniques, where rigid assumptions are made about the model. Artificial Neural Networks being non-parametric, makes no assumption about the distribution of data and thus capable of "letting the data speak for itself". As a result, they are natural choice for modeling complex medical problems where large database of relevant medical information are available [15].

In biomedicine, the assessment of vital functions of the body often requires noninvasive measurements, processing and analysis of physiological signals. Examples of physiological signals found in biomedicine include the electrical activity of the brain-the electroencephalo‐ gram (EEG), the electrical activity of the heart-the electrocardiogram (ECG), the electrical ac‐ tivity of the eye-i.e. PERG and EOG-respiratory signals, blood pressure and temperature signals [16].

Often, biomedical data are not well behaved. They vary from person to person, and are af‐ fected by factors such as medication, environmental conditions, age, weight, mental and physical state. Consequently, clinical expertise is often required for a proper analysis and in‐ terpretation of medical data. This has led to the integration of signal processing with intelli‐ gent techniques such as artificial neural networks (ANN), expert systems and fuzzy logic to improve performance [16].

ANN has been proposed as a reasoning tool to support clinical decision-making since 1959 [17]. Some problems encountered have led to significant developments in computer science, but it was only during the last decade of the last century that decision support systems have been routinely used in clinical practice on a significant scale [16].

The literature reports several applications of ANNs to the recognition of a particular pathol‐ ogy. For example, cancer diagnosis [18 and 19], automatic recognition of alertness and drowsiness from electroencephalography [20], predictions of coronary artery stenosis [21], analysis of Doppler shift signals [22 and 23], classify and predict the progression of thyroidassociated ophthalmopathy [24], diabetic retinopathy classification [25], saccade detection in EOG recordings [26] and PERG classification [22].

In this research we apply a hybrid of Neural Network and decision Tree to classify eye dis‐ eases according to patient complain, symptoms and physical eye examination. The aim is to help the ophthalmologist interpret the output of the examination systems easily and diag‐ nose the problem accurately [27-29].

#### **2.1. Artificial Neural Networks**

Artificial Neural networks learn by training on past experience using an algorithm which modifies the interconnection weight links as directed by a learning objective for a particular application. A *neuron* is a single processing unit which computes the weighted sum of its in‐ puts. The output of the network relies on cooperation of the individual neurons. The learnt knowledge is distributed over the trained networks weights. Neural networks are character‐ ized into feedforward and recurrent neural networks. Neural networks are capable of per‐ forming tasks that include pattern classification, function approximation, prediction or forecasting, clustering or categorization, time series prediction, optimization, and control. Feedforward networks contain an input layer, one or many hidden layers and an output lay‐ er. Fig. 1 shows the architecture of a feedforward network. Equation (1) shows the dynamics of a feedforward network.

$$\mathbf{S}^{\,I} \, \prescript{\,}{\,} \, \mathbf{S}^{\,I} \, \prescript{\,}{\,} \, \mathbf{S}^{\,I}\_{i} \sum\_{l=1}^{m} \mathbf{S}^{\,I}\_{i} \, \prescript{\,}{\mathbf{W}} \, \prescript{\,}{\,} \, \mathbf{W}^{\,I}\_{jl} \, \prescript{\,}{\,} \, \mathbf{O}^{\,I}\_{j} \, \Big\} \tag{1}$$

between the networks *output values* and *desired values* for those outputs. The goal of gradient descent learning is to minimize the sum of squared errors by propagating error signals back‐ ward through the network architecture upon the presentation of training samples from the training set. These error signals are used to calculate the *weight* updates which represent the knowledge learnt in the network. The performance of backpropagation can be improved by adding a momentum term and training multiple networks with the same data but different small random initializations prior to training. In gradient descent search for a solution, the network searches through a weight space of errors. A limitation of gradient descent is that it may get trapped in a local minimum easily. This may prove costly in terms for network

Neural Networks and Decision Trees For Eye Diseases Diagnosis

http://dx.doi.org/10.5772/51380

67

In the past, research has been done to improve the training performance of neural networks which has significance on its generalization. Symbolic or expert knowledge is inserted into neural networks prior to training for better training and generalization performance as dem‐ onstrated in [13]. The generalization ability of neural networks is an important measure of its performance as it indicates the accuracy of the trained network when presented with data not present in the training set. A poor choice of the network architecture i.e. the number of neurons in the hidden layer will result in poor generalization even with optimal values of its weights after training. Until recently neural networks were viewed as black boxes because they could not explain the knowledge learnt in the training process. The extraction of rules

from neural networks shows how they arrived to a particular solution after training.

**2.2. Knowledge Extraction from Neural Networks: Combining Neural Networks with**

In applications like credit approval and medical diagnosis, explaining the reasoning of the neural network is important. The major criticism against neural networks in such domains is that the decision making process of neural networks is difficult to understand. This is be‐ cause the knowledge in the neural network is stored as real-valued parameters (weights and biases) of the network, the knowledge is encoded in distributed fashion and the mapping learnt by the network could be non-linear as well as non-monotonic. One may wonder why neural networks should be used when comprehensibility is an important issue. The reason is that predictive accuracy is also very important and neural networks have an appropriate inductive bias for many machine learning domains. The predictive accuracies obtained with neural networks are often significantly higher than those obtained with other learning para‐

Decision trees have been preferred when a good understanding of the decision process is es‐ sential such as in medical diagnosis. Decision tree algorithms execute fast, are able to handle a high number of records with a high number of fields with predictable response times, han‐ dle both symbolic and numerical data well and are better understood and can easily be

The goal of knowledge extraction is to find the knowledge stored in the network's weights in symbolic form. One main concern is the fidelity of the extraction process, i.e. how accu‐ rately the extracted knowledge corresponds to the knowledge stored in the network. There

training and generalization performance.

**Decision trees**

digms, particularly decision trees.

translated into if-then-else rules.

where *S <sup>l</sup> <sup>j</sup>* is the output of the neuron j in layer*l*, *Si l*-1 is the output of neuron j in layer *l* - 1 (containing m neurons) and *W ji <sup>l</sup>* the weight associated with that connection with j. *<sup>θ</sup> <sup>j</sup> l* is the internal threshold/bias of the neuron and *gi* is the sigmoidal discriminant function.

**Figure 1.** The architecture of the feedforward neural network with one hidden layer.

Backpropagation is the most widely applied learning algorithm for neural networks. It learns the weights for a multilayer network, given a network with a fixed set of weights and interconnections. Backpropagation employs gradient descent to minimize the squared error between the networks *output values* and *desired values* for those outputs. The goal of gradient descent learning is to minimize the sum of squared errors by propagating error signals back‐ ward through the network architecture upon the presentation of training samples from the training set. These error signals are used to calculate the *weight* updates which represent the knowledge learnt in the network. The performance of backpropagation can be improved by adding a momentum term and training multiple networks with the same data but different small random initializations prior to training. In gradient descent search for a solution, the network searches through a weight space of errors. A limitation of gradient descent is that it may get trapped in a local minimum easily. This may prove costly in terms for network training and generalization performance.

In this research we apply a hybrid of Neural Network and decision Tree to classify eye dis‐ eases according to patient complain, symptoms and physical eye examination. The aim is to help the ophthalmologist interpret the output of the examination systems easily and diag‐

Artificial Neural networks learn by training on past experience using an algorithm which modifies the interconnection weight links as directed by a learning objective for a particular application. A *neuron* is a single processing unit which computes the weighted sum of its in‐ puts. The output of the network relies on cooperation of the individual neurons. The learnt knowledge is distributed over the trained networks weights. Neural networks are character‐ ized into feedforward and recurrent neural networks. Neural networks are capable of per‐ forming tasks that include pattern classification, function approximation, prediction or forecasting, clustering or categorization, time series prediction, optimization, and control. Feedforward networks contain an input layer, one or many hidden layers and an output lay‐ er. Fig. 1 shows the architecture of a feedforward network. Equation (1) shows the dynamics

*l*-1

*<sup>l</sup>* the weight associated with that connection with j. *<sup>θ</sup> <sup>j</sup>*

is the sigmoidal discriminant function.

) (1)

is the output of neuron j in layer *l* - 1

*l* is the

nose the problem accurately [27-29].

**2.1. Artificial Neural Networks**

66 Advances in Expert Systems

of a feedforward network.

(containing m neurons) and *W ji*

internal threshold/bias of the neuron and *gi*

where *S <sup>l</sup>*

*S l*

*<sup>j</sup>* is the output of the neuron j in layer*l*, *Si*

**Figure 1.** The architecture of the feedforward neural network with one hidden layer.

Backpropagation is the most widely applied learning algorithm for neural networks. It learns the weights for a multilayer network, given a network with a fixed set of weights and interconnections. Backpropagation employs gradient descent to minimize the squared error

*<sup>j</sup>* = *gi*(∑ *i*=1 *m Si l*-1 *W ji <sup>l</sup>* - *<sup>θ</sup> <sup>j</sup> l* In the past, research has been done to improve the training performance of neural networks which has significance on its generalization. Symbolic or expert knowledge is inserted into neural networks prior to training for better training and generalization performance as dem‐ onstrated in [13]. The generalization ability of neural networks is an important measure of its performance as it indicates the accuracy of the trained network when presented with data not present in the training set. A poor choice of the network architecture i.e. the number of neurons in the hidden layer will result in poor generalization even with optimal values of its weights after training. Until recently neural networks were viewed as black boxes because they could not explain the knowledge learnt in the training process. The extraction of rules from neural networks shows how they arrived to a particular solution after training.

#### **2.2. Knowledge Extraction from Neural Networks: Combining Neural Networks with Decision trees**

In applications like credit approval and medical diagnosis, explaining the reasoning of the neural network is important. The major criticism against neural networks in such domains is that the decision making process of neural networks is difficult to understand. This is be‐ cause the knowledge in the neural network is stored as real-valued parameters (weights and biases) of the network, the knowledge is encoded in distributed fashion and the mapping learnt by the network could be non-linear as well as non-monotonic. One may wonder why neural networks should be used when comprehensibility is an important issue. The reason is that predictive accuracy is also very important and neural networks have an appropriate inductive bias for many machine learning domains. The predictive accuracies obtained with neural networks are often significantly higher than those obtained with other learning para‐ digms, particularly decision trees.

Decision trees have been preferred when a good understanding of the decision process is es‐ sential such as in medical diagnosis. Decision tree algorithms execute fast, are able to handle a high number of records with a high number of fields with predictable response times, han‐ dle both symbolic and numerical data well and are better understood and can easily be translated into if-then-else rules.

The goal of knowledge extraction is to find the knowledge stored in the network's weights in symbolic form. One main concern is the fidelity of the extraction process, i.e. how accu‐ rately the extracted knowledge corresponds to the knowledge stored in the network. There are two main approaches for knowledge extraction from trained neural networks: (1) extrac‐ tion of 'if-then' rules by clustering the activation values of hidden state neurons and (2) the application of machine learning methods such as decision trees on the observation of inputoutput mappings of the trained network when presented with data. We will use decision trees for the extraction of rules from trained neural networks. The extracted rules will ex‐ plain the classification and categorization of different eye diseases according to symptoms.

niques to aid the description, categorisation and generalisation of a given set of data. Data

The dependent variable, Y, is the target variable that we are trying to understand, classify or generalise. The vector **x** is composed of the input variables, x1, x2, x3 etc., that are used for

DT offers a structured way of decision making [29,30]. A DT is characterized by an ordered set of nodes. Each of the internal nodes is associated with a decision function of one or more features.. DT approach can generate *if -then* rules. Specific DT methods include Classification and Regression Trees (CART), Chi Square Automatic Interaction Detection (CHAID), ID3 and C4.5. C4.5 which is the extension of ID3[31,32] is very useful in this work. C4.5 Decision Tree is based on Information theory, that is it uses information theory to select features which give the greatest information gain or decrease of entropy [31]. Information gain is the informational value of creating a branch in a decision tree based on the given attribute using

The eye is made up of numerous components. Figure 1 shows the anatomy of the eye.

**Figure 2.** Anatomy of the eye (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**•** Cornea: clear front window of the eye that transmits and focuses light into the eye

**•** Pupil: dark aperture in the iris that allows light to go through into the back of the eye

**•** Retina: nerve layer that lines the back of the eye, senses light, undergoes complex chemi‐ cal changes, and creates electrical impulses that travel through the optic nerve to the brain

**•** Iris: colored part of the eye that helps regulate the amount of light that enters

**•** Lens: transparent structure inside the eye that focuses light rays onto the retina

(*x*, *Y* )=(*x*1, *x*2, *x*3, …, *xk* , *Y* ) (2)

Neural Networks and Decision Trees For Eye Diseases Diagnosis

http://dx.doi.org/10.5772/51380

69

comes in records of the form:

that task.

entropy theory.

**2.4. Anatomy of the Eye**

In knowledge extraction using decision trees, the network is initially trained with the train‐ ing data set. After successful training and testing, the network is presented with another da‐ ta set which only contains inputs samples. Then the generalisation made by the network upon the presentation is noted with each corresponding input sample in this data set. In this way, we are able to obtain a data set with input-output mappings made by the trained net‐ work. The generalisation made by the output of the network is an indirect measure of the knowledge acquired by the network in the training process. Finally, the decision tree algo‐ rithm is applied to the input-output mappings to extract rules in the form of trees.

Decision trees are machine learning tools for building a tree structure from a training data‐ set of instances which can predict a classification given unseen instances. A decision tree learns by starting at the root node and selects the best attributes which splits the training data. The root node then grows unique child nodes using an entropy function to measure the information gained from the training data. This process continues until the tree structure is able to describe the given data set. Compared to neural networks, they can explain how they arrive to a particular solution. We will use decision trees to extract rules from the trained neural networks.

#### **2.3. Decision Tree**

A decision tree(DT) is a decision support tool that uses a tree-like graph or model of deci‐ sions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. Decision trees are commonly used in opera‐ tions research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. Another use of decision trees is as a descriptive means for calculating condi‐ tional probabilities.

Decision tree learning is a method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables. Each inte‐ rior node corresponds to one of the input variables; there are edges to children for each of the possible values of that input variable. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf.

A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node all has the same value of the target variable, or when splitting no longer adds value to the predictions. In data min‐ ing, trees can be described also as the combination of mathematical and computational tech‐ niques to aid the description, categorisation and generalisation of a given set of data. Data comes in records of the form:

$$\mathbf{x} \text{ (x. } Y \text{)=} \begin{pmatrix} \mathbf{x}\_{1'} \ \mathbf{x}\_{2'} \ \mathbf{x}\_{3'} \ \dots \ \mathbf{x}\_{k'} \ Y \end{pmatrix} \tag{2}$$

The dependent variable, Y, is the target variable that we are trying to understand, classify or generalise. The vector **x** is composed of the input variables, x1, x2, x3 etc., that are used for that task.

DT offers a structured way of decision making [29,30]. A DT is characterized by an ordered set of nodes. Each of the internal nodes is associated with a decision function of one or more features.. DT approach can generate *if -then* rules. Specific DT methods include Classification and Regression Trees (CART), Chi Square Automatic Interaction Detection (CHAID), ID3 and C4.5. C4.5 which is the extension of ID3[31,32] is very useful in this work. C4.5 Decision Tree is based on Information theory, that is it uses information theory to select features which give the greatest information gain or decrease of entropy [31]. Information gain is the informational value of creating a branch in a decision tree based on the given attribute using entropy theory.

#### **2.4. Anatomy of the Eye**

are two main approaches for knowledge extraction from trained neural networks: (1) extrac‐ tion of 'if-then' rules by clustering the activation values of hidden state neurons and (2) the application of machine learning methods such as decision trees on the observation of inputoutput mappings of the trained network when presented with data. We will use decision trees for the extraction of rules from trained neural networks. The extracted rules will ex‐ plain the classification and categorization of different eye diseases according to symptoms.

In knowledge extraction using decision trees, the network is initially trained with the train‐ ing data set. After successful training and testing, the network is presented with another da‐ ta set which only contains inputs samples. Then the generalisation made by the network upon the presentation is noted with each corresponding input sample in this data set. In this way, we are able to obtain a data set with input-output mappings made by the trained net‐ work. The generalisation made by the output of the network is an indirect measure of the knowledge acquired by the network in the training process. Finally, the decision tree algo‐

Decision trees are machine learning tools for building a tree structure from a training data‐ set of instances which can predict a classification given unseen instances. A decision tree learns by starting at the root node and selects the best attributes which splits the training data. The root node then grows unique child nodes using an entropy function to measure the information gained from the training data. This process continues until the tree structure is able to describe the given data set. Compared to neural networks, they can explain how they arrive to a particular solution. We will use decision trees to extract rules from the

A decision tree(DT) is a decision support tool that uses a tree-like graph or model of deci‐ sions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. Decision trees are commonly used in opera‐ tions research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. Another use of decision trees is as a descriptive means for calculating condi‐

Decision tree learning is a method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables. Each inte‐ rior node corresponds to one of the input variables; there are edges to children for each of the possible values of that input variable. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf.

A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node all has the same value of the target variable, or when splitting no longer adds value to the predictions. In data min‐ ing, trees can be described also as the combination of mathematical and computational tech‐

rithm is applied to the input-output mappings to extract rules in the form of trees.

trained neural networks.

**2.3. Decision Tree**

68 Advances in Expert Systems

tional probabilities.

The eye is made up of numerous components. Figure 1 shows the anatomy of the eye.

**Figure 2.** Anatomy of the eye (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).


**Figure 6.** Conjunctivitis (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

Neural Networks and Decision Trees For Eye Diseases Diagnosis

http://dx.doi.org/10.5772/51380

71

**Figure 7.** Uveitis (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**Figure 8.** Keratoconus (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**Figure 9.** Blepharitis (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**•** Vitreous: clear, jelly-like substance that fills the middle of the eye

#### **2.5. Some eye disease conditions**

Some eye disease conditions are shown in the figure3 to figure10 below:

**Figure 3.** Glaucoma (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**Figure 4.** Cataract (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**Figure 5.** Macular degeneration (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

Neural Networks and Decision Trees For Eye Diseases Diagnosis http://dx.doi.org/10.5772/51380 71

**Figure 6.** Conjunctivitis (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**•** Macula: small central area in the retina that contains special light-sensitive cells and al‐

**•** Optic nerve: connects the eye to the brain and carries the electrical impulses formed by

lows us to see fine details clearly

70 Advances in Expert Systems

**2.5. Some eye disease conditions**

the retina to the visual cortex of the brain

**•** Vitreous: clear, jelly-like substance that fills the middle of the eye

Some eye disease conditions are shown in the figure3 to figure10 below:

**Figure 3.** Glaucoma (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**Figure 4.** Cataract (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**Figure 5.** Macular degeneration (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**Figure 7.** Uveitis (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**Figure 8.** Keratoconus (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**Figure 9.** Blepharitis (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).

**S/N Eye Disease Signs**

**(patient complain, symptoms and eye condition or cause)**

Neural Networks and Decision Trees For Eye Diseases Diagnosis

http://dx.doi.org/10.5772/51380

73

shadows and color vision are less vivid, advanced age, bright light or antiglare sunglass may improve vision, poor night vision.

difficulty in reading, Trouble discerning colors, slow recovery of visual function after exposure to bright light, loss in contrast

dark floating sport in visual field, eye pain, blurred vision improves with blinking, discomfort after long period of concentrated use of eye(watching television, using computer or

heaviness in the eye, central visual loss, blind spot in view.

discharge from the eye, blurred vision, sensitivity to bright light, swollen eyelid, white or grey round spot on the cornea.

redness of eye, red and swollen eyelid, blurred vision, dry eye.

redness of eye, hereditary, aging may also cause it.

sensitivity, advanced age(66-74), Hereditary.

1 Cataracts Loss of visual acuity, loss of contrast sensitivity, contours

3 Macular Degeneration Blurred vision, distorted images, missing letters in words,

4 Pink eye(conjunctivitis) Red or pink color eye, itching, blurred image, gritty feeling, irritation, watering of eye

reading).

10 Color blindness Problem discerning colors, hereditary, aging.

vision

**Table 1.** Eye Diseases and their signs (patient complain, symptoms and eye condition).

12 Nearsightedness(myopia) Blurred vision at distant, good vision for close object.

13 Astigmatism Blurred vision, steamy appearing cornea, hereditary, may be corrected with contact lens

5 Uveitis Redness of eye, blurred vision, sensitivity to light(photophobia),

6 Retinal detachment Experience of flashes of light and floater in visual view, feeling

7 Corneal ulcer Redness of eye, pains of foreign bodies in the eye, pus/thick

8 Keratoconus Distorted vision, loss of vision focus, contact less could not improve vision

9 Blepharitis Burning of foreign bodies sensation, itching, sensitivity to light,

11 Farsightedness(hyperopia) Blurred vision for close object, aging, contact lens may improves

2 Glaucoma Painless, decrease in peripheral field of view, halos around light,

**Figure 10.** Corneal ulcer (Source: http://www.medicinenet.com/eye\_diseases\_pictur\_slideshow/article.htm#).
