Data Processing Using Artificial Neural Networks

*Wesam Salah Alaloul and Abdul Hannan Qureshi*

### **Abstract**

The artificial neural network (ANN) is a machine learning (ML) methodology that evolved and developed from the scheme of imitating the human brain. Artificial intelligence (AI) pyramid illustrates the evolution of ML approach to ANN and leading to deep learning (DL). Nowadays, researchers are very much attracted to DL processes due to its ability to overcome the selectivity-invariance problem. In this chapter, ANN has been explained by discussing the network topology and development parameters (number of nodes, number of hidden layers, learning rules and activated function). The basic concept of node and neutron has been explained, with the help of diagrams, leading to the ANN model and its operation. All the topics have been discussed in such a scheme to give the reader the basic concept and clarity in a sequential way from ANN perceptron model to deep learning models and underlying types.

**Keywords:** ANN, artificial neural network, node, network training, gradient descent, deep learning

#### **1. Introduction**

Artificial Intelligence (AI) is the knowledge domain that targets the development of computer systems to solve problems by giving them cognitive powers for performing tasks that usually require human intelligence. Hence, simulation of human intelligence, with computer programing and technologies, is the main objective of AI. Whereas, machine learning is one of the branches of AI, in which computer systems are programmed based on the data and type of input. Machine learning (ML) gives the capability to AI for solving problems based on available data. Likewise, artificial neural network (ANN) is an evolved method of ML algorithms, developed on a concept of imitating the human brain [1–3].

A single neuron is considered as a cell, processing electrochemical signals or nerve impulses, and the human brain is a complicated network of neurons that transfers information, with the help of various interlinked neurons. ANN models are considered as most popular among AI models because of their architecture, which is the collection of neurons linked with other neurons in various layers. ANN is non-linear and complex systems of neurons and neuron is a mathematical unit [4].

Literature depicts that ML, ANN and deep learning (DL) falls under the pyramid of AI and shown in **Figure 1**. Under ANN, DL has gained much importance among researchers. DL is a complex network set of ANN with various layers of processing, which improves the results by developing high levels of insight. DL methodologies

**Figure 2.**

*Comparison between DL and conventional ML.*

are popular due to their computational powers and handling of large data sets, and this makes them more attractive than conventional methods.

Literature depicts that, in the 1980s, very few researchers were working on deep

The architecture of ANN is stimulated by the framework of biological neurons, like in the human brain. The human brain is the composition of a vast number of the interlinked neurons forming a network. A neuron is like a cell, and each neuron executes a simple task, i.e., response to an input signal. Likewise, the ANN is a framework of interlinked nodes, similar to neurons, forming a network model. Hence in ANN, several artificial neurons are interlinked and become a robust computer-based tool that can handle large amounts of data to execute enormously simultaneous calculations using input data. ANN operations are not based on explicit rules and outputs are generated by trial and error procedures through sequential computations. The ANN is also classified as 'connectionism' because the given data is not conceded from neuron to neuron, but it is encoded in the complicated interconnected network of neurons, unlike the traditional computers [2, 11, 12].

NNs, and it gained popularity in the early 1990s. Since then, a large number of research articles have been published on applications of ANN and this journey is ongoing. The few significant milestones, after 1990, regarding ANN evolution is

shown in **Figure 4** [5–10].

*ANN evolution timeline (1938–1988).*

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

**Figure 3.**

**83**

**3. Basic architecture of ANN**

Past studies illustrated the comparison between DL and conventional ML methods for effective outputs, with the help of graphical representation, as shown in **Figure 2**. **Figure 2** illustrates the behaviour of curves, for DL and conventional ML, by comparing the accuracy of results (outputs) against the amount of data (input). The graph shows that the result accuracy of conventional ML methods is better for limited data, but it decreases as the amount of data is increased. Instead, the result accuracy of DL improves for large data sets, due to the presence of a vast neural network than conventional ML, hence, making DL more famous. DL is usually used for complicated tasks, such as image classification, image recognition, and handwriting identification [1, 3].

#### **2. History of ANN**

The origins of all the work on ANN are in neurobiological studies that date back to about a century ago. A brief overview of evolution in ANN and significant milestones are shown in the timeline, as shown in **Figures 3** and **4**.

#### *Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

#### **Figure 3.** *ANN evolution timeline (1938–1988).*

Literature depicts that, in the 1980s, very few researchers were working on deep NNs, and it gained popularity in the early 1990s. Since then, a large number of research articles have been published on applications of ANN and this journey is ongoing. The few significant milestones, after 1990, regarding ANN evolution is shown in **Figure 4** [5–10].

## **3. Basic architecture of ANN**

The architecture of ANN is stimulated by the framework of biological neurons, like in the human brain. The human brain is the composition of a vast number of the interlinked neurons forming a network. A neuron is like a cell, and each neuron executes a simple task, i.e., response to an input signal. Likewise, the ANN is a framework of interlinked nodes, similar to neurons, forming a network model. Hence in ANN, several artificial neurons are interlinked and become a robust computer-based tool that can handle large amounts of data to execute enormously simultaneous calculations using input data. ANN operations are not based on explicit rules and outputs are generated by trial and error procedures through sequential computations. The ANN is also classified as 'connectionism' because the given data is not conceded from neuron to neuron, but it is encoded in the complicated interconnected network of neurons, unlike the traditional computers [2, 11, 12].

are popular due to their computational powers and handling of large data sets, and

The origins of all the work on ANN are in neurobiological studies that date back

to about a century ago. A brief overview of evolution in ANN and significant

milestones are shown in the timeline, as shown in **Figures 3** and **4**.

Past studies illustrated the comparison between DL and conventional ML methods for effective outputs, with the help of graphical representation, as shown in **Figure 2**. **Figure 2** illustrates the behaviour of curves, for DL and conventional ML, by comparing the accuracy of results (outputs) against the amount of data (input). The graph shows that the result accuracy of conventional ML methods is better for limited data, but it decreases as the amount of data is increased. Instead, the result accuracy of DL improves for large data sets, due to the presence of a vast neural network than conventional ML, hence, making DL more famous. DL is usually used for complicated tasks, such as image classification, image recognition,

this makes them more attractive than conventional methods.

and handwriting identification [1, 3].

*Comparison between DL and conventional ML.*

*Dynamic Data Assimilation - Beating the Uncertainties*

**2. History of ANN**

**82**

**Figure 1.** *AI pyramid.*

**Figure 2.**

*Output* ¼ **Σ**ð*Weights* � *Inputs*Þ þ *Bias* (1)

The activation function is the second step; which converts the input signal, received from the summation function module and transformed it to an output of a

Generally speaking, each ANN has three main components, i.e., node character, network topology and the learning rules. The node character controls the processing of signals by determining the associated number of inputs and outputs, the associated weight for each input and output and the activation function, for each node. Learning rules establish the initiation and adjustment of weights. Whereas, the network topology defines the ways the nodes will be connected and organised (details are discussed in Section 3.2). The operation of the ANN model is computing the output of all the neurons, which is an entirely deterministic calculation [1, 2].

An activation function is a mathematical function. In simple words, it receives the output of the summation function as an input and converts that into the final

There are different types of activation functions, but non-linear functions are more popular than the linear function. A linear function is just a polynomial of one degree, and it is considered as single-layer ANN model has less power and limited complexity to process complicated data. Therefore, non-linear activation functions are mostly included in designing of ANN models for solving complex problems and

The activation function uses the value ξ = ΣWi.Xi as an input for processing and controlling the input Xi for activation of the neuron. The most commonly known

this unique quality makes ANN true universal function approximators.

node for an ANN model [1–3, 12, 13].

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

**Figure 6.**

**85**

*Generic ANN model.*

**3.1 The activation function**

output of a node with the help of ANN processing.

activation functions [1, 12–15] are shown in **Table 1**.

**Figure 5.** *Basic node model.*

To comprehend the basic structure of ANN, firstly, the understanding of 'node' is necessary. The generic model for a node is shown in **Figure 5**.

Each node receives various inputs through connections and transfers it to adjacent nodes. **Figure 6** represents the general model of ANN, which is stimulated by a biological neuron.

The nodes are arranged and organised into linear arrays known as layers. **Figure 6** shows that there are three layers in ANN called the input layer, the output layer and the hidden layer.

In the input layer X1, X2, X3, … Xn signifies several inputs to the network. Whereas, W1, W2, W3, … Wn are known as connection weights, which shows the strength of a particular node. In ANN, weights are considered as the most significant factors as these are numerical parameters that determine the effect of neurons to each other and also impact the output, by converting the input.

In the ANN, the processing part is performed in the hidden layer. The hidden layer executes two operational functions, i.e., summation function and transfer function, also known as an activation function. The summation function is the first step, and in this part, each input (Xi) to ANN is multiplied by its respective weight (Wi) and then, the products Wi.Xi is cumulated into the summation function ξ = ΣWi.Xi. 'B' is a bias value; this parameter is used to regulate the output of the neuron in association with the weighted sum of the inputs. This process is denoted as Eq. (1):

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

**Figure 6.** *Generic ANN model.*

$$Output = \Sigma(Weight \times Input) + Bias \tag{1}$$

The activation function is the second step; which converts the input signal, received from the summation function module and transformed it to an output of a node for an ANN model [1–3, 12, 13].

Generally speaking, each ANN has three main components, i.e., node character, network topology and the learning rules. The node character controls the processing of signals by determining the associated number of inputs and outputs, the associated weight for each input and output and the activation function, for each node. Learning rules establish the initiation and adjustment of weights. Whereas, the network topology defines the ways the nodes will be connected and organised (details are discussed in Section 3.2). The operation of the ANN model is computing the output of all the neurons, which is an entirely deterministic calculation [1, 2].

#### **3.1 The activation function**

An activation function is a mathematical function. In simple words, it receives the output of the summation function as an input and converts that into the final output of a node with the help of ANN processing.

There are different types of activation functions, but non-linear functions are more popular than the linear function. A linear function is just a polynomial of one degree, and it is considered as single-layer ANN model has less power and limited complexity to process complicated data. Therefore, non-linear activation functions are mostly included in designing of ANN models for solving complex problems and this unique quality makes ANN true universal function approximators.

The activation function uses the value ξ = ΣWi.Xi as an input for processing and controlling the input Xi for activation of the neuron. The most commonly known activation functions [1, 12–15] are shown in **Table 1**.

To comprehend the basic structure of ANN, firstly, the understanding of 'node'

Each node receives various inputs through connections and transfers it to adjacent nodes. **Figure 6** represents the general model of ANN, which is stimulated by a

The nodes are arranged and organised into linear arrays known as layers. **Figure 6** shows that there are three layers in ANN called the input layer, the output

In the input layer X1, X2, X3, … Xn signifies several inputs to the network. Whereas, W1, W2, W3, … Wn are known as connection weights, which shows the strength of a particular node. In ANN, weights are considered as the most significant factors as these are numerical parameters that determine the effect of neurons

In the ANN, the processing part is performed in the hidden layer. The hidden layer executes two operational functions, i.e., summation function and transfer function, also known as an activation function. The summation function is the first step, and in this part, each input (Xi) to ANN is multiplied by its respective weight (Wi) and then, the products Wi.Xi is cumulated into the summation function ξ = ΣWi.Xi. 'B' is a bias value; this parameter is used to regulate the output of the neuron in association with the weighted sum of the inputs. This process is denoted as Eq. (1):

is necessary. The generic model for a node is shown in **Figure 5**.

to each other and also impact the output, by converting the input.

biological neuron.

**Figure 4.**

**Figure 5.** *Basic node model.*

**84**

layer and the hidden layer.

*ANN evolution timeline (after 1988).*

*Dynamic Data Assimilation - Beating the Uncertainties*

**3.2 Network topology**

*Conceptual model for ANN topology.*

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

**Figure 7.**

path among the nodes [1, 2, 12].

*3.2.1 Perceptron and multi-layer architectures*

**3.3 Connection types between nodes**

**87**

The nodes are arranged and organised into linear arrays known as layers. The interconnecting network model, between the nodes of ANN, with each other, is called the topology (or architecture). ANN is composed of input layers, hidden layers and output layers, as already discussed in **Figure 6**. Also, the hidden layers can be from none to numerous, based on the model-complexity. Each layer is a combination of many nodes, and these nodes, based on some properties, can be grouped in layers. A single-layer ANN, with a single output, is known as Perceptron. A conceptual model for layers and ANN topology is shown in **Figure 7**. **Figure 7** shows n number of data entries in the input layer as X1, X2, … . Xn. Also, it can be seen that there is L number of hidden layers in the ANN model. Whereas, there are i number of nodes in each hidden layer. The notations 1 � 1, 1 � i, L � 1 and L � i, on each node giving its information, expressing 'L' as (hidden) layer number, i.e., from 1 to L and 'i' as node

number, i.e., from 1 to i. Y is the output for the mentioned ANN model.

inputs transcends a threshold point, the output is 1; otherwise, it is 0.

neural networks, but its problem-solving capabilities makes it unique [1, 14].

Designing of network topology is based on following factors; (1) the number of nodes in each layer, (2) the number of layers in the network and (3) the connected

A single-layered ANN, with a single output, is known as the perceptron. The perceptron mostly uses the step function, in which, if the computed sum of the

Multi-layer perceptrons (MLPs) are the most commonly used architecture for ANN. Composition ofMLPs contains layers of neurons with an input layer, an output layer, and the hidden layer (at least one). The layers of the perceptron are interlinked with each other by developing a multi-layered architecture, and this makes the model essentially complex for the ANN processing. The MLP terminology is originated from perceptron

The connections between nodes of ANN are classified into two categories: (1) the feedforward network, and (2) the feedback network or recurrent network.

**Table 1.** *Activation functions.*

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

**Figure 7.** *Conceptual model for ANN topology.*

#### **3.2 Network topology**

**Transfer functions**

Rectified linear unit (ReLU)

Hyperbolic tangent

**Table 1.**

**86**

*Activation functions.*

**Graphical presentation Numerical**

Linear *Y* ¼ *f*ð Þ¼ *ξ ξ* Output = Input.

Unit step *f*ð Þ¼ *ξ* 0

*Dynamic Data Assimilation - Beating the Uncertainties*

Sigmoid *<sup>f</sup>*ð Þ¼ *<sup>ξ</sup>* <sup>1</sup>

Gaussian *<sup>f</sup>*ð Þ¼ *<sup>ξ</sup> <sup>e</sup>*�ð Þ*<sup>ξ</sup>* <sup>2</sup> Named after the

**equation**

if *ξ*<0 *f*ð Þ¼ *ξ* 1 If *ξ*≥ 0

*f*ð Þ¼ *ξ* 0 if *ξ*<0 *f*ð Þ¼ *ξ ξ* If *ξ*≥ 0

*<sup>f</sup>*ð Þ¼ *<sup>ξ</sup>* <sup>2</sup> 1þ*e*�2*:<sup>ξ</sup>* - 1

**Remarks**

Range (�∞, +∞)

Useful for binary schemes. Range (0,1)

Most popular activation function since 2015. Range (0, ∞)

<sup>1</sup>þ*e<sup>ξ</sup>* Commonly used function. Range (0, 1)

> mathematician Carl Friedrich Gauss Range (0,1]

Alternative to sigmoid function. Range (�1, 1)

The nodes are arranged and organised into linear arrays known as layers. The interconnecting network model, between the nodes of ANN, with each other, is called the topology (or architecture). ANN is composed of input layers, hidden layers and output layers, as already discussed in **Figure 6**. Also, the hidden layers can be from none to numerous, based on the model-complexity. Each layer is a combination of many nodes, and these nodes, based on some properties, can be grouped in layers. A single-layer ANN, with a single output, is known as Perceptron. A conceptual model for layers and ANN topology is shown in **Figure 7**. **Figure 7** shows n number of data entries in the input layer as X1, X2, … . Xn. Also, it can be seen that there is L number of hidden layers in the ANN model. Whereas, there are i number of nodes in each hidden layer. The notations 1 � 1, 1 � i, L � 1 and L � i, on each node giving its information, expressing 'L' as (hidden) layer number, i.e., from 1 to L and 'i' as node number, i.e., from 1 to i. Y is the output for the mentioned ANN model.

Designing of network topology is based on following factors; (1) the number of nodes in each layer, (2) the number of layers in the network and (3) the connected path among the nodes [1, 2, 12].

#### *3.2.1 Perceptron and multi-layer architectures*

A single-layered ANN, with a single output, is known as the perceptron. The perceptron mostly uses the step function, in which, if the computed sum of the inputs transcends a threshold point, the output is 1; otherwise, it is 0.

Multi-layer perceptrons (MLPs) are the most commonly used architecture for ANN. Composition ofMLPs contains layers of neurons with an input layer, an output layer, and the hidden layer (at least one). The layers of the perceptron are interlinked with each other by developing a multi-layered architecture, and this makes the model essentially complex for the ANN processing. The MLP terminology is originated from perceptron neural networks, but its problem-solving capabilities makes it unique [1, 14].

#### **3.3 Connection types between nodes**

The connections between nodes of ANN are classified into two categories: (1) the feedforward network, and (2) the feedback network or recurrent network.

connections. It can be observed that node H2x1 is sending the information back to node H1x1 and the cycle goes on until the output will reach an equilibrium state,

interconnected path that drives it back to the starting neuron. It may cause a delay in specific time units, and this interconnected path is called a cycle [1, 2, 12]. This

The training of the ANN is accomplished through a learning process. While in the training process, weights are modified for attaining required results. In the training process, some sample data is processed to the network and weights are

The learning process is mostly classified into two categories: (1) supervised

In supervised learning, a training set is presented to the model. The training set constitutes of input examples and corresponding target outputs. The inputs are noted for the response of the network, and the weights between with networks are adjusted for error reduction, for the attainment of the desired output. The network follows successive iterations during this process until the computed result converges to the correct one. Construction of the training set requires special consideration. A training set is considered an ideal one, and it should be giving a better representation of the underlying model. Otherwise, a reliable model with desirable

In the supervised learning process, the networks are trained first before its operation in a model for predictive outputs. Significantly, when the network starts computing the intended outputs with the series of inputs, with fixed weights, then the ANN model can be set for the required operation. Few of the well-known algorithms with a supervised learning method are the Adaline (used for binary data), the Perceptron (used for continuous data), and the Madaline (developed

Reinforcement learning is a particular case scenario of supervised learning. It is, when the external environment only checks for the information for acceptance and

performing and the most active neuron connections for the input are strengthened over successive iterations. Few of the renown algorithms of reinforcement learning

rejection, instead of indicating the correct output. In this process, the well-

are the Boltzmann machine, the learning vector quantisation, and Hopfield

Supervised ANN models have many applications for image classification, plant control, forecasting, prediction, robotics, ECG signals classification and many

Unsupervised learning does not follow a training set or a targeted output approach. Instead, it trails the input data pattern of the underlying model. In this

i.e., with minimum error. In a feedback network, there exists at least one

process will be better understood, after going through the next section.

modified to attain better approximation of the desired output.

results cannot be achieved with an unrepresentative training set.

**4. Training of ANN (learning process)**

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

learning, and (2) unsupervised learning.

**4.1 Supervised learning**

from the Adaline).

networks.

**89**

more [19–21].

*4.1.1 Reinforcement learning*

**4.2 Unsupervised learning**

**Figure 8.** *Feedforward network connection.*

## *3.3.1 Feedforward networks*

Feedforward network is a one-way connection having no loop backwards. They are static in nature as their signal travels in one way only. **Figure 8** is a model example of feedforward networks.

#### *3.3.2 Feedback networks*

In feedback network, nodes have backward connected loops, and in these connections, the output of the nodes can be the input to the same level or previous nodes. Unlike the feedforward network, the feedback networks are dynamic. In feedback networks, signals are transmitted in forward as well as in backward directions [16]. Feedback process occurs when the output (partial or full) is channelled back into the input of a network as part of a repeated cause-and-effect process [17]. In the feedback network, a single input generates a series of outputs cycles until it reaches an equilibrium point. Equilibrium point refers to minimum error, i.e., for each predicted output if the error is enormous then, the output is routed back, and parameters (weights and biases) are modified until the error becomes minimum [18]. **Figure 9** shows the ANN model for feedback network

**Figure 9.** *Feedback network connection.*

connections. It can be observed that node H2x1 is sending the information back to node H1x1 and the cycle goes on until the output will reach an equilibrium state, i.e., with minimum error. In a feedback network, there exists at least one interconnected path that drives it back to the starting neuron. It may cause a delay in specific time units, and this interconnected path is called a cycle [1, 2, 12]. This process will be better understood, after going through the next section.

## **4. Training of ANN (learning process)**

The training of the ANN is accomplished through a learning process. While in the training process, weights are modified for attaining required results. In the training process, some sample data is processed to the network and weights are modified to attain better approximation of the desired output.

The learning process is mostly classified into two categories: (1) supervised learning, and (2) unsupervised learning.

#### **4.1 Supervised learning**

*3.3.1 Feedforward networks*

*Feedforward network connection.*

**Figure 8.**

**Figure 9.**

**88**

*Feedback network connection.*

*3.3.2 Feedback networks*

example of feedforward networks.

*Dynamic Data Assimilation - Beating the Uncertainties*

Feedforward network is a one-way connection having no loop backwards. They

are static in nature as their signal travels in one way only. **Figure 8** is a model

In feedback network, nodes have backward connected loops, and in these connections, the output of the nodes can be the input to the same level or previous nodes. Unlike the feedforward network, the feedback networks are dynamic. In feedback networks, signals are transmitted in forward as well as in backward directions [16]. Feedback process occurs when the output (partial or full) is channelled back into the input of a network as part of a repeated cause-and-effect process [17]. In the feedback network, a single input generates a series of outputs cycles until it reaches an equilibrium point. Equilibrium point refers to minimum error, i.e., for each predicted output if the error is enormous then, the output is routed back, and parameters (weights and biases) are modified until the error becomes minimum [18]. **Figure 9** shows the ANN model for feedback network

In supervised learning, a training set is presented to the model. The training set constitutes of input examples and corresponding target outputs. The inputs are noted for the response of the network, and the weights between with networks are adjusted for error reduction, for the attainment of the desired output. The network follows successive iterations during this process until the computed result converges to the correct one. Construction of the training set requires special consideration. A training set is considered an ideal one, and it should be giving a better representation of the underlying model. Otherwise, a reliable model with desirable results cannot be achieved with an unrepresentative training set.

In the supervised learning process, the networks are trained first before its operation in a model for predictive outputs. Significantly, when the network starts computing the intended outputs with the series of inputs, with fixed weights, then the ANN model can be set for the required operation. Few of the well-known algorithms with a supervised learning method are the Adaline (used for binary data), the Perceptron (used for continuous data), and the Madaline (developed from the Adaline).

#### *4.1.1 Reinforcement learning*

Reinforcement learning is a particular case scenario of supervised learning. It is, when the external environment only checks for the information for acceptance and rejection, instead of indicating the correct output. In this process, the wellperforming and the most active neuron connections for the input are strengthened over successive iterations. Few of the renown algorithms of reinforcement learning are the Boltzmann machine, the learning vector quantisation, and Hopfield networks.

Supervised ANN models have many applications for image classification, plant control, forecasting, prediction, robotics, ECG signals classification and many more [19–21].

#### **4.2 Unsupervised learning**

Unsupervised learning does not follow a training set or a targeted output approach. Instead, it trails the input data pattern of the underlying model. In this process, the ANN model adjusts its weights, against the supplied inputs, thus producing outputs similar to inputs. The model, without any outer support, recognises the patterns and differences in the inputs. In this process, the clusters are formed, each cluster consists of a group of several weights, in such a way that related input path results in a similar output. If any new pattern is detected during the iteration process, it is classified as a new cluster.

significant than a required number. Epoch is defined as a process of providing one pass or iteration of input through the network and modifying the weights. The optimal number of epochs can be determined by the comparison of training error

Simulation is the ultimate goal of applying ANN networks. It is the representa-

There are three types of sets in which sample data is distributed: (i) the training set, (ii) the validation set, and (iii) the testing set. The training set is used to train the ANN model; it is a set of sample data that is used to modify or adjust the weights in the ANN to produce the desired outcome. The validation set is used to inform the

The post-processing comprises of all the tests, which are applied on a specific network for the validation of results, also, to analyse, describe, and to improve its final performance. The comparison of results is achieved by using three different statistics. The first one is the root-mean-square error (RMSE), and it is described

P*<sup>n</sup>*

The second statistical factor is percentage volume error (%VE), which is the measuring of the absolute relative bias error of estimated values. It is formulated

> P*<sup>n</sup> i*¼1

s

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

*n*

*<sup>i</sup>* ð Þ *obsi* � *esti*

*obsi*�*esti obsi* � �

*n*

whereas, esti = ith estimated variable, obsi = ith observed data, and n = number

The third statistical factor is the correlation, and it is used in the measuring of

The term 'gradient descent' is a combination of two words the 'gradient', which means a slope and the 'descent', which means to incline. Therefore, with gradient descent, the slope of gradients is descended to find the lowest point with the

the linear correlation coefficient between the predicted and observed data. In case of unsatisfactory results in the post-processing, modification can be made in the following: (1) weights and biases, (2) number of hidden neurons,

2

(2)

(3)

ANN when training is to be terminated (when the minimum error point is achieved). The test set provides an entirely independent way of examining the precision of the ANN. The test set is a set of sample data that is used for the evaluation of the ANN model. A rule of thumb for this random split regarding

*RMSE* ¼

%*VE* ¼

(3) transfer functions, and (4) number of hidden layers [4, 25].

and model testing procedure.

tion of predicted output data for an ANN model.

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

percentage is 70, 15, 15%, respectively [3, 12, 14].

**5.4 Simulation**

**5.5 Post-processing**

in Eq. (2):

as Eq. (3):

of observed values.

**6. Gradient descent**

**91**

Autoencoders, Hebbian Learning, Deep Belief Nets, Self-Organising Map, Generative Adversarial Networks, and Algebraic Reconstruction Technique (ART) are the few most renown algorithms for unsupervised learning. Unsupervised ANN models are used in diagnosing diseases, image segmentation and many more. Unsupervised algorithms have become very useful and powerful tools in segmentation of magnetic resonance images for detection of anomalies in the body systems [1, 2, 4, 12, 14, 22–24].

## **5. Mapping by ANNs**

The primary reason for ANN popularity is due to approximated data output. There are five main steps for the approximation function in the ANN model, as given below.

#### **5.1 Data pre-processing**

In data pre-processing, the appropriate predictors are selected as inputs before processing to a network for mapping. There are three general processes in data pre-processing, mentioned as follows:


#### **5.2 Selection of network architecture**

A network architecture comprises several hidden neurons, the number of hidden layers, the flow of data, the way neurons are interconnected, and specific transfer functions. Recurrent neural networks, multi-layer perceptron (MLP), probabilistic neural networks, radial basis function networks, generalised regression neural networks and time-delay neural networks are the few of the renown architectures.

#### **5.3 Network training**

About function mapping, the training process is known as the calibration of the network through input and out pairs. During the training process, ANN might suffer from the overfitting and underfitting. The overall performance of the network decreases because of these two mentioned factors. This unfitting of the network, during the training process, can be managed by increasing the number of epochs, but it may result in network overfitting if the number of epochs is more

significant than a required number. Epoch is defined as a process of providing one pass or iteration of input through the network and modifying the weights. The optimal number of epochs can be determined by the comparison of training error and model testing procedure.

#### **5.4 Simulation**

process, the ANN model adjusts its weights, against the supplied inputs, thus producing outputs similar to inputs. The model, without any outer support, recognises the patterns and differences in the inputs. In this process, the clusters are formed, each cluster consists of a group of several weights, in such a way that related input path results in a similar output. If any new pattern is detected during the iteration

Autoencoders, Hebbian Learning, Deep Belief Nets, Self-Organising Map, Generative Adversarial Networks, and Algebraic Reconstruction Technique (ART) are the few most renown algorithms for unsupervised learning. Unsupervised ANN models are used in diagnosing diseases, image segmentation and many more. Unsupervised algorithms have become very useful and powerful tools in segmentation of magnetic resonance images for detection of anomalies in the body systems

The primary reason for ANN popularity is due to approximated data output. There are five main steps for the approximation function in the ANN model, as

In data pre-processing, the appropriate predictors are selected as inputs before processing to a network for mapping. There are three general processes in data

b. *Normalising*: It normalises a vector to have unity variance and zero mean

c. *Principal component analysis*: This process replaces the groups of related variables by new unrelated variables by detecting linear dependencies

A network architecture comprises several hidden neurons, the number of hidden layers, the flow of data, the way neurons are interconnected, and specific transfer functions. Recurrent neural networks, multi-layer perceptron (MLP), probabilistic neural networks, radial basis function networks, generalised regression neural networks and time-delay neural networks are the few of the renown architectures.

About function mapping, the training process is known as the calibration of the

network through input and out pairs. During the training process, ANN might suffer from the overfitting and underfitting. The overall performance of the network decreases because of these two mentioned factors. This unfitting of the network, during the training process, can be managed by increasing the number of epochs, but it may result in network overfitting if the number of epochs is more

a. *Standardising*: The input values are rescaled to a uniformed scale.

process, it is classified as a new cluster.

*Dynamic Data Assimilation - Beating the Uncertainties*

[1, 2, 4, 12, 14, 22–24].

**5. Mapping by ANNs**

**5.1 Data pre-processing**

value.

between them.

**5.3 Network training**

**90**

pre-processing, mentioned as follows:

**5.2 Selection of network architecture**

given below.

Simulation is the ultimate goal of applying ANN networks. It is the representation of predicted output data for an ANN model.

#### **5.5 Post-processing**

There are three types of sets in which sample data is distributed: (i) the training set, (ii) the validation set, and (iii) the testing set. The training set is used to train the ANN model; it is a set of sample data that is used to modify or adjust the weights in the ANN to produce the desired outcome. The validation set is used to inform the ANN when training is to be terminated (when the minimum error point is achieved). The test set provides an entirely independent way of examining the precision of the ANN. The test set is a set of sample data that is used for the evaluation of the ANN model. A rule of thumb for this random split regarding percentage is 70, 15, 15%, respectively [3, 12, 14].

The post-processing comprises of all the tests, which are applied on a specific network for the validation of results, also, to analyse, describe, and to improve its final performance. The comparison of results is achieved by using three different statistics. The first one is the root-mean-square error (RMSE), and it is described in Eq. (2):

$$RMSE = \sqrt{\frac{\sum\_{i}^{n} (obs\_i - est\_i)^2}{n}} \tag{2}$$

The second statistical factor is percentage volume error (%VE), which is the measuring of the absolute relative bias error of estimated values. It is formulated as Eq. (3):

$$\%VE = \frac{\sum\_{i=1}^{n} \left(\frac{obs\_i - est\_i}{obs\_i}\right)}{n} \tag{3}$$

whereas, esti = ith estimated variable, obsi = ith observed data, and n = number of observed values.

The third statistical factor is the correlation, and it is used in the measuring of the linear correlation coefficient between the predicted and observed data.

In case of unsatisfactory results in the post-processing, modification can be made in the following: (1) weights and biases, (2) number of hidden neurons, (3) transfer functions, and (4) number of hidden layers [4, 25].

#### **6. Gradient descent**

The term 'gradient descent' is a combination of two words the 'gradient', which means a slope and the 'descent', which means to incline. Therefore, with gradient descent, the slope of gradients is descended to find the lowest point with the

smallest error. It is an iterative process until the correction of the error in the ANN learning model. It is defined as during the backpropagation in the ANN model, the process of iteration keeps updating biases and weights with the error times derivative of the activation function. The steepest descent step size is substituted by a similar size from the previous step.

A gradient is the derivative of the activation function, as shown in **Figure 10**.

The primary purpose of using gradient descent is to find the overall cost minimum at each step, with the lowest error. Also, at this point, model predictions are more reliable because of upright fit data. Evaluation of slope can be done with the help of **Figure 11**, and Eq. (4) can be derived.

$$
\Delta \mathbf{x}\_i = -a \frac{d\mathbf{y}}{d\mathbf{x}\_i} \tag{4}
$$

whereas, α = learning rate and dy/dxi, also known as the partial derivative of y with respect to xi. For gradient descent, this equation can be used for each variable

The biases and weights are the parameters of the network that are required to be

*ij* <sup>þ</sup> *<sup>α</sup>* � *<sup>∂</sup>ep*

0 @

*<sup>∂</sup>W*ð Þ *<sup>L</sup> ij*

1

A (5)

(8)

*ep* (6)

� � (7)

adjusted before operating an ANN. These parameters can be modified by using either supervised or unsupervised approach for any ANN model. For training purpose, the supervised learning process is generally considered for determining biases and weights of an ANN network. The supervised training process of an ANN network could be attained by using delta rule. The delta rule is expressed as Wij

Gradient descent can be achieved either for the stochastic or full batch. In stochastic, gradient descent performs calculation for gradient by taking a single sample. Whereas, in full batch, the gradient is calculated for the full training dataset. One of the advantages of stochastic gradient descent is the fast calculation

when δy < 0 (δ is a partial derivative).

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

**6.1 Training algorithm by delta rule**

with the help Eqs. (5)–(7), as shown:

selected between 0 and 1 experimentally.

the chain rule gives Eq. (8):

**93**

*Wnew L*ð Þ

*ij* <sup>¼</sup> *<sup>W</sup>old L*ð Þ

*∂ep ∂* ð Þ *L Wij*

lead to the following weight calculating Eqs. (9) and (10):

¼ *∂ep ∂I L pj : ∂I* ð Þ *L pj ∂w*ð Þ *<sup>L</sup> ij*

This algorithm keeps the iterations continued until the expected output of network training is achieved. The basis for stopping the training process may be the minimum target value of performance function, the number of epochs and run time of the process; this is known as stopped training. The above mentioned equations

*<sup>E</sup>* <sup>¼</sup> <sup>1</sup> *n* X*<sup>n</sup> p*¼1

*ep* ¼ *tp* � *yp*

whereas, n = the number of pairs of data, W = the weight of the link between the ith neuron to the jth neuron in the Lth layer, E = the average error of estimation, tp = target output, yp = simulated output, α = learning rate, the value of which is

The backpropagation algorithm is mostly used for the application of delta rule for the training process of an ANN. The mathematical expression of delta rule is changed to computational relation because of the backpropagation algorithm, which can be applied through an iterative process. This process provides a way to the gradient for determining of the minimum error function, and it is efficiently calculated by using the chain rule of differentiation provided by the backpropagation algorithm. This characteristic makes this process to also be known as the generalised delta rule. In this algorithm, during each iteration, the network weights are shifted along with the negative of the gradient in the steepest descent direction of the performance function (epoch). For a certain weight in the Lth hidden layer,

of gradients [1, 13, 23].

**Figure 10.** *Gradient descent.*

**Figure 11.** *Slope computation.*

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

smallest error. It is an iterative process until the correction of the error in the ANN learning model. It is defined as during the backpropagation in the ANN model, the process of iteration keeps updating biases and weights with the error times derivative of the activation function. The steepest descent step size is substituted by a

A gradient is the derivative of the activation function, as shown in **Figure 10**. The primary purpose of using gradient descent is to find the overall cost minimum at each step, with the lowest error. Also, at this point, model predictions are more reliable because of upright fit data. Evaluation of slope can be done with the

*<sup>Δ</sup>xi* ¼ �*<sup>α</sup> dy*

*dxi*

(4)

similar size from the previous step.

**Figure 10.** *Gradient descent.*

**Figure 11.** *Slope computation.*

**92**

help of **Figure 11**, and Eq. (4) can be derived.

*Dynamic Data Assimilation - Beating the Uncertainties*

whereas, α = learning rate and dy/dxi, also known as the partial derivative of y with respect to xi. For gradient descent, this equation can be used for each variable when δy < 0 (δ is a partial derivative).

Gradient descent can be achieved either for the stochastic or full batch. In stochastic, gradient descent performs calculation for gradient by taking a single sample. Whereas, in full batch, the gradient is calculated for the full training dataset. One of the advantages of stochastic gradient descent is the fast calculation of gradients [1, 13, 23].

#### **6.1 Training algorithm by delta rule**

The biases and weights are the parameters of the network that are required to be adjusted before operating an ANN. These parameters can be modified by using either supervised or unsupervised approach for any ANN model. For training purpose, the supervised learning process is generally considered for determining biases and weights of an ANN network. The supervised training process of an ANN network could be attained by using delta rule. The delta rule is expressed as Wij with the help Eqs. (5)–(7), as shown:

$$\mathcal{W}\_{\vec{\eta}}^{new(L)} = \mathcal{W}\_{\vec{\eta}}^{old(L)} + a \left( -\frac{\partial\_{\epsilon\_p}}{\partial\_{\mathcal{W}\_{\vec{\eta}}^{(L)}}} \right) \tag{5}$$

$$E = \frac{1}{n} \sum\_{p=1}^{n} e\_p \tag{6}$$

$$e\_p = \left(t\_p - \mathcal{y}\_p\right) \tag{7}$$

whereas, n = the number of pairs of data, W = the weight of the link between the ith neuron to the jth neuron in the Lth layer, E = the average error of estimation, tp = target output, yp = simulated output, α = learning rate, the value of which is selected between 0 and 1 experimentally.

The backpropagation algorithm is mostly used for the application of delta rule for the training process of an ANN. The mathematical expression of delta rule is changed to computational relation because of the backpropagation algorithm, which can be applied through an iterative process. This process provides a way to the gradient for determining of the minimum error function, and it is efficiently calculated by using the chain rule of differentiation provided by the backpropagation algorithm. This characteristic makes this process to also be known as the generalised delta rule. In this algorithm, during each iteration, the network weights are shifted along with the negative of the gradient in the steepest descent direction of the performance function (epoch). For a certain weight in the Lth hidden layer, the chain rule gives Eq. (8):

$$\frac{\partial\_{\epsilon\_p}}{\partial\_{W\_{\vec{q}}}} = \frac{\partial\_{\epsilon\_p}}{\partial I\_{p\vec{j}}^L} \cdot \frac{\partial I\_{p\vec{j}}^{(L)}}{\partial w\_{\vec{ij}}^{(L)}}\tag{8}$$

This algorithm keeps the iterations continued until the expected output of network training is achieved. The basis for stopping the training process may be the minimum target value of performance function, the number of epochs and run time of the process; this is known as stopped training. The above mentioned equations lead to the following weight calculating Eqs. (9) and (10):

*Dynamic Data Assimilation - Beating the Uncertainties*

For the last layer

$$\mathcal{W}\_{ij}^{new} = \mathcal{W}\_{ij}^{old} + a \delta\_{pj} \mathcal{y}\_{pi}^{(L-1)} \tag{9}$$

For the hidden layer

$$\mathcal{W}^{new}\_{\vec{\eta}} = \mathcal{W}^{old}\_{\vec{\eta}} + a \delta^{(L)}\_{p\vec{\eta}} \mathcal{y}^{(L-1)}\_{p\vec{\imath}} \tag{10}$$

Following this procedure of training, based on the specific input vectors using the final derived weights and biases, the ANN model will be operated on sample data for initiation of simulation for the related outputs. The ANN training can be achieved either by batch training or incremental training. During the batch training process, the adjustment of biases and weights is attained after the presentation of all the inputs and targets. Whereas, during the incremental training, the adjustment of biases and weights is attained just after the presentation of individual input. In training, the process affects network performance. In the case of the low learning rate, the time required for learning the synaptic weights will be extremely long. On the other hand, if the set learning rate will be too high, this will tend the algorithm to oscillate, and the trained network performance will be reduced because the weight changes are too drastic. Therefore, the learning rate controls the convergence of the algorithm. These weight modifications can be applied after each pattern is completed, and these computed weight changes can be summed up to be applied to the network weights, as shown in Eq. (11):

$$
\Delta w\_{\vec{\eta}}^{l} = \sum\_{p=1}^{n} \Delta w\_{p\vec{\eta}}^{L} \tag{11}
$$

**Figure 12** depicts a DNNs model with numerous hidden layers. The outer layer of DNN mostly uses the softmax module for the solution of most of the classification problems. The softmax formula is also known as normalised exponential, is

> *exp a*ð Þ*<sup>i</sup> <sup>j</sup> exp a <sup>j</sup>*

whereas, j is the set of output nodes, ai is the net input to a particular output

DNNs models with non-linear behaviour can go up to several abstractions of levels that helps in decision making by transforming original data into higher abstract levels. This process streamlines finding the solution for non-linear and complex functions. Basis of DL is automated learning of features that offer the facility of transfer learning and modularity. Unlike conventional machine learning, training of DL networks requires a large amount of data. Convolutional neural network (CNN) and recurrent neural network (RNN) are the renown deep net-

CNN is the popular DL methodology, based on the animal's visual cortex. CNNs are very much similar to ANN that can be observed as the acyclic graph in the form of a well-arranged collection of neurons. Although, in CNNs, the neurons in the hidden layers are only interconnected with a subset of neurons in the preceding layer, unlike regular ANN model. This rare type of interconnectivity enables CNN models to learn the discreet features on an object. CNN models are used for face recognition, scene labelling, image classification, document analysis and many

The police department of the Penang Island, Malaysia had installed more than 500 CCTV cameras around the Island and many of them were equipped with face recognition technology, which was developed by IBM. Their main objective was to control crime and capture the wanted criminals [29]. Likewise, in China Pharmaceutical University, to control the student attendance and class discipline the university management installed the facial recognition system in the campus, including

� � (12)

*Yi* <sup>¼</sup> <sup>P</sup>

node, and Yi is the value of output node between range (0, 1).

**7.1 Convolutional neural network (CNN)**

given below in Eq. (12):

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

**Figure 12.** *DNNs generic model.*

works [27, 28].

more.

**95**

Usually, in dynamic networks, the inputs and targets are shown in sequence. In the adaptive learning process, the recent data, that is perceived before the time of simulation is considered as necessary as compared to all the data [4, 14, 26].

#### **7. Deep learning**

In the field of AI, deep learning (DL) has gained much popularity and trending for investigation domains. One of the foremost shortcomings of conventional machine learning is their inability to solve the selectivity-invariance problem, and because of this drawback, these methods have limited capability of data processing in their real state. Selectivity-invariance enables the model for the selection of those parameters that comprise of more information and disregard parameters with less information. This characteristic of DL, i.e., ability to overcome the selectivityinvariance dilemma, makes it more likeable among researchers and motivate them to the advancement of machine learning using the DL approach.

The architecture of DL is composed of various layers of trainable parameters, and this helps DL-based algorithms for excellent performance in machine learning and AI applications. DL algorithm is Deep Neural Networks (DNNs), and they usually use backpropagation optimised algorithms for end-to-end training. DNNs capability of selectivity-invariance extracts the compound features through successive layers of neurons equipped with differentiable, non-linear activation functions, and this provides a suitable platform for the backpropagation algorithm. A generic architectural model of DNNs is shown in **Figure 12**.

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

**Figure 12.** *DNNs generic model.*

For the last layer

For the hidden layer

**7. Deep learning**

**94**

*Wnew*

*Dynamic Data Assimilation - Beating the Uncertainties*

*Wnew*

applied to the network weights, as shown in Eq. (11):

*Δw<sup>l</sup>*

*ij* <sup>¼</sup> <sup>X</sup>*<sup>n</sup> p*¼1

Usually, in dynamic networks, the inputs and targets are shown in sequence. In the adaptive learning process, the recent data, that is perceived before the time of simulation is considered as necessary as compared to all the data [4, 14, 26].

In the field of AI, deep learning (DL) has gained much popularity and trending

The architecture of DL is composed of various layers of trainable parameters, and this helps DL-based algorithms for excellent performance in machine learning and AI applications. DL algorithm is Deep Neural Networks (DNNs), and they usually use backpropagation optimised algorithms for end-to-end training. DNNs capability of selectivity-invariance extracts the compound features through successive layers of neurons equipped with differentiable, non-linear activation functions, and this provides a suitable platform for the backpropagation algorithm. A generic

for investigation domains. One of the foremost shortcomings of conventional machine learning is their inability to solve the selectivity-invariance problem, and because of this drawback, these methods have limited capability of data processing in their real state. Selectivity-invariance enables the model for the selection of those parameters that comprise of more information and disregard parameters with less information. This characteristic of DL, i.e., ability to overcome the selectivityinvariance dilemma, makes it more likeable among researchers and motivate them

to the advancement of machine learning using the DL approach.

architectural model of DNNs is shown in **Figure 12**.

*Δw<sup>L</sup>*

*ij* <sup>¼</sup> *<sup>W</sup>old*

*ij* <sup>¼</sup> *<sup>W</sup>old*

*ij* þ *αδpjy*

*ij* <sup>þ</sup> *αδ*ð Þ *<sup>L</sup> pj y* ð Þ *L*�1

Following this procedure of training, based on the specific input vectors using the final derived weights and biases, the ANN model will be operated on sample data for initiation of simulation for the related outputs. The ANN training can be achieved either by batch training or incremental training. During the batch training process, the adjustment of biases and weights is attained after the presentation of all the inputs and targets. Whereas, during the incremental training, the adjustment of biases and weights is attained just after the presentation of individual input. In training, the process affects network performance. In the case of the low learning rate, the time required for learning the synaptic weights will be extremely long. On the other hand, if the set learning rate will be too high, this will tend the algorithm to oscillate, and the trained network performance will be reduced because the weight changes are too drastic. Therefore, the learning rate controls the convergence of the algorithm. These weight modifications can be applied after each pattern is completed, and these computed weight changes can be summed up to be

ð Þ *L*�1

*pi* (9)

*pi* (10)

*pij* (11)

**Figure 12** depicts a DNNs model with numerous hidden layers. The outer layer of DNN mostly uses the softmax module for the solution of most of the classification problems. The softmax formula is also known as normalised exponential, is given below in Eq. (12):

$$Y\_i = \frac{\exp\left(a\_i\right)}{\sum\_j \exp\left(a\_j\right)}\tag{12}$$

whereas, j is the set of output nodes, ai is the net input to a particular output node, and Yi is the value of output node between range (0, 1).

DNNs models with non-linear behaviour can go up to several abstractions of levels that helps in decision making by transforming original data into higher abstract levels. This process streamlines finding the solution for non-linear and complex functions. Basis of DL is automated learning of features that offer the facility of transfer learning and modularity. Unlike conventional machine learning, training of DL networks requires a large amount of data. Convolutional neural network (CNN) and recurrent neural network (RNN) are the renown deep networks [27, 28].

#### **7.1 Convolutional neural network (CNN)**

CNN is the popular DL methodology, based on the animal's visual cortex. CNNs are very much similar to ANN that can be observed as the acyclic graph in the form of a well-arranged collection of neurons. Although, in CNNs, the neurons in the hidden layers are only interconnected with a subset of neurons in the preceding layer, unlike regular ANN model. This rare type of interconnectivity enables CNN models to learn the discreet features on an object. CNN models are used for face recognition, scene labelling, image classification, document analysis and many more.

The police department of the Penang Island, Malaysia had installed more than 500 CCTV cameras around the Island and many of them were equipped with face recognition technology, which was developed by IBM. Their main objective was to control crime and capture the wanted criminals [29]. Likewise, in China Pharmaceutical University, to control the student attendance and class discipline the university management installed the facial recognition system in the campus, including the classrooms, labs, library and entrance gates. This overall improved the students' response towards academics [30]. Face recognition technology is based on deep CNN models. This process can be performed by using both supervised and unsupervised approaches but supervised methodologies are mostly preferred. Face recognition is performed by taking an input from video or image and detection is made by taking input to greyscale. The features in greyscale are applied one by one and compared with pixel values. The CNN models give high accuracy than past techniques by overcoming the problems, like light intensity and expressions, with the help of trained models using more training samples [31–33].

same was done for 'Validation' folder. In the classification and prediction process, the model output was analysed, for the effectiveness of the results, against two parameters: (1) effect of increasing the number of epochs per run, and (2) the

The effect of increasing the number of epochs on the model, for each run, is shown in **Table 2**. The effectiveness of the output is measured against the % accuracy, and % loss for different number epochs. The number of hidden layers for

**Table 2** clearly shows that an increasing number of epochs refines the output by

The effect of increasing the number of hidden layers on the model, for each run,

**Table 3** clearly shows that an increasing number of hidden layers increases the model effectiveness by increasing the accuracy and decreasing the data loss. The model gave one wrong prediction, when there were 2 hidden layers. Whereas, by increasing the number of hidden layers, the model started to predict correctly.

The output window from the model is shown in **Figure 13**. It can be seen that the

**Number of epochs % Accuracy % Loss Prediction** 74.14 56.41 Correct 81.25 43.44 Correct 100 27.77 Correct

**Number of hidden layers % Accuracy % Loss Prediction** 45.83 67 Incorrect 70 64.90 Correct 100 61.38 Correct

model successfully predicted the correct output ('Apple'). The accuracy of the model was increasing with each epoch from almost 37 to 89% and data loss was also decreasing, consecutively. The program code for this model is given in Appendix A.

increasing the accuracy and decreasing the data loss. The model gave a correct

is shown in **Table 3**. The effectiveness of the output is measured against the % accuracy, and % loss for various number hidden layers. The number of epochs for

number of hidden layers.

*8.1.1 Number of epochs per run*

*8.1.2 Number of hidden layers*

*8.1.3 Overall summary*

**Table 2.**

**Table 3.**

**97**

these tests were kept constant for each run.

*Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

these tests was kept constant for each run.

*Output summary for increasing number of epochs.*

*Output summary for increasing number of hidden layers.*

prediction of the fruit classification in all the runs.

#### **7.2 Recurrent neural network (RNN)**

RNNs are used for the tasks that require consecutive sequential inputs for processing. Initially, training of RNNs was done by using backpropagation. RNNs approach utilises one factor of input, at a time, in sequence by keeping state vector in their hidden nodes, in which implicitly within nodes contains information of all the past value of factors of that sequence. RNNs are dynamic and fairly powerful systems, but during the training process the problem occurs as in gradients of backpropagation algorithm either would shrink or grow at every time step, ultimately they might disappear after many cycles. If we explore RNN, deep feedforward networks will be found having all layers sharing the same weight. RNN lags to the capability of storing information for a long time, and deficiency is known as long-term dependencies. To control this shortcoming, one approach has been introduced with explicit memory known as long short-term memory (LSTM). In this method, particular hidden nodes are used to store the information in the form of input data for a much higher time. LSTM is very much recognised for the betterquality performance in speech recognition systems [1, 27, 28].

Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google's Assistant are the most popular voice recognizer tools and they are used for making a phone call, play reminders, alarms, provide driving directions and much more. The speech recognizers are developed on RNN networks, which are based on LSTM-RNN architecture. This gives the RNN models the ability to deal with long-distance patterns and makes them suitable for learning long-span relations. The models are trained endto-end and output is attained [34, 35]. Other few applications of RNN models are keyphrase recognition, meteorological data updating, speech to text [35–38]. Massachusetts Institute of Technology (MIT) had performed an interesting simulated study on self-driving cars, and its framework was also being developed on the deep reinforced model [39].

#### **8. Examples of ANN model using Python**

#### **8.1 Supervised ANN model**

A simple ANN model was developed using Python. The model was designed by using supervised CNN methodology for image classification. Images were collected for training and validation purpose of the model for apples and oranges. For training purpose, 20 images were collected for each (apple and orange), making a total of 40 images. For validation purpose, 10 more images were collected for each, making a total of 20 images. The data for the supervised process, of the ANN model, was arranged in a specific way with a separate folder for each process, i.e., training and validation. In a folder named as 'Training', images of each fruit were placed separately in the folders having their name titles, i.e., 'Apple' and 'Orange', and

#### *Data Processing Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.91935*

same was done for 'Validation' folder. In the classification and prediction process, the model output was analysed, for the effectiveness of the results, against two parameters: (1) effect of increasing the number of epochs per run, and (2) the number of hidden layers.
