**3. Development and implementation of ANN**

The origin of artificial neural networks is based on learning technique that mimics the biological learning process occurring in the brain. Neural networks present a robust way to predict an actual value after a learning activity from a supplied sample set [61]. The ANNs are based on a concept that combines a set of computational procedures with a theoretical basis in order to predict the unknown output parameter in various processes. Generally, neural networks are adopted to subordinate knowledge to observations or when data or activity is so complex that is not allows to identify an optimal solution in a reasonable time. It is difficult in each field of application and even for each task, to compare the use of neural networks versus other prediction techniques (e.g., statistical methods or a support vector machine) because, on the contrary to the conventional computational techniques, they are able to solve nonlinear and ill-defined problems. Many factors underlie this trend, most of them are related to reliability of the predictions, to the robustness and adaptability of the results as well as the learning ability of the neural process. In many cases the forecasts generated by ANNs, if correctly designed, significantly improve with increasing of dataset used as training subset. Consistently, in last years, the adoption of the ANNs in many business areas is increased exponentially and the number of publications, in high-level journals, was grown [62] (**Figure 3**).

**Figure 3.** Distribution of ANN-papers by year.

Under engineering perspective, a 'good' ANN is based on models able to imitating the proprieties of natural systems, such as cognitive capabilities, flexibility, robustness, ability to learn and fault tolerance. At this scope the structure and the behaviour of the ANN required a study characterized by different hierarchical levels of organization as neurons, layers, synapses and cognition-behaviour functions. Different areas of application are interested by the ANNs, some of them are astronomy, mathematics, physics chemistry, earth and space sciences, life and medical science and engineering. In recent years USA and EU countries, have approved different initiatives for the study of the human brain in these cases, the ANN, in various forms and at different levels, has been included in thesis research projects. The inter-disciplinary given by system adopted for dataset analysis and by the complexity computational required by the elaboration of the data, allows to design and simulate systems capable to satisfying the needs and the challenges of the real world. Japan in 2014 has been developed a project based named as Brain Mapping by Integrated Neurotechnologies for Disease Studies (Brain/ MINDS) [63], that will be integrated with new biomedical technologies and neural network systems. In Australia, a specific programme has also been set up with preliminary funds of around \$250 million over 10 years with the goal of developing the world's first bionic brain (AusBrain) [64] based on multilayer perceptron (MLP) system. There is also another ambitious initiative in China (Brainnetome) [65], the goals of this are to simulate the brain networks for perception, memory, emotion and their disorders as well as to develop advanced technologies to achieve these goals.

## **3.1. Designing the ANN**

allows real-time monitoring and estimation of quality variables such as the etching thick-

The origin of artificial neural networks is based on learning technique that mimics the biological learning process occurring in the brain. Neural networks present a robust way to predict an actual value after a learning activity from a supplied sample set [61]. The ANNs are based on a concept that combines a set of computational procedures with a theoretical basis in order to predict the unknown output parameter in various processes. Generally, neural networks are adopted to subordinate knowledge to observations or when data or activity is so complex that is not allows to identify an optimal solution in a reasonable time. It is difficult in each field of application and even for each task, to compare the use of neural networks versus other prediction techniques (e.g., statistical methods or a support vector machine) because, on the contrary to the conventional computational techniques, they are able to solve nonlinear and ill-defined problems. Many factors underlie this trend, most of them are related to reliability of the predictions, to the robustness and adaptability of the results as well as the learning ability of the neural process. In many cases the forecasts generated by ANNs, if correctly designed, significantly improve with increasing of dataset used as training subset. Consistently, in last years, the adoption of the ANNs in many business areas is increased exponentially and the

ness and the etching time [47–52].

206 Advanced Applications for Artificial Neural Networks

**Figure 3.** Distribution of ANN-papers by year.

**3. Development and implementation of ANN**

number of publications, in high-level journals, was grown [62] (**Figure 3**).

An ANN is a computational model that establishes a relationship between process factors and output variables. Artificial neurons are combined through weights, which work as adjustable coefficients. There are many programs and frameworks, either of general purpose or that simulates functions or neural structures (e.g., IQR, NeuroSpaces, NNET, etc.) but there is not a specific simulator that is currently being used by the whole community since some different approaches are more suitable than others, on the basis of the research task being addressed. Moreover, most simulators can take full advantage of their computational capabilities on the basis of the features of the computer hardware to which it is installed [66].

This correlation depends by the fundamental features of the network, which define the way input and output are connected to each other [67]. The network includes input layer, output layer and a certain number of hidden layers. The fundamental features of the network are:


In the following sections the data splitting strategy, the architecture design approach and the learning algorithm identification, is described.

#### *3.1.1. Dataset splitting*

The appropriate data splitting can be handled as a statistical sampling problem. Therefore, various classical sampling techniques can be adopted in order to split the data in three subset for training, validation and testing of ANN, most commons are: Simple random sampling (SRS), Trial-and-error methods, Systematic sampling and Convenience sampling. The splitting strategy tries to overcome the high variance of the SRS by repeating the random sampling several times in order to minimize the mean square error (MSE) of the ANN. This technique is high time-consuming and requires significant computational costs. A subset (generally as big as 60% of the available experimental data composed by inputs/output pairs) is used for the ANN training. In this phase, the synaptic weights, which are the links between neurons, have a synaptic weight attached. They are updated repeatedly in order to reduce the error between the experimental outputs and the associated forecasts. A subset (generally as big as 20% of the available experimental data) is adopted for the ANN validation. In particular the validation sets allows to identifying the underlying trend of the training data subset. A subset (generally as big as 20% of the available experimental data) is adopted for testing the forecast reliability of the ANN in the learning phase. In order to deal with the overfitting problem that occurs when the network has memorized the training examples, but it has not learned to generalize to new situations, different approaches are suggested: reduce the number of hidden layers, improve the 'quality' of the training-subset adopted, introduce some noisy data into training set, etc. In Ref. [67] an efficient method is proposed for model establishment by means the identification of a low-dimension ANN learning matrix through the principal component analysis (PCA).

#### *3.1.2. Network architectures*

An ANN is a computational model that establishes a relationship between process factors and output variables. Artificial neurons are combined through weights, which work as adjustable coefficients. This correlation depends by the fundamental features of the network, which define the way input and output are connected to each other [67]. The network includes input layer, output layer and a certain number of hidden layers (**Figure 4**). The fundamental steps for the development of a network are:


Based on the connection pattern (architecture), ANNs can be grouped into two categories:


**Figure 4.** Structure of an ANN.

*3.1.1. Dataset splitting*

208 Advanced Applications for Artificial Neural Networks

analysis (PCA).

*3.1.2. Network architectures*

for the development of a network are:

and the validation steps of the ANN development;

The appropriate data splitting can be handled as a statistical sampling problem. Therefore, various classical sampling techniques can be adopted in order to split the data in three subset for training, validation and testing of ANN, most commons are: Simple random sampling (SRS), Trial-and-error methods, Systematic sampling and Convenience sampling. The splitting strategy tries to overcome the high variance of the SRS by repeating the random sampling several times in order to minimize the mean square error (MSE) of the ANN. This technique is high time-consuming and requires significant computational costs. A subset (generally as big as 60% of the available experimental data composed by inputs/output pairs) is used for the ANN training. In this phase, the synaptic weights, which are the links between neurons, have a synaptic weight attached. They are updated repeatedly in order to reduce the error between the experimental outputs and the associated forecasts. A subset (generally as big as 20% of the available experimental data) is adopted for the ANN validation. In particular the validation sets allows to identifying the underlying trend of the training data subset. A subset (generally as big as 20% of the available experimental data) is adopted for testing the forecast reliability of the ANN in the learning phase. In order to deal with the overfitting problem that occurs when the network has memorized the training examples, but it has not learned to generalize to new situations, different approaches are suggested: reduce the number of hidden layers, improve the 'quality' of the training-subset adopted, introduce some noisy data into training set, etc. In Ref. [67] an efficient method is proposed for model establishment by means the identification of a low-dimension ANN learning matrix through the principal component

An ANN is a computational model that establishes a relationship between process factors and output variables. Artificial neurons are combined through weights, which work as adjustable coefficients. This correlation depends by the fundamental features of the network, which define the way input and output are connected to each other [67]. The network includes input layer, output layer and a certain number of hidden layers (**Figure 4**). The fundamental steps

• Dataset splitting, which identifies the subset data to be adopted for the training, the testing

• Architecture, which determines the connections between layers and neurons;

• The learning algorithm, which determines the weights of the links between neurons.

(e.g., competitive networks, Kohonen's SOM, Hopfield network, ART models).

Based on the connection pattern (architecture), ANNs can be grouped into two categories:

• Feed-forward networks (e.g., single-layer perceptron, multilayer perceptron, radial basis function nets) in which there are not network connection as loops (as shown in **Figure 4**); • Recurrent (or feedback) networks, in which different loops occur in network connections The first one network are considered "static", in fact they produce only one set of output values rather than a sequence of values from a given input, and they worked in memory-less condition, this means that their response to an input is independent of the previous network state. The 'recurrent networks', on the other hand, are dynamic systems, in which the input pattern leads the network to enter in a new state, when a new input is introduced. Most popular network architecture in use today is the multilayer perceptron neural network (feedforward network) where the output of a previous layer is the input to the next layer. In this case a biased sum of the weights assigned to different inputs, allows identifying the activation level that, through a transfer function, produces the corresponding output. The network thus has a simple interpretation as a form of input-output model, with the weights and thresholds (biases) the free parameters of the model [68]. The design of the ANN architecture consists of identifying the kind of the structure (between feed-forward and recurrent architectures) and identifies the number of hidden layers and the number of neurons for each layer. On one hand, many neurons can lead to memorize the training sets with lost of the ANN's capability to generalize. On the other hand, a lack of neurons can inhibit the appropriate pattern classification. Many software allows to identify the best number of hidden layer and neurons (for each layer) through a 'trial-and-error' approach. In this case different architectures are iterative tested by software and for each of them, the software provide a "fitness bar" based on the inverse of the mean absolute error (MAE) computed on the testing set. In most cases the higher "fitness bar" identifies the best architecture.

#### *3.1.3. Learning algorithm*

The purpose of the learning algorithm is to train the network to predict the output parameter(s) given one or more input parameter(s). There are many types of neural network learning rules. There are three kind of learning algorithm, the first is known as supervised learning, in this case the algorithm allows to predict the output parameter on the basis of a set of known inputoutput pairs [69, 70]. Second algorithm is unsupervised learning, in this case the output is not given, the aim consisting of inferring a function in order to describe a hidden structure (e.g., clustering, anomaly detection, etc.). Therefore the output parameters are considered 'unlabelled' (the observations are not classified) and is not provided any evaluation about the prediction reliability ensured by the ANN [71]. Third algorithm is named reinforcement learning, in this case a continue interaction between the learning system and the environment allows to identify the input-output mapping minimizing the performance scalar index. The approach is very similar to unsupervised learning (also in this case there are not given input-output pairs), reward or punishment signals are adopted for the prediction of output parameters [72]. In most cases, the unsupervised learning allows to ensuring lower cost function. Three different methods, usually considered to be supervised learning methods, are described in this work: Quick Propagation (QP), Conjugate Gradient (CG) and Levenberg-Marquardt algorithm (LM).

QP is a heuristic modification of the standard back propagation, the output of the *m*th output node for the *p*th input pattern is given by *opm* (Eq. (1)).

$$o\_{pw} = f\left(\sum\_{k=1}^{K} \overline{\omega}\_{km} o\_{pk}\right) \tag{1}$$

where *f* is the activation sigmoidal function (Eq. (2)), ¯¯ *<sup>ω</sup>km* is the weight between the *m*th output neuron and the *k*th hidden neuron. The value of *opk* depends by two parameters: the first is given by the weight between *k*th hidden neuron and the *n*th input neuron (¯¯ *<sup>ω</sup>nk*). The second parameter is *xpn* given by *p*th input pattern of *n*th neuron.

$$f(\mathbf{x}) = \frac{1}{(1 + \mathbf{c}^{\circ})} \tag{2}$$

All network weights are updated after presenting each pattern from the learning data set.

As far as concern CG method, the learning algorithm starts with a random weight vector that is iteratively updated according the direction of the greatest rate of decrease of the error evaluated as *ω*(*τ*) in Eq. (3).

$$
\Delta\omega^{(\rm tr)} = -\eta \nabla E\_{\omega(\rm tr)} \tag{3}
$$

where *E* is the error function evaluated at *ω*(*τ*) and *η* is the arbitrary learning rate parameter. For each step (*τ*) the gradient is re-evaluated in order to reduce *E.* The performance of the gradient descent algorithm is very sensitive to the proper setting of the learning rate, in case *η* is too high the algorithm can oscillate and become unstable, for *η* too small the algorithm takes too long to converge. In this case an adaptive learning rate allows to keep the learning step size as large as possible, ensuring, in this way, the learning rate stable. The LM algorithm allows to minimize the squares of the differences (*E*) between the desirable output, identified as *yd* (*t*), and the predicted output *yp* (*t*) [73]. '*E*' is given by the follow equation:

ANN Modelling to Optimize Manufacturing Process http://dx.doi.org/10.5772/intechopen.71237 211

$$E = \frac{1}{2} \sum\_{\mathbb{H}} \left( y\_p(t) - y\_d(t) \right)^2 \tag{4}$$

LM algorithm is also adopted, which blends the 'Steepest Descent' method and the 'Gauss-Newton', therefore it can converge well even if the error surface is much more complex than the quadratic situation; ensuring, in many cases, speed and stability. LM algorithm can be presented as:

$$w\_{k+1} = w\_k - \left(I\_k^T f\_k + \mu I\right)^{-1} f\_k e\_k \tag{5}$$

where *J* is Jacobian matrix, *μ* is the 'combination coefficient' (always positive), *I* is the identity matrix and *e* represents the error vector. When μ is very small (nearly zero), Gauss-Newton algorithm is used. On the other hand, when μ is very large, steepest descent method is used.
