**7. The example of neural network modelling**

Suppose we are to model a bakeout process in a vacuum chamber. The presented example should be intended for industrial circumstances, where one and the same production process is repeated in the vacuum chamber. Prior to the production process, the entire system must be degassed using an appropriate regime depending on the materials used. The model of degassing will be used only to monitor the eventual departure of the vacuum

Neural Networks and Static Modelling 17

Once input (time t) and output (pressure p) parameters are determined, the neural network architecture must be chosen. The undertaking theory suggests that one hidden layer neural network should be capable of producing the approximation. Nevertheless, from the practical point of view, it is a better choice to start with a neural network with more then one hidden layer. Let us start with two hidden layers and with 10 neurons in each hidden layer. One neuron is used for the input layer and one for the output layer. The configuration

The measured values are the training set, therefore, the neural network is trained to reproduce time–pressure pairs and to approximate pressure for any point in the space

Once the configuration is set and the training set prepared, the training process can start. It completes successfully when all the points from the training set are reproduced within the set tolerance (1% or other appropriate value). A certain number of training epochs is used

The result of training is a neural network model which is, of course, only one of many possible models. For any serious use of such model, the training process should be repeated several times and the outcomes should be carefully studied. With several repetitions of training process on the same training set, the training stability band can be determined. Fig. 7 shows the training stability band for the training set from the Table 1 and for the neural network configuration 1 10 10 1, where dots represent the measured points included in the

**0.00 100.00 200.00 300.00 400.00 500.00**

Fig. 7. The training stability band for the presented example. The measured points are

depicted with dots, and their numeric values are given in Table 1.

stability band lower boundary

*tim e[m in]*

s tability band upper boundary

architecture can be denoted as **1 10 10 1** (see Fig 2 right).

for the successful training completion.

between them.

training set.

**0.00**

**0.50**

**1.00**

**1.50**

*pressure*

**2.00**

**2.50**

**3.00**

chamber pressure temporal profile. Any difference in the manufacturing process as it is the introduction of different cleaning procedures might result in a different degassing profile. A departure from the usual degassing profile should result in a warning signal indicating possible problems in further production stages.

For the sake of simplicity, only two parameters will be observed, time *t* and pressure in the vacuum chamber *p.* Practically, it would be appropriate to monitor temperature, heater current, and eventual critical components of mass spectra as well.

First, it has to be determined which parameters are primary and which are their consequences. In the degassing problem, one of the basic parameters is the heater current, another being time. As a consequence of the heater current, the temperature in the vacuum chamber rises, the degassing process produces the increase of pressure and possible emergence of critical gasses etc. Therefore, the time and the heater current represent the input parameters for the neural network, while parameters such as temperature, pressure, and partial pressures are those to be modelled.

In the simplified model, we suppose that, during the bakeout process, the heater current always remains constant and is only switched on at the start and off after the bakeout. We also suppose that there is no need to model mass spectra and temperature. Therefore, there are two parameters to be modelled, time as the primary parameter and pressure as the consequence (of constant heater current and time).

Since the model will be built for the industrial use, its only purpose is to detect possible anomalies in the bakeout stage. The vacuum chamber is used to process the same type of objects, and no major changes are expected to happen in the vacuum system as well. Therefore, the bakeout process is supposed to be repeatable to some extent.


Several measurements are to be made resulting in the values gathered in Table 1..

Table 1. Measurements of bakeout of the vacuum system. The time–pressure pairs are the training set for the neural network.

The model is built solely on the data gathered by the measurements on the system. As mentioned before, neural networks do not need any further information on the modelled system.

chamber pressure temporal profile. Any difference in the manufacturing process as it is the introduction of different cleaning procedures might result in a different degassing profile. A departure from the usual degassing profile should result in a warning signal indicating

For the sake of simplicity, only two parameters will be observed, time *t* and pressure in the vacuum chamber *p.* Practically, it would be appropriate to monitor temperature, heater

First, it has to be determined which parameters are primary and which are their consequences. In the degassing problem, one of the basic parameters is the heater current, another being time. As a consequence of the heater current, the temperature in the vacuum chamber rises, the degassing process produces the increase of pressure and possible emergence of critical gasses etc. Therefore, the time and the heater current represent the input parameters for the neural network, while parameters such as temperature, pressure,

In the simplified model, we suppose that, during the bakeout process, the heater current always remains constant and is only switched on at the start and off after the bakeout. We also suppose that there is no need to model mass spectra and temperature. Therefore, there are two parameters to be modelled, time as the primary parameter and pressure as the

Since the model will be built for the industrial use, its only purpose is to detect possible anomalies in the bakeout stage. The vacuum chamber is used to process the same type of objects, and no major changes are expected to happen in the vacuum system as well.

Time [min] Numeric values of pressure

30 0.51 60 0.95 90 0.96 120 1.40 180 2.40 240 2.13 360 1.00 480 0.56 600 0.30

Table 1. Measurements of bakeout of the vacuum system. The time–pressure pairs are the

The model is built solely on the data gathered by the measurements on the system. As mentioned before, neural networks do not need any further information on the modelled

Therefore, the bakeout process is supposed to be repeatable to some extent.

Several measurements are to be made resulting in the values gathered in Table 1..

possible problems in further production stages.

and partial pressures are those to be modelled.

consequence (of constant heater current and time).

training set for the neural network.

system.

current, and eventual critical components of mass spectra as well.

Once input (time t) and output (pressure p) parameters are determined, the neural network architecture must be chosen. The undertaking theory suggests that one hidden layer neural network should be capable of producing the approximation. Nevertheless, from the practical point of view, it is a better choice to start with a neural network with more then one hidden layer. Let us start with two hidden layers and with 10 neurons in each hidden layer. One neuron is used for the input layer and one for the output layer. The configuration architecture can be denoted as **1 10 10 1** (see Fig 2 right).

The measured values are the training set, therefore, the neural network is trained to reproduce time–pressure pairs and to approximate pressure for any point in the space between them.

Once the configuration is set and the training set prepared, the training process can start. It completes successfully when all the points from the training set are reproduced within the set tolerance (1% or other appropriate value). A certain number of training epochs is used for the successful training completion.

The result of training is a neural network model which is, of course, only one of many possible models. For any serious use of such model, the training process should be repeated several times and the outcomes should be carefully studied. With several repetitions of training process on the same training set, the training stability band can be determined. Fig. 7 shows the training stability band for the training set from the Table 1 and for the neural network configuration 1 10 10 1, where dots represent the measured points included in the training set.

Fig. 7. The training stability band for the presented example. The measured points are depicted with dots, and their numeric values are given in Table 1.

Neural Networks and Static Modelling 19

0

situation significantly.

choose the configuration 1 5 15 20 1.

Fig. 9. Dependence of the average number of epochs versus the neural network configuration. The higher number of epochs needed usually means faster.

Fig. 10 is a representation of two properties: the average number of epochs (x axis), and the average training stability band width (y axis). Each point in the graph represents one neural network configuration. Interesting configurations are specially marked. It is interesting that the configuration with only one hidden layer performs very poorly. Even an increased number of neurons in the neural network with only one hidden layer does not improve the

If we seek for the configuration that will provide best results for the given trainig data set, we will try to provide the configuration that trains quickly (low number of epochs) and features the lowest possible training stability band width. Such configurations are 1 20 20 20 1 and 1 15 20 15 1. If we need the lowest possible training stability band-width then we will

**N N c on figurat io n**

50000

100000

150000

**average nr. of epochs**

200000

250000

If the training stability band is narrow enough, there is no real need to proceed with further optimisation of the model, otherwise the search for optimal or, at least, a sufficient model should continue.

It is interesting to observe the width of the training stability band for different neural network configurations. Fig. 8 provides the results of such study for the training set from the Table 1. It is important to notice that the diagram on Fig. 8 shows the average stability bandwidth for each neural network configuration. Compared to the values of the measured dependence, the average widths are relatively small, which is not always the case for maximal values of a training stability band. It would be more appropriate for certain problems to observe the maximal value of training stability band width instead of the average values.

On the other hand, the width of the training stability band is only one parameter to be observed. From the practical point of view, it is also important to know how time consuming the training procedure for different configurations really is. Fig. 9 provides some insights into this detail of neural network modelling. Even a brief comparison of Fig. 8 and Fig. 9 clearly shows that neural networks that enable a quick training process are not necessary those which also provide the narrowest training stability band.

Fig. 8. Dependence of the average width of the training stability band versus the neural network configuration.

If the training stability band is narrow enough, there is no real need to proceed with further optimisation of the model, otherwise the search for optimal or, at least, a sufficient model

It is interesting to observe the width of the training stability band for different neural network configurations. Fig. 8 provides the results of such study for the training set from the Table 1. It is important to notice that the diagram on Fig. 8 shows the average stability bandwidth for each neural network configuration. Compared to the values of the measured dependence, the average widths are relatively small, which is not always the case for maximal values of a training stability band. It would be more appropriate for certain problems to observe the maximal value of training stability band width instead of the

On the other hand, the width of the training stability band is only one parameter to be observed. From the practical point of view, it is also important to know how time consuming the training procedure for different configurations really is. Fig. 9 provides some insights into this detail of neural network modelling. Even a brief comparison of Fig. 8 and Fig. 9 clearly shows that neural networks that enable a quick training process are not

necessary those which also provide the narrowest training stability band.

should continue.

average values.

0 0.002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018

**average stability band width**

network configuration.

Fig. 8. Dependence of the average width of the training stability band versus the neural

**NN co nfigu ra tion**

Fig. 9. Dependence of the average number of epochs versus the neural network configuration. The higher number of epochs needed usually means faster.

Fig. 10 is a representation of two properties: the average number of epochs (x axis), and the average training stability band width (y axis). Each point in the graph represents one neural network configuration. Interesting configurations are specially marked. It is interesting that the configuration with only one hidden layer performs very poorly. Even an increased number of neurons in the neural network with only one hidden layer does not improve the situation significantly.

If we seek for the configuration that will provide best results for the given trainig data set, we will try to provide the configuration that trains quickly (low number of epochs) and features the lowest possible training stability band width. Such configurations are 1 20 20 20 1 and 1 15 20 15 1. If we need the lowest possible training stability band-width then we will choose the configuration 1 5 15 20 1.

Neural Networks and Static Modelling 21

The important conclusions from the presented example are that the nature of the modelled problem dictates which neural network configuration performs the most appropriate approximation, and that, for each data set to be modelled, a separate neural network

Aggelogiannaki, E., Sarimveis, H., Koubogiannis, D. (2007). Model predictive

Ait Gougam, L., Tribeche, M. , Mekideche-Chafa, F. (2008). A systematic investigation

Bahi, J.M., Contassot-Vivier , S., Sauget, M. (2009). An incremental learning algorithm for function approximation. *Advances in Engineering Software.* 40 pp 725–730 Belič, I. (2006). Neural networks and modelling in vacuum science. *Vacuum*. Vol. 80, pp

Bertels, K, Neuberg, L., Vassiliadis, S., Pechanek, D.G. (1998). Chaos and neural network

Caoa, F., Xiea, T., Xub, Z.(2008). The estimate for approximation error of neural networks: A

Goldberg, D.E. (1998). *Genetic Algorithms in Search, Optimization, and Machine Learning*.

Huang, W.Z., Huang, Y. (2008). Chaos of a new class of Hopfield neural networks. *Applied* 

Kurkova, V. (1995). Approximation of functions by perceptron networks with bounded

Mhaskar, H.N. (1996). Neural Networks and Approximation Theory. *Neural Networks,* 9, (4),

Šetina, J., Belič, I. (2006). Neural-network modeling of a spinning-rotor-gauge zero

Tikk, D., Kóczy, L.T., Gedeon, T.D. (2003). A survey on universal approximation and its

Yuan, Q., Li, Q., Yang, X.S. (2009). Horseshoe chaos in a class of simple Hopfield neural

Wang, J., Xub, Z. (2010). New study on neural networks: The essential order of

Wang, T., Wang, K., Jia, N. (2011). Chaos control and associative memory of a time-delay

Wena, C., Ma, X. (2008). A max-piecewise-linear neural network for function approximation.

correction under varying ambient temperature conditions. In: *JVC 11, 11th Joint Vacuum Conference*, September 24 - 28, 2006, Prague, Czech Republic. Programme

limits in soft computing techniques*. International Journal of Approximate Reasoning*.

globally coupled neural network using symmetric map. *Neurocomputing* 74 pp

learning – Some observations. *Neural Process. Lett*. 7, pp 69-80

number of hidden units. *Neural Networks*. 8 (5), pp 745-750

networks. *Chaos, Solitons and Fractals* 39 pp 1522–1529

approximation. *Neural Networks.* 23 pp 618-624

constructive approach. *Neurocomputing* . 71 pp 626–630

*Mathematics and Computation*. 206, (1) pp 1-11

and book of abstracts, pp 99-100

*Neurocomputing .* 71 pp 843–852

temperaturecontrol in long ducts by means of a neural network approximation

of a neural network for function approximation. *Neural Networks.* 21 pp 1311-

training performance analysis should be performed.

tool. *Applied Thermal Engineering.* 27 pp 2363–2369

**9. References** 

1317

1107-1122

Addison-Wesley

pp 721-722

33(2), pp 185-202

1673–1680

Fig. 10. The optimisation process seeks the neural network that uses both the lowest possible number of epochs while producing an approximation with the narrowest training stability band. Each dot on the graph represents one neural network configuration. Interesting configurations are highlighted.
