**5. Training stability analysis**

The training process of a neural network is the process which, if repeated, does not lead to equal results. Each training process starts with different, randomly chosen connection weights so the training starting point is always different. Several repetitions of the training process lead to different outcomes. Stability of the training process refers to a series of outcomes and the maximal and minimal values, i.e., the band that can be expected for the particular solution. The narrower the obtained band, the more stable approximations can be expected (Fig. 4).

Neural Networks and Static Modelling 13

7. For each approximated point perform the test to see whether it lies within the established lower or upper boundary. If this is not the case, extend the particular boundary value.

**0.0 0 50.0 0 100 .00 150 .00 20 0.00 25 0.00 30 0.00 3 50.00 4 00.00**

Fig. 5. The training stability upper boundary change vs. the number of separate training experiments. The training stability band does not change significantly after the 100th training

The practical question is how many times should the training with the different starting values for the connection weights, be performed to assure that the training stability band no longer changes in its upper and lower boundaries. Fig. 5 shows the stability band upper boundary changes for each separate training outcome (the distance from existing boundary and the new boundary). The results are as expected, and more separate trainings are performed, lesser the stability band changes. The recording of the lower boundary change gives very similar results. Fig. 5 shows clearly that the changes after the 100th repetition of training do not bring significant changes in the training stability band. In our further experiments the number of repetitions was therefore fixed to 100. When dealing with any new modelling problem, the number of required training repetitions should be assessed.

Only one single run of a training process is not enough because the obtained model shows only one possible solution, and nothing is really known about the behaviour of the particular neural network modelling in conjunction with the particular problem. In our work, the spinning rotor gauge error extraction modelling was performed and results were very promising (Šetina, et.al., 2006). However, the results could not be repeated. When the training stability test is performed, the range of possible solutions becomes known, and further evaluation of adequacy of the used neural network system is possible. The training stability band is used instead of the worst case approximation analysis. In the case where the complete training stability range is acceptable for the modelling purposes, any trained

*num ber of performed tests*

6. Perform the new training process with another set of randomly chosen weights.

8. Repeat steps 6, 7 and 8 until the prescribed number of repetitions is achieved.

**0.00 E+00**

experiment.

neural network can be used.

**2.00E -03**

**4.00E -03**

**6.00E -03**

*distance from already established stability band*

**8.00E -03**

**1.00E -02**

**1.20E -02**

The research work on the stability band has been conducted on 160 measured time profile samples. For each sample, 95 different neural network configurations have been systematically tested. Each test has included 100 repetitions of the training process, and each training process required several thousand epochs. From the vast database obtained, statistically firm conclusions can be drawn. So much of work was necessary to prove the importance of the stability band. In practice users need to perform only one or two tests to get the necessary information.

Fig. 4. The different training processes produce different models. A possible outcome of an arbitrarily chosen training process falls within the maximal and minimal boundaries which define the training "stability band".

The procedure for the determination of the training stability band is the following:


The research work on the stability band has been conducted on 160 measured time profile samples. For each sample, 95 different neural network configurations have been systematically tested. Each test has included 100 repetitions of the training process, and each training process required several thousand epochs. From the vast database obtained, statistically firm conclusions can be drawn. So much of work was necessary to prove the importance of the stability band. In practice users need to perform only one or two tests to

**0.0 0 100.00 200.0 0 300.00 400.00 500.00**

Fig. 4. The different training processes produce different models. A possible outcome of an arbitrarily chosen training process falls within the maximal and minimal boundaries which

1. Perform the first training process based on the training data set and randomly chosen

2. Perform the modelling with the trained neural network in the data points between the

3. The result of the modelling is the first so-called reference model. This means that the model is used to approximate the unknown function in the space between the measured points. The number of approximated points should be large enough so the approximated function can be evaluated with sufficient density. Practically this means the number of approximated points should be at least 10 times larger than the number of measured data points (in the case presented, the ratio between the number of

5. Compare the outcomes of the new model with the previous one and set the minimal and maximal values for all approximated points. The first band is therefore obtained

The procedure for the determination of the training stability band is the following:

measured and the number of approximated points was 1:15).

and bordered by the upper and lower boundaries.

4. Perform the new training process with another set of randomly chosen weights.

*tim e [min]*

get the necessary information.

sta b ilit y b an d lo w er b ou n da ry

define the training "stability band".

neural network weights.

training data points.

sta b ility ba n d u pp e r bo u nd a ry

**0.00**

**1.00**

**2.00**

**3.00**

*pressure* 

**4.00**

**5.00**

**6.00**


Fig. 5. The training stability upper boundary change vs. the number of separate training experiments. The training stability band does not change significantly after the 100th training experiment.

The practical question is how many times should the training with the different starting values for the connection weights, be performed to assure that the training stability band no longer changes in its upper and lower boundaries. Fig. 5 shows the stability band upper boundary changes for each separate training outcome (the distance from existing boundary and the new boundary). The results are as expected, and more separate trainings are performed, lesser the stability band changes. The recording of the lower boundary change gives very similar results. Fig. 5 shows clearly that the changes after the 100th repetition of training do not bring significant changes in the training stability band. In our further experiments the number of repetitions was therefore fixed to 100. When dealing with any new modelling problem, the number of required training repetitions should be assessed.

Only one single run of a training process is not enough because the obtained model shows only one possible solution, and nothing is really known about the behaviour of the particular neural network modelling in conjunction with the particular problem. In our work, the spinning rotor gauge error extraction modelling was performed and results were very promising (Šetina, et.al., 2006). However, the results could not be repeated. When the training stability test is performed, the range of possible solutions becomes known, and further evaluation of adequacy of the used neural network system is possible. The training stability band is used instead of the worst case approximation analysis. In the case where the complete training stability range is acceptable for the modelling purposes, any trained neural network can be used.

Neural Networks and Static Modelling 15

theorem). If the sampled data is too sparse, the results of the modelling (in fact any kind of

1. Select the training data set which should be sampled adequately to represent the observed problem. Input as well as output (target) parameters should be defined. 2. Check the differences between the maximal and minimal values in the training set (for input and output values). If the differences are too high, the training set should be segmented (or logarithmed), and for each segment the separate neural network should

3. Set the first neural network which should consist of three layers (one being the hidden layer). Set the training tolerance parameter that satisfies the modelling needs. Choose an arbitrary number of neural cells in the hidden layer (let's say 10) and start the training procedure. Observe the way in which the neural network performs the training. If the training process shows a steady decrease of the produced training error, the chosen number of neural cells in the hidden layer is sufficient to build the model. If the output error shows unstable behaviour and it does not decrease with the number of epochs then the number of cells in the hidden layer might be too small and should be increased. If the speed of the neural network output error decreases too slowly for practical needs, then another hidden layer should be introduced. It is interesting that the increase of number of neural cells in one hidden layer does not significantly improve the speed of training convergence. Only addition in a new hidden layer can lower the training convergence speed. Addition of too many hidden layers also can not guarantee better training convergence. There is a certain limit to a training convergence speed that can be reached with an optimal configuration, and it depends on the

4. Perform the training stability test and decide whether further optimization is needed. It is interesting that the training stability band mostly depends on the problem being modelled and that the various neural network configurations do not alter it

5. At step 4, the design of the neural network is completed and it can be applied to model

Usually, one of the modelled parameters is time. In this case, one of the inputs represents the time points when the data samples were taken. Fig. 6 shows the model of degassing a vacuum system where the x axis represents the time in minutes and the y axis represents the total pressure in the vacuum system (in relative numerical values). In this case, the only input to the neural network was time and the target was to predict the pressure in the

Suppose we are to model a bakeout process in a vacuum chamber. The presented example should be intended for industrial circumstances, where one and the same production process is repeated in the vacuum chamber. Prior to the production process, the entire system must be degassed using an appropriate regime depending on the materials used. The model of degassing will be used only to monitor the eventual departure of the vacuum

modelling) will be poor, and the modelling tools can not be blamed for bad results.

The synthesis of a neural network should be performed in several steps:

be designed.

problem at hand.

significantly.

the problem

vacuum chamber.

**7. The example of neural network modelling** 

Fig. 6. The upper curve represents the band centre line, which is the most probable model for the given data, and the lower curve represents the band width. The points depict the measured data points that were actually used for training. The lower curve represents the width of the training stability band. The band width is the smallest at the measured points used for training.

When the band is too wide, more emphasis has to be given to find the training process that gives the best modelling. In this case, genetic algorithms can be used to find the best solution (or any other kind of optimization) (Goldberg, 1998).

Following a very large number of performed experiments it was shown that the middle curve (centre line) (Fig. 6) between the maximal and minimal curve is the most probable curve for the observed modelling problem. The training stability band middle range curve can be used as the best approximation, and the training repetition that produces the model closest to the middle curve should be taken as best.

Some practitioners use the strategy of division of the measured points into two sets: one is used for training purposes and the other for the model evaluation purposes. This is a very straightforward strategy, but it works well only when the number of the measured points is large enough (in accordance with the dynamics of the system–sampling theorem). If this is not the case, all measured points should be used for the training purposes, and an analysis of the training stability should be performed.
