**2.1 Learning process**

An artificial neuron is composed of five main parts: inputs, weights, sum function, activation function and outputs. Inputs are information that enters the cell from other cells of from external world. Weights are values that express the effect of an input set or another process element in the previous layer on this process element. Sum function is a function that calculates the effect of inputs and weights totally on this process element. This function calculates the net input that comes to a cell (Topcu & Sardemir, 2007).

The information is propagated through the neural network layer by layer, always in the same direction. Besides the input and output layers there can be other intermediate layers of neurons, which are usually called hidden layers. Fig. 3 shows the structure of a typical neural network.

The inputs to the jth node are represented as an input factor, a, with component ai (i=1 to n), and the output by bj. The values w1j, w2j, …, and wnj are weight factors associated with each input to the node. This is something like the varying synaptic strengths of biological neurons. Weights are adaptive coefficients within the network that determine the intensity of the input signal. Every input (a1, a2, …, an) is multiplied by its corresponding weight factor (w1j, w2j, …, wnj), and the node uses this weighted input (w1j a1, w2j a2, …, wnj an) to perform further calculations. If the weight factor is positive, (wijai) tends to excite the node. If the weight factor is negative, (wijai) inhibits the node. In the initial setup of a neural

$$\text{Total Activity:}\tag{1}$$

$$\mathbf{x}\_{l} = \sum\_{l=1}^{n} \{w\_{l}\}.a\_{l} - T\_{l}\tag{1}$$

$$\mathbf{b}\_{\mathbf{i}} = f(\mathbf{x}\_{i}) = f\left(\sum\_{l=1}^{n} \{\mathbf{w}\_{l\mid l}\}, \mathbf{a}\_{l} - T\_{l}\right) \tag{2}$$

$$f(\mathbf{x}) = \mathbf{x} \tag{3}$$

$$\text{Log sigmoid: } f(\mathbf{x}) = 1/1 + e^{-\mathbf{x}} \tag{4}$$

Neural Network and Adaptive Neuro-Fuzzy

Inference System Applied to Civil Engineering Problems 477

squares error (MSE) quickly and efficiently. (Eberhard & Dobbins, 1990) recommended the number of hidden-layer nodes be at least greater than the square root of the sum of the number of the components in the input and output vectors. (Carpenter & Barthelemy, 1994; Hajela & Berke, 1991) suggested that the number of nodes in the hidden layer is between the

The number of nodes in the hidden layer will be selected according to the following rules:

1. The maximum error of the output network parameters should be as small as possible

Neural networks require that their input and output data are normalized to have the same order of magnitude. Normalization is very critical; if the input and the output variables are not of the same order of magnitude, some variables may appear to have more significance than they actually do. The normalization used in the training algorithm compensates for the order-of-differences in magnitude of variables by adjusting the network weights. To avoid such problems, normalization all input and output variables is recommended. The training patterns should be normalized before they are applied to the neural network so as to limit the input and output values within a specified range. This is due to the large difference in the values of the data provided to the neural network. Besides, the activation function used in the back-propagation neural network is a sigmoid function or hyperbolic tangent function. The lower and upper limits of the function are 0 and 1, respectively for sigmoid function and are -1 and +1 for hyperbolic tangent function. The following formula is used to pre-process the input data sets whose values are between -1 and 1(Baughman & Liu, 1995).

> ,max. ,min. 2. 1 *i i*

> > �������������

(6)

(7)

*i i x x*

*x x*

������� <sup>=</sup> ���������

The fuzzy set theory developed by (Zadeh, 1965) provides as a mathematical framework to deal with vagueness associated with the description of a variable. The commonly used

sum and the average of the number of nodes in the input and output layers.

2. The training epochs (number of iteration) should be as few as possible.

for both training patterns and testing patterns.

,min.

������: the minimum value of variable xi (input). ������: the maximum value of variable xi (input).

������: the minimum value of variable *it* (output). ������: the maximum value of variable *it* (output).

**3. Adaptive neuro-fuzzy inference system (ANFIS)** 

where:

where:

�������: the normalized variable.

, .

However, for the sigmoid function the following function might be used.

*i norm*

*x*

**2.4 Pre-process and post-process of the training patterns** 

Fig. 6. Commonly used transfer function

#### **2.2 Generalization**

After the training is completed, the network error is usually minimized and the network output shows reasonable similarities with the target output, and before a neural network can be used with any degree of confidence, there is a need to establish the validity of the results it generates. A network could provide almost perfect answers to the set of problems with which it was trained, but fail to produce meaningful answers to other examples. Usually, validation involves evaluating network performance on a set of test problem that were not used for training. Generalization (testing) is so named because it measures how well the network can generalize what it has learned and form rules with which to make decisions about data it has not previously seen. The error between the actual and predicted outputs of testing and training converges upon the same point corresponding to the best set of weight factors for the network. If the network is learning an accurate generalized solution to the problem, the average error curve for the test patterns decreases at a rate approaching that of the training patterns. Generalization capability can be used to evaluate the behavior of the neural network.

#### **2.3 Selecting the number of hidden layers**

The number of hidden layers and the number of nodes in one hidden layer are not straightforward to ascertain. No rules are available to determine the exact number. The choice of the number of hidden layers and the nodes in the hidden layer(s) depends on the network application. Determining the number of hidden layers is a critical part of designing a network and it is not straightforward as it is for input and output layers (Rafiq el at., 2001).

To determine the optimal number of hidden layers, and the optimal number of nodes in each layer, the network is to be trained using various configurations, and then to select the configuration with the fewest number of layers and nodes that still yields the minimum meansquares error (MSE) quickly and efficiently. (Eberhard & Dobbins, 1990) recommended the number of hidden-layer nodes be at least greater than the square root of the sum of the number of the components in the input and output vectors. (Carpenter & Barthelemy, 1994; Hajela & Berke, 1991) suggested that the number of nodes in the hidden layer is between the sum and the average of the number of nodes in the input and output layers.

The number of nodes in the hidden layer will be selected according to the following rules:


#### **2.4 Pre-process and post-process of the training patterns**

Neural networks require that their input and output data are normalized to have the same order of magnitude. Normalization is very critical; if the input and the output variables are not of the same order of magnitude, some variables may appear to have more significance than they actually do. The normalization used in the training algorithm compensates for the order-of-differences in magnitude of variables by adjusting the network weights. To avoid such problems, normalization all input and output variables is recommended. The training patterns should be normalized before they are applied to the neural network so as to limit the input and output values within a specified range. This is due to the large difference in the values of the data provided to the neural network. Besides, the activation function used in the back-propagation neural network is a sigmoid function or hyperbolic tangent function. The lower and upper limits of the function are 0 and 1, respectively for sigmoid function and are -1 and +1 for hyperbolic tangent function. The following formula is used to pre-process the input data sets whose values are between -1 and 1(Baughman & Liu, 1995).

$$\mathbf{x}\_{i,norm.} = \mathbf{2} \frac{\mathbf{x}\_i - \mathbf{x}\_{i,min.}}{\mathbf{x}\_{i,max.} - \mathbf{x}\_{i,min.}} - \mathbf{1} \tag{6}$$

where:

476 Fuzzy Inference System – Theory and Applications

0.5

1.0

y

0

x

b) A sigmoid transfer function

x

x

1.0

After the training is completed, the network error is usually minimized and the network output shows reasonable similarities with the target output, and before a neural network can be used with any degree of confidence, there is a need to establish the validity of the results it generates. A network could provide almost perfect answers to the set of problems with which it was trained, but fail to produce meaningful answers to other examples. Usually, validation involves evaluating network performance on a set of test problem that were not used for training. Generalization (testing) is so named because it measures how well the network can generalize what it has learned and form rules with which to make decisions about data it has not previously seen. The error between the actual and predicted outputs of testing and training converges upon the same point corresponding to the best set of weight factors for the network. If the network is learning an accurate generalized solution to the problem, the average error curve for the test patterns decreases at a rate approaching that of the training patterns. Generalization capability can be used to evaluate the behavior

0

y


c) A hyperbolic tangent transfer function

The number of hidden layers and the number of nodes in one hidden layer are not straightforward to ascertain. No rules are available to determine the exact number. The choice of the number of hidden layers and the nodes in the hidden layer(s) depends on the network application. Determining the number of hidden layers is a critical part of designing a network and it is not straightforward as it is for input and output layers (Rafiq el at., 2001). To determine the optimal number of hidden layers, and the optimal number of nodes in each layer, the network is to be trained using various configurations, and then to select the configuration with the fewest number of layers and nodes that still yields the minimum mean-

Fig. 6. Commonly used transfer function

a) A pure linear transfer function

0

y

**2.2 Generalization** 

of the neural network.

**2.3 Selecting the number of hidden layers** 

�������: the normalized variable. ������: the minimum value of variable xi (input). ������: the maximum value of variable xi (input).

However, for the sigmoid function the following function might be used.

$$O\_{l,norm} = \frac{\mathbf{t}\_l - \mathbf{t}\_{l,mtn}}{\mathbf{t}\_{l,max} - \mathbf{t}\_{l,mtn}} \tag{7}$$

where:

������: the minimum value of variable *it* (output). ������: the maximum value of variable *it* (output).
