**4. Tuning neural network controller using classical approach**

The architecture shown in **Figure 1** assumes the role of two neural blocks. Indeed, the weights of the neural model are adjusted by the identification error *e k*ð Þ, however the weights of the neural controller are trained by the tracking error *ec*ð Þ*k* .

The multi-layer perceptron is used in the neural model and in the neural controller. Each block consists of three layers. The sigmoid activation function *s* is used for all neurons.

Concerning the neural network model, the *j th* output layer of the hidden layer is described as follows

$$h\_j = \sum\_{i=1}^{n\_1} w\_{ji} x\_i \quad j = 1, 2, \dots, n\_2 \tag{10}$$

where *n*<sup>1</sup> is the number of nodes of the input layer, *wji* is the hidden weight, *xi* is the input vector of the neural model, *<sup>x</sup>* <sup>¼</sup> ½ � *u k*ð Þ, *u k*ð Þ � <sup>1</sup> , *u k*ð Þ � <sup>2</sup> , … *<sup>T</sup>*, *u k*ð Þ is the control input to the system and *n*<sup>2</sup> is the number of nodes of the hidden layer given in the expression (3).

The output of the neural network model is given by the following equation

$$yr(k+1) = \lambda s \left(\sum\_{j=1}^{n\_2} w\_{1j} s(h\_j)\right) \tag{11}$$

*w*1*j*ð Þ¼ *k* þ 1 *w*1*j*ð Þþ *k* Δ *w*1*j*ð Þ*k* (19)

*∂h*<sup>1</sup> *∂w*1*<sup>j</sup>*

ð Þ *h*<sup>1</sup> *S Wx* ð Þ (21)

*vjix*1*<sup>i</sup> j* ¼ 1, … , *n*<sup>4</sup> (22)

*th* output layer of the hidden

(20)

(23)

(26)

where Δ*w*1*<sup>j</sup>* is

*DOI: http://dx.doi.org/10.5772/intechopen.96424*

layer is

Δ*w*1*<sup>j</sup>* ¼ �*η*

*hcj* <sup>¼</sup> <sup>X</sup>*n*<sup>3</sup> *i*¼1

weight and *x*1*<sup>i</sup>* is the input vector of the neural network controller

*<sup>x</sup>*<sup>1</sup> <sup>¼</sup> ½ � *r k*ð Þ,*r k*ð Þ � <sup>1</sup> ,*r k*ð Þ � <sup>2</sup> , … *<sup>T</sup>*, *r k*ð Þ is the desired value.

*j*¼1

*u k*ð Þ¼ *<sup>λ</sup><sup>c</sup> <sup>s</sup>* <sup>X</sup>*<sup>n</sup>*<sup>4</sup>

and *v*1*<sup>j</sup>* is the output weight.

where Δ*vji* is given by

Δ*vji* ¼ �*η<sup>c</sup>*

*∂Ec ∂ec ∂ec ∂y ∂yr ∂h*<sup>1</sup>

following equation

with

**11**

Concerning the neural network controller, the *j*

*∂E ∂w*1*<sup>j</sup>*

*Tuning Artificial Neural Network Controller Using Particle Swarm Optimization Technique…*

Δ*w*1*j*ð Þ¼ *k ηλe k*ð Þ*s*

where *n*<sup>3</sup> is the number of nodes of the input layer, *vji* is the hidden

*<sup>v</sup>*1*js hcj* � � ! <sup>¼</sup> *<sup>λ</sup><sup>c</sup> <sup>s</sup>* <sup>X</sup>*<sup>n</sup>*<sup>4</sup>

*u k*ð Þ¼ *<sup>λ</sup><sup>c</sup> s h*ð Þ¼ *<sup>c</sup>*<sup>1</sup> *<sup>λ</sup><sup>c</sup> s vT*

*S Vx* ð Þ¼ <sup>1</sup> *s h <sup>j</sup>*

*v*<sup>1</sup> ¼ *v*1*<sup>j</sup>* � �*<sup>T</sup>*

Concerning the hidden synaptic weights, they are updated by

The output of the neural network controller is given by the following equation

where *n*<sup>4</sup> is the number of nodes of the hidden layer, *λ<sup>c</sup>* is a scaling coefficient

*<sup>x</sup>*<sup>1</sup> <sup>¼</sup> *<sup>x</sup>*1*<sup>i</sup>* ½ �*<sup>T</sup>*, *<sup>i</sup>* <sup>¼</sup> 1, … , *<sup>n</sup>*3, *<sup>V</sup>* <sup>¼</sup> *vji* � �, *<sup>i</sup>* <sup>¼</sup> 1, … , *<sup>n</sup>*3, *<sup>j</sup>* <sup>¼</sup> 1, … , *<sup>n</sup>*4,

� � � � *<sup>T</sup>*

*∂h*<sup>1</sup> *∂s h <sup>j</sup>* � �

with *η<sup>c</sup>* is the learning rate, 0≤ *η<sup>c</sup>* ≤ 1 and the function cost defined as follows

The compact form of the output of the neural network controller is given by the

*j*¼1

<sup>1</sup> *S Vx* ð Þ<sup>1</sup>

, *j* ¼ 1, … , *n*4,

*vji*ð Þ¼ *k* þ 1 *vji*ð Þþ *k* Δ *vji*ð Þ*k* (25)

*∂h <sup>j</sup> ∂u*

*∂u ∂hc*<sup>1</sup> *∂hc*<sup>1</sup> *∂hcj*

*∂hcj ∂vji*

, *j* ¼ 1, … , *n*4*:*

*∂s h <sup>j</sup>* � � *∂h <sup>j</sup>*

*<sup>v</sup>*1*js* <sup>X</sup>*<sup>n</sup>*<sup>3</sup>

! !

*i*¼1

*vjix*1*<sup>i</sup>*

� � (24)

¼ �*η*

0

*∂E ∂e ∂e ∂h*<sup>1</sup>

where *w*1*<sup>j</sup>* is the weight from the hidden layer to the output layer and *λ* is a scaling coefficient. The compact form of the output is given by the following equation

$$yr(k+1) = \lambda \, s(h\_1) = \lambda \, s \left[ w\_1^T \mathbf{S}(W\mathbf{x}) \right] \tag{12}$$

with

$$\mathbf{x} = \begin{bmatrix} \mathbf{x}\_{i} \end{bmatrix}^{T}, i = \mathbf{1}, \dots, n\_{1},$$

$$\mathbf{W} = \begin{bmatrix} \boldsymbol{w}\_{\hat{\boldsymbol{\mu}}} \end{bmatrix}, i = \mathbf{1}, \dots, n\_{1}, j = \mathbf{1}, \dots, n\_{2},$$

$$\mathbf{S}(\mathbf{W}\mathbf{x}) = \begin{bmatrix} \mathbf{s}(\boldsymbol{h}\_{j}) \end{bmatrix}^{T}, j = \mathbf{1}, \dots, n\_{2},$$

$$\boldsymbol{w}\_{1} = \begin{bmatrix} \boldsymbol{w}\_{1\hat{\boldsymbol{\mu}}} \end{bmatrix}^{T}, j = \mathbf{1}, \dots, n\_{2}.$$

The incremental change of the hidden weights Δ*wij*, *i* ¼ 1, … , *n*<sup>1</sup> and *j* ¼ 1*::n*2, is

$$
\Delta w\_{ji} = -\eta \frac{\partial E}{\partial w\_{ji}} = -\eta \frac{\partial E}{\partial \epsilon} \frac{\partial \epsilon}{\partial h\_1} \frac{\partial h\_1}{\partial h\_j} \frac{\partial h\_j}{\partial w\_{ji}} \tag{13}
$$

$$
\Delta w\_{ji} = \eta \lambda^{\prime}(h\_1) S^{\prime}(\mathcal{W}\infty) w\_{\natural} \mathbf{x}^T e(k) \tag{14}
$$

with *η* is the learning rate, 0 ≤*η*≤1, *S*<sup>0</sup> ð Þ¼ *Wx diag s*<sup>0</sup> *h <sup>j</sup>* � � � � *<sup>T</sup>* , *j* ¼ 1, … , *n*2, *s* 0 ð Þ *h*<sup>1</sup> is the derivative of *s h*ð Þ<sup>1</sup> defined as follows

$$s'(h\_1) = s(h\_1)(1 - s(h\_1))\tag{15}$$

*e k*ð Þ is the identification error which is given by

$$e(k) = \mathcal{y}(k) - \mathcal{y}r(k)\tag{16}$$

and the function cost which is given by the following equation

$$E = \frac{1}{2} \sum\_{k=1}^{N} \left( e(k) \right)^2 = \frac{1}{2} \sum\_{k=1}^{N} \left( y(k) - yr(k) \right)^2 \tag{17}$$

where *N* is the number of observations.

The incremental change of the hidden weights Δ*wij* is used in the following equation

$$
\Delta w\_{ji}(k+1) = w\_{ji}(k) + \Delta \, w\_{ji}(k) \tag{18}
$$

However, the output weights are updated by the following equation

*Tuning Artificial Neural Network Controller Using Particle Swarm Optimization Technique… DOI: http://dx.doi.org/10.5772/intechopen.96424*

$$
\boldsymbol{w}\_{\mathbb{\dot{\mathcal{Y}}}}(\boldsymbol{k}+\mathbf{1}) = \boldsymbol{w}\_{\mathbb{\dot{\mathcal{Y}}}}(\boldsymbol{k}) + \Delta \,\boldsymbol{w}\_{\mathbb{\dot{\mathcal{Y}}}}(\boldsymbol{k})\tag{19}
$$

where Δ*w*1*<sup>j</sup>* is

where *n*<sup>1</sup> is the number of nodes of the input layer, *wji* is the hidden weight, *xi* is the input vector of the neural model, *<sup>x</sup>* <sup>¼</sup> ½ � *u k*ð Þ, *u k*ð Þ � <sup>1</sup> , *u k*ð Þ � <sup>2</sup> , … *<sup>T</sup>*, *u k*ð Þ is the control input to the system and *n*<sup>2</sup> is the number of nodes of the hidden layer given

*j*¼1

where *w*1*<sup>j</sup>* is the weight from the hidden layer to the output layer and *λ* is a scaling

coefficient. The compact form of the output is given by the following equation

*<sup>x</sup>* <sup>¼</sup> *xi* ½ �*<sup>T</sup>*, *<sup>i</sup>* <sup>¼</sup> 1, … , *<sup>n</sup>*1,

� � � � *<sup>T</sup>*

� �, *<sup>i</sup>* <sup>¼</sup> 1, … , *<sup>n</sup>*1, *<sup>j</sup>* <sup>¼</sup> 1, … , *<sup>n</sup>*2,

The incremental change of the hidden weights Δ*wij*, *i* ¼ 1, … , *n*<sup>1</sup> and *j* ¼ 1*::n*2, is

¼ �*η*

, *j* ¼ 1, … , *n*2,

*∂h*<sup>1</sup> *∂h <sup>j</sup>*

ð Þ¼ *Wx diag s*<sup>0</sup> *h <sup>j</sup>*

*∂h <sup>j</sup> ∂wji*

� � � � *<sup>T</sup>*

ð Þ¼ *h*<sup>1</sup> *s h*ð Þ<sup>1</sup> ð Þ 1 � *s h*ð Þ<sup>1</sup> (15)

*e k*ð Þ¼ *y k*ð Þ� *yr k*ð Þ (16)

*wji*ð Þ¼ *k* þ 1 *wji*ð Þþ *k* Δ *wji*ð Þ*k* (18)

ð Þ *Wx <sup>w</sup>*1*jxTe k*ð Þ (14)

ð Þ *y k*ð Þ� *yr k*ð Þ <sup>2</sup> (17)

, *j* ¼ 1, … , *n*2, *s*

, *j* ¼ 1, … , *n*2*:*

*∂E ∂e ∂e ∂h*<sup>1</sup>

*yr k*ð Þ¼ <sup>þ</sup> <sup>1</sup> *<sup>λ</sup> s h*ð Þ¼ <sup>1</sup> *<sup>λ</sup> s w<sup>T</sup>*

*W* ¼ *wji*

Δ*wji* ¼ �*η*

with *η* is the learning rate, 0 ≤*η*≤1, *S*<sup>0</sup>

the derivative of *s h*ð Þ<sup>1</sup> defined as follows

Δ*wji* ¼ *ηλs*

*s* 0

and the function cost which is given by the following equation

ð Þ *e k*ð Þ <sup>2</sup> <sup>¼</sup> <sup>1</sup>

However, the output weights are updated by the following equation

2 X *N*

The incremental change of the hidden weights Δ*wij* is used in the following

*k*¼1

*e k*ð Þ is the identification error which is given by

*<sup>E</sup>* <sup>¼</sup> <sup>1</sup> 2 X *N*

where *N* is the number of observations.

*k*¼1

*S Wx* ð Þ¼ *s h <sup>j</sup>*

*w*<sup>1</sup> ¼ *w*1*<sup>j</sup>* � �*<sup>T</sup>*

> *∂E ∂wji*

> > 0 ð Þ *h*<sup>1</sup> *S*<sup>0</sup>

*w*1*js h <sup>j</sup>* � � !

<sup>1</sup> *S Wx* ð Þ � � (12)

(11)

(13)

0 ð Þ *h*<sup>1</sup> is

The output of the neural network model is given by the following equation

*yr k*ð Þ¼ <sup>þ</sup> <sup>1</sup> *<sup>λ</sup> <sup>s</sup>* <sup>X</sup>*n*<sup>2</sup>

in the expression (3).

*Deep Learning Applications*

with

equation

**10**

$$
\Delta w\_{1\circ} = -\eta \frac{\partial E}{\partial w\_{1\circ}} = -\eta \frac{\partial E}{\partial \varepsilon} \frac{\partial \varepsilon}{\partial h\_1} \frac{\partial h\_1}{\partial w\_{1\circ}} \tag{20}
$$

$$
\Delta w\_{1j}(k) = \eta \lambda e(k) s'(h\_1) \mathbf{S}(\mathcal{W} \mathbf{x}) \tag{21}
$$

Concerning the neural network controller, the *j th* output layer of the hidden layer is

$$h\_{cj} = \sum\_{i=1}^{n\_{\mathcal{I}}} v\_{j\mathcal{I}} x\_{\mathcal{I}i} \quad j = 1, \ldots, n\_{\mathcal{I}} \tag{22}$$

where *n*<sup>3</sup> is the number of nodes of the input layer, *vji* is the hidden weight and *x*1*<sup>i</sup>* is the input vector of the neural network controller *<sup>x</sup>*<sup>1</sup> <sup>¼</sup> ½ � *r k*ð Þ,*r k*ð Þ � <sup>1</sup> ,*r k*ð Þ � <sup>2</sup> , … *<sup>T</sup>*, *r k*ð Þ is the desired value.

The output of the neural network controller is given by the following equation

$$\mu(k) = \lambda\_{\epsilon} \left( \sum\_{j=1}^{n\_4} \nu\_{\text{lj}} s(h\_{\epsilon j}) \right) = \lambda\_{\epsilon} s \left( \sum\_{j=1}^{n\_4} \nu\_{\text{lj}} s \left( \sum\_{i=1}^{n\_3} \nu\_{\text{ji}} \mathbf{x}\_{1i} \right) \right) \tag{23}$$

where *n*<sup>4</sup> is the number of nodes of the hidden layer, *λ<sup>c</sup>* is a scaling coefficient and *v*1*<sup>j</sup>* is the output weight.

The compact form of the output of the neural network controller is given by the following equation

$$\mu(k) = \lambda\_{\mathfrak{c}} \,\, \mathfrak{s}(h\_{\mathfrak{c}1}) = \lambda\_{\mathfrak{c}} \,\, \mathfrak{s} \left[ \boldsymbol{v}\_{1}^{T} \mathbf{S}(\mathsf{Vx}\_{1}) \right] \tag{24}$$

with

$$\mathbf{x}\_{1} = \begin{bmatrix} \mathbf{x}\_{1i} \end{bmatrix}^{T}, i = \mathbf{1}, \dots, n\_{3},$$

$$V = \begin{bmatrix} v\_{ji} \end{bmatrix}, i = \mathbf{1}, \dots, n\_{3}, j = \mathbf{1}, \dots, n\_{4},$$

$$\mathbf{S}(\mathbf{V}\mathbf{x}\_{1}) = \begin{bmatrix} s\begin{pmatrix} h\_{j} \end{pmatrix} \end{bmatrix}^{T}, \ j = \mathbf{1}, \dots, n\_{4},$$

$$\boldsymbol{\nu}\_{1} = \begin{bmatrix} \boldsymbol{\nu}\_{1j} \end{bmatrix}^{T}, j = \mathbf{1}, \dots, n\_{4}.$$

Concerning the hidden synaptic weights, they are updated by

$$
v\_{ji}(k+1) = v\_{ji}(k) + \Delta\ v\_{ji}(k)\tag{25}$$

where Δ*vji* is given by

$$
\Delta v\_{ji} = -\eta\_c \frac{\partial E\_c}{\partial \mathbf{e}\_c} \frac{\partial \mathbf{e}\_c}{\partial \mathbf{y}} \frac{\partial \mathbf{y}r}{\partial h\_1} \frac{\partial h\_1}{\partial \mathbf{s}(h\_j)} \frac{\partial \mathbf{s}(h\_j)}{\partial h\_j} \frac{\partial h\_j}{\partial u} \frac{\partial u}{\partial h\_{c1}} \frac{\partial h\_{c1}}{\partial h\_{cj}} \frac{\partial h\_{cj}}{\partial v\_{ji}} \tag{26}$$

with *η<sup>c</sup>* is the learning rate, 0≤ *η<sup>c</sup>* ≤ 1 and the function cost defined as follows

*Deep Learning Applications*

$$E\_c = \frac{1}{2} \sum\_{k=1}^{N} \left( e\_c(k) \right)^2 = \frac{1}{2} \sum\_{k=1}^{N} \left( y(k) - r(k) \right)^2 \tag{27}$$

where *N* is the number of observations and *ec*ð Þ*k* is the tracking error which is given by the following equation

$$
\sigma\_{\varepsilon}(k) = \mathcal{y}(k) - r(k) \tag{28}
$$

This approach has two drawbacks. First, to find a suitable fixed learning rate *η*ð Þ*k* (respectively *ηc*ð Þ*k* ), several tests are called which decreases the on-line operation. Second, when we use this type of derivative of sigmoid function, a large amount of error should not be spread to the weights of the output layer and the learning speed becomes very slow. In order to increase the learning speed, some of

*Tuning Artificial Neural Network Controller Using Particle Swarm Optimization Technique…*

**5. Tuning neural network controller using particle swarm optimization**

An alternative technique is proposed, in this section, to optimize the neural network controller by implementing Particle Swarm Optimization algorithm. This algorithm works like animal behavior on finding foods and avoiding danger, where they will coordinate with each other to find the best position to settle. Likewise, PSO is directed by the movement of the best individual from the population, known as the social compound, and their own experience, known as the cognitive compound. The algorithm moves the set of solutions to find the best solution among them.

In this study, the Particle Swarm Optimization Feedforward Neural Network (PSO NN) is applied to a multi-layered perceptron where the position of each particle, in a swarm, represents the set of synaptic weights of the neural network for the current iteration. The dimensionality of each particle is the number of synaptic

Let us consider a search space of dimension *D*. A particle *i* of the swarm is

*xij* ¼ ½ � *xi*1, *xi*2, … , *xiD*

*vij* ¼ ½ � *vi*1, *vi*2, … , *viD*

There is no concept of backpropagation in PSO NN where the direct neural network produces the learning error, objective function of each particle, based on the set of synaptic weights and biases, the positions of the particles. Each particle moves in the weighting space trying to minimize the learning error and keeps in

Changing the position means updating the synaptic weights of the neural network controller to generate the proper control law by reducing tracking error. In each iteration *k*, the particles update their position by calculating the new velocity and move to the new position. At the iteration ð Þ *k* þ 1 , the velocity vector is

*Pbestij* <sup>¼</sup> *pbesti*1, *pbesti*2, … , *pbestiD <sup>T</sup>* (39)

*Gbestij* <sup>¼</sup> *gbesti*1, *gbesti*2, … , *gbestiD <sup>T</sup>* (40)

memory the best position through which it passed, denoted

whereas the best position reached by the swarm is denoted

*<sup>T</sup>* (37)

*<sup>T</sup>* (38)

the new proposed approaches are proposed in the next section.

*DOI: http://dx.doi.org/10.5772/intechopen.96424*

**5.1 Mathematical formulation**

modeled by a position vector

calculated as follows:

**13**

and a velocity vector denoted

weights.

where *r k*ð Þ is the desired output. So Δ*vji* comes

$$
\Delta v\_{\vec{\mu}} = \eta\_c \lambda\_c \mathbf{e}\_\epsilon(\mathbf{k}) \mathbf{s}'(h\_1) w\_{\vec{\mu}} \mathbf{S}'(W\mathbf{x}) w\_{\vec{\mu}} \mathbf{s}'(h\_{c1}) v\_{1\vec{\mu}} \mathbf{S}'(V\mathbf{x}\_1) \mathbf{x}\_1^T \tag{29}
$$

with *S*<sup>0</sup> ð Þ¼ *Vx*<sup>1</sup> *diag s*<sup>0</sup> *h <sup>j</sup>* � � � � *<sup>T</sup>* , *j* ¼ 1, … , *n*4.

The output synaptic weights of the neural network controller are updated as

$$
\upsilon\_{1\circ}(k+1) = \upsilon\_{1\circ}(k) + \Delta\upsilon\_{1\circ}(k) \tag{30}
$$

where Δ*v*1*<sup>j</sup>* is given by

$$
\Delta \nu\_{1\dot{j}} = -\eta\_c \frac{\partial E\_c}{\partial \nu\_{1\dot{j}}} = -\eta\_c \frac{\partial E\_c}{\partial \varepsilon\_c} \frac{\partial \varepsilon\_c}{\partial h\_{c1}} \frac{\partial h\_{c1}}{\partial \nu\_{1\dot{j}}} \tag{31}
$$

So

$$\frac{\partial \mathcal{e}\_c(k)}{\partial h\_{c1}} = \frac{\partial (\mathcal{y}(k) - r(k))}{\partial h\_{c1}} = \frac{\partial \mathcal{y}(k)}{\partial h\_{c1}} = \frac{\partial \mathcal{y}(k)}{\partial u(k)} \frac{\partial u(k)}{\partial h\_{c1}}\tag{32}$$

and the Eq. (31) becomes

$$
\Delta \nu\_{\mathbf{j}\rangle} = -\eta\_c \frac{\partial E\_c}{\partial \mathbf{c}\_c} \frac{\partial \mathbf{y}(k)}{\partial u} \frac{\partial u(k)}{\partial h\_{c1}} \frac{\partial h\_{c1}}{\partial v\_{\mathbf{j}}} \tag{33}
$$

or in Eq. (33), *y k*ð Þ does not depend on *h*1, for this reason we use *yr k*ð Þ instead of *y k*ð Þ under the condition that the neural model is equal to the system behavior which gives

$$\frac{\partial \eta(k)}{\partial u} = \frac{\partial \eta r(k)}{\partial u} = \frac{\partial \eta r(k)}{\partial h\_1} \frac{\partial h\_1}{\partial s(h\_j)} \frac{\partial s(h\_j)}{\partial h\_j} \frac{\partial h\_j}{\partial u} \tag{34}$$

from where approximately

$$
\Delta \nu\_{\mathbf{l}\mathbf{j}} = -\eta\_c \frac{\partial E\_c}{\partial \mathbf{e}\_c(\mathbf{k})} \frac{\partial \mathbf{e}\_c(\mathbf{k})}{\partial \mathbf{y}(\mathbf{k})} \frac{\partial \mathbf{y}(\mathbf{k})}{\partial h\_1} \frac{\partial h\_1}{\partial \mathbf{s}(h\_j)} \frac{\partial \mathbf{s}(h\_j)}{\partial h\_j} \frac{\partial h\_j}{\partial \mathbf{u}(\mathbf{k})} \frac{\partial \mathbf{u}(\mathbf{k})}{\partial h\_{c1}} \frac{\partial \mathbf{h}\_{c1}}{\partial \nu\_{\mathbf{l}j}} \tag{35}
$$

the obtained incremental change Δ*v*1*<sup>j</sup>* is rewritten as

$$
\Delta v\_{1\circ} = \eta\_c \lambda\_c e\_c(k) s'(h\_1) w\_{1\circ} \mathbf{S}'(\mathcal{W}\mathbf{x}) w\_{j\circ} s'(h\_{c1}) \mathbf{S}(\mathcal{V}\mathbf{x}\_1) \tag{36}
$$

In this section, we used a fixed learning rate, *η*ð Þ*k* (respectively *ηc*ð Þ*k* ), and a derivative of sigmoid function *s* 0 ð Þ¼ *h*<sup>1</sup> *s h*ð Þ<sup>1</sup> ð Þ 1 � *s h*ð Þ<sup>1</sup> .

*Tuning Artificial Neural Network Controller Using Particle Swarm Optimization Technique… DOI: http://dx.doi.org/10.5772/intechopen.96424*

This approach has two drawbacks. First, to find a suitable fixed learning rate *η*ð Þ*k* (respectively *ηc*ð Þ*k* ), several tests are called which decreases the on-line operation. Second, when we use this type of derivative of sigmoid function, a large amount of error should not be spread to the weights of the output layer and the learning speed becomes very slow. In order to increase the learning speed, some of the new proposed approaches are proposed in the next section.
