3. Long-short-term memory neural network

LSTM is a recurrent neural network architecture originally designed for supervised time-series learning. It addresses the problem that errors propagated back in time tend to vanish in multilayer neural networks (MLPs). Enforcing a constant error flow in constant error carousels (CEC) is a solution for vanishing errors [7].

These CECs are processing units having linear activation functions that do not decay over time. CECs can become filled with useless information if access to them is not regulated; therefore, specialized multiplicative units called input gates regulate access to the CECs. Further, their access to activation of other network units is regulated by multiplicative units called output gates. In addition, forget gates are added to CECs in order to reset information that is no longer useful. A combination of a CEC and its input, output and forget gates is called a memory cell (Figure 6).

The activation updates at each time step t in this type of neural network are computed as follows. For the hidden unit activation yh , the output unit activation yk , the input gate activation yin, the output gate activation yout and the forget gate activation yφ, we have

$$y^i(t) = f\_i\left(\sum\_m w\_{im} y^m(t-1)\right) \tag{10}$$

where wim is the weight of the connection from unit m to unit i. For the activation function fi, the standard logistic sigmoid function for all units is chosen, except for output units, for which it is the identity function [7]. The CEC activation, also known as memory cell state, is calculated using

$$s\_{c\_j^v}(t) = y^{\wp\_j}(t)s\_{c\_j^v}(t-1) + y^{i\eta\_j}(t)g\left(\sum\_m w\_{c\_j^v m} y^m(t-1)\right) \tag{11}$$

where g is a logistic sigmoid function scaled to the [�2, 2] range and scv j ð Þ0 . Finally, the activation update for the memory cell output is calculated from

Figure 6. Graphic representation of a memory cell.

In this study, other experiments to prove the limitations of LEARCH were performed, in which we trained the system using nonrepresentative environment paths. That is, the paths taught by the expert traversed many cells of the environment that did not contain sufficient representative features of the environment or cells that did not have significant differences in cost. Figure 5 shows examples of these paths, which allowed the LEARCH system to acquire nonrepresentative knowledge that was then generalized over the cost map. The cost map at the left of Figure 5 is less generalized compared with the more descriptive costs shown on the

Figure 5. Left: cost map computed with five representative paths. Right: cost map computed with five non representative

Therefore, in order to address the problems with the LEARCH system, we propose the use of an LSTM as part of the system. Inclusion of an LSTM allows the navigation agent to learn navigation policies and complex traversability cost functions and, furthermore, to retain memory of the knowledge learned in the past navigation episodes for reuse during new episodes. The latter capability allows expensive retraining to be avoided when the navigation environment is similar to those already explored by the agent and allows hidden states of the extremely large state space represented by a nonstructured or rough terrain to be recognized.

LSTM is a recurrent neural network architecture originally designed for supervised time-series learning. It addresses the problem that errors propagated back in time tend to vanish in multilayer neural networks (MLPs). Enforcing a constant error flow in constant error carousels

These CECs are processing units having linear activation functions that do not decay over time. CECs can become filled with useless information if access to them is not regulated; therefore, specialized multiplicative units called input gates regulate access to the CECs. Further, their access to activation of other network units is regulated by multiplicative units called output gates. In addition, forget gates are added to CECs in order to reset information

map at the right of Figure 5.

114 Advanced Path Planning for Mobile Entities

paths.

We present the LSTM in the next section.

(CEC) is a solution for vanishing errors [7].

3. Long-short-term memory neural network

$$y^{\vec{c\_j}}(t) = y^{out}(t)h\left(s\_{\vec{c\_j}}(t)\right) \tag{12}$$

associated with a larger error. The desired MLP output was obtained using Eq. (14), and

e<sup>A</sup>ðs; <sup>a</sup>Þ=yvð Þ<sup>t</sup>

The complete learning process and the manner in which the LEARCH and RL-LSTM systems are connected is shown in Figure 7. The entire process occurs offline. First, the LEARCH

> Visitation count

Update cost function

Currents Advantage values to Navigation Policy

eAðs; <sup>a</sup>Þ=yvð Þ<sup>t</sup>

� <sup>þ</sup> <sup>β</sup>y<sup>v</sup>

(t) is used as the temperature of the Boltzmann action selection rule, which

∗

Compute ( )

( , )

( ) ( )

Compute reg. target

Train regressor

( , )

Compute

( )

Compute ∆ MLP

Compute ∆ ∆ MLP LSTM

( )

ð Þ t þ 1 (14)

http://dx.doi.org/10.5772/intechopen.71486

Path Planning in Rough Terrain Using Neural Network Memory

(15)

117

backpropagation was employed to train the MLP:

where n is the number of actions available to the agent.

,

Obsrve environment

F is obtained from the robotic agent and used by both systems.

F

The MLP output yv

has the form

yv

<sup>F</sup> computes <sup>∗</sup> computes <sup>∗</sup>

+1( )

LSTM

to reward

( )

Current cost map to costs of the environment

> MLP Compute action

Compute action's Compute cost reward

∆ LSTM

Figure 7. LEARCH-RL-LSTM system showing the manner in which the two systems are connected to train the LSTM. The entire process occurs offline. First, the LEARCH algorithm iterates until the required cost map M is obtained. Then, the RL-LSTM algorithm begins the process of training the LSTM using the costs converted into rewards r. The feature map

<sup>d</sup>ðÞ¼ <sup>t</sup> <sup>E</sup>TDð Þ<sup>t</sup> � � �

> Pn b¼1

The learning process implemented for LSTM in this paper is a variation of real-time recurrent learning (RTRL), as described in Ref. [7] which is a variation of [9]. In this variant, when the error arrives at a cell, it stops propagation further back in time. However, the error is used to update incoming weights when it leaves the memory cell through the input gate.
