5. Results

In this section, the results of the experimental tests are presented. Five environments were designed for navigation policy learning using the LEARCH-RL-LSTM system shown in Figure 7. Here, each environment was modeled as a grid, and each model was referred to as a map. Each map was a grid having dimensions of 20 20 cells. Further, each cell represented a patch of terrain, and this patch was represented by a vector of dimension 4, where each dimension was a scalar value representing the vegetation density, terrain slope, rock size or the presence of water. Figure 9 shows in the lower right corner the color code used to illustrate the manner in which this environment was designed. Each environment differed by 5% from the previous one, i.e. 20% of the states in map 5 differed from those of map 1. These maps are shown in Figure 9.

Table 1 lists the results of experiments conducted using the LEARCH system alone to learn the navigation policies and cost functions of the five maps. In order to test the capability of LEARCH to reuse knowledge learned in previous navigation episodes, the following process was employed. Once LEARCH learned the navigation policies and cost function of map 1, this knowledge was used as initial knowledge to start navigation episodes involving the remaining maps. As is apparent from the first row of Table 2, it was not necessary to retrain the LEARCH row shows, and it was not necessary to retrain the LEARCH system to learn the demonstrated behavior and cost function of map 2. In other words, the LEARCH system could apply the knowledge learned from map 1 to map 2. However, this behavior did not occur for the other maps. For maps 3, 4 and 5, and when attempting to reuse the knowledge learned from map 1, it was necessary to retrain the LEARCH system. In these learning episodes, an increased number of iterations were necessary in order to acquire the new knowledge (as is apparent when Table 1 is compared with row one on Table 2, it can be concluded that the previous knowledge learned using map 1 is even detrimental to the system performance when new

algorithm iterates until the required cost map M is obtained. Then, the RL-LSTM algorithm begins the process of training the LSTM using the costs converted into rewards r. The feature

Figure 8. (a) Example of a real environment modelled as a grid map. (b) Patches of terrain used for training are marked

In order to prove the generalization and long-term memory capabilities of LSTM, training was performed using patches of terrain containing representative features of rough terrain. That is, an entire map is not used to train the LSTM (Figure 8). In this way, an efficient training phase is achieved by taking advantage of the above-mentioned capabilities. In the next section, we show the results of the experiments conducted to confirm these capabilities. In addition, we prove the efficacy of the LSTM for mapping tasks that require inference of hidden states, i.e.

The LEARCH algorithm builds a cost function; however, as noted in Section 2, the cost function capability for generalization is limited and decays as the number of training paths grows. As the motivation for employing a cost function is to obtain the cost of traversing a patch of terrain so that the path planning system can compute the optimal path with the minimal traversal cost, we propose the extraction of terrain patches having descriptive characteristics of rough terrain for navigation. Hence, the traversal costs for these environment features can be determined using LEARCH, and the costs can be transformed to rewards for a

In this section, the results of the experimental tests are presented. Five environments were designed for navigation policy learning using the LEARCH-RL-LSTM system shown in Figure 7. Here, each environment was modeled as a grid, and each model was referred to as a map. Each map was a grid having dimensions of 20 20 cells. Further, each cell represented a patch of terrain, and this patch was represented by a vector of dimension 4, where each dimension was a scalar value representing the vegetation density, terrain slope, rock size or the presence of water.

map F is obtained from the robotic agent and used by both systems.

smoothing or noise recognition.

RL algorithm [10].

with an orange box.

118 Advanced Path Planning for Mobile Entities

5. Results

Figure 9. Maps used in experiments. Lower right corner of the second row: colour code used to represent environment features.


Table 1. Iterations needed to learn demonstrated behavior using LEARCH system.


Table 2. Iterations needed to learn demonstrated behavior using knowledge of map 1, for LEARCH and LEARCH-RL-LSTM system.

maps are processed, even if the new maps are very similar to map 1. Therefore, the LEARCH system was shown to have a very poor generalization capability.

When RL-LSTM was integrated with LEARCH to improve the capability for reusing knowledge learned from previous navigation episodes, there was no need to retrain the system. This is apparent from the second row of Table 2, where all the demonstrated behavior for maps 2 to 5 could be learned using the knowledge learned from map 1 only. It is important to note that, although the results were obtained from relatively small maps, it was necessary to retrain the cost function using LEARCH in each of these cases. Further, when LEARCH-RL-LSTM was employed, retraining was unnecessary when the patches of terrain were similar, because this system can generalize knowledge from previous navigation episodes. Note that, when the agent navigates in real time, even small retraining episodes are computationally expensive. Further, the agent is required to stop navigating until the retraining episode ends. However, for LEARCH-RL-LSTM, retraining is unnecessary when the environment is similar to those already known from previous navigation episodes.

5.1. Noise tests

Goal

Start

Goal

In the previous experiments, we assumed that the agent could infer the current state of the environment model based on the features observed by the agent. However, in a real scenario, the agent must infer the actual state via a perceptual system based on data obtained through noisy sensors such as cameras, a Global Positioning System (GPS) or LiDAR. In outdoor environments, two states (patches of terrain) can be very similar; however, the same action in these similar states could lead to different resultant actions. In case of noisy signals, one state could be interpreted as another similar state or, alternatively, as a new state that is not

Figure 10. Maps used in second set of experiments to simulate dynamic environments. Lower right corner of the second

Start

Goal

Start

Goal

Goal

http://dx.doi.org/10.5772/intechopen.71486

121

Path Planning in Rough Terrain Using Neural Network Memory

Start

To test these two systems in more realistic environment, a noise signal was induced to the inputs of both systems. A real uniform distribution bounded to a maximum of [1, 1] (20% of noise) was used. Then, several runs of each system were conducted with an initial limit of [0.1, 0.1] (2% of noise) and increments of [0.1, 0.1] in the noise signal, until the maximum limits where both systems failed to infer the real state for the agent were determined. Tables 3 and 4 show the test results for both systems with noise; the noise range values are

explicitly represented in the environment model, i.e. a hidden state.

Start

row: sample of a map with the patch of terrain used for retraining marked by an orange box.

Another set of environment maps was also used to test both algorithms. For these new environments, features that were not observed in previous scenarios were included. In this experiment, map 6 was the base of knowledge, and two new features were included in maps 7, 8 and 9. The states differed in the same way as in the previous experiment, with 5% of the states in each map being different from those of the previous maps. However, these differences included new features in order to simulate a dynamic environment, i.e. we simulate that the terrain of the map 6 suddenly changed when the agent navigates again on this map introducing new features on some cells of the grid of map 6. The maps used for this experiment are shown in Figure 10.

The LSTM used in these experiments was trained offline. During agent navigation, an efficient training episode was only executed if necessary, i.e. only if the action that LSTM learned to take is dangerous for the agent. These training episodes were efficient, because only a fraction of the environment was used (such as the patch of terrain shown in the lower right corner of Figure 10 each time the robot encountered a new state or required navigation assistance.

Path Planning in Rough Terrain Using Neural Network Memory http://dx.doi.org/10.5772/intechopen.71486 121

Figure 10. Maps used in second set of experiments to simulate dynamic environments. Lower right corner of the second row: sample of a map with the patch of terrain used for retraining marked by an orange box.

#### 5.1. Noise tests

maps are processed, even if the new maps are very similar to map 1. Therefore, the LEARCH

Table 2. Iterations needed to learn demonstrated behavior using knowledge of map 1, for LEARCH and LEARCH-

Environment Map 1 Map 2 Map 3 Map 4 Map 5 Iterations 7 3 3 3 4

Environment Map 2 Map 3 Map 4 Map 5 Iterations LEARCH 0547 Iterations LEARCH-RL-LSTM 0000

When RL-LSTM was integrated with LEARCH to improve the capability for reusing knowledge learned from previous navigation episodes, there was no need to retrain the system. This is apparent from the second row of Table 2, where all the demonstrated behavior for maps 2 to 5 could be learned using the knowledge learned from map 1 only. It is important to note that, although the results were obtained from relatively small maps, it was necessary to retrain the cost function using LEARCH in each of these cases. Further, when LEARCH-RL-LSTM was employed, retraining was unnecessary when the patches of terrain were similar, because this system can generalize knowledge from previous navigation episodes. Note that, when the agent navigates in real time, even small retraining episodes are computationally expensive. Further, the agent is required to stop navigating until the retraining episode ends. However, for LEARCH-RL-LSTM, retraining is unnecessary when the environment is similar to those

Another set of environment maps was also used to test both algorithms. For these new environments, features that were not observed in previous scenarios were included. In this experiment, map 6 was the base of knowledge, and two new features were included in maps 7, 8 and 9. The states differed in the same way as in the previous experiment, with 5% of the states in each map being different from those of the previous maps. However, these differences included new features in order to simulate a dynamic environment, i.e. we simulate that the terrain of the map 6 suddenly changed when the agent navigates again on this map introducing new features on some cells of the grid of map 6. The maps used for this experiment are

The LSTM used in these experiments was trained offline. During agent navigation, an efficient training episode was only executed if necessary, i.e. only if the action that LSTM learned to take is dangerous for the agent. These training episodes were efficient, because only a fraction of the environment was used (such as the patch of terrain shown in the lower right corner of Figure 10 each time the robot encountered a new state or required navigation assistance.

system was shown to have a very poor generalization capability.

Table 1. Iterations needed to learn demonstrated behavior using LEARCH system.

already known from previous navigation episodes.

shown in Figure 10.

RL-LSTM system.

120 Advanced Path Planning for Mobile Entities

In the previous experiments, we assumed that the agent could infer the current state of the environment model based on the features observed by the agent. However, in a real scenario, the agent must infer the actual state via a perceptual system based on data obtained through noisy sensors such as cameras, a Global Positioning System (GPS) or LiDAR. In outdoor environments, two states (patches of terrain) can be very similar; however, the same action in these similar states could lead to different resultant actions. In case of noisy signals, one state could be interpreted as another similar state or, alternatively, as a new state that is not explicitly represented in the environment model, i.e. a hidden state.

To test these two systems in more realistic environment, a noise signal was induced to the inputs of both systems. A real uniform distribution bounded to a maximum of [1, 1] (20% of noise) was used. Then, several runs of each system were conducted with an initial limit of [0.1, 0.1] (2% of noise) and increments of [0.1, 0.1] in the noise signal, until the maximum limits where both systems failed to infer the real state for the agent were determined. Tables 3 and 4 show the test results for both systems with noise; the noise range values are


References

6739759

51-60

TRO.2012.2227216

DOI: 10.1007/BF00994018

[1] Silver D, Bagnell JA, Stentz A. Learning from demonstration for autonomous navigation in complex unstructured terrain. The International Journal of Robotics Research. 2010;29

Path Planning in Rough Terrain Using Neural Network Memory

http://dx.doi.org/10.5772/intechopen.71486

123

[2] Surger B, Steder B, Burgard W. Traversability analysis for mobile robots in outdoor environments: A semi-supervised learning approach based on 3D-lidar data. In: 2015 IEEE International Conference on Robotics and Automation (ICRA); 26–30 May 2015;

[3] Häselich M, Jöbgen B, Neuhaus F, Lang D, Paulus D. Markov random field terrain classification of large-scale 3D maps. In: 2014 IEEE International Conference on Robotics and Biomimetics (ROBIO 2014); 5–10 Dec 2014; Bali, Indonesia. IEEE; 2014. p. 1970-1975.

[4] Kondo M, Sunaga K, Kobayashi Y, Kaneko T, Hiramatsu Y, Fuji H, Kamiya T. Path selection based on local terrain feature for unmanned ground vehicle in unknown rough terrain environment. In: 2013 IEEE International Conference on Robotics and Biomimetics (ROBIO); 12–14 Dec 2013; Shenzhen, China. IEEE; 2013. p. 1977-1982. DOI: 10.1109/ROBIO.2013.

[5] Murphy L, Newman P. Risky planning on probabilistic Costmaps for path planning in outdoor environments. IEEE Transactions on Robotics. 2013;29(2):445-457. DOI: 10.1109/

[6] Valencia-Murillo R, Arana-Daniel N, López-Franco C, Alanís A. Rough terrain perception through geometric entities for robot navigation. In: 2nd International Conference on Advances in Computer Science and Engineering (CSE 2013); 1–2 Jul 2013; Los Angeles,

[7] Bakker B. Reinforcement learning with long short-term memory. In: Advances in Neural Information Processing Systems 14. Cambridge: MIT Press; 2002. p. 1475-1482

[8] Kalman R. When is a linear control system optimal. Journal of Basic Engineering. 1964;86(1):

[9] Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995;20(3):273-297.

[10] Sutton R, Barto A. Reinforcement Learning: An Introduction. Cambridge: MIT Press; 1998

CA, USA. Atlantis Press; 2013. DOI: 10.2991/cse.2013.69

Seattle, WA, USA. IEEE; 2015. p. 3941-3946. DOI: 10.1109/ICRA.2015.7139749

(12):1565-1592. DOI: 10.1177/0278364910369715

DOI: 10.1109/ROBIO.2014.7090625

Table 3. Maximum noise supported by both systems in tests where the desired behavior could be reproduced with maps 1–5.


Table 4. Maximum noise supported by both systems in tests where the desired behavior could be reproduced withmaps 6–9.

the maximum limits of the noise supported by the system using that map. Note that the results of the maps 6–9 yielded by the LEARCH system are omitted, because this system could not reproduce the desired behavior on these maps under the supplied noise levels.
