**4.3 Obtaining cooperative action that considers the condition of the injured and the effects of reward distribution timing**

We applied the three types of reward distributions in experiments for efficient rescue in accordance with the urgency of the condition of the injured. In the following descriptions of experimental results, the horizontal axis represents episodes, and the vertical axis represents the number of steps for task completion by all agents.

**Figure 6** shows the results of an experiment to investigate the effectiveness of a reward distribution in accordance with condition (Method 1) compared to the conventional method [16]. As shown, the number of steps is higher throughout with Method 1 than with the standard method, thus indicating learning to postpone rescue of the low urgency injured and prioritize rescue of the high urgency injured.

Finally, we performed an experiment to investigate the effectiveness of reward distribution based on contribution degree (Method 2) in comparison to Method 1 and another experiment to investigate the effectiveness of reward distribution by contribution degree in accordance with injured condition (Method 3) in

**171**

**5. Conclusion**

**Figure 7.**

**Table 2.**

conferral on the agents.

*Improvement of Cooperative Action for Multi-Agent System by Rewards Distribution*

comparison with the other proposed methods (Methods 1 and 2). The results are

**Triage of injured Method 1 Method 2 Method 3** Red 1126.73 1140.93 1102.93 Yellow 1518.20 1614.40 1269.07 Green 2404.53 2499.33 1685.47 Black 3284.27 2999.47 2447.33

As shown, Method 2 tends to yield a higher number of steps than Method 1 in and after around episode 6000. This indicates that, for efficient injured rescue with consideration for contribution degree, the rescue order learned was to first rescue those injured who were nearby and thus shorten the rescue time, leaving for later

Method 3 is approximately 2.2 and 3.4% superior to Methods 1 and 2, respectively. The agents were apparently able to learn rescue of the injured in accordance with urgency because a reward differing in accordance with injured condition was conferred on the agents. These results also show that the agents were able to learn efficient rescue action because the reward distribution reflected contribution degree.

In this chapter, we considered rescue robots as a multi-agent system and proposed three reward distributions for the agents to learn cooperative action with consideration given to the condition of the injured and obstacle removal in responding to our disaster rescue problem, as well as investigating the timing of reward

Comparative experiments showed that the timing of reward distribution enabling the agents to obtain the most efficient cooperative actions consisted of

*DOI: http://dx.doi.org/10.5772/intechopen.85109*

shown in **Figure 7** and **Table 2**.

*Results of the different proposed methods.*

the rescue of those who were farther away.

*Mean step numbers by different reward distribution.*

**Figure 6.** *Results of the conventional method and proposed Method 1.*

*Improvement of Cooperative Action for Multi-Agent System by Rewards Distribution DOI: http://dx.doi.org/10.5772/intechopen.85109*

#### **Figure 7.**

*Assistive and Rehabilitation Engineering*

*Results of experiment to compare three reward distribution timing patterns.*

Pattern 3 in the subsequent experiments.

**the effects of reward distribution timing**

of steps than did Patterns 1 and 2, which in turn indicates that efficient rescue and removal was learned by conferring rewards in two stages and thus led the agent to regard the course from discovery to transport as one task. We therefore applied

**4.3 Obtaining cooperative action that considers the condition of the injured and** 

We applied the three types of reward distributions in experiments for efficient rescue in accordance with the urgency of the condition of the injured. In the following descriptions of experimental results, the horizontal axis represents episodes, and the vertical axis represents the number of steps for task completion by all agents. **Figure 6** shows the results of an experiment to investigate the effectiveness of a reward distribution in accordance with condition (Method 1) compared to the conventional method [16]. As shown, the number of steps is higher throughout with Method 1 than with the standard method, thus indicating learning to postpone rescue of the low urgency injured and prioritize rescue of the high urgency injured. Finally, we performed an experiment to investigate the effectiveness of reward distribution based on contribution degree (Method 2) in comparison to Method 1 and another experiment to investigate the effectiveness of reward distribution by contribution degree in accordance with injured condition (Method 3) in

**Figure 5.**

**170**

**Figure 6.**

*Results of the conventional method and proposed Method 1.*

*Results of the different proposed methods.*


#### **Table 2.**

*Mean step numbers by different reward distribution.*

comparison with the other proposed methods (Methods 1 and 2). The results are shown in **Figure 7** and **Table 2**.

As shown, Method 2 tends to yield a higher number of steps than Method 1 in and after around episode 6000. This indicates that, for efficient injured rescue with consideration for contribution degree, the rescue order learned was to first rescue those injured who were nearby and thus shorten the rescue time, leaving for later the rescue of those who were farther away.

Method 3 is approximately 2.2 and 3.4% superior to Methods 1 and 2, respectively. The agents were apparently able to learn rescue of the injured in accordance with urgency because a reward differing in accordance with injured condition was conferred on the agents. These results also show that the agents were able to learn efficient rescue action because the reward distribution reflected contribution degree.

#### **5. Conclusion**

In this chapter, we considered rescue robots as a multi-agent system and proposed three reward distributions for the agents to learn cooperative action with consideration given to the condition of the injured and obstacle removal in responding to our disaster rescue problem, as well as investigating the timing of reward conferral on the agents.

Comparative experiments showed that the timing of reward distribution enabling the agents to obtain the most efficient cooperative actions consisted of reward conferral at the stages in which the agent discovered the injured person or obstacle and at completion of their transport. The results also showed that the capability of cooperative actions for the most efficient injured rescue and obstacle removal could be acquired through reward distribution by contribution degree in accordance with condition.

In this chapter, the multi-agent system corresponding to the disaster rescue problem, rescue simulations were performed with the condition of the injured determined in advance. In future studies, we plan to conduct simulations with dynamic changes over time in both the condition of the injured and removable versus nonremovable states of the obstacles.
