**1. Introduction**

The frequency of natural disasters is increasing everywhere in the world, which is a major impediment to sustainable development. In order to minimize the damage of disasters, the United Nations Office for Disaster Risk Reduction (UNISDR) calls for the promotion of disaster prevention and mitigation by local governments in each country. This is an important issue for the international community in order to reduce vulnerability to and damage from disasters.

In the case of a large-scale disaster, a large number of injuries occur simultaneously, and the condition of the injured changes with the lapse of time. This implies that, to conduct efficient treatment when resources are insufficient to immediately treat all the people who are injured, it is necessary to use triage, which is the process of determining the priority of treatment based on the severity of the injured person's condition [1].

To date, many different remote-controlled disaster relief robots have been developed. A further complication, besides the need for triage, is that these robots must work in environments in which communication is not always secure. For these reasons, there is a need for autonomous disaster relief robots, that is, robots which can learn from the conditions that they encounter and then take independent action [2]. Thus, efficient rescue needs to consider the condition of the injured, which changes with the lapse of time, even with the use of disaster rescue robots.

Reinforcement learning is one way that robots can acquire information about appropriate behavior in new environments. Under this learning system, robots can observe the environment, select and perform actions, and obtain rewards [3–6]. Each robot must learn what the best policy (i.e., the policy that obtains the largest amount of reward over time) is by itself.

Recent research on disaster relief robots has included consideration of multiagent systems, that is, systems that include two or more disaster relief robots. A multi-agent system in which multiple agents explore sections of damaged building with the goal of updating a topological map of the building with safe routes is discussed [7, 8]. John et al. constructed a multi-agent systems approach to disaster situation management, which is a complex multidimensional process involving a large number of mobile interoperating agents [9]. However, to successfully interact in the real world, agents must be able to reason about their interactions with heterogeneous agents of widely varying properties and capabilities. It is necessary that agents are able to learn from the environment and implement independent actions by using perceptual and reasoning in order to carry out their task in the best possible way [10, 11].

Numerous studies regarding learning in multi-agent systems have been conducted. Spychalski and Arendt proposed a methodology for implementing machine learning capability in multi-agent systems for aided design of selected control systems allowed to improve their performance by reducing the time spent processing requests that were previously acknowledged and stored in the learning module [12]. In [13], a new kind of multi-agent reinforcement learning algorithm, called TM\_Qlearning, which combines traditional Q-learning with observation-based teammate modeling techniques, was proposed. Two multi-agent reinforcement learning methods, both consisting of promoting the selection of actions so that the chosen action not only relies on the present experience but also on an estimation of possible future ones, have been proposed to better solve the coordination problem and the exploration/exploitation dilemma in the case of nonstationary environments [14]. In [15], the construction of a multi-agent evacuation guidance simulation that consists of evacuee agents and instruction agents was reported, and the optimum evacuation guidance method was discussed through numerical simulations by using the multi-agent system for post-earthquake tsunami events. A simulation of a disaster relief problem that included multiple autonomous robots working as a multi-agent system has been reported [16].

In disaster relief problems, it is important to rescue the injured and remove obstacles according to conditions that are changing with the passage of time. However, conventional research on multiple agents targeted for disaster relief has not taken into consideration the condition of the injured, so it is insufficient for efficient rescue.

In this chapter, we discuss acquiring cooperative behavior of rescuing the injured and clearing obstacles according to triage of the injured in a multi-agent system. We propose three methods of reward distribution: (1) reward distribution responding to the condition of the injured, (2) reward distribution based on the contribution degree, and (3) reward distribution by the contribution degree responding to the condition of the injured. We investigated the effectiveness of

**165**

**Figure 1.** *Multi-agent systems.*

*Improvement of Cooperative Action for Multi-Agent System by Rewards Distribution*

contribution degree responding to the condition of the injured.

limited, and it is inaccurate, and it entails delay [2, 17, 18].

• The control of the system is distributed.

• The data are decentralized.

not aware of each another [20].

• The computation is asynchronous.

these proposed methods for a disaster relief problem by an experiment. The results of the experiment showed that agents gained high rewards by rescuing those in most urgent need under the method having the reward distributed according to the

**2. Learning of multi-agent systems and representation of disaster relief** 

Agents are a computational mechanism that exist in some complex environment, sense and perform actions in its environment, and by doing so realize a set of tasks for which it is assigned. A multi-agent system consists of agents that interact with each other, situated in a common environment, which they perceive with sensors and upon which they act with actuators (**Figure 1**). Agent and environment are relationships of the interaction. In the meantime, when the environments are inaccessible, the information which can be perceived from the environment is

In [19], the following major characteristics of multi-agent systems were

• Each agent has incomplete information and is restricted in its capabilities.

In multi-agent systems, individual agents are forced to engage with other agents that have varying goals, abilities, and composition. Reinforcement learning have been used to learn about other agents and adapt local behavior for the purpose of achieving coordination in multi-agent situations in which the individual agents are

Various reinforcement learning strategies have been proposed that can be used by agents to develop a policy to maximizing rewards accumulated over time. A prominent algorithm in reinforcement learning is the Q-learning algorithm. In Q-learning, the decision policy is represented by the *Q*-factors, which estimates long-term discounted rewards for each state-action pair. Let *Q*(*s*, *a*) denote the

*DOI: http://dx.doi.org/10.5772/intechopen.85109*

**2.1 Learning of multi-agent systems**

**problem**

identified:

*Improvement of Cooperative Action for Multi-Agent System by Rewards Distribution DOI: http://dx.doi.org/10.5772/intechopen.85109*

these proposed methods for a disaster relief problem by an experiment. The results of the experiment showed that agents gained high rewards by rescuing those in most urgent need under the method having the reward distributed according to the contribution degree responding to the condition of the injured.
