**2. Learning of multi-agent systems and representation of disaster relief problem**

## **2.1 Learning of multi-agent systems**

*Assistive and Rehabilitation Engineering*

amount of reward over time) is by itself.

best possible way [10, 11].

To date, many different remote-controlled disaster relief robots have been developed. A further complication, besides the need for triage, is that these robots must work in environments in which communication is not always secure. For these reasons, there is a need for autonomous disaster relief robots, that is, robots which can learn from the conditions that they encounter and then take independent action [2]. Thus, efficient rescue needs to consider the condition of the injured, which changes with the lapse of time, even with the use of disaster rescue robots.

Reinforcement learning is one way that robots can acquire information about appropriate behavior in new environments. Under this learning system, robots can observe the environment, select and perform actions, and obtain rewards [3–6]. Each robot must learn what the best policy (i.e., the policy that obtains the largest

Recent research on disaster relief robots has included consideration of multiagent systems, that is, systems that include two or more disaster relief robots. A multi-agent system in which multiple agents explore sections of damaged building with the goal of updating a topological map of the building with safe routes is discussed [7, 8]. John et al. constructed a multi-agent systems approach to disaster situation management, which is a complex multidimensional process involving a large number of mobile interoperating agents [9]. However, to successfully interact in the real world, agents must be able to reason about their interactions with heterogeneous agents of widely varying properties and capabilities. It is necessary that agents are able to learn from the environment and implement independent actions by using perceptual and reasoning in order to carry out their task in the

Numerous studies regarding learning in multi-agent systems have been conducted. Spychalski and Arendt proposed a methodology for implementing machine learning capability in multi-agent systems for aided design of selected control systems allowed to improve their performance by reducing the time spent processing requests that were previously acknowledged and stored in the learning module [12]. In [13], a new kind of multi-agent reinforcement learning algorithm, called TM\_Qlearning, which combines traditional Q-learning with observation-based teammate modeling techniques, was proposed. Two multi-agent reinforcement learning methods, both consisting of promoting the selection of actions so that the chosen action not only relies on the present experience but also on an estimation of possible future ones, have been proposed to better solve the coordination problem and the exploration/exploitation dilemma in the case of nonstationary environments [14]. In [15], the construction of a multi-agent evacuation guidance simulation that consists of evacuee agents and instruction agents was reported, and the optimum evacuation guidance method was discussed through numerical simulations by using the multi-agent system for post-earthquake tsunami events. A simulation of a disaster relief problem that included multiple autonomous robots

In disaster relief problems, it is important to rescue the injured and remove obstacles according to conditions that are changing with the passage of time. However, conventional research on multiple agents targeted for disaster relief has not taken into consideration the condition of the injured, so it is insufficient for

In this chapter, we discuss acquiring cooperative behavior of rescuing the injured and clearing obstacles according to triage of the injured in a multi-agent system. We propose three methods of reward distribution: (1) reward distribution responding to the condition of the injured, (2) reward distribution based on the contribution degree, and (3) reward distribution by the contribution degree responding to the condition of the injured. We investigated the effectiveness of

working as a multi-agent system has been reported [16].

**164**

efficient rescue.

Agents are a computational mechanism that exist in some complex environment, sense and perform actions in its environment, and by doing so realize a set of tasks for which it is assigned. A multi-agent system consists of agents that interact with each other, situated in a common environment, which they perceive with sensors and upon which they act with actuators (**Figure 1**). Agent and environment are relationships of the interaction. In the meantime, when the environments are inaccessible, the information which can be perceived from the environment is limited, and it is inaccurate, and it entails delay [2, 17, 18].

In [19], the following major characteristics of multi-agent systems were identified:


In multi-agent systems, individual agents are forced to engage with other agents that have varying goals, abilities, and composition. Reinforcement learning have been used to learn about other agents and adapt local behavior for the purpose of achieving coordination in multi-agent situations in which the individual agents are not aware of each another [20].

Various reinforcement learning strategies have been proposed that can be used by agents to develop a policy to maximizing rewards accumulated over time. A prominent algorithm in reinforcement learning is the Q-learning algorithm.

In Q-learning, the decision policy is represented by the *Q*-factors, which estimates long-term discounted rewards for each state-action pair. Let *Q*(*s*, *a*) denote the

**Figure 1.** *Multi-agent systems.*

*Q*-factor for state *s* and action *a*. If an action *a* in state *s* produces a reinforcement of *r* and a transition to state *st + 1*, then the corresponding *Q*-factor is modified as follows:

$$\mathbf{Q}(\mathfrak{s}\_t, a\_t) \leftarrow \mathbf{Q}(\mathfrak{s}\_t, a\_t) + \mathbf{a}[r + \gamma \max \mathbf{Q}(\mathfrak{s}\_{t+1}, a) - \mathbf{Q}(\mathbf{S}\_t, a\_t)] \tag{1}$$

where α is a small constant called learning rate, which denotes a step-size parameter in the iteration, and γ denotes the discounting factor. Theoretically, the algorithm is allowed to run for infinitely many iterations until the Q-factors converge.

#### **2.2 The target problem**

In this chapter, we focus on a disaster relief as a target problem similar to previous research [16]. In the disaster relief problem, agents must rescue the injured as quickly as possible, and the injured with different severity and urgency of the condition are placed on a field of fixed size. Because there are multiple injured and obstacles, the disaster relief problem can be considered to be a multi-agent system. Each agent focuses on achieving its own target, and the task of system is to efficiently rescue all of the injured and remove obstacles (**Figure 2**).

Efficient rescue is performed at a disaster site using triage to assign priority of transport or treatment based on the severity and urgency of the condition of the injured. In the disaster rescue problem, it is thus necessary to reflect triage based on the condition of the injured. For this purpose, in this chapter, we designate the condition of the injured as red (requiring emergency treatment), yellow (requiring urgent treatment), green (light injury), or black (lifesaving is difficult) in descending order of urgency.

The disaster relief problem is represented as shown in **Figure 3**. The field is divided into an N × N lattice. Agents are indicated by circles, ◎; the injured are indicated by R, Y, G, and B; removable obstacles are indicated by white triangles, △; and nonremovable obstacles are indicated by black triangles, ▲. The destination of injures is indicated by a white square, □; and the collection site of movable obstacles is indicated by a black square, ■. A single step is defined such that each of the agents on the field completes a single action, and the field is re-initialized once all of the injured have been moved.

The agents considered in this chapter are depicted in **Figure 4** and included two types, to obtain cooperative actions. The rescue agents have the primary function of rescuing the injured, and the removal agents have the primary function of removing obstacles, although either type can perform both functions. The agents

**167**

*Improvement of Cooperative Action for Multi-Agent System by Rewards Distribution*

recognize the colors of the injured on the field and identify the condition of the

Each agent considers the overall tasks in the given environment, carries out the tasks in accordance with the assigned roles, and learns appropriate actions that

An agent can recognize its circumstance within a prescribed field of vision and move one cell vertically or horizontally, but will stay in place without moving if a nonremovable obstacle, injured transport destination, obstacle transport destination, or other agent occupies the movement destination or if the movement destination is outside the field. Each agent has a constant but limited view of the field, and

The available actions of agents are (1) moving up, down, right, or left to an adjacent cell; (2) remaining in the present cell when adjacent cell is occupied by an obstacle that cannot be removed or by another agent; and (3) finding an injured

If an agent is processing an injured person and next action is moving it to the appropriate destination, then the task of the agent is completed. The agent can begin a new task for rescuing or removing. When all of the injured on the field have

person or a movable obstacle and taking it to the appropriate location.

injured in correspondence with those colors.

*Agents of different functions: (a) removing agent and (b) relief agent.*

*An example of representation for a disaster relief problem.*

it can assess the surrounding environment.

been rescued, the overall task is completed.

bring high rewards.

**Figure 3.**

**Figure 4.**

*DOI: http://dx.doi.org/10.5772/intechopen.85109*

**Figure 2.** *Disaster relief problem.*

*Improvement of Cooperative Action for Multi-Agent System by Rewards Distribution DOI: http://dx.doi.org/10.5772/intechopen.85109*


#### **Figure 3.**

*Assistive and Rehabilitation Engineering*

**2.2 The target problem**

all of the injured have been moved.

*Q*-factor for state *s* and action *a*. If an action *a* in state *s* produces a reinforcement of *r* and a transition to state *st + 1*, then the corresponding *Q*-factor is modified as follows:

*Q*(*st*,*at*) ← *Q*(*st*,*at*) + α[*r* + *maxQ*(*st*+1,*a*) − *Q*(*St*,*at*)] (1)

where α is a small constant called learning rate, which denotes a step-size parameter in the iteration, and γ denotes the discounting factor. Theoretically, the algorithm is allowed to run for infinitely many iterations until the Q-factors converge.

In this chapter, we focus on a disaster relief as a target problem similar to previous research [16]. In the disaster relief problem, agents must rescue the injured as quickly as possible, and the injured with different severity and urgency of the condition are placed on a field of fixed size. Because there are multiple injured and obstacles, the disaster relief problem can be considered to be a multi-agent system. Each agent focuses on achieving its own target, and the task of system is to effi-

Efficient rescue is performed at a disaster site using triage to assign priority of transport or treatment based on the severity and urgency of the condition of the injured. In the disaster rescue problem, it is thus necessary to reflect triage based on the condition of the injured. For this purpose, in this chapter, we designate the condition of the injured as red (requiring emergency treatment), yellow (requiring urgent treatment), green (light injury), or black (lifesaving is difficult) in descending order of urgency. The disaster relief problem is represented as shown in **Figure 3**. The field is divided into an N × N lattice. Agents are indicated by circles, ◎; the injured are indicated by R, Y, G, and B; removable obstacles are indicated by white triangles, △; and nonremovable obstacles are indicated by black triangles, ▲. The destination of injures is indicated by a white square, □; and the collection site of movable obstacles is indicated by a black square, ■. A single step is defined such that each of the agents on the field completes a single action, and the field is re-initialized once

The agents considered in this chapter are depicted in **Figure 4** and included two

types, to obtain cooperative actions. The rescue agents have the primary function of rescuing the injured, and the removal agents have the primary function of removing obstacles, although either type can perform both functions. The agents

ciently rescue all of the injured and remove obstacles (**Figure 2**).

**166**

**Figure 2.**

*Disaster relief problem.*

*An example of representation for a disaster relief problem.*

**Figure 4.** *Agents of different functions: (a) removing agent and (b) relief agent.*

recognize the colors of the injured on the field and identify the condition of the injured in correspondence with those colors.

Each agent considers the overall tasks in the given environment, carries out the tasks in accordance with the assigned roles, and learns appropriate actions that bring high rewards.

An agent can recognize its circumstance within a prescribed field of vision and move one cell vertically or horizontally, but will stay in place without moving if a nonremovable obstacle, injured transport destination, obstacle transport destination, or other agent occupies the movement destination or if the movement destination is outside the field. Each agent has a constant but limited view of the field, and it can assess the surrounding environment.

The available actions of agents are (1) moving up, down, right, or left to an adjacent cell; (2) remaining in the present cell when adjacent cell is occupied by an obstacle that cannot be removed or by another agent; and (3) finding an injured person or a movable obstacle and taking it to the appropriate location.

If an agent is processing an injured person and next action is moving it to the appropriate destination, then the task of the agent is completed. The agent can begin a new task for rescuing or removing. When all of the injured on the field have been rescued, the overall task is completed.
