**2.1 Background: Aging society in Japan**

The aging society is becoming a serious problem in Japan. Like many developed countries, the decrease in birth rate and advance in life expectancy is proceeding very steeply. Due to the post war baby boom, the population of the elderly (over 65 year olds), has exceeded 20 percent of the whole population.

In the past, social welfare service has been focused on giving good medical care to the elderly. But recently, attention is shifting towards minimizing the needs of medical care itself.

Minimization of medical care not only has financial benefits, but is also important for increasing the individuals quality of life. By keeping good health and not needing medical care, an elderly person can continue to make their own decisions. This is rarely possible if they are hospitalized. Health exercises are considered to be an effective way to keep their health and minimize medication.

There is a lot of activity going on for spreading the health exercises. In the Ibaraki prefecture, the local government and a local university has co-developed a program for teaching health exercises. One characteristic of the system developed in the Ibaraki prefecture is that the trainer of the exercise is also an elderly person. The prefecture gives certificates to elderly people who have mastered the specified teaching program. This in turn qualifies them to participate as volunteers to teach the exercise to other elderly people. Due to this elderly-to-elderly teaching system, the number of health exercise demonstrators has increased to over 2300.

TAIZO is developed to assist spreading the health exercise.

### **2.2 Humanoid robot as an exercise demonstration medium**

We use a humanoid robot as a medium to demonstrate the exercise, because the following points make it an effective demonstrator.

**Similarity of the shape of body:** Because a humanoid robot is designed to imitate the shape of a human, a person can easily observe and imitate the demonstrative body motion expressed by the robot.

**Attraction:** Meeting face to face with a humanoid robot is not yet a common occurrence. It is easy to spark the curiosity of someone who is not familiar with humanoid robots and grab their attention with an artificial being which moves and talks.

**Embodiment in 3D space:** The robot has a real body which has an actual volume in 3D space. Compared to the virtual agents which only appear in 2D display, it can be observed from very wide view points (Matsusaka, 2008) (Ganeshan, 2006). From this characteristic, even the user at the back of the robot can observe the robots motions. A user to the side can also observe other users interactions with the robot.

Similarity of the body shape assists precise communication of the body movement to the trainee and useful to enhance the effectiveness of the exercise. Attraction also becomes a good incentive for the trainee to engage in the exercise. Embodiment in 3D space assists the social communication between the robot and the trainees. A trainee can watch other trainees while training. By looking at other trainees, they can see how well they communicate with the robot and how eager they engage in the exercise. In most of our demonstration experiments, this inter-audience peer-to-peer effect gives positive feedback to enhance the individuals eagerness to engage in the exercise (explained and discussed in section 2.9).

#### **2.3 Demonstration setup and scenario**

2 Will-be-set-by-IN-TECH

scripting method to the communication robot is first presented by Kanda et al (2002) in 2002. In their work, they not only proposed an incremental development framework, but also implemented an on-line development environment which can realize automated control of the robot. They have confirmed through 25 days field study that with help of the development environment, the conversation ability of the robot was incremented on-line and succeed to

However, in the existing behavior-based scripting methods for communication robot, there is an inefficiency in terms of reusing the script to develop different types of robots (this problem is described in Section 3.2.2). This chapter, we present the extension-by-unification method in order to push forward the behavior-based scripting approach to develop communication

In Section 3.1, a component based architecture we have developed that can develop a robotic

In Section 3.2, a formal discussion of an incremental development methods for the state-transition model is presented. Here, we introduce the formalization of the proposed incremental development method and clarify its characteristics by comparing it to the

The aging society is becoming a serious problem in Japan. Like many developed countries, the decrease in birth rate and advance in life expectancy is proceeding very steeply. Due to the post war baby boom, the population of the elderly (over 65 year olds), has exceeded 20

In the past, social welfare service has been focused on giving good medical care to the elderly.

Minimization of medical care not only has financial benefits, but is also important for increasing the individuals quality of life. By keeping good health and not needing medical care, an elderly person can continue to make their own decisions. This is rarely possible if they are hospitalized. Health exercises are considered to be an effective way to keep their

There is a lot of activity going on for spreading the health exercises. In the Ibaraki prefecture, the local government and a local university has co-developed a program for teaching health exercises. One characteristic of the system developed in the Ibaraki prefecture is that the trainer of the exercise is also an elderly person. The prefecture gives certificates to elderly people who have mastered the specified teaching program. This in turn qualifies them to participate as volunteers to teach the exercise to other elderly people. Due to this elderly-to-elderly teaching system, the number of health exercise demonstrators has increased

We use a humanoid robot as a medium to demonstrate the exercise, because the following

**Similarity of the shape of body:** Because a humanoid robot is designed to imitate the shape of a human, a person can easily observe and imitate the demonstrative body motion expressed

But recently, attention is shifting towards minimizing the needs of medical care itself.

decrease operation time of the human operator (Kanda et al, 2009).

dialog system only by connecting a components will be shown.

TAIZO is developed to assist spreading the health exercise.

**2.2 Humanoid robot as an exercise demonstration medium**

points make it an effective demonstrator.

**2. TAIZO robot and robot-human health exercise demonstration**

robots.

previous method.

**2.1 Background: Aging society in Japan**

percent of the whole population.

health and minimize medication.

to over 2300.

by the robot.

Health exercise is intended to strengthen the body of elderly people. It consists of several kind of exercise that involve stretching and muscle training. Health exercise is recently increasing in importance and is gaining attention as an effective way to reduce the number of elderly people who become bedridden and need nursing care and increase their quality of life. Despite these good points, there are still some difficulty in applying this illness prevention activity. One of the biggest problems is the difficulty to encourage people to engage in these health exercise to begin with.

People are not wary of their health while they are healthy. They notice after they realize they have a serious illness. Although, we have statistics on the percentage of people who become seriously ill, it is still difficult to estimate how healthy we are ourselves. Moreover, it is much more difficult to understand the effect of health exercise, since we cannot compare what would have happened if we didn't (or did) engage in the exercise.

Because of above, we usually require a special incentive to engage in the health exercise.

TAIZO (Figure 1) is designed to help demonstrate health exercise. It will stand to the side of the demonstrator and assist the trainer in demonstrating. Demonstration is usually done in front of 5 to 15 trainees. TAIZO is used to demonstrate at events with 40 to 80 trainees. The robot is used as an eye-catcher to capture the attention of people who don't know the exercise, but could be a potential regular trainee. By using the robot as a demonstrator, human demonstrator can get more interest from a larger variety of people compared to a demonstration done by humans alone. This leads to more people having a chance to engage in health exercises.

#### **2.4 Role of the human demonstrator and the robot**

Both human demonstrator and the robot stand in front of the audience (Figure 2). Human demonstrator leads the training program and the robot follows. Both human and the robot show the demonstration to the audience.

To follow the human demonstrators lead, the robot has to accept commands given by the human demonstrator. In addition, although the human demonstrator takes the lead in most situations, the robot has to collaborative with demonstration activity in order to make it more

**2.5 Difference of communication channels in robot-human demonstration**

robot. We call this dialog "in-demonstration conversation".

the communication issue specific to a human-robot setup.

**2.5.2 Speech or keyboard : Discussion from both sides**

In Section 2.7 evaluation design is described.

Figure 3 shows the overall architecture of TAIZO robot.

**2.6 Architecture of TAIZO robot 2.6.1 Speech input system**

manzai to entertain a human audience (e.g. Hayashi et al (2008)).

When demonstrating the exercise there is a dialog between the trainer and the supporting

Speech Communication with Humanoids: How People React and How We Can Build the System 169

There exists research which handles this in-demonstration conversation. Katagiri et al (2001) has realized and evaluated the effect of demonstration through virtual agents which uses in-demonstration conversation. In Japan, this dialog has been developed into a two-man stand-up comedy called manzai. There several pairs of robots which have been used for

However, in most of previous research, the demonstration setup consists of only two robots. If one of the members becomes human, there will be a communication issues between the human and robot. In this chapter, we introduce our health exercise robot which has handled

In human-robot collaborative demonstration, what is the most appropriate method to give commands to the robot? Here, we compare two input methods, key input and vocal input. When we compare the two input methods from accuracy, key input is more precise than the

When we compare two the input methods from their characteristics, key input uses a private channel, while vocal commands are public. In other words, vocal commands can be heard from the audience, but key input cannot be heard or observed by most of the audience. From the demonstrator's side of view, a more precise input channel may be preferable, because they don't want to make errors. But when we consider the audience's point of view, a more transparent communication may be more preferable. From this point of view, vocal commands may be preferred, because it will allow the audience to observe the flow of the dialog between the human demonstrator and the robot. Interaction using an audio medium is publicly observable, and from a users standpoint, could overcame keypad input despite it's

In this chapter, we will not only evaluate the effectiveness of each input method from a demonstrator's point of view, but will also focus on the demonstrative effect to the audience.

Speech input system consists of a speaker-phone device (Figure 4) to capture and emit sound. The speech recognition system Julius (Kawahara et al, 2000) inputs the captured sound and outputs the recognized phrase. The recognized phrase is matched with a phrase database in the phrase matcher. The phrase database contains a set of phrases written in a form of script which associates phrases and commands. Details of the speech I/O system will be shown in

Key input system consists of a keypad device (Figure 4). The human demonstrator types a

**2.5.1 In-demonstration conversation**

speech input.

inaccuracy.

Section 3.1.

**2.6.2 Key input system**

number using this keypad.

Fig. 2. Demonstrative setup

attractive. The collaboration activity itself could sometimes be an attention catcher to increase attendance of the demonstration. This chapter focuses on this communication effect in the human-robot collaborative demonstration.

### **2.5 Difference of communication channels in robot-human demonstration 2.5.1 In-demonstration conversation**

When demonstrating the exercise there is a dialog between the trainer and the supporting robot. We call this dialog "in-demonstration conversation".

There exists research which handles this in-demonstration conversation. Katagiri et al (2001) has realized and evaluated the effect of demonstration through virtual agents which uses in-demonstration conversation. In Japan, this dialog has been developed into a two-man stand-up comedy called manzai. There several pairs of robots which have been used for manzai to entertain a human audience (e.g. Hayashi et al (2008)).

However, in most of previous research, the demonstration setup consists of only two robots. If one of the members becomes human, there will be a communication issues between the human and robot. In this chapter, we introduce our health exercise robot which has handled the communication issue specific to a human-robot setup.

### **2.5.2 Speech or keyboard : Discussion from both sides**

In human-robot collaborative demonstration, what is the most appropriate method to give commands to the robot? Here, we compare two input methods, key input and vocal input. When we compare the two input methods from accuracy, key input is more precise than the

speech input.

4 Will-be-set-by-IN-TECH

Human Robot

Observer(s)

Observe

Demonstrators

Communication channel

Demonstrate Observe

Audiences (Trainees)

attractive. The collaboration activity itself could sometimes be an attention catcher to increase attendance of the demonstration. This chapter focuses on this communication effect in the

Fig. 1. TAIZO robot

Fig. 2. Demonstrative setup

human-robot collaborative demonstration.

When we compare two the input methods from their characteristics, key input uses a private channel, while vocal commands are public. In other words, vocal commands can be heard from the audience, but key input cannot be heard or observed by most of the audience.

From the demonstrator's side of view, a more precise input channel may be preferable, because they don't want to make errors. But when we consider the audience's point of view, a more transparent communication may be more preferable. From this point of view, vocal commands may be preferred, because it will allow the audience to observe the flow of the dialog between the human demonstrator and the robot. Interaction using an audio medium is publicly observable, and from a users standpoint, could overcame keypad input despite it's inaccuracy.

In this chapter, we will not only evaluate the effectiveness of each input method from a demonstrator's point of view, but will also focus on the demonstrative effect to the audience. In Section 2.7 evaluation design is described.

### **2.6 Architecture of TAIZO robot**

### **2.6.1 Speech input system**

Figure 3 shows the overall architecture of TAIZO robot.

Speech input system consists of a speaker-phone device (Figure 4) to capture and emit sound. The speech recognition system Julius (Kawahara et al, 2000) inputs the captured sound and outputs the recognized phrase. The recognized phrase is matched with a phrase database in the phrase matcher. The phrase database contains a set of phrases written in a form of script which associates phrases and commands. Details of the speech I/O system will be shown in Section 3.1.

### **2.6.2 Key input system**

Key input system consists of a keypad device (Figure 4). The human demonstrator types a number using this keypad.

Fig. 3. System architecture

Fig. 4. Keypad (left) and speaker-phone (right)

The keypad is used for both key input mode and speech input mode. In speech input mode, the keypad is used to control the volume of the microphone device. Input to the microphone device switches on when the key is pressed, and turns off when the key is released (push-to-talk).

Fig. 5. Motions designed for health exercise.

QA type phrases.

We play back prerecorded speech for the robot. Patterns of prerecorded speech consists of exercise related phrases (e.g. "Raise your arm behind your head", "Twist your waist"), question answer type phrases ("Yes", "Okay", "No") and greetings ("Hello", "Good Bye"). Exercise related phrases are recorded by the script developer, each time a new exercise is designed. The script developer can also write a script for interaction by using prerecorded

Speech Communication with Humanoids: How People React and How We Can Build the System 171

### **2.6.3 Motion and speech database**

The motion database contains sequences of values specified to each joint of the robot. These sequences are designed to express the exercise motion of the robot.

In this chapter we have designed 17 motions (see Figure 5). When designing the motion, we first create an abstract design of the motion using a robot simulator (Hirukawa et al, 2003). Final adjustment is done by playing back the sequence on the real robot.

#### 170 The Future of Humanoid Robots – Research and Applications Speech Communication with Humanoids: How People React and How We can Build the System <sup>7</sup> Speech Communication with Humanoids: How People React and How We Can Build the System 171

6 Will-be-set-by-IN-TECH

The keypad is used for both key input mode and speech input mode. In speech input mode, the keypad is used to control the volume of the microphone device. Input to the microphone device switches on when the key is pressed, and turns off when the key is released

The motion database contains sequences of values specified to each joint of the robot. These

In this chapter we have designed 17 motions (see Figure 5). When designing the motion, we first create an abstract design of the motion using a robot simulator (Hirukawa et al, 2003).

sequences are designed to express the exercise motion of the robot.

Final adjustment is done by playing back the sequence on the real robot.

Fig. 3. System architecture

Fig. 4. Keypad (left) and speaker-phone (right)

**2.6.3 Motion and speech database**

(push-to-talk).

Fig. 5. Motions designed for health exercise.

We play back prerecorded speech for the robot. Patterns of prerecorded speech consists of exercise related phrases (e.g. "Raise your arm behind your head", "Twist your waist"), question answer type phrases ("Yes", "Okay", "No") and greetings ("Hello", "Good Bye"). Exercise related phrases are recorded by the script developer, each time a new exercise is designed. The script developer can also write a script for interaction by using prerecorded QA type phrases.

Fig. 6. Error rate of each input methods.

precision of the key input is not as high as we expected.

need to remember the mapping to the numbers.

speech in a human-robot collaborative demonstration.

prefecture intended to encourage sports for elderly people.

**2.9 Feasibility test in the real demonstration**

an instructor again.

Although the key input is a precise input method, there are some errors due to mistyping. Mistyping can be classified into two types. One is typing error, which happens when the demonstrator is in a hurry to type the keypad in the middle of the demonstration. The other is memory error. Memory error happens because the keypad only accepts numeric input and the demonstrator has to remember the mapping between the number and the training pattern. They often forget the mapping and type the wrong number. Because of these errors, actual

Speech Communication with Humanoids: How People React and How We Can Build the System 173

Most of the errors in vocal commands happen due to speech recognition errors. Some demonstrators had problems in pronunciation and other demonstrators spoke superfluous words to enhance the demonstration. In the case of vocal input, there was less memory error, because the demonstrator only has to pronounce the name of training pattern and there is no

Figure 7 shows the impression of the demonstration asked after each session. Demonstrator feels using the vocal command is easier than the key input. This can be understood from the memory error as we discussed above. When vocal commands were used, the audience both enjoyed the demonstration and found it as fulfilling as the keypad input demonstration, despite happenings due to inaccuracy. This could be one of the effects of the observability of

Figure 8 shows the impression of the robot assisted demonstration before and after the demonstration. Almost all the subjects have answered that they are willing to use the robot as

We have already started applying this robot in real demonstration events. Figure 10 shows photos from the "Nenrin-pic" event. Nenrin-pic event is one of the event hosted by Ibaraki


Table 1. Items in question sheet (each question allows free commenting).

### **2.6.4 Motion and speech generation system**

During motion generation, motion data in the database is transmitted to the motor controller. Motion data is transmitted sequentially at a specific rate using the internal clock of the robot. Speech recordings are synchronized in parallel with the motion at specified timings.
