**2.6.5 Script engine and script database**

The scenario database contains scripts which define speech phrases and key numbers of the keypad commands. It also includes information which associates the commands to motion data.

The script engine is based on a state transition model. For this specific experiment, we use a flat (one-state) structure model with 17 commands. Details will be shown in Section 3.2.

### **2.7 Evaluation of command input methods**

### **2.7.1 Experiment condition**

We have designed 4 experiments controlled by two conditions. Condition 1 is the input which be either keypad input or vocal command. Condition 2 is defined by the subjects role which is demonstrator or audience. The conditions regarding the role of the demonstrator switches when the same subject experiences the same demonstration as a demonstrator or as a passive participant. Each subject is asked to attend all of these 4 (2 × 2) experiments.

Experiment subjects are 60-80 year old, and consists of 1 male and 5 females. All subjects have experience in the health exercise training program and are certificatified to instruct. Although they are knowledgeable in the health exercise, when participating as passive audience, we asked them to answer the questions (described in next section) as if they are a novice trainee.

### **2.7.2 Experiment sequence**

The six subjects were divided into two groups of three. There are six sessions altogether for each group. The first three sessions used the keypad and the latter three sessions used vocal commands. In each of the three sessions, three subjects in turn take the role of demonstrator.

Each subject is asked to fill in a questionnaire sheet after each session. Table 1 shows the questions used in this experiment. Questions consist of asking the ease of use, whether the experience was enjoyable and whether the demonstration was fulfilling.

Before and after the experiment, the subject is asked to fill the question sheet. In the questionnaire handed to the subject before the experiment, expectation for the robot is asked. In the post experiment questionnaire, the subjects preference to keypad or vocal commands and willingness to continue using the robot in a real demonstration is asked.

#### **2.8 Result**

Figure 6 shows the error rate of the key input and the speech input.

Fig. 6. Error rate of each input methods.

8 Will-be-set-by-IN-TECH

Type Text (followed by "Do you ...") When Expectation think the demonstration by the robot effective? Before Enjoyableness think the demonstration was enjoyable? Each Easiness think using robot is easy? Each Fulfillness think the demonstration was fulfilling? Each Preference prefer speech input or key input? After Effectiveness think the demonstration by the robot effective? After Willingness want to use the robot in real demonstration? After

During motion generation, motion data in the database is transmitted to the motor controller. Motion data is transmitted sequentially at a specific rate using the internal clock of the robot.

The scenario database contains scripts which define speech phrases and key numbers of the keypad commands. It also includes information which associates the commands to motion

The script engine is based on a state transition model. For this specific experiment, we use a flat (one-state) structure model with 17 commands. Details will be shown in Section 3.2.

We have designed 4 experiments controlled by two conditions. Condition 1 is the input which be either keypad input or vocal command. Condition 2 is defined by the subjects role which is demonstrator or audience. The conditions regarding the role of the demonstrator switches when the same subject experiences the same demonstration as a demonstrator or as a passive

Experiment subjects are 60-80 year old, and consists of 1 male and 5 females. All subjects have experience in the health exercise training program and are certificatified to instruct. Although they are knowledgeable in the health exercise, when participating as passive audience, we asked them to answer the questions (described in next section) as if they are a novice trainee.

The six subjects were divided into two groups of three. There are six sessions altogether for each group. The first three sessions used the keypad and the latter three sessions used vocal commands. In each of the three sessions, three subjects in turn take the role of demonstrator. Each subject is asked to fill in a questionnaire sheet after each session. Table 1 shows the questions used in this experiment. Questions consist of asking the ease of use, whether the

Before and after the experiment, the subject is asked to fill the question sheet. In the questionnaire handed to the subject before the experiment, expectation for the robot is asked. In the post experiment questionnaire, the subjects preference to keypad or vocal commands

participant. Each subject is asked to attend all of these 4 (2 × 2) experiments.

experience was enjoyable and whether the demonstration was fulfilling.

and willingness to continue using the robot in a real demonstration is asked.

Figure 6 shows the error rate of the key input and the speech input.

Speech recordings are synchronized in parallel with the motion at specified timings.

Table 1. Items in question sheet (each question allows free commenting).

**2.6.4 Motion and speech generation system**

**2.6.5 Script engine and script database**

**2.7 Evaluation of command input methods**

**2.7.1 Experiment condition**

**2.7.2 Experiment sequence**

**2.8 Result**

data.

Although the key input is a precise input method, there are some errors due to mistyping. Mistyping can be classified into two types. One is typing error, which happens when the demonstrator is in a hurry to type the keypad in the middle of the demonstration. The other is memory error. Memory error happens because the keypad only accepts numeric input and the demonstrator has to remember the mapping between the number and the training pattern. They often forget the mapping and type the wrong number. Because of these errors, actual precision of the key input is not as high as we expected.

Most of the errors in vocal commands happen due to speech recognition errors. Some demonstrators had problems in pronunciation and other demonstrators spoke superfluous words to enhance the demonstration. In the case of vocal input, there was less memory error, because the demonstrator only has to pronounce the name of training pattern and there is no need to remember the mapping to the numbers.

Figure 7 shows the impression of the demonstration asked after each session. Demonstrator feels using the vocal command is easier than the key input. This can be understood from the memory error as we discussed above. When vocal commands were used, the audience both enjoyed the demonstration and found it as fulfilling as the keypad input demonstration, despite happenings due to inaccuracy. This could be one of the effects of the observability of speech in a human-robot collaborative demonstration.

Figure 8 shows the impression of the robot assisted demonstration before and after the demonstration. Almost all the subjects have answered that they are willing to use the robot as an instructor again.

#### **2.9 Feasibility test in the real demonstration**

We have already started applying this robot in real demonstration events. Figure 10 shows photos from the "Nenrin-pic" event. Nenrin-pic event is one of the event hosted by Ibaraki prefecture intended to encourage sports for elderly people.

Fig. 9. Preference of using speech input or key input asked to each subject.

Fig. 10. Photos from a real demonstration at nenrin-pic event.

During the 3 day event, we have demonstrated 10 times a day using TAIZO robot. More than 600 peoples has joined the training experience. As we can see from the photos, almost all the audiences were eagerly followed the demonstration given by the human-robot demonstrators. One of the unexpected effects of using TAIZO was that we were able to catch the attention of a wide variety of ages. TAIZO was intended to catch attention of the elderly, but during the nenrin-pic event, many young adults and their children accompanying their parents or grand-parents were drawn to the demonstration. The attraction of the TAIZO robot was strong enough to catch also those accompanying persons. It seems to have a good effect for the elderly, because they can enjoy their exercise by participating together with their families.

Speech Communication with Humanoids: How People React and How We Can Build the System 175

Fig. 7. Impression of the demonstration asked after each session. The error bar indicates maximum and minimum rating of the subjects. Easiness is asked to the demonstrator. Enjoyableness and fulfillness are asked to the audience.

Fig. 8. Impression of the robot before and after the demonstration. The error bar indicates maximum and minimum rating of 6 subjects.

10 Will-be-set-by-IN-TECH

Fig. 7. Impression of the demonstration asked after each session. The error bar indicates maximum and minimum rating of the subjects. Easiness is asked to the demonstrator.

Fig. 8. Impression of the robot before and after the demonstration. The error bar indicates

Enjoyableness and fulfillness are asked to the audience.

maximum and minimum rating of 6 subjects.

Fig. 9. Preference of using speech input or key input asked to each subject.

Fig. 10. Photos from a real demonstration at nenrin-pic event.

During the 3 day event, we have demonstrated 10 times a day using TAIZO robot. More than 600 peoples has joined the training experience. As we can see from the photos, almost all the audiences were eagerly followed the demonstration given by the human-robot demonstrators. One of the unexpected effects of using TAIZO was that we were able to catch the attention of a wide variety of ages. TAIZO was intended to catch attention of the elderly, but during the nenrin-pic event, many young adults and their children accompanying their parents or grand-parents were drawn to the demonstration. The attraction of the TAIZO robot was strong enough to catch also those accompanying persons. It seems to have a good effect for the elderly, because they can enjoy their exercise by participating together with their families.

Fig. 11. Architecture of the OpenHRI software suite.

**3.1 Component based system design**

different institutes can easily be connected.

development tool to edit the links between the components.

the following approach.

recognition.

multi-modal information.

**3. Reduce the difficulties of building the communication system** In this section, we introduce our efforts to reduce development difficulties.

We are currently taking two approaches to reduce the development difficulties. The one is

Speech Communication with Humanoids: How People React and How We Can Build the System 177

At present, we are developing a set of software called Open Source Software Suite for Human Robot Interaction (OpenHRI). Using OpenHRI, we aim to solve the above problems and enable the development of communication functions for robots. For this purpose, we employ

**Introduce a uniform component model:** We construct our set of software on RT-Middleware, an object management group (OMG)-compliant robot technology middleware specification (Ando et al, 2005). The RT-Middleware specification can be used to connect all the components without requiring implementation issues to be taken into account. Further, because it is a standard architecture for building robotic systems, individual components developed in

**Provide the required functions in a reconfigurable manner:** We implement various functions from audio signal processing to dialog management in a uniform and reconfigurable manner. The developer can develop the entire system at a comparatively less development cost. In addition, the system can easily be adapted to different environments for realizing accurate

Figure 11 illustrates the overall architecture of the components provided in OpenHRI. The software covers all the functions for the development of the communication system and also incorporates an interface for establishing connections with other components that can provide

The component architecture of our software is based on RT-Middleware. RT-Middleware is a middleware architecture for robotic applications that has been standardized by the OMG. In RT-Middleware, each function of the robot is implemented as a "node." An application system can be developed by selecting the required components and connecting them to each other (the connections are called "links"). Figure 12 shows the "RT-SystemEditor"

In the specification of RT-Middleware, a "data port" is defined as a connection point of a link that realizes the transmission of a data stream. A "service port" is defined as an entry point of

component based system design and the other is incremental script development.

#### **2.10 Summary**

In Section 2.7 we have tried to evaluate effect of using key and speech inputs especially focused to human-robot collaborative demonstration setup.

As we discussed in Section 2.5, vocal communication increases the transparency of the human-robot communication to the audience. We could see from the experiment results that the enjoyment and the fulfillment of the demonstration from audience perspective was not let down by the imprecision in speech recognition.

One of the unexpected effect of using speech is, because it is very intuitive, it decreases the burden for remembering the commands. Error rate of key based commands is unexpectedly high, despite it's preciseness. This may also support the use of speech input.

Despite supportive evidence for speech input, about half of the demonstrators answered that they prefer using key input. In the question sheet, subject can leave comments. Subjects who are supportive to the speech input commented that they actually enjoyed reacting to mistakes made by speech recognition. This comment can be explained that by increasing the transparency of communication channel between human-robot demonstrators, the subject can observe what is happening in the situation (human demonstrator said the right thing, robot mistakes) and feel the mistaken response of the robot funny. On the other hand, subjects who are supportive to the key input commented that they want to demonstrate the exercise in a more precise manner.

We are currently preparing to do an evaluation which is more focused on measuring these effects and searching for a way to realize an appropriate interface for both people who prefers enjoyment or precision.

#### **2.11 Left problems**

As we have looked and discussed in this section, humanoid robot has different character than the other artifacts. *It has human shape* which can attract human to join in the activity. This character also gives some effect to *gain expectation to use natural communication method (voice)* as the human do. *It has physical body and exist in same world* which can give effect also to the observers. These characters are especially useful for applications such as exercise demonstration.

However, this character sometimes gives negative effect to the usefulness. Because the human gain too much expectation to the robot to use natural communication method, human tends to use colloquial expression towered the robot, which is difficult for the robot to understand. In Section 2.7, we have seen the command acceptance rate using voice recognition evaluated by elderly users. In this experiment the command acceptance rate is low, not because voice recognition rate is low, but mostly because conversation patterns programmed to the dialog manager was not enough to understand the all varieties of colloquial expressions given by the users.

Because the robot has physical body and exist in physical world, the voice recognition system of the robot have to work under noisy condition of the real environment.

Although for ideal benefits of using humanoid robots, above practical problems are need to be solved beforehand to enhance the usefulness of the robots.

We are not only developing the applications for humanoid robots, but also developing a support tools for assist development of the communication functions for humanoid robots. From the next section, we will introduce our development tools.

Fig. 11. Architecture of the OpenHRI software suite.
