**Speech Communication with Humanoids: How People React and How We Can Build the System**

Yosuke Matsusaka

*National Institute of Advanced Industrial Science and Technology (AIST) Japan* 

### **1. Introduction**

164 The Future of Humanoid Robots – Research and Applications

Shanahan, M.. (2006) A cognitive architecture that combines internal simulation with a

Soyel, H., & Demirel, H.. (2008). 3D facial expression recognition with geometrically

Sun, R. (2006). The CLARION cognitive architecture: Extending cognitive modeling to social

Tavakkoli, A.; Kelley, R.; King, C.; Nicolescu, M.; Nicolescu, M. & Bebis, G. (2007). "A

Tax, D., Duin, R. (2004). "Support Vector Data Description." Machine Learning 54. pp. 45-66. Tian, Y.L.; Kanade, T., & Cohn, J. F.. (2005). Facial Expression Analysis, *Handbook of Face* 

Tistarelli, M. & Grosso, E. (2010). Human Face Analysis: From Identity to Emotion and

Trafton, J. G. ; Cassimatis, N. L. ; Bugajska, M. D. ; Brock, D. P.; Mintz, F. E. & Schultz, A. C..

Tsalakanidou, F. & Malassiotis, S.. (2010). Real-time 2D+3D facial action and expression

Turaga, P.; Chellappa, R.; Subrahmanian, V. S. & Udrea, O.. (2008). Machine Recognition of

Valstar, M.; Pantic, M; Ambadar, Z. & Cohn, J. F.. (2006). Spontaneous versus Posed Facial

van Gelder, T. & Port, R. F.. (1996). It's about time: an overview of the dynamical approach

Viola, P., & Jones, M. J. (2004) Robust Real-Time Face Detection. *International Journal of* 

Whissell, C. M.,. (1989). The Dictionary of Affect in Language, Emotion: Theory, Research

Zeng, Z.; Pantic, M.; Roisman, G.I. &; Huang, T.S. (2009). A Survey of Affect Recognition

Massachusetts Institute of Technology, Cambridge, MA, USA, pp. 1-43. Vernon, D.; Metta, G. & Sandini G.. (2007) A Survey of Artificial Cognitive Systems:

Intl. Symposium on , pp.1-4. doi: 10.1109/ISCIS.2008.4717898

*Recognition*, S.Z. Li and A.K. Jain, eds., pp. 247-276, Springer.

Science, vol. 6005, pp. 76-88. doi: 10.1007/978-3-642-12595-9\_11.

Symposium on Visual Computing, pp. 173-182

Technology, vol18,no. 11, pp. 1473-1488.

Multimodal Interfaces (ICMI '06), pp. 162-170.

*Computer Vision*, vol. 57, no 2, pp. 137-154.

vol. 4, pp. 113-131, Academic Press.

doi: 10.1016/j.concog.2005.11.005.

University Press, New York.

470.

2009.12.009.

pp. 151-180.

global workspace, *Consciousness and Cognition*, vol. 15, no. 2, June 2006, pp. 433-449.

localized facial features, Computer and Information Sciences, 2008. ISCIS '08. 23rd

simulation. In: Ron Sun (ed.), *Cognition and Multi-Agent Interaction*. Cambridge

Vision-Based Architecture for Intent Recognition," Proc. of the International

Intention Recognition, Ethics and Policy of Biometrics, Lecture Notes in Computer

(2005) Enabling effective human-robot interaction using perspective-taking in robots. IEEE Transactions on Systems, Man, and Cybernetics, vol. 35, n.4, pp. 460-

recognition. *Pattern Recognition*, vol. 43, no. 5, pp. 1763-1775. doi: 10.1016/j.patcog.

Human Activities: A Survey. IEEE Treans. On Circuits and Systems for video

Behavior: Automatic Analysis of Brow Actions, in Proc. of Eight Int'l Conf.

to cognition. In *Mind as motion*, Robert F. Port and Timothy van Gelder (Eds.).

Implications for the Autonomous Development of Mental Capabilities in Computational Agents. *IEEE Transaction on Evolutionary Computation,* vol. 11, n. 2,

and Experience. The Measurement of Emotions, R. Plutchik and H. Kellerman, eds.,

Methods: Audio, Visual, and Spontaneous Expressions. *IEEE Transactions on Pattern Analysis and Machine Intelligence,* vol.31, no.1, pp.39-58. doi: 10.1109/TPAMI.2008.52

Robots are expected to help increase the quality of our life. Robots are already widely used in industry to liberate humans from repetitive labour. In recent years, entertainment is getting more momentum as an application in which robots can be used to increase peoples quality of life (Moon, 2001)(Wada et al., 2002).

We have been developing the robot TAIZO as a demonstrator of human health exercises. TAIZO encourages the human audience to engage in the health exercise by demonstrating (Matsusaka et al, 2009). When demonstrating, TAIZO, the robot and the human demonstrator will stand in front of the human audience and demonstrate together. For this to work, the human demonstrator has to control the robot while they themselves are demonstrating the exercise to the audience.

A quick and easy method for controlling the robot is required. Furthermore, in human-robot collaborative demonstration, the method of communication used between the human and robot can be used to affect the audience.

In this chapter, we will introduce the robot TAIZO, and it's various functions. We also evaluated the effect of using voice commands compared to keypad input during the robot-human collaborative demonstration. In Section 2 we explain the social background behind the development of TAIZO. In Section 2.5 we will discuss about effects of using voice commands compared to key input in human-robot collaborative demonstration. In section 2.6 we present an overview of the system used in TAIZO. In Section 2.7 is the evaluation and discussion of the results from experiments in which weãA˘ Ameasured the effect of using voice ˘ commands through simulated demonstrations. Finally, in Section 2.10 and Section 2.11 we will discuss about the benefits and problems of using humanoid robot to this application.

In latter part of the chapter, we will discuss how to develop the communication function for the humanoid robot.

Recently, "behavior-based" scripting method is applied in many practical robotic systems. The application presented by Brooks (1991) used hieratical structure model, the recent applications (Kaelbling, 1991) (Yartsev et al, 2005) uses state transition model (finite state automata) to model the situation of the system. The developer incrementally develop the script by adding each behaviors which fits to each small situations. Diverse situation understanding ability can be realized as a result of long-term incremental development.

The behavior-based scripting method can also be applied to communication robots by incorporating speech input with the situation model. Application of the behavior-based

**Attraction:** Meeting face to face with a humanoid robot is not yet a common occurrence. It is easy to spark the curiosity of someone who is not familiar with humanoid robots and grab

Speech Communication with Humanoids: How People React and How We Can Build the System 167

**Embodiment in 3D space:** The robot has a real body which has an actual volume in 3D space. Compared to the virtual agents which only appear in 2D display, it can be observed from very wide view points (Matsusaka, 2008) (Ganeshan, 2006). From this characteristic, even the user at the back of the robot can observe the robots motions. A user to the side can also observe

Similarity of the body shape assists precise communication of the body movement to the trainee and useful to enhance the effectiveness of the exercise. Attraction also becomes a good incentive for the trainee to engage in the exercise. Embodiment in 3D space assists the social communication between the robot and the trainees. A trainee can watch other trainees while training. By looking at other trainees, they can see how well they communicate with the robot and how eager they engage in the exercise. In most of our demonstration experiments, this inter-audience peer-to-peer effect gives positive feedback to enhance the individuals eagerness to engage in the exercise (explained and discussed in section 2.9).

Health exercise is intended to strengthen the body of elderly people. It consists of several kind of exercise that involve stretching and muscle training. Health exercise is recently increasing in importance and is gaining attention as an effective way to reduce the number of elderly people who become bedridden and need nursing care and increase their quality of life. Despite these good points, there are still some difficulty in applying this illness prevention activity. One of the biggest problems is the difficulty to encourage people to engage in these

People are not wary of their health while they are healthy. They notice after they realize they have a serious illness. Although, we have statistics on the percentage of people who become seriously ill, it is still difficult to estimate how healthy we are ourselves. Moreover, it is much more difficult to understand the effect of health exercise, since we cannot compare what would

Because of above, we usually require a special incentive to engage in the health exercise. TAIZO (Figure 1) is designed to help demonstrate health exercise. It will stand to the side of the demonstrator and assist the trainer in demonstrating. Demonstration is usually done in front of 5 to 15 trainees. TAIZO is used to demonstrate at events with 40 to 80 trainees. The robot is used as an eye-catcher to capture the attention of people who don't know the exercise, but could be a potential regular trainee. By using the robot as a demonstrator, human demonstrator can get more interest from a larger variety of people compared to a demonstration done by humans alone. This leads to more people having a chance to engage

Both human demonstrator and the robot stand in front of the audience (Figure 2). Human demonstrator leads the training program and the robot follows. Both human and the robot

To follow the human demonstrators lead, the robot has to accept commands given by the human demonstrator. In addition, although the human demonstrator takes the lead in most situations, the robot has to collaborative with demonstration activity in order to make it more

their attention with an artificial being which moves and talks.

have happened if we didn't (or did) engage in the exercise.

**2.4 Role of the human demonstrator and the robot**

show the demonstration to the audience.

other users interactions with the robot.

**2.3 Demonstration setup and scenario**

health exercise to begin with.

in health exercises.

scripting method to the communication robot is first presented by Kanda et al (2002) in 2002. In their work, they not only proposed an incremental development framework, but also implemented an on-line development environment which can realize automated control of the robot. They have confirmed through 25 days field study that with help of the development environment, the conversation ability of the robot was incremented on-line and succeed to decrease operation time of the human operator (Kanda et al, 2009).

However, in the existing behavior-based scripting methods for communication robot, there is an inefficiency in terms of reusing the script to develop different types of robots (this problem is described in Section 3.2.2). This chapter, we present the extension-by-unification method in order to push forward the behavior-based scripting approach to develop communication robots.

In Section 3.1, a component based architecture we have developed that can develop a robotic dialog system only by connecting a components will be shown.

In Section 3.2, a formal discussion of an incremental development methods for the state-transition model is presented. Here, we introduce the formalization of the proposed incremental development method and clarify its characteristics by comparing it to the previous method.
