**1. Introduction**

18 Will-be-set-by-IN-TECH

228 The Future of Humanoid Robots – Research and Applications

W. D. Stiehl & J. K. Lee & R. Toscano & C. Breazeal(2008). The Huggable: A Platform for

J. Macedo & D. Kaber & M. Endsley & P. Powanusorn & S. Myung(1999). The effects

S.M.Pizer & R.E. Johnston & J.P. Ericksen & B.C. Yankaskas & K.E. Muller(1990).

N. J. B. McFarlane & C. P. Schofield(1995). Segmentation and tracking of piglets in images, In:*Journal of Machine Vision and Applications*, vol. 8, no. 3, pp. 187-193,Springer. P. Kakumanua & S. Makrogiannisa & N. Bourbakis(2006). A survey of skin-color modeling

G. Bridgeman & D. Hendry & L. Start(1975). Failure to detect displacement of visual world

I.D. Guedalia & M. London & M.Werman (1998). An On-Line Agglomerative Clustering Method for Nonstationary Data. In: *Neural Computation*, vol. 11(2), pp.521-540.

P.A.Viola & M.J.Jones (2004). Robust Real-Time Face Detection. In: *International Journal of*

S. Ravi Kiran & V. Ng-Thow-Hing & S.Okita, In: *The 19th IEEE International Symposium on*

*Report FS-08-02*, The AAAI Press, Menlo Park, California.

and detection methods, In:*Pattern Recognition*, vol. 40(3), 2006

*Robot and Human Interactive Communication(RO-MAN)*. pp.7-14.

during saccadic eye movements. In: *Vision Research*, 15, pp. 719-722.

In:*Human Factors*, vol 40, pp. 541-553.

ASIMO humanoid robot (2000). Honda Motor Co. Ltd.

*Computer Vision*, vol. 57(2), pp.137-154.

337-345.

Research in Robotic Companions for Elder care, In:*AAAI Fall Symposium, Technical*

of automated compensation for incongruent axes on teleoperator performance,

Contrast-limited adaptive histogram equalization: speed and effectiveness, In:*Proceedings of the First Conference on Visualization in Biomedical Computing*, pp.

> Human-robot interaction has been an emerging research topic in recent year because robots are playing important roles in today's society, from factory automation to service applications to medical care and entertainment. The goal of human-robot interaction (HRI) research is to define a general human model that could lead to principles and algorithms allowing more natural and effective interaction between humans and robots. Ueno [Ueno, 2002] proposed a concept of Symbiotic Information Systems (SIS) as well as a symbiotic robotics system as one application of SIS, where humans and robots can communicate with each other in human friendly ways using speech and gesture. A Symbiotic Information System is an information system that includes human beings as an element, blends into human daily life, and is designed on the concept of symbiosis [Ueno, 2001]. Research on SIS covers a broad area, including intelligent human-machine interaction with gesture, gaze, speech, text command, etc. The objective of SIS is to allow non-expert users, who might not even be able to operate a computer keyboard, to control robots. It is therefore necessary that these robots be equipped with natural interfaces using speech and gesture.

> There are several researches on human robot interaction in recent years especially focussing assistance to human. Severinson-Eklundh et. al. have developed a fetch-and-carry-robot (Cero) for motion-impaired users in the office environment [Severinson-Eklundh, 2003]. King et. al. [King, 1990] developed a 'Helpmate robot', which has already been deployed at numerous hospitals as a caregiver. Endres et. al. [Endres, 1998] developed a cleaning robot that has successfully been served in a supermarket during opening hours. Siegwart et. al. described the 'Robox' robot that worked as a tour guide during the Swiss national Exposition in 2002 [Siegwart, 2003]. Pineau et. al. described a mobile robot 'Pearl' that assists elderly people in daily living [Pineau, 2003]. Fong and Nourbakhsh [Fong, 2003] have summarized some applications of socially interactive robots. The use of intelligent robots encourages the view of the machine as a partner in communication rather than as a tool. In the near future, robots will interact closely with a group of humans in their everyday environment in the field of entertainment, recreation, health-care, nursing, etc.

> Although there is no doubt that the fusion of gesture and speech allows more natural human-robot interaction, for single modality gesture recognition can be considered more reliable than speech recognition. Human voice varies from person to person, and the system

User, Gesture and Robot Behaviour Adaptation for Human-Robot Interaction 231

New robot behaviour can be learned according to the generalization of multiple occurrence of same gesture with minimum user interaction. The algorithm is tested by implementing a

In this chapter we have described a vision and knowledge-based user and gesture recognition as well as adaptation system for human–robot interaction. Following

Both machines and human measure their environment through sense or input interfaces and modify their environment through expression or output interfaces. Most popular mode of human-computer or human-intelligence machine interaction is simply based on keyboards and mice. These devices are familiar but lack of naturalness and do not support remote control or telerobotics interface. Thus in recent years researchers are giving tedious pressure to find attractive and natural user interface devices. The term natural user interface is not an exact expression, but usually means an interface that is simple, easy to use and seamless as possible. Multimodal user interfaces are a strong candidate for building natural user interfaces. In multimodal approaches user can include simple keyboard and mouse with advance perception techniques like speech recognition and computer vision (gestures, gaze,

Weimer et. al. [Weimer, 1989] described a multimodal environment that uses gesture and speech input to control a CAD system. They used 'Dataglove' to track the hand gestures and presented the objects in three-dimension onto the polarizing glasses. Yang et. al. [Yang, 1998] have implemented a camera-based face and facial features (eyes, lips and nostrils) tracker system. The system can also estimate user gaze direction and head poses. They have implemented two multimodal applications: a lip-reading system and a panoramic image viewer. The lip-reading systems improve speech recognition accuracy by using visual input to disambiguate among acoustically confusing speech elements. The panoramic image viewer uses gaze to control panning and tilting, and speech to control zooming. Perzanowski et. al. [Perzanowski, 2001] proposed multimodal human-robot interface for mobile robot. They have incorporated both natural language understanding and gesture recognition as a communication mode. They have implemented their method on a team of 'Nomad 200' and 'RWI ATRV-Jr' robots. These robots understand speech, hand gestures

and input from a handheld Palm pilot to other Personal Digital Assistant (PDA).

To use the gestures in the HCI or HRI it is necessary to interpret the gestures by computer or robot. The interpretation of human gestures requires that static or dynamic modelling of the human hand, arm, face and other parts of the body that is measurable by the computers or intelligent machines. First attempt is to measure the gesture features (hand pose and/or arm joint angles and spatial positions) are by the so called glove-based devices [Sturman, 1994]. The problems regarding gloves and other interface devices can be solved using vision-based non-contract and nonverbal communication techniques. Numbers of approaches have been applied for the visual interpretation of gestures to implement human-machine interaction [Pavlovic, 1997]. Torrance [Torrance, 1994] proposed a natural language-based interface for teaching mobile robots about the names of places in an indoor environment. But due to the

human-robot interaction scenario with a humanoid robot "Robovie" [Kanda, 2002].

**2. Related research** 

subsections summarize the related works.

etc.) as user machine interface tools.

**2.1 Overview of human-machine interaction systems** 

needs to take care of large number of data to recognize speech. Human speech contains three types of information: who the speaker is, what the speaker said, and how the speaker said it [Fong, 2003]. Depending on what information the robot requires, it may need to perform speaker tracking, dialogue management or even emotion analysis. Most systems are also sensitive to mis-recognition due to the environmental noise. On the other hand, gestures are expressive, meaningful body motions such as physical movements of head, face, fingers, hands or body with the intention to convey information or interact with the environment. Hand and face poses are more rigid, though its also varies little from person to person. However, humans will feel more comfortable in pointing at an object than in verbally describing its exact location. Gestures are an easy way to give geometrical information to the robot. However, gestures are varying among individuals or varying from instance to instance for a given individual. The hand shape and human skin-color are different for different persons. The gesture meanings are also different in different cultures. In human-human communications, human can adapt or learn new gestures or new users using own intelligence and contextual information. Human can also change each other behaviours based on conversation or situation. Achieving natural gesturebased interaction between human and robots, the system should be adaptable to new users, gestures and robot behaviors. This chapter includes the issues regarding new users, poses, gestures and behaviours recognition and adaptation for implementing humanrobot interaction in real-time.

Adaptivity is the biological property in all creatures to survive in the biological world. It is the capability of self-modification that some agents have, which allows them to maintain a level of performance in front of environmental changes, or to improve it when confronted repeatedly with the same situation [Torras, 1995]. Gesture-based human-robot natural interaction system could be designed so that it can understand different users, their gestures, meaning of the gestures and the robot behaviours. Torras proposed robot adaptivity technique using neural learning algorithm. This method is computationally inexpensive and there is no way to encode prior knowledge about the environment to gain the efficiency. It is essential for the system to cope with the different users. A new user should be included using on-line registration process. When a user is included the user may wants to perform new gesture that is ever been used by other persons or himself/herself. In that case, the system should include the new hand poses or gestures with minimum user interaction.

In the proposed method, a frame-based knowledge model is defined for gesture interpretation and human-robot interaction. In this knowledge model, necessary frames are defined for the known users, robots, poses, gestures and robot behaviours. The system first detects a human face using a combination of template-based and feature-invariant pattern matching approaches and identifies the user using the eigenface method [Hasanuzzaman 2007]. Then, using the skin-color information of the identified user three larger skin-like regions are segmented from the YIQ color spaces, after that face and hand poses are classified by the PCA method. The system is capable of recognizing static gestures comprised of face and hand poses. It is implemented using the frame-based Software Platform for Agent and Knowledge Management (SPAK) [Ampornaramveth, 2001]. Known gestures are defined as frames in SPAK knowledge base using the combination of face and hand pose frames. If the required combination of the pose components is found then corresponding gesture frame will be activated. The system learns new users, new poses using multi-clustering approach and combines computer vision and knowledge-based approaches in order to adapt to different users, different gestures and robot behaviours. New robot behaviour can be learned according to the generalization of multiple occurrence of same gesture with minimum user interaction. The algorithm is tested by implementing a human-robot interaction scenario with a humanoid robot "Robovie" [Kanda, 2002].
