**An Emotional Talking Head for a Humoristic Chatbot**

Agnese Augello1, Orazio Gambino1, Vincenzo Cannella1, Roberto Pirrone1, Salvatore Gaglio<sup>1</sup> and Giovanni Pilato2

> <sup>1</sup>*DICGIM - University of Palermo, Palermo* <sup>2</sup>*ICAR - Italian National Research Council, Palermo Italy*

#### **1. Introduction**

318 Applications of Digital Signal Processing

[14] Holst, G. C.: CCD Arrays, Cameras, and Displays, 2nd edition, JCD Publishing & SPIE

[15] Janesick, J. R.: Scientific Charge-Coupled Devices, SPIE PRESS Monograph vol. PM83, SPIE–The International Society for Optical Engineering, January, 2001. [16] Mehdi, K.L. Sencar, H.T. Memon, N. Blind source camera identification. International

[17] Kharrazi, M., Sencar, H. T., and Memon, N.: "Blind Source Camera Identification", Proc.

[18] A. C. Popescu and H. Farid, "Exposing Digital Forgeries in Color Filter Array

[19] Geradts, Z., Bijhold, J., Kieft, M., Kurosawa, K., Kuroki, K., and Saitoh, N.: "Methods for

[20] Jan Lukás, Jessica J. Fridrich, Miroslav Goljan: Digital camera identification from sensor

[21] D.P. Rublev, V.M. Fedorov, A.B. Chumachenko, O.B. Makarevich Identifikatsiya

[22] D.P. Rublev, A.B.Chumachenko Identifikaciya cifrovyh fotokamer po karte

[23] Rublev D. P., Fedorov V.M., Chumachenko A.B., Makarevich O.B.; Identifikaciya

[24] Rublev D.P., Fedorov V.M., Makarevich O.B. Arhitektura setevoi sistemy obnarujeniya

[25] Rublev D. P. Fedorov V.M., Chumachenko A.B., Makarevich O.B., Metody identifikacii

[26] Rublev D. P. Fedorov V.M., Chumachenko A.B., Makarevich O.B., Ustanovlenie

[27] Rublev D. P., Makarevich O.B., Chumachenko A.B., Fedorov V.M., Ufa, Methods of

Interpolated Images," IEEE Transactions on Signal Processing, Vol. 53, No. 10, part

Identification of Images Acquired with Digital Cameras," Proc. of SPIE, Enabling Technologies for Law Enforcement and Security, vol. 4232, pp. 505–512, February

pattern noise. IEEE Transactions on Information Forensics and Security 1(2): 205-

ustroisv tsifrovoi zapisi po osobennostiam sozdavaemykh imi obrazov, Vserossiyskaya konferetsița s mezhdunarodnym uchastiem «Problemy

svetochuvstvitel'nosti matricy. HIII Vserossiiskaya nauchno-prakticheskaya konferenciya "Problemy informacionnoi bezopasnosti v sisteme vysshei shkoly",

fotokamer i skanerov po neodnorodnostyam cifrovyh obrazov; Materialy H Mejdunarodnoi nauchno-prakticheskoi konferencii "Informacionnaya

vnedrennyh steganograficheskim metodom dannyh v rechevyh soobscheniyah i izobrajeniyah, VII Mejdunarodnaya nauchno-prakticheskaya konferenciya

cifrovoi apparatury zapisi po ee vyhodnym dannym, Tret'ya mejdunarodnaya nauchno-tehnicheskaya konferenciya "Informacionnye tehnologii v nauke,

avtorskih prav po neodnorodnostyam cifrovyh obrazov, stat'ya v jurnale, Taganrog, Izvestiya YuFU. Tehnicheskie nauki. Tematicheskii vypusk.

Digital Recording Device Identification based on Created Records,ȟȠȎȠȪȭ Ȑ ȟȏȜȞțȖȘȓ ,Proceedings of the 10 International Workshop on Computer Science and

Conference on Image Processing, 2004, Vol. 1, pp. 709- 712.

ICIP' 04, Singapore, October 24–27, 2004. pp. 312-317.

informatisatsii obschestva», Nalchik, 2008, p 132-135.

proizvodstve i obrazovanii", Stavropol', 2008,s. 178-183.

"Informacionnaya bezopasnost'"2008, 8 (85), s. 141-147.

Information Technologies, 2008,1,ȟ. 97-100.

bezopasnost'" Taganrog, 2008, 1, s. 238-244

"Informacionnaya bezopasnost'"-2007.

Pres, USA, 1998.

2, pp. 3948–3959, Oct 2005.

2001.

214 (2006)

MIFI, 2007, s 78-79.

The interest about enhancing the interface usability of applications and entertainment platforms has increased in last years. The research in human-computer interaction on conversational agents, named also chatbots, and natural language dialogue systems equipped with audio-video interfaces has grown as well. One of the most pursued goals is to enhance the realness of interaction of such systems. For this reason they are provided with catchy interfaces using humanlike avatars capable to adapt their behavior according to the conversation content. This kind of agents can vocally interact with users by using Automatic Speech Recognition (ASR) and Text To Speech (TTS) systems; besides they can change their "emotions" according to the sentences entered by the user. In this framework, the visual aspect of interaction plays also a key role in human-computer interaction, leading to systems capable to perform speech synchronization with an animated face model. These kind of systems are called Talking Heads.

Several implementations of talking heads are reported in literature. Facial movements are simulated by rational free form deformation in the 3D talking head developed in Kalra et al. (2006). A Cyberware scanner is used to acquire surface of a human face in Lee et al. (1995). Next the surface is converted to a triangle mesh thanks to image analysis techniques oriented to find reflectance local minima and maxima.

In Waters et al. (1994) the DECface system is presented. In this work, the animation of a wireframe face model is synchronized with an audio stream provided by a TTS system. An input ASCII text is converted into a phonetic transcription and a speech synthesizer generates an audio stream. The audio server receives a query to determine the phoneme currently running and the shape of the mouth is computed by the trajectory of the main vertexes. In this way, the audio samples are synchronized with the graphics. A nonlinear function controls the translation of the polygonal vertices in such a way to simulate the mouth movements. Synchronization is achieved by calculating the deformation length of the mouth, based on the duration of an audio samples group.

BEAT (Behavior Expression Animation Toolkit) an intelligent agent with human characteristics controlled by an input text is presented in Cassell et al. (2001). A talking head for the Web with a client-server architecture is described in Ostermann et al. (2000). The client application comprises the browser, the TTS engine, and the animation renderer. A

a Humoristic Chatbot 3

An Emotional Talking Head for a Humoristic Chatbot 321

particular the reasoner is composed by a humoristic area, divided in turn in a humoristic recognition area and in a humoristic evocation area, and an emotional area. The first area allows the chatbot to search for the presence of humoristic features in the user sentences, and to produce an appropriate answer. Therefore, the emotional area allows the chatbot to elaborate information related to the produced answer and a correspondent humor level in order to produce the correct information needed for the talking head animation. In particular prosody and emotional information, necessary to animate the chatbot and express emotions during the speech process, are communicated to the Talking Head component. The TH system relies on a web application where a servlet selects the basis facial meshes to be animated, and integrates with the reasoner to process emotion information, expresses using ad hoc AIML (Artificial Intelligence Markup Language) tags, and to obtain the prosody that are needed to control animation. On the client side, all these data are used to actually animate the head. The presented animation procedure allows for considerable computational savings, so both plain

The chatbot brain has been implemented using an extended version of the ALICE ALICE

The ALICE dialogue engine is based on a pattern matching algorithm which looks for a match between the user's sentences and the information stored in the chatbot knowledge base. Alice knowledge base is structured with an XML-like language called AIML (Artificial Intelligence Mark-up Language). Standard AIML tags make possible for the chatbot understanding user questions, to properly give him an answer, save and get values of variables, or store the context of conversation. The basic item of knowledge in ALICE is the *category*, which represents a question-answer module, composed a *pattern* section representing a possible user question, and a *template* section which identifies the associated chatbot answer. The AIML

(2011) architecture, one of the most widespread conversational agent technologies.

web, and mobile client have been implemented.

Fig. 1. EHeBby Architecture

**3. EHeBby reasoner**

coarticulation model determines the synchronization between the mouth movements and the synthesized voice. The 3D head is created with a Virtual Reality Modeling Language (VRML) model.

LUCIA Tisato et al. (2005) is a MPEG-4 talking head based on the INTERFACE Cosi et al. (2003) platform. Like the previous work, LUCIA consists in a VRML model of a female head. It speaks Italian thanks to the FESTIVAL Speech Synthesis System Cosi et al. (2001). The animation engine consists in a modified Cohen-Massaro coarticulation model. A 3D MPEG-4 model representing a human head is used to accomplish an intelligent agent called SAMIR (Scenographic Agents Mimic Intelligent Reasoning) Abbattista et al. (2004). SAMIR is used as a support system to web users. In Liu et al. (2008) a talking head is used to create a man-car-entertainment interaction system. The facial animation is based on a mouth gesture database.

One of the most important features in conversations between human beings is the capability to generate and understand humor: "Humor is part of everyday social interaction between humans" Dirk (2003). Since having a conversation means having a kind of social interaction, conversational agents should be capable to understand and generate also humor. This leads to the concept of *computational humor*, which deals with automatic generation and recognition of humor.

Verbally expressed humor has been analyzed in literature, concerning in particular very short expressions (jokes) Ritchie (1998): a one-liner is a short sentence with comic effects, simple syntax, intentional use of rhetoric devices (e.g., alliteration, rhyme), and frequent use of creative language constructions Stock & Strapparava (2003). Since during a conversation the user says short sentences, one-liners, jokes or gags can be good candidates for the generation of humorous sentences. As a consequence, literature techniques about computational humor regarding one-liners can be customized for the design of a humorous conversational agent.

In recent years the interest in creating humorous conversational agents has grown. As an example in Sjobergh & Araki (2009) an humorous Japanese chat-bot is presented, implementing different humor modules, such as a database of jokes and conversation-based jokes generation and recognition modules. Other works Rzepka et al. (2009) focus on the detection of emotions in user utterances and puns generation.

In this chapter we illustrate a humorous conversational agent, called *EHeBby*, equipped with a realistic talking head. The conversational agent is capable to generate humorous expressions, proposing to the user riddles, telling jokes, ironically answering to the user. Besides, the chatbot is capable to detect, during the conversation with the user, the presence of humorous expressions, listening and judging jokes and react changing the visual expression of the talking head, according to the perceived level of humor. The chatbot reacts accordingly to the user jokes, adapting the expression of its talking head. Our talking head offers a realistic presentation layer to mix emotions and speech capabilities during the conversation with the user. It shows a smiling expression if it considers the user's sentence "funny", indifferent if it does not perceive any humor in the joke, or angry if it considers the joke in poor taste. In the following paragraphs we illustrate both the talking head features and the humorous agent brain.
