4. Applications to virtual and embodied robotics

As already mentioned, in this work, our intention is to elaborate the elements that will power an artificial intelligent entity, either virtual or physically embodied, with the capacity to recognize and express (R&E) emotional content using emoji. In this sense, we can collect massive amounts of human-human (and perhaps humanmachines too) interactions from multiple Internet sources such as social media or forums, to train ML algorithms, which can R&E emotions, utterances, beat gestures, and even assess personality of the interlocutor. Furthermore, we may even reconstruct text phrases from speech in which emoji are embedded to these to obtain a bigger picture of the semantic meaning. For instance, if we asked the robot "are you sure?" while raising the eyebrows to emphasize our incredulity, we may obtain an equivalent expression such as "are you sure? " Once the models are defined and trained, these will be embedded into the artificial entity, which will be interacting with humans. This conceptual framework is displayed in Figure 3. While in a virtual entity such as a chatbot, the inference of emotional states or personality, as well as expressing emotions or beat gestures using emoji, is straightforward, in an embodied entity such as a physical robot that requires a little bit more of elaboration. In the latter, an interlocutor's emotional or personality first requires the humans' facial expressions and gestures to be transformed into emoji from video streams or speech similarly as shown in [22]. Then, the same pipeline as the one used for a chatbot may be employed, identifying the corresponding emotional state using pretrained sentiment detection algorithm such as in [20]. Therefore, since both, embodied and virtual artificial entities, can employ the same pipeline, we focus on applications to the former. In particular, we discuss some works, which are delved in this direction, and how the cognitive interaction

Emoji as a Proxy of Emotional Communication DOI: http://dx.doi.org/10.5772/intechopen.88636

Figure 3.

Emoji emotional communication conceptual framework.

between humans and artificial entities may be improved by modeling the emotional exchange as shaped by emoji usage.

### 4.1 Embodied service robots study cases

Service robots are a type of embodied artificial intelligent entities (EAIE), which are meant to enhance and support human social activities such as health and elder care, education, domestic chores, among others [32–34]. A very important element for EAIE is improving the naturalness of human-robot interactions (HRI), which can provide EAIE with the capacity to R&E emotions to/from their human interlocutors [32, 33].

Regarding the emotional mechanisms of an embodied robot per se, a relevant example is the work by [33], which consists in an architecture for imbuing an EAIE with emotions that are displayed in an LED screen using emoticons. Such architecture establishes that a robot's emotions are in terms of long-medium-short affective states suchlike its personality (i.e., social and mood changes), the surrounding ambient (i.e., temperature, brightness, and sound levels), and human interaction (i.e., hit, pat, and stroke sensors), respectively. All of these sensory inputs were employed to determine EAIE emotional state using ad hoc rules, which are coded into a fuzzy logic algorithm, which is then displayed in an LED face. Facial gestures corresponding to Ekman's basic emotions expressions are shown in the form of emoticons.

An important application of embodied service robots is the support of elder's daily activities to promote a healthy life style and providing them with an enriching companion. In such case, a more advanced interaction models for EAIE based on an emotional model, gestures, facial expressions, and R&E utterances are proposed [32, 35–37]. The authors of these works put forward several cost-efficient EIAE based on mobile device technologies namely iPhonoid, iPhonoid-C, and iPadrone. These are robotic companions based on an architecture, which among other features is built upon the informationally structured spaces (ISS) concept. The latter allows to gather, store, and transform multimodal data from the surrounding ambiance into a unified framework for perception, reasoning, and decision making. This is a very interesting concept since, not only EAIE behavior may be improved by its own perceptions and HRI but also from remote users' information such as elder's activities from Internet or grocery shopping. Likewise, all these multimodal information can be exploited by any family member to improve the quality of his/her relation with the elder ones [36]. Regarding the emotional model, the perception and action modules are the most relevant. Among the perceptions considered in these frameworks stand the number of people in the room, gestures, utterances,

colors, etc. In the same fashion as [33], these EAIE implements an emotional timevarying framework, which considers emotion, feeling, and mood (from shorter to longer emotional duration states, respectively). First, perceptions are transformed into emotions using expert-defined parameters, then emotions and long-term traits (i.e., mood) serve as the input of feelings whose activation follows a spiking neural network model [32, 35]. Particularly, mood and feelings are within a feedback loop, which emphasize the emotional time-varying approach. Once perceptions are turned into its corresponding emotional state, the latter is sent to the action module to determine the robot behavior (i.e., conversation content, gestures, and facial expression). As mentioned earlier, EAIE also R&E utterances, which provide feedback to the robot's emotional state. Another interesting feature of the architecture of these EAIE is its conversational framework. In this sense, the usage of certain utterances, gestures, or facial expressions depends on conversation modes, which in turn depends on NLP processing for syntactic and semantic analyses [32, 37]. Nevertheless, with regard to facial and gesture expressions, these works take them for granted and barely discuss both. In particular, how facial expressions are designed and expressed can only be guessed from figures of these EAIE papers, which closely resemble emoji-like facial expressions.

Embodied service robots are also beneficial in the pedagogical area as educational agents [38, 39]. Under this setting, robots are employed in a learning-byteaching approach where students (ranging from kindergarten to preadolescence) read and prepare educational material beforehand, which is then taught to the robotic peer. This has shown to improve students understanding and knowledge retention about the studied subject, increasing their motivation and concentration [38, 40]. Likewise, robots may enhance its classroom presence and the elaboration of affective strategies by means of recognizing and expressing emotional content. For instance, one may desire to elicit an affective state that engages students in an activity or identify boredom in students. Then, robot's reaction has to be an optimized combination of gestures, intonation, and other nonverbal cues, which maximize learning gains while minimizing distraction [41]. Humanoid robots are preferred in education due to their anthropomorphic emotional expression, which is readily available through body and head posture, arms, speech intonation, and so on. Among the most popular humanoid robotic frameworks stand the Nao® and Pepper® robots [38–40]. In particular, Pepper is a small humanoid robot, which is provided with microphones, 3D sensors, touch sensors, gyroscope, RGB camera, and touch screen placed on the chest of the robot, among other sensors. Through the ALMood Module, Pepper is able to process perceptions from sensors (e.g., interlocutors' gaze, voice intonation, or linguistic semantics of speech) to provide an estimation of the instantaneous emotional state of the speaker, surrounding people, and ambiance mood [42, 43]. However, Pepper communication and its emotional expression is mainly carried out through speech consequence of limitations such as a static face, unrefined gestures, and other nonverbal cues, which are not as flexible as human standards [44], for instance while we consider Figure 4, which is a picture displaying a sad Pepper. Only by looking the picture, it is unclear if the robot is sad, looking at its wheels, or simply turned off.

#### 4.2 Study cases through the emoji communication lens

In summary, in the above revised EAIE cases (emoticon-based expression, iPadrone/iPhonoid, and Pepper), emotions are generated through an ad hoc architecture, which considers emotions and moods that are determined by multimodal data. A cartoon of these works is presented in Figure 5, displaying on Emoji as a Proxy of Emotional Communication DOI: http://dx.doi.org/10.5772/intechopen.88636

Figure 4. Is Pepper sad or just shutdown?

(a) the work of [33] on (b) the work of [32, 35–37], and on (c) Pepper the robot as described in [42–44].

In these cases, we can integrate emoji-based models to enhance the emotional communication with humans, for some tasks more directly than for others. Take for instance, the facial expressions by itself, in the case of (a) and (b), the replacement of emoticon-based emotional expression by its emoji counterpart is straightforward. This will not only improve visually the robot's facial expression but also allowing more complex facial expressions to be displayed such as sarcasm ( ) or co-speech gestures as after making a joke. Another important feature of replacing emoticon-based faces by emoji is that the latter are used mostly to convey positive emotions even when criticizing or giving negative feedback [2]. Therefore, this feature could be really useful for maintaining a perpetual friendly tone of an elder robotic partner (b) or as an educational agent (c).

Regarding the emotional expression of the discussed EAIE, this is contingent to the emotional model, which in the case of (a) and (b) are expert-design knowledge coded into fuzzy logic behavior rules and more complex neural networks, respectively. In both cases, this not only will bias the EAIE into specific emotional states but also will require vast human effort to maintain it. In contrast, Pepper's framework is robuster, includes a developer kit, which allows modifying robot's behaviors and the integration of third party chatbots, performing semantic and utterance analysis, and is maintained and improved by a robotics enterprise. Yet, Pepper's

#### Figure 5.

Case studies using emoji-based modules to improve its emotional R&E models.

emotional communication is constrained by a static face, while it can express emotions by changing the color of its led eyes and adopt the corresponding body posture; its emotional communication is mainly done through verbal expressions. Nevertheless, in a pragmatic sense, do we really need to emulate emotions for a robot to have an emotional communication or is enough to R&E emotions so that a human interlocutor cannot distinguish between man and machine? In this sense, NLP and ML can be used to leverage the emotional communication of a robot by first mapping multimodal data into a discourse-like text where emoji are embedded, and then, using emoji-based models to recognize sentiments, utterances, and gestures so the decision-making module can determine the corresponding message along with its corresponding emoji. In the case of (a), the microphone and in the case of (b), the microphone, camera, and ambient sensors will be responsible for capturing speech and facial expressions that will be converted into a discourse-like text. Once the emotional content of the message is identified, the corresponding emoji shall be displayed. In the case of Pepper, F2F communication can be improved directly by displaying emoji in its front tablet. For instance, when Pepper starts waving to a potential speaker, a friendly emoji such as a waving hand or a greeting smile shall be portrayed in the tablet. Likewise, emoji usage as utterances and beat gestures can be employed by Pepper to avoid silences in a goofy manner ( ), to indicate a lack of knowledge about a particular topic ( ), or to emphasize politeness when asking an interlocutor for an action ( ).
