**2. Data and methodology**

*Types of Nonverbal Communication*

always be crossed.

unconsciously, automatically, such as nodding during a telephone call. Therefore, foreground gestures are also in the foreground of the interaction. Among their characteristics, he lists co-occurrence with demonstratives, absence of speech, and a significant effort in their production, i.e., gestures that are bigger and more precise. Contrary to them are background gestures. They are both smaller in size and precision and occur while the sender is speaking. Despite this clear division, Cooperrider [27] emphasizes that the line between foreground and background gestures is anything but straightforward, as some gestures can break the foreground-background barrier. He demonstrates this with pointing gestures, which are generally in the foreground, but when pointing to oneself, they occur in the background. Furthermore, even symbolic gestures can take the background if performed automatically and if they are void of their communicative message. On the other hand, beats occur only as background gestures. One can, therefore, roughly consider illustrators, symbols, and partially deictics as NCI occurring in the foreground, while regulators, beats, and partially deictics can be considered as NCI occurring in the background, while still bearing in mind that the dividing line can

Hence, Cooperrider [27] differentiates between gestures with a semantic or propositional content, i.e., a message that provides some kind of information, and those that are void of it. The same distinction can be made for DAs. There are DAs that primarily convey information that is indispensable for communication, such as the task dimension, and those DAs that primarily do not contain propositional content (hence, they contain metadiscursive content) yet are vital for successful natural communication, such as the turn and management dimensions. Nevertheless, we must apply the same caveat as the one in the background-foreground distinction for gestures, as some DAs can occur either in the foreground or the background. For example, the dimension of managing social obligations can generally be considered part of the foreground, such as the concept of greeting someone upon the first encounter. Still, if a social convention is performed routinely, unconsciously, and is deprived of its semantic content, such as thanking someone for the floor, such a DA can be considered as occurring in the background. The nine DA dimensions can, therefore, roughly be divided into those occurring in the foreground, such as the task and the social obligation management dimension, and those occurring in the background, such as the feedback dimensions, the time and the turn management dimensions, the discourse structuring dimension, and

the own- and the partner communication management dimensions.

episodes should require a background DA as well as a background NCI.

In light of this foreground-background link between DAs and the NCI of gestures, we set out to explore whether the theory of DAs can help predict the nature of the NCI of the corresponding unit. Specifically, we hypothesize that turn management DAs correlate with background gestures. Therefore, we propose the following

For successful communication, the message must be as clear as possible. An utterance with a mismatching underlying nature is potentially confusing. For example, to take a turn, which is a typical background DA, one sometimes begins one's utterance with "look". The NCI accompanying "look" is usually a subtle hand gesture (e.g., a referential deictic), completely void of meaning and therefore a background gesture. Whereas when one uses "look" in the propositional sense, one uses a pointing gesture; both the DA and the NCI are, in this case, of foreground nature. To use a pointing (foreground) gesture with the mentioned turn-taking (background) DA in the "look" example would therefore be confusing, steering the collocutor to search for an object in sight, which does not exist. Therefore, to ensure cohesion and for the communication to be more effective, it seems plausible that a non-propositional

**108**

hypothesis:

In order to perform research into authentic non-verbal behavior during turntaking, we utilized a 57-minute long video recording from the Corpus EVA [18]. Our annotation scheme, adapted from Mlakar et al. [28], outlined in **Figure 1**, was applied in the dataset to perform conversational analysis. For this research, dialog acts were added as a linguistic branch.

The main objective of the scheme is to identify inferred meanings of co-verbal expressions as a function of linguistic, paralinguistic, and social signals (e.g., where and when to gesture) on a symbolic level, and to identify the physical nature (e.g., articulation of body language) and use of the available "imaginary forms" (e.g., how to gesture, how to vocalize), i.e., the level of the interpretation of non-verbal forms. The first layer, in **Figure 1**, the symbolic interpretation, is the focus of this research. It is used to analyze the interpretation of the interplay between various conversational signals, that is, verbal and non-verbal (i.e., DAs, gestures, syntax, discourse markers) at a symbolic level. The second layer, the interpretation of form, is concerned with how information is expressed beyond language, through prosody and embodied expressions, as an abstract concept of a non-verbal conversational expression with a specific communicative intent, i. e. how it is physically realized. For example, the 'form' of a gesture or 'accentuation' of speech. Its primary goal is to provide a detailed description, the closest possible to the physical reality and the entity that will realize it (e.g., an embodied conversational agent). As already mentioned, in this chapter, however, we focus on the first layer. The layer which aims to find patterns and tendencies in how people communicate through joint use of language, prosody, gaze, gesture, facial expressions, and other articulation of the

#### **Figure 1.**

*The topology of annotation in the EVA Corpus: The levels of annotation describing verbal and non-verbal contexts of conversational episodes.*

body, specifically focused on turn-taking and analysis of DAs and NCIs overlapping in conversational expressions (episodes).

### **2.1 The EVA Corpus**

The EVA Corpus consists of 228 minutes in total, and includes four video and audio recordings, each 57 minutes long, with corresponding orthographic transcriptions. The discourse in all four recordings is a part of the entertaining evening TV talk show *'A si ti tut not padu*', broadcast by the Slovene commercial TV in 2010. In this research, we utilize one of the videos.

In total, five different collocutors are engaged in each episode in multiparty discourse. The conversational setting is relaxed and unrestricted. The hosts are skilled interlocutors who engage in witty, humorous, and sarcastic dialog with the guest. Therefore, the discourse is highly spontaneous, authentic, and, in this case, since all the participants know each other privately, also relaxed and full of emotional responses. Overall, the video contains 1,516 utterances, with an average of 303 utterances per speaker. The episode contains 1,999 sentences, with an average of 399.8 per participant. The average sentence duration is 2.8 seconds, whereby the longest is 18.1 seconds, and the shortest is 0.19 seconds. Overall, there are 10,471 words in the episode, and on average, a speaker uttered 2,094 of them, with a mean value of 7.9 words per sentence. While the total length of the recording is just under one hour, the total duration of all utterances without overlapping is 1 hour 33 minutes and 26.3 seconds, which suggests a substantial amount of overlapping speech. Consequently, the dialog is characterized by a vivid and rapid exchange of speaker roles, which makes it ideal for the study of non-verbal behavior that accompanies turn-taking.

#### **2.2 DA annotation**

The entertainment show was segmented and transcribed with the transcription tool Transcriber 1.5.1 and annotated in the annotation tool ELAN. The annotation of DAs was performed with the web-based annotation tool Webanno. For the classification of DAs, we applied the ISO 24617-2 scheme, however, it was partially consolidated in accordance with our research's aim. In the dimension of information-providing functions, we specified the function Correction as it does not clarify whether the sender corrects themselves of the interlocutor. Therefore, we added the function CorrectionPartner, which denotes the action of the sender who is correcting the interlocutor. Among the functions Inform or Agreement, we also filled the need for argumentative acts and added the function Argument. For occasions where the sender quotes someone, the function ReportedSpeech was added. Among the directive functions, the Instruct function did not suffice for acts where the sender provides support to the interlocutor or when the sender warns the interlocutor. Therefore, the functions Encouragement and Warning were added. With regard to feedback-specific functions, we merged the AutoPositive and AutoNegative functions into the OwnComprehensionFeedback function. Similarly, we merged the alloPositive, and the AlloNegative functions into the PartnerComprehensionFeedback function. The dimension of discourse structuring provided the function of opening but lacked the closing action, which we added. As regards the dimension that manages social obligations, we merged the InitGreeing and the ReturnGreeting functions into Greeting. The dimension, however, lacked the function of providing and accepting praise or flattery, which is why the functions Praise and AcceptPraise were included. The annotation of sentiment included

**111**

*Can Turn-Taking Highlight the Nature of Non-Verbal Behavior: A Case Study*

proposed distinction of DAs can be applied as proposed.

the qualifiers Disappointment, Disgust, Emphasis, Hurt, Negative, Positive,

In line with Cooperrider's [27] foreground-background distinction, we divided DAs according to whether they are conveying a vital part of the message without which the encounter would be void of propositional content or not. Since taskoriented DAs include the functions of information-seeking and -providing, as well as commissive and directive functions, they are part of the foreground. Similarly, the social obligations management DAs perform functions such as greetings, introductions, apologies, thanking, and valedictions. They contain propositional content and can, therefore, be considered part of the foreground. On the other hand, the feedback DAs, turn management DAs, time management DAs, discourse structuring DAs, and own- and partner communication management DAs perform background functions as their main purpose is not to convey information but to steer the dialog or to provide active listenership. For example, when correcting oneself after misspeaking, the act of correction is not in the foreground; it is the underlying information-related DA. Similarly, when helping the interlocutor to find the correct ending to a word, the act of completion is in the background, while the interlocutor's primary utterance that is being completed by the partner is in the foreground. As emphasized in the Introduction, some functions can cross this distinction. Let us consider an example with the function of completion. When people try to demonstrate their connection by finishing each other's sentences, the partner's act of completing the interlocutor's primary utterance is in the foreground, since both interlocutor's purpose of communication was to demonstrate their connection by completing each other's sentences. Nevertheless, for the majority of cases, the

In terms of the background-foreground distribution of observed DA episodes, we can conclude that the material is well balanced. It consists of 1,897 instances where the primary role of the DA was recognized as of foreground nature, and 2,020 instances were the primary role of the DA was of background nature.

The annotation of non-verbal expressions focusing on gestures, mimics, was carried out in Elan. The annotation of each phenomenon highlighted in **Figure 1** (e.g., gesture unit, phrase, NCI) was conducted individually, but by two or three annotators at a time. In terms of annotation disagreement, diverging values were elaborated and argued until consensus was reached. Moreover, before the annotation process began, all annotators were familiarized with the nature of the signal to be annotated and notified with the possible values from which they could choose.

• **Illustrators (I)** define body movement (embodiment) that illustrates what a speaker is saying. Regarded as foreground behavior, they accompany or reinforce verbal cues and are accompanied by an actual word referent in the speech. Illustrators are further classified into outlines, ideographs, and dimensional illustrators. The outlines (IO) subclass encompasses embodiments that reproduce a concrete aspect of the accompanying verbal content (explicit referents in speech). The ideographic/metaphoric illustrators (Ii) subclass refers to a concretization of the abstract through a specific shape. The spatial/dimensional (Id) subclass refers to the spatial movements outlining or depicting dimensional relations. They are used to 'paint' characteristics of entities and actions to

In terms of NCI annotation, we used the following classification:

further highlight their physical properties.

*DOI: http://dx.doi.org/10.5772/intechopen.95516*

Satisfaction, and Surprise.

**2.3 NCI annotation**

*Can Turn-Taking Highlight the Nature of Non-Verbal Behavior: A Case Study DOI: http://dx.doi.org/10.5772/intechopen.95516*

the qualifiers Disappointment, Disgust, Emphasis, Hurt, Negative, Positive, Satisfaction, and Surprise.

In line with Cooperrider's [27] foreground-background distinction, we divided DAs according to whether they are conveying a vital part of the message without which the encounter would be void of propositional content or not. Since taskoriented DAs include the functions of information-seeking and -providing, as well as commissive and directive functions, they are part of the foreground. Similarly, the social obligations management DAs perform functions such as greetings, introductions, apologies, thanking, and valedictions. They contain propositional content and can, therefore, be considered part of the foreground. On the other hand, the feedback DAs, turn management DAs, time management DAs, discourse structuring DAs, and own- and partner communication management DAs perform background functions as their main purpose is not to convey information but to steer the dialog or to provide active listenership. For example, when correcting oneself after misspeaking, the act of correction is not in the foreground; it is the underlying information-related DA. Similarly, when helping the interlocutor to find the correct ending to a word, the act of completion is in the background, while the interlocutor's primary utterance that is being completed by the partner is in the foreground. As emphasized in the Introduction, some functions can cross this distinction. Let us consider an example with the function of completion. When people try to demonstrate their connection by finishing each other's sentences, the partner's act of completing the interlocutor's primary utterance is in the foreground, since both interlocutor's purpose of communication was to demonstrate their connection by completing each other's sentences. Nevertheless, for the majority of cases, the proposed distinction of DAs can be applied as proposed.

In terms of the background-foreground distribution of observed DA episodes, we can conclude that the material is well balanced. It consists of 1,897 instances where the primary role of the DA was recognized as of foreground nature, and 2,020 instances were the primary role of the DA was of background nature.

#### **2.3 NCI annotation**

*Types of Nonverbal Communication*

**2.1 The EVA Corpus**

turn-taking.

**2.2 DA annotation**

in conversational expressions (episodes).

this research, we utilize one of the videos.

body, specifically focused on turn-taking and analysis of DAs and NCIs overlapping

The EVA Corpus consists of 228 minutes in total, and includes four video and audio recordings, each 57 minutes long, with corresponding orthographic transcriptions. The discourse in all four recordings is a part of the entertaining evening TV talk show *'A si ti tut not padu*', broadcast by the Slovene commercial TV in 2010. In

In total, five different collocutors are engaged in each episode in multiparty discourse. The conversational setting is relaxed and unrestricted. The hosts are skilled interlocutors who engage in witty, humorous, and sarcastic dialog with the guest. Therefore, the discourse is highly spontaneous, authentic, and, in this case, since all the participants know each other privately, also relaxed and full of emotional responses. Overall, the video contains 1,516 utterances, with an average of 303 utterances per speaker. The episode contains 1,999 sentences, with an average of 399.8 per participant. The average sentence duration is 2.8 seconds, whereby the longest is 18.1 seconds, and the shortest is 0.19 seconds. Overall, there are 10,471 words in the episode, and on average, a speaker uttered 2,094 of them, with a mean value of 7.9 words per sentence. While the total length of the recording is just under one hour, the total duration of all utterances without overlapping is 1 hour 33 minutes and 26.3 seconds, which suggests a substantial amount of overlapping speech. Consequently, the dialog is characterized by a vivid and rapid exchange of speaker roles, which makes it ideal for the study of non-verbal behavior that accompanies

The entertainment show was segmented and transcribed with the transcription

tool Transcriber 1.5.1 and annotated in the annotation tool ELAN. The annotation of DAs was performed with the web-based annotation tool Webanno. For the classification of DAs, we applied the ISO 24617-2 scheme, however, it was partially consolidated in accordance with our research's aim. In the dimension of information-providing functions, we specified the function Correction as it does not clarify whether the sender corrects themselves of the interlocutor. Therefore, we added the function CorrectionPartner, which denotes the action of the sender who is correcting the interlocutor. Among the functions Inform or Agreement, we also filled the need for argumentative acts and added the function Argument. For occasions where the sender quotes someone, the function ReportedSpeech was added. Among the directive functions, the Instruct function did not suffice for acts where the sender provides support to the interlocutor or when the sender warns the interlocutor. Therefore, the functions Encouragement and Warning were added. With regard to feedback-specific functions, we merged the AutoPositive and AutoNegative functions into the OwnComprehensionFeedback function. Similarly, we merged the alloPositive, and the AlloNegative functions into the PartnerComprehensionFeedback function. The dimension of discourse structuring provided the function of opening but lacked the closing action, which we added. As regards the dimension that manages social obligations, we merged the InitGreeing and the ReturnGreeting functions into Greeting. The dimension, however, lacked the function of providing and accepting praise or flattery, which is why the functions Praise and AcceptPraise were included. The annotation of sentiment included

**110**

The annotation of non-verbal expressions focusing on gestures, mimics, was carried out in Elan. The annotation of each phenomenon highlighted in **Figure 1** (e.g., gesture unit, phrase, NCI) was conducted individually, but by two or three annotators at a time. In terms of annotation disagreement, diverging values were elaborated and argued until consensus was reached. Moreover, before the annotation process began, all annotators were familiarized with the nature of the signal to be annotated and notified with the possible values from which they could choose.

In terms of NCI annotation, we used the following classification:

• **Illustrators (I)** define body movement (embodiment) that illustrates what a speaker is saying. Regarded as foreground behavior, they accompany or reinforce verbal cues and are accompanied by an actual word referent in the speech. Illustrators are further classified into outlines, ideographs, and dimensional illustrators. The outlines (IO) subclass encompasses embodiments that reproduce a concrete aspect of the accompanying verbal content (explicit referents in speech). The ideographic/metaphoric illustrators (Ii) subclass refers to a concretization of the abstract through a specific shape. The spatial/dimensional (Id) subclass refers to the spatial movements outlining or depicting dimensional relations. They are used to 'paint' characteristics of entities and actions to further highlight their physical properties.


In terms of the background-foreground distribution of observed NCIs, we can observe that the material contains predominantly non-verbal behavior "functioning" in the background. Overall, we have observed roughly 1,684 non-verbal expressions, out of which 1,274 belonged to regulators (75.65 percent) and 136 (8.08 percent) to illustrators and symbols. The rest, 275 (16.33 percent), belonged to deictic expressions. The majority of NCI is, therefore, of background nature.

**113**

**Table 1.**

*Can Turn-Taking Highlight the Nature of Non-Verbal Behavior: A Case Study*

A rough classification of NCIs and DAs according to their underlying nature, which can be of background and/or foreground nature is represented in **Table 1**. It must be emphasized that this classification is purely provisional, as the foregroundbackground barrier is vague and can, depending on the wider context, be crossed

In total, five annotators, two with a linguistic background, and three with a technical background in machine interaction were involved in this phase of annotations. Annotations were performed in separate sessions, each session describing a specific signal. The annotation was performed in pairs, i.e., two or three annotators annotated the same signal. After the annotation, consensus was reached by observing and commenting on the values where the was no or little annotation agreement among multiple annotators (including those not involved in the annotation of the signal). The final corpus was generated after all disagreements were resolved. Procedures for checking inconsistencies were finally applied by an expert annotator. Before starting with each session, the annotators were given an introductory presentation defining the nature of the signal they were observing and the exact meaning of the finite set of values they could use. An experiment measuring agreement was also performed. It included an introductory annotation session in which the preliminary inconsistencies were resolved. Overall, given the complexity of the task and the fact that the values in **Table 2** also cover cases with a possible duality of meaning, the level of agreement is acceptable and comparable to other multimodal

For the less complex signals, influenced primarily by a single modality (e.g., pitch, gesture unit, gesture phrase, body-part/modality, sentence type), the annotators' agreement measured in terms of Cohen's kappa [30] was high, namely, between 0.75 and 0.9 on the Kappa score. The signals such as, Part-of-Speech, Syntax, Word Segmentation, were annotated (semi)automatically and the two expert annotators (linguists) overviewed the process and corrected the tags manually. The agreement was measured over the agreement on the corrections made. Pitch was annotated completely automatically, no agreement was measured. The only exceptions between less complex, unimodal signals, were Gesture phrase (0.53) and Prosodic phrases (0.71). The disagreements were expected since in some cases it is quite ambiguous to identify where a certain phrase ends and the next stars. Moreover, in a lot of cases, a retraction phase of a gesture can be recognized as

As summarized in Table 3, for the more complex signals that involve multiple modalities for their comprehension (including speech, gestures, and text) the

NCIs Regulators, Batons, Deictics Illustrators, Symbols,

management, Discourse structuring, Feedback, Communication

*A coarse-grained classification of the underlying nature of NCI classes and DA dimensions.*

**Background nature Foreground nature**

Deictics

Task, Social obligations management

*DOI: http://dx.doi.org/10.5772/intechopen.95516*

by both NCIs and DAs.

**2.4 Annotation agreement**

corpus annotation tasks [29].

stroke phase of the next gesture phrase.

management

disagreements in interpretation were expectedly higher.

DAs Turn management, Social obligations management, Time

A rough classification of NCIs and DAs according to their underlying nature, which can be of background and/or foreground nature is represented in **Table 1**. It must be emphasized that this classification is purely provisional, as the foregroundbackground barrier is vague and can, depending on the wider context, be crossed by both NCIs and DAs.
