**5. Effects of task demand, listener characteristics, and speaker characteristics on brain responses toward vocal expression decoding**

Decoding emotion from voice has suffered from many variations, one noticeable factor is the communication context. The task relevance modulates the level of explicitness of emotional processing of vocal expression. One study presented mismatching and match‐ ing emotional prosody to listeners and asked them to judge the emotional congruency (where the emotional information is task relevant), or to verify the consistency between a visually presented lexical item and the statement [22]. Three ERP effects were elicited: an early negativity effect from 150 to 250 ms regardless of task relevance and the pattern of mismatch, an early positivity from 250 to 450 ms only on angry voice which was preceded by a neutral voice but regardless task relevance, and a late positivity effect after 450 ms for the task that directed listener's attention to the emotional aspects of the vocal expression. Explicit task relevant processing emotionality enhanced vigilance in perceiving emotional change in the voice.

Vocal emotion decoding is also characterized according to the listener's characteristics. Developmental studies revealed neurophysiological correlates of emotional voice process‐ ing (especially negative emotion) were similar in children and adults [23]. Using emotional interjections ("ah"), Chronaki et al. [23] compared angry, happy, and neutral voices in 6‐ to 11‐year‐old typically developing children. The N400 was attenuated by angry than by other expression types over parietal and occipital regions. Comparing neurocognitive processes along stages of early human development merits further examinations [24].

Another topic is how listener's linguistic and cultural background affect their perception of vocal expressions. In a recent EEG study, native North‐American English and Chinese speakers were asked to detect the emotion of the vocal or facial expression in a voice‐face pair [25]. The emotional information between the voice and face was either congruent or incongruent. Both groups were sensitive to the emotional differences between voice and face, revealing lower accuracy and higher N400 amplitude for the incongruent voice‐ face pairs. However, English speakers showed more pronounced N400 enlargement and more reduced accuracy when vocal information was attended, suggesting that those from a Western culture suffered from a larger interference effect from irrelevant face informa‐ tion. Another study using a passive odd‐ball paradigm in which the two groups of listen‐ ers were presented with deviant or standard facial expressions which were paired with a vocal expression or not [26]. Chinese speakers showed a larger mismatch negativity when vocal expression was presented together with a facial expression, suggesting that individu‐ als from an eastern culture were more sensitive to an interference from task‐irrelevant vocal cues. These findings implicate a role of cultural learning and different cultural practices in communication shape neurocognitive processes associated with the early perception of voice‐face emotional cues.

Listener's biological sex has been central in modulating the integration of emotional informa‐ tion in vocal and verbal channels [27, 28]. Recent evidence extended this idea beyond the basic emotion. Jiang and Pell [15] examined the sex difference in evaluating confidence in both confidence‐ and neutral‐intending vocal expressions and the associated neural responses. They revealed that the delayed positivity effect elicited by neutral‐intending expression was only observed in female listeners, suggesting an inferential process aimed at deriving speaker meaning from nonexpression‐intending vocal expressions. Their further analysis revealed that, when vocal statements were led by lexical phrases of some level of certainty (LEX + VOC), females elicited more pronounced N1 in confident expression and larger late positivity (550–1200 ms) in unconfident and close‐to‐confident expressions. When these statements were compared with those with only vocal cues signifying confidence (VOC only), reduced N1, P2 as well as N400 were observed in females [16]. These findings suggest the enhanced sensitivity to socioemotional information for females in vocal communication. Females and males also engage different strategies in resolving conflicting information in vocal expressions. Jiang and Pell [17] demonstrated that the conflicting message of vocal confidence expressions elic‐ ited different ERP effects in female vs. male listeners. The confident statement following an unconfident phrase elicited a larger delayed positivity only in a female participant; while the unconfident statement following a confident phrase elicited an N400 in a male participant and a P600 effect in male participants. These findings provided a picture of how mixed messages are dealt with in female vs. male brain: in face of a mismatch in vocal expressions, the female attempted to unify separate information to establish an integrated representation while the male updated the initially built representation by switching an alternative interpretation (for example, by saying "She has access to the building" in the unconfident voice following "I'm certain," the speaker reveals some level of hesitation).

Given its sociointeractive nature, inferring a speaker meaning from interactive emotive expression is susceptible to listener's traits and personality characteristics. One factor which has been ignored but should be evaluated is the individual's interpersonal sensitivity. Jiang and Pell [16, 17] measured individual's interpersonal sensitivity using interpersonal reactiv‐ ity index (IRI) [29] and regressed the early and late ERP responses toward perceiving a cer‐ tain level of confidence to the interpersonal sensitivity. They found that those who displayed higher IRI score revealed more pronounced delayed positivity effects in close‐to‐confident and unconfident congruent expressions [16] and in incongruent confident expressions pre‐ ceded by an unconfident phrase [17]. A further examination of such individual difference revealed that a larger positivity for a female listener fully mediated their perceptual adjust‐ ment toward that incongruent expression (e.g., judging the incongruent confident expression to be less confident than the congruent one).

Another topic is how listener's linguistic and cultural background affect their perception of vocal expressions. In a recent EEG study, native North‐American English and Chinese speakers were asked to detect the emotion of the vocal or facial expression in a voice‐face pair [25]. The emotional information between the voice and face was either congruent or incongruent. Both groups were sensitive to the emotional differences between voice and face, revealing lower accuracy and higher N400 amplitude for the incongruent voice‐ face pairs. However, English speakers showed more pronounced N400 enlargement and more reduced accuracy when vocal information was attended, suggesting that those from a Western culture suffered from a larger interference effect from irrelevant face informa‐ tion. Another study using a passive odd‐ball paradigm in which the two groups of listen‐ ers were presented with deviant or standard facial expressions which were paired with a vocal expression or not [26]. Chinese speakers showed a larger mismatch negativity when vocal expression was presented together with a facial expression, suggesting that individu‐ als from an eastern culture were more sensitive to an interference from task‐irrelevant vocal cues. These findings implicate a role of cultural learning and different cultural practices in communication shape neurocognitive processes associated with the early perception of

52 Emotion and Attention Recognition Based on Biological Signals and Images

Listener's biological sex has been central in modulating the integration of emotional informa‐ tion in vocal and verbal channels [27, 28]. Recent evidence extended this idea beyond the basic emotion. Jiang and Pell [15] examined the sex difference in evaluating confidence in both confidence‐ and neutral‐intending vocal expressions and the associated neural responses. They revealed that the delayed positivity effect elicited by neutral‐intending expression was only observed in female listeners, suggesting an inferential process aimed at deriving speaker meaning from nonexpression‐intending vocal expressions. Their further analysis revealed that, when vocal statements were led by lexical phrases of some level of certainty (LEX + VOC), females elicited more pronounced N1 in confident expression and larger late positivity (550–1200 ms) in unconfident and close‐to‐confident expressions. When these statements were compared with those with only vocal cues signifying confidence (VOC only), reduced N1, P2 as well as N400 were observed in females [16]. These findings suggest the enhanced sensitivity to socioemotional information for females in vocal communication. Females and males also engage different strategies in resolving conflicting information in vocal expressions. Jiang and Pell [17] demonstrated that the conflicting message of vocal confidence expressions elic‐ ited different ERP effects in female vs. male listeners. The confident statement following an unconfident phrase elicited a larger delayed positivity only in a female participant; while the unconfident statement following a confident phrase elicited an N400 in a male participant and a P600 effect in male participants. These findings provided a picture of how mixed messages are dealt with in female vs. male brain: in face of a mismatch in vocal expressions, the female attempted to unify separate information to establish an integrated representation while the male updated the initially built representation by switching an alternative interpretation (for example, by saying "She has access to the building" in the unconfident voice following "I'm

Given its sociointeractive nature, inferring a speaker meaning from interactive emotive expression is susceptible to listener's traits and personality characteristics. One factor which

voice‐face emotional cues.

certain," the speaker reveals some level of hesitation).

Listener's level of anxiety also places an important role in modulating their neural responses toward decoding vocal emotions. In Jiang and Pell [15], both early (N100) and late ERP responses (P200, late positivity) were associated with the one's trait anxiety with those exhibiting higher trait anxiety revealed a reduced N100 and late positive effect in both vocalization and speech but an enhanced P200 effect in vocalization. Jiang and Pell [17] further found that the P200 in response to the confident vs. unconfident vocal expres‐ sion was larger in those who displayed a lower level of trait anxiety and such modulation mediated the reduced P200 in male listeners who showed reduced anxiety as compared with female listeners.
