**1. Introduction**

Theoretical models based on electrophysiological studies have indicated early and late neu‐ rophysiological markers that index online perception of vocal emotion expressions in speech as well as other higher‐order socioemotive expressions (e.g., confidence, sarcasm, sincerity, etc.), which roughly correspond to each hypothesized processing stage [1, 2]. Studies with event‐related potentials (ERPs), which focused on the analysis of averaged electrophysi‐ ological response to a certain vocal or speech event, have enlightened neurocognitive pro‐ cesses at a fine‐grained temporal scale. The early fronto‐central auditory N1 is known to be associated with a wide range of auditory stimulus types as a measure of sensory‐perceptual processing. In vocal emotion processing, N1 has been linked to the extraction of acoustic cues that differentiate different types of vocal signals, frequency, and intensity parameters [3, 4], and is unaffected by differences in emotional meaning. The fronto‐central P200 has been associated with the early attentional allocation or relevance evaluation of vocal signals [2, 5], ensuring preferential processing of emotional stimuli. Differentiation of P200 ampli‐ tude can be found between basic emotions [6] or between emotional vs. neutral speech [3, 7], suggesting that this component may reflect an early function of "tagging" emotional or motivational relevant stimuli. The P200 tended to be associated with higher mean and range of f0, larger mean and range of amplitude of speech, and slower speech rate [6], implicating that the early P200 modulation is partially explained by early meaning encoding as well as continued sensory processing [8]. A late centro‐parietal positivity (also named LPC) evoked by vocal emotion expressions has been defined as a positive‐going wave starting about 500 ms post‐onset of the vocal stimuli and perhaps sustaining until 1200 ms depending on stimulus features. The LPC is considered as reflecting continued or second‐pass evaluative process of the meaning of vocal emotional signals [2, 5]. The LPC was larger in emotional vocal stimuli, leading to larger differences in the LPC amplitude among basic emotion types [6], suggesting a more elaborative processing vocal information at this stage. In addition to these ERP effects, a more delayed sustained positivity may reflect a listener's attempt to infer the goal of a speaker, especially when an expected way of speaking is mismatched in an utterance context [9]. These event‐related potential components have provided a useful tool to examine the temporal neural dynamics of emotional decoding in voice and speech.
