**2. Entropy indices**

Entropy was firstly defined in thermodynamics, referring to the distribution probability of molecules in a fluid system [15]. In information theory and signals analysis, this concept was adapted by Shannon, who defined entropy as a measure of the information provided by a time series, describing its complexity, irregularity or unpredictability [16]. With respect to the EEG analysis, many entropy indices have been introduced and successfully applied for the study of various physical and mental disorders, like epilepsy [17], Alzheimer [18], autism [19] or depression [20], among others. As a result of the valuable outcomes, entropy metrics have also been introduced in the research field of emotions recognition from EEG recordings [14]. The following subsections give a brief mathematical description of the entropy metrics mainly applied for emotions detection.

#### **2.1 Regularity-based entropy indices**

The irregularity of a signal represents the rate of repetitiveness of patterns, reaching higher values for non-repetitive and disordered time series, and lower for sequences with a high rate of occurrence [21]. One of the regularity-based entropy metrics widely used is the approximate entropy (ApEn), which evaluates the probability of having repetitive patterns and assigns a non-negative number to each sequence in terms of its repetitiveness, with lower values for more recurrent patterns [22]. Mathematically, ApEn is computed as

$$\text{ApEn}(m, r) = \mathbf{C}^m(r) - \mathbf{C}^{n+1}(r),\tag{1}$$

where *<sup>C</sup><sup>m</sup>*ð Þ*<sup>r</sup>* and *<sup>C</sup><sup>m</sup>*þ<sup>1</sup> ð Þ*r* are the correlation integrals that represent the likelihood of having two sequences matching for *m* and for *m* þ 1 points, respectively, within the threshold *r* [22]. Nevertheless, ApEn also considers the self-matching of each sequence, thus influencing on the final result. Therefore, the sample entropy (SampEn) was developed to address this issue [23]. SampEn eliminates the selfmatching and makes results independent of the value of the vector length *m* selected. If *Bm*ð Þ*<sup>r</sup>* is the probability that two patterns match for *<sup>m</sup>* points, and *<sup>B</sup><sup>m</sup>*þ<sup>1</sup> ð Þ*r* is the probability that two patterns match for *m* þ 1 points, defined to exclude self-matches, then SampEn is calculated as

$$\text{SampEn}(m, r) = -\ln\left[\frac{B^{m+1}(r)}{B^m(r)}\right].\tag{2}$$

Moreover, quadratic sample entropy (QSampEn) emerged as an improvement of SampEn to make it insensitive to the value of the threshold *r* chosen for its calculation [24]. This independence of the parameter *r* is achieved by simply adding the term ln 2ð Þ*r* to the SampEn equation:

$$\text{QSampEn}(m, r) = \text{SampEn}(m, r, N) + \ln(2r). \tag{3}$$

#### **2.2 Predictability-based and symbolic entropy indices**

The predictability of a nonstationary system is related to its stable and deterministic evolution in time. Most of the entropy metrics for predictability measurement are symbolic indices that convert the original signal into a sequence of discrete symbols to form sequences [25]. After this symbolization, the evaluation of the predictability of a time series can be carried out with multiple techniques. The most commonly used is the Shannon entropy (ShEn), which quantifies the predictability of a signal in terms of the probability distribution of its amplitudes [16]. The mathematical expression of ShEn is

$$\text{ShEn}(m) = -\sum\_{i=1}^{m} p(\boldsymbol{\varkappa}\_i) \cdot \ln \left( p(\boldsymbol{\varkappa}\_i) \right),\tag{4}$$

being *p x*ð Þ*<sup>i</sup>* the probability of appearance of each symbolic sequence *xi* of length *m*.

The Rényi entropy (REn) is a generalization of ShEn that is also widely used for the quantification of underlying dynamics in symbolized signals [26]. Concretely, REn provides a better characterization of some rare and frequent ordinal sequences, and it is defined as

$$\text{REn}(m, q) = -\frac{1}{1 - q} \ln \sum\_{i=1}^{m} p(\mathbf{x}\_i)^q,\tag{5}$$

being *q* (*q*≥ 0 and *q* 6¼ 1) the bias parameter that enables a more accurate characterization of a nonlinear signal [26]. Indeed, ShEn is the particular case of REn for *q* ¼ 1, thus being REn a more flexible index than ShEn [26].

The version of ShEn for continuous random variables, called differential entropy (DEn), has received growing interest in the last years [27]. This entropy index can be expressed as

$$\text{DEn}(\mathbf{X}) = -\int\_{\mathcal{X}} f(\mathbf{x}) \log \left( f(\mathbf{x}) \right) d\mathbf{x} \tag{6}$$

where *X* is a random signal and *f x*ð Þ is its probability density function. In the case of time series governed by the Gauss distribution *<sup>N</sup> <sup>μ</sup>*, *<sup>σ</sup>*<sup>2</sup> ð Þ, being *<sup>μ</sup>* and *<sup>σ</sup>* its mean and variance, respectively, DEn can be defined as

$$\mathrm{DEn}(\mathbf{X}) = -\int\_{-\infty}^{\infty} \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}} \log\left(\frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\right) d\mathbf{x} = \frac{1}{2} \log\left(2\pi e \sigma^2\right). \tag{7}$$

As EEG signals follow a Gaussian distribution after the application of a band-pass filtering approach, the DEn of each sub-frequency band previously obtained by a Fast Fourier transform can be obtained according to the aforementioned equation [27].

Another widely used predictability-based entropy metric is permutation entropy (PerEn), which is a fast and insensitive to noise metric that evaluates the order of the symbols within a pattern [28]. Briefly, the original time series is symbolized to obtain ordinal sequences *xi* that are associated with *m*! permutation patterns *κi*. Considering *p*ð Þ *π<sup>k</sup>* as the probability of apperarance of each permutation pattern, PerEn can be then computed by means of ShEn:

$$\text{PerEn}(m) = -\frac{1}{\ln\left(m!\right)} \sum\_{k=1}^{m!} p(\pi\_k) \cdot \ln\left(p(\pi\_k)\right). \tag{8}$$

One of the limitations of PerEn is that it only considers the order of the symbols in a pattern, without taking into account their amplitudes. This limitation has been recently solved by means of the introduction of amplitude-aware permutation entropy (AAPE) [29]. This improvement of PerEn computes the probability *<sup>p</sup>*<sup>∗</sup> ð Þ *<sup>π</sup><sup>k</sup>* of appearance of patterns evaluating the average absolute and relative amplitudes of the symbolic sequences, and applying an adjustment coefficient to weight those parameters. Finally, AAPE is calculated in a similar manner as PerEn [29]:

$$\text{AAPE}(m) = -\frac{1}{\ln\left(m!\right)} \sum\_{k=1}^{m!} p^\*\left(\pi\_k\right) \cdot \ln\left(p^\*\left(\pi\_k\right)\right). \tag{9}$$

Another option for the assessment of predictability of a time series is the spectral entropy (SpEn) [30]. In this case, the spectral power of a determined frequency is computed and normalized with respect to the total power, which gives a probability density function *pf* . SpEn is then computed through ShEn or REn [30].

### **2.3 Multilag and multiscale entropy indices**

The time series generated by nonlinear and nonstationary systems like the brain usually present highly complex dynamics derived from different simultaneous mechanisms that operate in multiple time scales [31]. As a result, the brain behavior cannot be completely described by means of single-scale methods. Therefore, multiscale variations of the aforementioned entropy metrics have been introduced with the purpose of revealing undiscovered information related to the multiscale nature of the EEG recordings. For the computation of multiscale entropy (MSE), the original signal *x n*ð Þ is firstly decomposed into coarse-grained time series *<sup>y</sup>*ð Þ*<sup>κ</sup>* , with *<sup>κ</sup>* as the scale factor, as follows:

$$\mathcal{Y}^{(\kappa)} = \frac{1}{\kappa} \sum\_{i=(j-1)\kappa+1}^{j\kappa} \varkappa(i), \quad \mathbf{1} \le j \le \frac{N}{\kappa}. \tag{10}$$

Therefore, all the previously defined entropy indices can be computed in a multiscale form for the coarse-grained series as defined above.

Another multiscale option is the wavelet entropy (WEn), which makes use of the decomposition of the original signal in different scales by means of the wavelet

transform [32]. After the decomposition of the time series, the probability distribution *pj* of the energy at each decomposition level is computed. Finally, WEn is estimated through ShEn:

$$\text{WEN} = -\sum\_{j=1}^{q} p\_j \cdot \ln \left( p\_j \right). \tag{11}$$

On the other hand, the characteristics of the autocorrelation function of some signals require the consideration of a lag or time delay *τ* for a correct quantification of the complexity and nonlinear dynamics of the time series. In this sense, multilag entropy approaches are helpful to reduce the influence of the autocorrelation function for a proper characterization of a nonlinear signal [33]. One of those multilag approaches is the permutation min-entropy (PerMin), which is a symbolic timedelayed improvement of PerEn [34]. Starting from a generalization of PerEn through replacing ShEn by REn, the Rényi permutation entropy (RPE) is obtained as

$$\text{RPE}(m, q, \tau) = \frac{1}{\ln \left( m! \right)} \cdot \frac{1}{1 - q} \ln \left( \sum\_{k=1}^{m!} p^\*(\pi\_k)^q \right). \tag{12}$$

Finally, PerMin is obtained in the limit *q* ! ∞ and presents the following expression:

$$\text{PerMin}(m,\tau) = -\frac{1}{\ln\left(m!\right)} \ln\left(\max\_{k=1,2,\ldots,m!} \left[p^{\tau}(\pi\_k)\right]\right). \tag{13}$$
