**4. Experimental results**

#### **4.1. Datasets**

Data in this work have been provided by Motor Trend Magazine from their Best Driver Car of the Year 2014 and 2015. They consist of frontal face video of a test driver as he drives one of 10 automobiles around a racetrack. Parts of the video will be released publicly on YouTube at a later date. The videos are 1080p HD quality captured with a Go Pro Hero 4 and range from 231 to 720 seconds in length. The camera is mounted on the windshield of the car facing the driver's face. The dataset was labeled with the Fontaine emotional model [2] rather than facial action units or emotional categories to quantize emotion. Emotions such as happiness, sadness, etc. occupy a space in a two‐dimensional Euclidean space defined by valence and arousal. The objective of the dataset is to detect the valence and arousal of an individual on a per‐frame basis. Valence, also known as evaluation‐pleasantness, describes positivity or nega‐ tivity of the person's feelings or feelings of situation, e.g., happiness versus sadness. Arousal, also known as activation‐arousal, describes a person's interest in the situation, e.g., eagerness versus anxiety.

## **4.2. Metrics**

information and the *XT* and *YT* frames contribute the temporal information. These planes intersect at the center pixel. Whereas in Eq. (19), VLBP captures a truly three‐ dimensional microtexture, LBP‐TOP computes LBP codes separately on each plane. The resulting feature

In the proposed method, the computational efficiency of LBP‐TOP is applied to images filtered with the anisotropic‐inhibited Gabor filter. The suppression of background texture provides an image that only contains the edges separate from the background texture. These edges are the significant boundaries of facial features that are useful when determining expression and emotion. Local anisotropic binary patterns' (LAIBP) code values are computed as follows:

where *g*(*u*, *v*) is the maximal edge magnitude from Eq. (16). LAIBP‐TOP features are extracted in a similar fashion to LBP‐TOP: Compute *LAIBP* codes from Eq. (20) in *XY*, *XT*, and *YT* planes and concatenate the resultant histograms. A comparison of AIGF, LBP, and the proposed method, LAIBP, are given in **Figure 4**. The proposed method (LAIBP‐TOP) is significantly different from LBP‐TOP because we introduce background texture removal from Eq. (16).

Data in this work have been provided by Motor Trend Magazine from their Best Driver Car of the Year 2014 and 2015. They consist of frontal face video of a test driver as he drives one of 10 automobiles around a racetrack. Parts of the video will be released publicly on YouTube at a later date. The videos are 1080p HD quality captured with a Go Pro Hero 4 and range from 231 to 720 seconds in length. The camera is mounted on the windshield of the car facing the driver's face. The dataset was labeled with the Fontaine emotional model [2] rather than facial action units or emotional categories to quantize emotion. Emotions such as happiness,

**Figure 4.** From left to right: The original frame, anisotropic inhibited Gabor filter (AIGF), local binary patterns (LBP), and the proposed method local anisotropic inhibited binary patterns (LAIBP). Note that the proposed method has more

continuous lines compared to AIGF. LBP is susceptible to JPEG compression artifacts.

sign (*AIGF*(*u*, *v*) − *AIGF*(*x*, *y*)) × 2 *<sup>q</sup>* (20)

.

*3.2.6. Local anisotropic inhibited Gabor patterns in three orthogonal planes*

{*u*,*v*}∈*Nx*,*<sup>y</sup> LBP*

vector dimensionality of LBP‐TOP is 3 × 2 *<sup>n</sup>*

16 Emotion and Attention Recognition Based on Biological Signals and Images

*LAIBP*(*x*, *y* ) = ∑

**4. Experimental results**

**4.1. Datasets**

For face detection results, we use true positive rate and *F*<sup>1</sup> score. *F*<sup>1</sup> score is given by:

$$\text{For face detection results, we use true positive rate and } F\_i \text{ score, } F\_i \text{ score is given by:}$$

$$2 \times \frac{(\text{Precision} \text{(Recall)} \tag{21})}{(\text{Precision} + \text{(Recall)} \tag{21})} \tag{21}$$

For both metrics, higher is better. For full recognition results, we use root mean squared (RMS) error and correlation. The correlation coefficient is given by:

(RMS) error and correlation. The correlation coefficient is given by: 
$$\frac{E\left[\left(y\_{d}-\mu\_{y\_{i}}\right)\left(y-\mu\_{y}\right)\right]}{\sigma\_{y\_{i}}\sigma\_{y}}\tag{22}$$

where *E*[. ] is the expected operation; *<sup>y</sup><sup>d</sup>* is the vector of ground‐truth labels for a video; *y* is the vector of predicted labels for a video; *μyd* and *μy* are the mean of ground‐truth and predic‐ tion, respectively; and *σyd* and *σ<sup>y</sup>* are the standard deviation of ground‐truth and prediction, respectively.

### **4.3. Results comparing different face detectors**

Face detection results are given in **Table 1**. In general, VJ is the worst performer with the highest variance. Though CLM and SDM have acceptable detection rates, they too have a high variance and some videos are a total failure with no face extraction. The proposed algorithm improves detection rates on both datasets and reduces variance.
