**Quantification of Emotions for Facial Expression: Generation of Emotional Feature Space Using Self-Mapping**

Masaki Ishii, Toshio Shimodate, Yoichi Kageyama, Tsuyoshi Takahashi and Makoto Nishida

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51136

**1. Introduction**

[14] Dozono H., Nakakuni M., et al. The Authentication System for Multi-modal Behavior Biometrics Using Concurrent Pareto Learning SOM, Artificial Neural Networks and

[15] Dozono H., Nakakuni M., et al. Application of Supervised Pareto Learning Self Or‐ ganizing Maps and Its Incremental Learning. Advances in Self Organizing Maps

[16] Neagoe, V., E., Ropot, A., D. Concurrent Self-Organizing Maps for Pattern Classifica‐ tion, Proceeding ICCI '02 Proceedings of the 1st IEEE International Conference on

Machine Learning- ICANN 2011. Springer; 2011 LNCS6792 197-204

WSOM 2009. Springer; 2009 LNCS 5629 54-62

Cognitive Informatics; 2002: 304-12

142 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

Facial expression recognition for the purpose of emotional communication between humans and machines has been investigated in recent studies [1-7].

The shape (static diversity) and motion (dynamic diversity) of facial components, such as the eyebrows, eyes, nose, and mouth, manifest expression. From the viewpoint of static di‐ versity, owing to the individual variation in facial configurations, it is presumed that a facial expression pattern due to the manifestation of a facial expression includes subject-specific features. In addition, from the viewpoint of dynamic diversity, because the dynamic changes in facial expressions originate from subject-specific facial expression patterns, it is presumed that the displacement vector of facial components has subject-specific features.

On the other hand, although an emotionally generated facial expression pattern of an indi‐ vidual is unique, internal emotions expressed and recognized by humans via facial expres‐ sions are considered person-independent and universal. For example, one person may express the common emotion of happiness using various facial expressions, while another person may recognize happiness from these facial expressions. Pantic et al. argued that a natural facial expression always includes various emotions, and that a pure facial expression rare‐ ly appears [1]. Furthermore, they suggested that it is not realistic to classify all facial expres‐ sions into the six basic emotion categories: anger, sadness, disgust, happiness, surprise and fear. Instead, they proposed quantitative classification into many more emotion categories.

© 2012 Ishii et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Ishii et al.; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Pioneering studies on the quantification of emotions recognized from facial expressions have been conducted in the field of psychology. In particular, the mental space model of Russell et al. is well known: each facial expression is arranged in a space centering on "pleasantness" and "arousal," particularly addressing the semantic antithetical nature of emotion [8]. Russell et al. discovered that facial expression stimuli can be conceptualized as a circular arrangement in the mental space described above (the circumplex model). Yama‐ da found a significant correlation between the "slantedness" and "curvedness/openness" of facial components and the "pleasantness" and "arousal" in the mental space [9]. This obser‐ vation highlights the importance of clarifying a correspondence between changes in facial components accompanying emotional expressions (physical parameters) and recognized emotions (psychological parameters).

EMap can quantize the level of emotion in the image according to the level of change in the

Quantification of Emotions for Facial Expression: Generation of Emotional Feature Space Using Self-Mapping

http://dx.doi.org/10.5772/51136

145

Figures 1 and 2 respectively show the FEMap and EMap generated using the proposed method. Figure 3 shows the recognition result for the expressions of "fear" and "sur‐ prise". These results indicate that the pleasantness and arousal values gradually change with changes in facial expression patterns. Moreover, the changes in the pleasantness and arousal values of two individuals are similar, even though their facial expression pat‐

facial patterns.

terns are different.

**Figure 1.** Generation results of FEMap.

**Figure 2.** Generation results of EMap.

We address the following issues related to the recognition of emotions from facial expressions.

First, facial expression patterns are considered as physical parameters. Expressions convey personality, and as physical parameters, facial expression patterns vary among individuals. Hence, the classification of facial expressions is fundamentally a problem with an unknown number of categories. Accordingly, the extraction of subject-specific facial expression cate‐ gories using a common person-independent technique is an important issue.

Second, emotions are considered as psychological parameters. The facial expression pattern of an individual is unique, but as a psychological parameter, emotion is person-independent and universal. Moreover, the grade of a recognized emotion changes according to the grade of physical change in a facial expression pattern. Therefore, it is important to match the amount of physical change in a subject-specific facial expression pattern with the corre‐ sponding amount of mental change in order to estimate the grade of emotion.

Previously, we proposed a method for generating a subject-specific feature space to estimate the grade of emotion, i.e., an emotional feature space that expresses the correspondence be‐ tween physical and psychological parameters [10, 11]. In this chapter, we improve the abovementioned method. In addition, we develop a method for generating a feature space that can express a level of detailed emotion.

## **2. Previous studies**

A method for generating a subject-specific emotional feature space using self-organizing maps (SOMs) [12] and counter propagation networks (CPNs) [13] has been proposed in pre‐ vious studies [10, 11]. The feature space expresses the correspondence between the changes in facial expression patterns and the degree of emotions in a two-dimensional space cen‐ tered on "pleasantness" and "arousal." For practical purposes, we created two types of fea‐ ture spaces, a facial expression map (FEMap) and an emotion map (EMap), by learning facial images using CPNs. When a facial image is fed into the CPN after the learning proc‐ ess, the FEMap can assign the image to a unique emotional category. Furthermore, the EMap can quantize the level of emotion in the image according to the level of change in the facial patterns.

Figures 1 and 2 respectively show the FEMap and EMap generated using the proposed method. Figure 3 shows the recognition result for the expressions of "fear" and "sur‐ prise". These results indicate that the pleasantness and arousal values gradually change with changes in facial expression patterns. Moreover, the changes in the pleasantness and arousal values of two individuals are similar, even though their facial expression pat‐ terns are different.

**Figure 1.** Generation results of FEMap.

Pioneering studies on the quantification of emotions recognized from facial expressions have been conducted in the field of psychology. In particular, the mental space model of Russell et al. is well known: each facial expression is arranged in a space centering on "pleasantness" and "arousal," particularly addressing the semantic antithetical nature of emotion [8]. Russell et al. discovered that facial expression stimuli can be conceptualized as a circular arrangement in the mental space described above (the circumplex model). Yama‐ da found a significant correlation between the "slantedness" and "curvedness/openness" of facial components and the "pleasantness" and "arousal" in the mental space [9]. This obser‐ vation highlights the importance of clarifying a correspondence between changes in facial components accompanying emotional expressions (physical parameters) and recognized

We address the following issues related to the recognition of emotions from facial expressions.

First, facial expression patterns are considered as physical parameters. Expressions convey personality, and as physical parameters, facial expression patterns vary among individuals. Hence, the classification of facial expressions is fundamentally a problem with an unknown number of categories. Accordingly, the extraction of subject-specific facial expression cate‐

Second, emotions are considered as psychological parameters. The facial expression pattern of an individual is unique, but as a psychological parameter, emotion is person-independent and universal. Moreover, the grade of a recognized emotion changes according to the grade of physical change in a facial expression pattern. Therefore, it is important to match the amount of physical change in a subject-specific facial expression pattern with the corre‐

Previously, we proposed a method for generating a subject-specific feature space to estimate the grade of emotion, i.e., an emotional feature space that expresses the correspondence be‐ tween physical and psychological parameters [10, 11]. In this chapter, we improve the abovementioned method. In addition, we develop a method for generating a feature space

A method for generating a subject-specific emotional feature space using self-organizing maps (SOMs) [12] and counter propagation networks (CPNs) [13] has been proposed in pre‐ vious studies [10, 11]. The feature space expresses the correspondence between the changes in facial expression patterns and the degree of emotions in a two-dimensional space cen‐ tered on "pleasantness" and "arousal." For practical purposes, we created two types of fea‐ ture spaces, a facial expression map (FEMap) and an emotion map (EMap), by learning facial images using CPNs. When a facial image is fed into the CPN after the learning proc‐ ess, the FEMap can assign the image to a unique emotional category. Furthermore, the

gories using a common person-independent technique is an important issue.

sponding amount of mental change in order to estimate the grade of emotion.

emotions (psychological parameters).

144 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

that can express a level of detailed emotion.

**2. Previous studies**

**Figure 2.** Generation results of EMap.

**3. Algorithms of SOM and CPN**

An SOM is a learning algorithm that models the self-organizing and adaptive learning capa‐ bilities of the human brain [12]. It comprises two layers: an input layer, to which training data are supplied, and a Kohonen layer, in which self-mapping is performed via competitive

Quantification of Emotions for Facial Expression: Generation of Emotional Feature Space Using Self-Mapping

1. Let *w i,j (t)* be the weight from an input layer unit *i* to a Kohonen layer unit *j* at time *t*.

2. Let *x <sup>i</sup> (t)* be the data input to the input layer unit *i* at time *t*; calculate the Euclidean dis‐

(*t*)−*wi*, *<sup>j</sup>*

4. Update the weight *w i,j (t)* of a Kohonen layer unit contained in the neighborhood region

(*t*) + *α*(*t*)(*xi*

A CPN is a learning algorithm that combines the Grossberg learning rule with a SOM [13]. It comprises three layers: an input layer to which training data are supplied, a Kohonen layer in which self-mapping is performed via competitive learning, and a Grossberg layer, which labels the Kohonen layer by the counter propagation of teaching signals. A CPN is useful for automatically determining the label of a Kohonen layer when the category to which training data belongs is predetermined. This labeled Kohonen layer is designated as a category map.

2. Let *x <sup>i</sup> (t)* be the data input to the input layer unit *i* at time *t*, and calculate the Euclidean

*n,m (t)* using (3).

(*t*)−*wi*, *<sup>j</sup>*

*n,m (t)* be the weights to a Kohonen layer unit *(n, m)* at time *t* from an

(*t*))2 (1)

http://dx.doi.org/10.5772/51136

147

, which is designated as the winner unit.

(*t*)) (2)

*n,m* and *w <sup>j</sup>*

*n,m* are

learning. The learning procedure of an SOM is described below.

*dj* <sup>=</sup> ∑ *i*=1 *I* (*xi*

of the winner unit *N <sup>c</sup> (t)* using (2), where *α(t)* is a learning coefficient.

(*t* + 1)=*wi*, *<sup>j</sup>*

5. Repeat processes 2)–4) up to the maximum iteration of learning.

input layer unit *i* and from a Grossberg layer unit *j*, respectively. In fact, *w <sup>i</sup>*

Actually, *w i,j* is initialized using random numbers.

between *x <sup>i</sup> (t)* and *w i,j (t)* using (1).

3. Search for a Kohonen layer unit to minimize *d <sup>j</sup>*

*wi*, *<sup>j</sup>*

The learning procedure of a CPN is described below.

**3.2. Counter Propagation Network (CPN)**

*n,m (t)* and *w <sup>j</sup>*

initialized using random numbers.

distance *d <sup>n</sup>*,m between *x <sup>i</sup> (t)* and *w <sup>i</sup>*

**3.1. Self-Organizing Maps (SOM)**

tance *d <sup>j</sup>*

1. Let *w <sup>i</sup>*

**Figure 3.** Recognition results for "fear" and "surprise".

## **3. Algorithms of SOM and CPN**

#### **3.1. Self-Organizing Maps (SOM)**

An SOM is a learning algorithm that models the self-organizing and adaptive learning capa‐ bilities of the human brain [12]. It comprises two layers: an input layer, to which training data are supplied, and a Kohonen layer, in which self-mapping is performed via competitive learning. The learning procedure of an SOM is described below.

1. Let *w i,j (t)* be the weight from an input layer unit *i* to a Kohonen layer unit *j* at time *t*. Actually, *w i,j* is initialized using random numbers.

2. Let *x <sup>i</sup> (t)* be the data input to the input layer unit *i* at time *t*; calculate the Euclidean dis‐ tance *d <sup>j</sup>* between *x <sup>i</sup> (t)* and *w i,j (t)* using (1).

$$d\_j = \sqrt{\sum\_{i=1}^{l} (x\_i(t) - w\_{i,j}(t))^2} \tag{1}$$

3. Search for a Kohonen layer unit to minimize *d <sup>j</sup>* , which is designated as the winner unit.

4. Update the weight *w i,j (t)* of a Kohonen layer unit contained in the neighborhood region of the winner unit *N <sup>c</sup> (t)* using (2), where *α(t)* is a learning coefficient.

$$w\_{i,j}(t+1) = w\_{i,j}(t) + \alpha(t)(x\_i(t) - w\_{i,j}(t))\tag{2}$$

5. Repeat processes 2)–4) up to the maximum iteration of learning.

#### **3.2. Counter Propagation Network (CPN)**

**Figure 3.** Recognition results for "fear" and "surprise".

146 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

A CPN is a learning algorithm that combines the Grossberg learning rule with a SOM [13]. It comprises three layers: an input layer to which training data are supplied, a Kohonen layer in which self-mapping is performed via competitive learning, and a Grossberg layer, which labels the Kohonen layer by the counter propagation of teaching signals. A CPN is useful for automatically determining the label of a Kohonen layer when the category to which training data belongs is predetermined. This labeled Kohonen layer is designated as a category map. The learning procedure of a CPN is described below.

1. Let *w <sup>i</sup> n,m (t)* and *w <sup>j</sup> n,m (t)* be the weights to a Kohonen layer unit *(n, m)* at time *t* from an input layer unit *i* and from a Grossberg layer unit *j*, respectively. In fact, *w <sup>i</sup> n,m* and *w <sup>j</sup> n,m* are initialized using random numbers.

2. Let *x <sup>i</sup> (t)* be the data input to the input layer unit *i* at time *t*, and calculate the Euclidean distance *d <sup>n</sup>*,m between *x <sup>i</sup> (t)* and *w <sup>i</sup> n,m (t)* using (3).

$$d\_{n,m} = \sqrt{\sum\_{i=1}^{l} \left(\chi\_i(t) - w\,\,^i{}\_{n,m}(t)\right)^2} \tag{3}$$

The proposed method consists of the following three steps. First, facial expression images are hierarchically classified using SOMs, and subject-specific facial expression categories are extracted. Next, the CPN is used for data expansion of the facial expression patterns on the basis of the similarity and continuity of each facial expression category. The CPN is a super‐ vised learning algorithm that combines Grossberg's learning rule with the SOM. A category map generated by the method described above is defined as a subject-specific FEMap. Then, a subject-specific emotion feature space is generated. The space matches physical and psy‐ chological parameters by inputting the coordinate values to the circumplex model proposed by Russell [8] as teaching signals for the CPN. Then, this complex plane is defined as a sub‐

Quantification of Emotions for Facial Expression: Generation of Emotional Feature Space Using Self-Mapping

http://dx.doi.org/10.5772/51136

149

The proposed method was adopted to extract a subject-specific facial expression category hierarchically by using an SOM with a narrow mapping space. An SOM is an unsuper‐ vised learning algorithm, and it classifies the given facial expression images in a self-organ‐ izing manner, according to their topological characteristics. Hence, it is suitable for classification with an unknown number of categories. Moreover, an SOM compresses the topological information of facial expression images using a narrow mapping space, and it performs classification based on features that roughly divide the training data. We speculate that repeating these steps hierarchically renders the classified amount of change in facial expres‐ sion patterns comparable; hence, a subject-specific facial expression category can be extract‐ ed. Figure 5 shows the extraction of a facial expression category, the details of which are

**1.** The expression images described in Section 5.1 were used as training data. The follow‐ ing processing was performed for each facial expression. The training data is assumed

**2.** Learning was conducted using an SOM with a Kohonen layer of five units and an input layer of 40 × 48 units (Fig. 5(a)), where the number of learning sessions was set

**3.** The weight of the Kohonen layer *W i,j* (0 ≤ *W i,j* ≤ 1) was converted to a value of 0―255 at the end of learning, and visualized images were generated (Fig. 5(b)), where *n <sup>1</sup>*―*n <sup>5</sup>*

**4.** Five visualized images can be considered as representative vectors of the training data classified into each unit (*n <sup>1</sup>*―*n <sup>5</sup>*). Therefore, a thresholding process was adopted to judge whether a visualized image was suitable as a representative vector. Specifically, for the upper and lower parts of the face shown in Fig. 5(c), a correlation coefficient be‐ tween a visualized image and classified training data was determined for each unit. The standard deviation of these values was computed. When the standard deviation of both regions was 0.005 or less in all five units, the visualized image was considered to repre‐ sent the training data, and the subsequent hierarchization processing was cancelled.

ject-specific EMap.

provided below.

to 10,000.

to constitute *N* frames.

denote the training data classified into each unit.

**4.1. Extraction of facial expression category**

3. Search for a Kohonen layer unit to minimize *d n,m*, which is designated as the winner unit.

4. Update weights *w <sup>i</sup> n,m (t)* and *w <sup>j</sup> n,m (t)* of a Kohonen layer unit contained in the neighbor‐ hood region of the winner unit *N <sup>c</sup> (t)* using (4) and (5), where *α(t)*, *β(t)* are learning coeffi‐ cients, and *t <sup>j</sup> (t)* is a teaching signal to the Grossberg layer unit *j*.

$$\left\|\boldsymbol{w}\right\|\_{n,m}^{i}\left(t+1\right) = \left\|\boldsymbol{w}\right\|\_{n,m}^{i}\left(t\right) + \alpha\left(t\right)\left(\boldsymbol{x}\_{i}(t) - \left\|\boldsymbol{w}\right\|\_{n,m}^{i}(t)\right) \tag{4}$$

$$ww^{\circlearrowleft}\_{n,m}(t+1) = w^{\circlearrowleft}\_{n,m}(t) + \beta(t)\{t\_j(t) - zw^{\circlearrowleft}\_{n,m}(t)\}\tag{5}$$

5. Repeat processes 2)–4) up to the maximum iteration of learning.

6. After learning is completed, compare weights *w <sup>j</sup> n,m* observed from each unit of the Koho‐ nen layer, and let the teaching signal of the Grossberg layer with the maximum value be the label of the unit.

#### **4. Proposed method**

Figure 4 shows the procedure for generating the FEMap and EMap.

**Figure 4.** Flow chart of proposed method.

The proposed method consists of the following three steps. First, facial expression images are hierarchically classified using SOMs, and subject-specific facial expression categories are extracted. Next, the CPN is used for data expansion of the facial expression patterns on the basis of the similarity and continuity of each facial expression category. The CPN is a super‐ vised learning algorithm that combines Grossberg's learning rule with the SOM. A category map generated by the method described above is defined as a subject-specific FEMap. Then, a subject-specific emotion feature space is generated. The space matches physical and psy‐ chological parameters by inputting the coordinate values to the circumplex model proposed by Russell [8] as teaching signals for the CPN. Then, this complex plane is defined as a sub‐ ject-specific EMap.

#### **4.1. Extraction of facial expression category**

*dn*,*<sup>m</sup>* <sup>=</sup> ∑ *i*=1 *I* (*xi* (*t*)−*w <sup>i</sup>*

*n,m (t)* and *w <sup>j</sup>*

*w i n*,*m*

148 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

*w <sup>j</sup> n*,*m*

6. After learning is completed, compare weights *w <sup>j</sup>*

cients, and *t <sup>j</sup> (t)* is a teaching signal to the Grossberg layer unit *j*.

(*t* + 1)=*w <sup>i</sup>*

(*t* + 1)=*w <sup>j</sup>*

5. Repeat processes 2)–4) up to the maximum iteration of learning.

Figure 4 shows the procedure for generating the FEMap and EMap.

*n*,*m*

*n*,*m*

4. Update weights *w <sup>i</sup>*

label of the unit.

**4. Proposed method**

**Figure 4.** Flow chart of proposed method.

*n*,*m*

3. Search for a Kohonen layer unit to minimize *d n,m*, which is designated as the winner unit.

hood region of the winner unit *N <sup>c</sup> (t)* using (4) and (5), where *α(t)*, *β(t)* are learning coeffi‐

(*t*) + *α*(*t*)(*xi*

(*t*) + *β*(*t*)(*tj*

nen layer, and let the teaching signal of the Grossberg layer with the maximum value be the

(*t*)−*w <sup>i</sup>*

(*t*)−*w <sup>j</sup>*

*n*,*m*

*n*,*m*

(*t*))2 (3)

(*t*)) (4)

(*t*)) (5)

*n,m* observed from each unit of the Koho‐

*n,m (t)* of a Kohonen layer unit contained in the neighbor‐

The proposed method was adopted to extract a subject-specific facial expression category hierarchically by using an SOM with a narrow mapping space. An SOM is an unsuper‐ vised learning algorithm, and it classifies the given facial expression images in a self-organ‐ izing manner, according to their topological characteristics. Hence, it is suitable for classification with an unknown number of categories. Moreover, an SOM compresses the topological information of facial expression images using a narrow mapping space, and it performs classification based on features that roughly divide the training data. We speculate that repeating these steps hierarchically renders the classified amount of change in facial expres‐ sion patterns comparable; hence, a subject-specific facial expression category can be extract‐ ed. Figure 5 shows the extraction of a facial expression category, the details of which are provided below.


**4.2. Generation of facial expression map**

**Figure 6.** CPN architecture for generation of FEMap.

The recognition of a natural facial expression requires the generation of a facial expression pattern (mixed facial expression) that interpolates each emotion category. In the proposed method, the representative image obtained in Section 4.1 was used as training data, and data expansion of facial expression patterns between each emotion category was performed us‐ ing a CPN with a large mapping space. A CPN is adopted because the teaching signal of the training data is known by the processing described in Section 4.1. The mapping space of the CPN comprises more units than the training data, and it has a torus structure because a large mapping space is assumed to enable the CPN to perform data expansion based on the similarity and continuity of the training data. Figure 6 shows the CPN architecture for gen‐

Quantification of Emotions for Facial Expression: Generation of Emotional Feature Space Using Self-Mapping

http://dx.doi.org/10.5772/51136

151

**1.** The CPN structure comprises an input layer of 40 × 48 units and a two- or three-dimen‐ sional Kohonen layer. In addition, Grossberg layer 1 of seven units was prepared; a teaching signal of six basic facial expressions and a neutral facial expression were input to it.

**2.** The representative images obtained in Section 4.1 were used as training data, and learn‐ ing was carried out for each subject. As the teaching signal to Grossberg layer 1, 1 was

erating an FEMap. The details of the processing are provided below.


#### **4.2. Generation of facial expression map**

**5.** The correlation coefficient of weight *W i,j* between each adjacent unit in the Kohonen layer was computed. The Kohonen layer was divided into two parts between the units

**6.** The training data (*N <sup>1</sup>* and *N <sup>2</sup>*) classified into both sides of the partition were used as new training data; the processing described above was repeated recursively. Conse‐ quently, the hierarchical structure of the SOM was generated (Fig. 5(b) and Fig. 5(d)).

**7.** The lowest category of the hierarchical structure was defined as a facial expression cate‐ gory (Fig. 5(e)). Five visualized images were defined as representative images of each category at the end of learning. Then, the photographer of the facial expression images visually confirmed each facial expression category and conducted implication in emo‐ tion categories, such as a neutral facial expression and six basic facial expressions.

of the minimum correlation coefficient (Fig. 5(b)).

150 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

**Figure 5.** Extraction procedure of facial expression categories.

The recognition of a natural facial expression requires the generation of a facial expression pattern (mixed facial expression) that interpolates each emotion category. In the proposed method, the representative image obtained in Section 4.1 was used as training data, and data expansion of facial expression patterns between each emotion category was performed us‐ ing a CPN with a large mapping space. A CPN is adopted because the teaching signal of the training data is known by the processing described in Section 4.1. The mapping space of the CPN comprises more units than the training data, and it has a torus structure because a large mapping space is assumed to enable the CPN to perform data expansion based on the similarity and continuity of the training data. Figure 6 shows the CPN architecture for gen‐ erating an FEMap. The details of the processing are provided below.

**Figure 6.** CPN architecture for generation of FEMap.


input to units that represent emotion categories of representative images; otherwise, 0 was input. The number of learning sessions was set to 20,000.

**1.** Grossberg layer 2 of one unit, which inputs the coordinate values of the circumplex

Quantification of Emotions for Facial Expression: Generation of Emotional Feature Space Using Self-Mapping

http://dx.doi.org/10.5772/51136

153

**2.** Each facial expression stimulus was arranged in a circle on a plane centered on "pleas‐ antness" and "arousal" in the circumplex model (Fig. 7(b)). The proposed method ex‐ presses this circular space as the complex plane shown in Fig. 7(c), and complex numbers based on the figure were input to Grossberg layer 2 as teaching signals. For example, when the input training data represents the emotion category of happiness,

**4.** Each unit of the Kohonen layer was plotted onto the complex plane at the end of learn‐ ing, according to the values of the real and imaginary parts of the weight (*W g2*) on Grossberg layer 2. Then, this complex plane was defined as a subject-specific EMap.

In general, open facial expression databases are used in conventional studies [14, 15]. These databases contain a few images per expression and subject. For this study, we obtained fa‐ cial expression images of ourselves because the proposed method extracts subject-specific facial expression categories and representative images of each category from large quantities

We discuss a neutral facial expression and six basic facial expressions, namely, anger, sad‐ ness, disgust, happiness, surprise, and fear. These expressions are deliberately manifested by a subject. The basic facial expressions were acquired as motion videos, including a proc‐ ess in which the neutral and basic facial expressions were manifested five times; each facial expression was manifested in turns. The motion videos were converted into static images (30 fps, 8-bit gray, 320 × 240 pixels). We processed a region containing facial components. Therefore, a face region image was extracted and normalized according to the following

**1.** A face was detected using Haar-like features [16]; a face region image normalized into a

**2.** The image was processed using a median filter for noise removal. Then, smoothing processing was performed after dimension reduction of the image using coarse grain

**3.** A pseudo-outline that is common to all the subjects was generated; the face region con‐

**4.** 4. Histogram linear transformation was performed for brightness value correction.

model, was added to the CPN structure (Fig. 7(a)).

**5. Evaluation experiment**

**5.1. Expression images**

of data.

procedures.

size of 80 × 96 pixels was extracted.

taining facial components was extracted.

processing (40 × 48 pixels).

the teaching signal for Grossberg layer 2 is *cos (π/4) + i sin (π/4)*.

**3.** This processing was repeated up to the maximum learning number.

**3.** The weights (*W g1*) of Grossberg layer 1 were compared for each unit of the Kohonen layer at the end of learning; the emotion category with the greatest value was used as the label of the unit. A category map generated by the processing described above was defined as a subject-specific FEMap.

#### **4.3. Generation of emotion map**

Although the facial expression patterns of an individual are unique, emotions expressed and recognized from facial expressions by humans are person-independent and universal. Therefore, it is necessary to match the grade of emotion based on a common index for each subject with the grade of change in facial expression patterns described in Section 4.2. The proposed method is based on the circumplex model proposed by Russell as a common in‐ dex. Specifically, the coordinate values based on the circumplex model are input as teaching signals for the CPN, and the processing described in Section 4.2 is carried out simultaneous‐ ly. Then, an EMap is generated for matching the grade of change in facial expression pat‐ terns with the grade of emotion. Figure 7 shows the procedure for generating the EMap, the details of which are provided below.

**Figure 7.** Procedure for generating EMap.


## **5. Evaluation experiment**

#### **5.1. Expression images**

input to units that represent emotion categories of representative images; otherwise, 0

**3.** The weights (*W g1*) of Grossberg layer 1 were compared for each unit of the Kohonen layer at the end of learning; the emotion category with the greatest value was used as the label of the unit. A category map generated by the processing described above was

Although the facial expression patterns of an individual are unique, emotions expressed and recognized from facial expressions by humans are person-independent and universal. Therefore, it is necessary to match the grade of emotion based on a common index for each subject with the grade of change in facial expression patterns described in Section 4.2. The proposed method is based on the circumplex model proposed by Russell as a common in‐ dex. Specifically, the coordinate values based on the circumplex model are input as teaching signals for the CPN, and the processing described in Section 4.2 is carried out simultaneous‐ ly. Then, an EMap is generated for matching the grade of change in facial expression pat‐ terns with the grade of emotion. Figure 7 shows the procedure for generating the EMap, the

was input. The number of learning sessions was set to 20,000.

defined as a subject-specific FEMap.

152 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

**4.3. Generation of emotion map**

details of which are provided below.

**Figure 7.** Procedure for generating EMap.

In general, open facial expression databases are used in conventional studies [14, 15]. These databases contain a few images per expression and subject. For this study, we obtained fa‐ cial expression images of ourselves because the proposed method extracts subject-specific facial expression categories and representative images of each category from large quantities of data.

We discuss a neutral facial expression and six basic facial expressions, namely, anger, sad‐ ness, disgust, happiness, surprise, and fear. These expressions are deliberately manifested by a subject. The basic facial expressions were acquired as motion videos, including a proc‐ ess in which the neutral and basic facial expressions were manifested five times; each facial expression was manifested in turns. The motion videos were converted into static images (30 fps, 8-bit gray, 320 × 240 pixels). We processed a region containing facial components. Therefore, a face region image was extracted and normalized according to the following procedures.


Figure 8 shows an example of face region images after extraction and normalization. Table 1 lists the number of acquired frames and the number of frames extracted by the SOM as the training data for the CPN.

**iii.** Method 3: Learning was conducted using a CPN with a three-dimensional Koho‐

Tables 2 and 3 list the number of Kohonen layer units on the FEMap for Methods 1 and 2, respectively. Figure 9 and 10 shows the FEMaps and the EMaps generated using Methods 1

Table 2 shows that the percentage of the neutral facial expression category is high. More‐ over, although a mixed facial expression of a neutral expression and six basic expressions is generated, as shown in Fig. 10(a), a mixture of the six basic expressions is not generated. On the other hand, the number of units of each emotion category on the FEMap is roughly con‐ stant, as shown in Table 3, and many mixed facial expressions are generated between the expressions on the EMap, as shown in Fig. 10(b). These results suggest that the input ratio of the training data should be constant for every emotion category to effectively generate many

Method 2.

**6. Result and discussion**

and 2.

**6.1. Discussion on training data input method**

**Table 2.** Number of units on FEMap (Method 1).

**Table 3.** Number of units on FEMap (Method 2).

mixed facial expressions.

nen layer of 10 × 10 × 10 units. The training data input method is the same as that of

Quantification of Emotions for Facial Expression: Generation of Emotional Feature Space Using Self-Mapping

http://dx.doi.org/10.5772/51136

155

The data used in the study was acquired in accordance with ethical regulations regarding research on humans at Akita University, Japan.


**Figure 8.** Examples of facial expression images.


**Table 1.** Number of acquired frames and training data.

#### **5.2. Experiment details**

This study examined the training data input method and the number of dimensions of the CPN mapping space. In particular, the following were examined.


**iii.** Method 3: Learning was conducted using a CPN with a three-dimensional Koho‐ nen layer of 10 × 10 × 10 units. The training data input method is the same as that of Method 2.

## **6. Result and discussion**

Figure 8 shows an example of face region images after extraction and normalization. Table 1 lists the number of acquired frames and the number of frames extracted by the SOM as the

The data used in the study was acquired in accordance with ethical regulations regarding

This study examined the training data input method and the number of dimensions of the

**i.** Method 1: Learning was conducted using a CPN with a two-dimensional Koho‐

**ii.** Method 2: The Kohonen layer of the CPN was set to 30 × 30 units, as in Meth‐

nen layer of 30 × 30 units. Moreover, training data were randomly selected

od 1. However, the training data for each emotion category were input by the

training data for the CPN.

research on humans at Akita University, Japan.

154 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

**Figure 8.** Examples of facial expression images.

**Table 1.** Number of acquired frames and training data.

CPN mapping space. In particular, the following were examined.

**5.2. Experiment details**

and input.

same ratio.

#### **6.1. Discussion on training data input method**

Tables 2 and 3 list the number of Kohonen layer units on the FEMap for Methods 1 and 2, respectively. Figure 9 and 10 shows the FEMaps and the EMaps generated using Methods 1 and 2.


**Table 2.** Number of units on FEMap (Method 1).


**Table 3.** Number of units on FEMap (Method 2).

Table 2 shows that the percentage of the neutral facial expression category is high. More‐ over, although a mixed facial expression of a neutral expression and six basic expressions is generated, as shown in Fig. 10(a), a mixture of the six basic expressions is not generated. On the other hand, the number of units of each emotion category on the FEMap is roughly con‐ stant, as shown in Table 3, and many mixed facial expressions are generated between the expressions on the EMap, as shown in Fig. 10(b). These results suggest that the input ratio of the training data should be constant for every emotion category to effectively generate many mixed facial expressions.

**6.2. Discussion on number of dimensions of CPN mapping space**

the happiness region in the EMap generated by Methods 2 and 3.

**Figure 11.** EMap generated by Method 3.

**Figure 12.** Enlargement of the happiness region in the Emap.

The EMap generated by Method 3 is shown in Fig. 11. Figure 12 shows the enlargement of

Quantification of Emotions for Facial Expression: Generation of Emotional Feature Space Using Self-Mapping

http://dx.doi.org/10.5772/51136

157

**Figure 9.** Generation results of FEMap using Methods 1 and 2.

**Figure 10.** Generation results of EMap using Methods 1 and 2.

#### **6.2. Discussion on number of dimensions of CPN mapping space**

The EMap generated by Method 3 is shown in Fig. 11. Figure 12 shows the enlargement of the happiness region in the EMap generated by Methods 2 and 3.

**Figure 11.** EMap generated by Method 3.

**Figure 9.** Generation results of FEMap using Methods 1 and 2.

156 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

**Figure 10.** Generation results of EMap using Methods 1 and 2.

**Figure 12.** Enlargement of the happiness region in the Emap.

Although the number of Kohonen layer units in Methods 2 and 3 are almost equal (the for‐ mer is 900 units and the latter is 1,000 units), the generation results of the EMaps differ sig‐ nificantly. In particular, many mixed facial expressions are radially generated from the coordinates of the teaching signal on the circumference, as shown in Fig. 11 and Fig. 12(b). The number of neighboring emotion categories on the FEMap increases as a result of an in‐ crease in the number of dimensions of the CPN mapping space.

**References**

1424-1445.

neers (in Japanese), 85(9), 680-685.

Engineers (in Japanese), 85(12), 936-941.

*tern Recognition*, 36, 259-275.

10.1007/978-3-642-97610-0.

48, 1290-1298.

munication Engineers (in Japanese), 85(10), 766-771.

Communication Engineers (in Japanese), 86(1), 54-61.

[1] Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic Analysis of Facial Expressions: The State of the Art. *IEEE Trans. Pattern Analysis and Machine Intelligence*, 22(12),

Quantification of Emotions for Facial Expression: Generation of Emotional Feature Space Using Self-Mapping

http://dx.doi.org/10.5772/51136

159

[2] Tian, Y. L., Kanade, T., & Cohn, J. F. (2001). Recognizing Action Units for Facial Ex‐ pression Analysis. *IEEE Trans. Pattern Analysis and Machine Intelligence*, 23(2), 97-116.

[3] Akamatsu, S. (2002). Recognition of Facial Expressions by Human and Computer [I]: Facial Expressions in Communications and Their Automatic Analysis by Computer. The Journal of the Institute of Electronics, Information, and Communication Engi‐

[4] Akamatsu, S. (2002). Recognition of Facial Expressions by Human and Computer [II]: The State of the Art in Facial Expression Analysis-1; Automatic Classification of Fa‐ cial Expressions. The Journal of the Institute of Electronics, Information, and Com‐

[5] Akamatsu, S. (2002). Recognition of Facial Expressions by Human and Computer [III]: The State of the Art in Facial Expression Analysis-2; Recognition of Facial Ac‐ tions. The Journal of the Institute of Electronics, Information, and Communication

[6] Akamatsu, S. (2003). Recognition of Facial Expressions by Human and Computer [IV: Finish]: Toward Computer Recognition of Facial Expressions Consistent with the Perception by Human. The Journal of the Institute of Electronics, Information, and

[7] Fasel, B., & Luettin, J. (2003). Automatic Facial Expression Analysis: A Survey. *Pat‐*

[8] Russell, J. A., & Bullock, M. (1985). Multidimensional Scaling of Emotional Facial Ex‐ pressions: Similarity from Preschoolers to Adults. *J. Personality and Social Psychology*,

[9] Yamada, H. (2000). Models of Perceptual Judgments of Emotion from Facial Expres‐

[10] Ishii, M., Sato, K., Madokoro, H., & Nishida, M. (2008). Generation of Emotional Fea‐ ture Space based on Topological Characteristics of Facial Expression Images. Proc. IEEE Int. Conf. Automatic Face and Gesture Recognition All pages (CD-ROM), 6. [11] Ishii, M., Sato, K., Madokoro, H., & Nishida, M. (2008). Extraction of Facial Expres‐ sion Categories and Generation of Emotion Map Using Self-Mapping. IEICE Trans.

[12] Kohonen, T. (1995). Self-Organizing Maps. *Springer Series in Information Sciences*,

sions. Japanese Psychological Review in Japanese), 43(2), 245-255.

Information and Systems (in Japanese), J91-D(11), 2659-2672.

## **7. Conclusion**

In this chapter, we proposed a method for generating a feature space that expresses the cor‐ respondence between the changes in facial expression patterns and the degree of emotions. In addition, we investigated the training data input method and the number of dimensions of the CPN mapping space. The results clearly show that the input ratio of the training data should be constant for every emotion category and the number of dimensions of the CPN mapping space should be extended to effectively express a level of detailed emotion. We plan to experimentally evaluate emotion estimation using the generated feature spaces.

## **Acknowledgements**

This work was supported by a Grant-in-Aid for Scientific Research on Innovative Areas "Face perception and recognition: Multidisciplinary approaching to understanding face processing mechanism (No.4002)" (23119717) from the Ministry of Education, Culture, Sports, Science and Technology, Japan.

This paper is based on "Generation of Emotional Feature Space for Facial Expression Recog‐ nition using Self-Mapping," by M. Ishii, T. Simodate, Y. Kageyama, T. Takahashi, and M. Nishida, which will be presented at the SICE Annual Conference 2012, an international con‐ ference on Instrumentation, Control, Information Technology and System Integration, to be held in Akita, Japan, from August 20-23, 2012.

## **Author details**

Masaki Ishii1\*, Toshio Shimodate2 , Yoichi Kageyama2 , Tsuyoshi Takahashi2 and Makoto Nishida3

\*Address all correspondence to: ishii@akita-pu.ac.jp

1 Faculty of Systems Science and Technology, Akita Prefectural University, Japan 2 Graduate School of Engineering and Resource Science, Akita University, Japan 3 Akita University, Japan

## **References**

Although the number of Kohonen layer units in Methods 2 and 3 are almost equal (the for‐ mer is 900 units and the latter is 1,000 units), the generation results of the EMaps differ sig‐ nificantly. In particular, many mixed facial expressions are radially generated from the coordinates of the teaching signal on the circumference, as shown in Fig. 11 and Fig. 12(b). The number of neighboring emotion categories on the FEMap increases as a result of an in‐

In this chapter, we proposed a method for generating a feature space that expresses the cor‐ respondence between the changes in facial expression patterns and the degree of emotions. In addition, we investigated the training data input method and the number of dimensions of the CPN mapping space. The results clearly show that the input ratio of the training data should be constant for every emotion category and the number of dimensions of the CPN mapping space should be extended to effectively express a level of detailed emotion. We plan to experimentally evaluate emotion estimation using the generated feature spaces.

This work was supported by a Grant-in-Aid for Scientific Research on Innovative Areas "Face perception and recognition: Multidisciplinary approaching to understanding face processing mechanism (No.4002)" (23119717) from the Ministry of Education, Culture,

This paper is based on "Generation of Emotional Feature Space for Facial Expression Recog‐ nition using Self-Mapping," by M. Ishii, T. Simodate, Y. Kageyama, T. Takahashi, and M. Nishida, which will be presented at the SICE Annual Conference 2012, an international con‐ ference on Instrumentation, Control, Information Technology and System Integration, to be

, Yoichi Kageyama2

1 Faculty of Systems Science and Technology, Akita Prefectural University, Japan 2 Graduate School of Engineering and Resource Science, Akita University, Japan

, Tsuyoshi Takahashi2

and

crease in the number of dimensions of the CPN mapping space.

158 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

**7. Conclusion**

**Acknowledgements**

**Author details**

Makoto Nishida3

3 Akita University, Japan

Masaki Ishii1\*, Toshio Shimodate2

Sports, Science and Technology, Japan.

held in Akita, Japan, from August 20-23, 2012.

\*Address all correspondence to: ishii@akita-pu.ac.jp


[13] Nielsen, R. H. (1987). Counter Propagation Networks. *Applied Optics*, 26(23), 4979-4984.

**Chapter 8**

**A Self Organizing Map Based Motion Classifier**

**Implementation on a Smartphone**

Additional information is available at the end of the chapter

Wattanapong Kurdthongmee

http://dx.doi.org/10.5772/51002

**1. Introduction**

accepted as useful.

**with an Extension to Fall Detection Problem and Its**

Automatic classification of human motions is very useful in many application areas. The characteristics of the motion types and patterns can be used as an indicator of one''s mobility level, latent chronic diseases and aging process [1]. The motion types can be employed to further make a decision if one is at risk; i.e. the motion type or the transition between the motion types may be risky and is likely to cause a fall. Alternatively, the motion types may be useful as an indication to request for a close observation/attention; i.e. a jogging is higher risk and requires a close attention than a normal walk especially for elderly people. The in‐ dication may be used to support the feature of a video surveillance system for monitoring elderly people. With respect to these examples of application areas in combination with the requirement to automate monitoring of elderly people as a result of ''aging society'', the de‐ mands for an automatic motion classification have been increased. To observe and make re‐ lation to motion types, either an acceleration sensor or video system has been widely

In this chapter, we present the application of the self organizing map (SOM) to an automatic classification of basic, i.e. activities of daily live - ADL, human motions. To be specific, SOM is employed to solve the human motion classification problem. In our proposed approach, SOM is trained with motion parameters captured from a specific wearer and used to per‐ form an adaptive motion types clustering and classification functions. This results in a code‐ book with different clusters of motion types whose parameters are similar. Instead of using a codebook alone, the algorithm is proposed in order to distinguish between different basic motion types whose motion parameters are similar and mapped to the same cluster as clear‐

> © 2012 Kurdthongmee; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 Kurdthongmee; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

