**5.1.2 Facial expression generation model**

The facial expression generation model is the module that accepts input of the fuzzy facial expression *EX* with its intensity and output the agent's facial expression. In this paper, we adopted Xface, an MPEG-4 based open source toolkit for 3D facial animation, to generate multiple facial expressions of emotions mentioned above. Figure 7 are some keyframes of facial expressions.

Fig. 7. Some keyframes of facial expressions.

Multimodal Intelligent Tutoring Systems 97

& Alexander, 2008). However, this approach has two main shortcomings: firstly, the video only recorded three human tutors' behaviors and the students' reactions to these behaviors are not considered. What we want to get is the behaviors that can motivate the students, rather than arouse the students' averseness; and secondly, the coding scheme can not apply to the attention information and speech, text communication. In this paper, we use the traditional questionnaire to get the "optimal reaction" of the tutor towards the learner's attention information and affective state. The critical observation is that every excellent teacher has commonsense of the kind we want to give our agent tutor. If we can find good ways to extract commonsense from human tutor by prompting them, asking them questions, presenting them with lines of reasoning to confirm or repair, and so on, we may be able to accumulate many of the knowledge structures needed to give our agent tutor the capacity for commonsense reasoning for student's attention information and affective state. So we built a system called Human Tutor Commonsense make it easy for human tutors to collaborate to construct a database of commonsense knowledge. We invited more than 100 excellent teachers to log on our system to build the database. Then, a group of 50 students were asked to evaluate how much they satisfied with these commonsense, on a scale from 1 (strongly dissatisfied) to 5 (strongly satisfied). Then we chose two commonsense with highest mean score as the "optimal reaction" for each situation these questions described. Based on the commonsense we obtained, MITS can be represented as a dynamic network as shown in Figure 9. Whenever the student's pedagogical state or attention information or

 Each time the dynamic network receives new evidence (the change of pedagogical state or attention information or affective state), a new time slice is added to the existing

This paper debuts a Multimodal Intelligent Tutoring Systems. Attention information detection and affective state detection are carried out. Meanwhile, the system adapts to the

The case-based method chose a tutorial action from the commonsense database

affective state is changed, the following events are happen:

The tutorial action is taken by the agent tutor "Alice"

"Alice" waits for the next student action

network

The history is updated

Fig. 9. Dynamic network for MITS.

**7. Conclusion** 
