**2. Architecture for MITS**

ITS is a computer-based educational system that provides individualized instruction like a human tutor. A traditional ITS decides how and what to teach based on the student pedagogical state. However, it has been demonstrated that an experienced human tutor can manage the attention information and affective state (besides the pedagogical state) of the student to motivate him and to improve the learning process. Therefore, the interface between the student and tutor in traditional ITS needs to be augmented to include attention information interface and affective information interface. ITS needs the ability of reasoning about the attention information and affective state to provide students with an adequate response from a pedagogical, attentive and affective point of view; in this sense, the module of attention information processing and affective information processing are required. The attention information processing module analyzes the gaze behavior of student in real-time and is capable of adapting the presentation flow according to the student's interest or noninterest. The affective information processing module analyzes the facial expression, speech and text input by the student to sense the underpinned affective qualities. Once the attention information and affective state have been obtained, the agent tutor has to respond accordingly. We must enable a mapping from the attention information and affective state

Multimodal Intelligent Tutoring Systems 85

illustrates one example of the interest areas. For each interest area, the interest score is calculated. When the score for an area exceeds a threshold, the agent will react if a reaction

The key functionality of the attention information processing module in our MITS is

 Monitor the grounding. In human face-to-face communication, grounding relates to the process of ensuring that what has been said is understood by the conversational partners, i.e. there is "common ground". During the learner-tutor interaction, grounding is considered successful if the following condition is met: the student's gaze shows a transition from the screen area of the speaking tutor to the screen area of the referent mentioned by the tutor. When positive evidence in grounding is observed, the course will continue. And a window of contextual content, i.e. the related contend,

 Guarantee the attention. The agent will perform an interruption if the student attends to interest areas that are not considered as part of the current content the agent is talking about. An "alert" action will be performed if the student does not gaze at the display,

 Note the history. This function records which area and how much of the area has been accessed by the student. If an important area that the student does not pay enough attention to, this area might be proposed again. While the area previously has been accessed for enough time, it is not very likely that the student intends to activate it

The components aforesaid are all based on the modified version of the algorithm described by Qvarfordt (Qvarfordt & Zhai, 2005), where it is used for an intelligent virtual tourist information environment (iTourist). Two interest metrics were developed: (1) the Interest Score (IScore) and (2) the Focus of Interest Score (FIScore). IScore is used to determine an area's "arousal" level, or the likelihood that the user is interested in it. When the IScore metric passes a certain threshold, the area is said to become "active". FIScore measures how

is defined.

Fig. 2. Example of "interest areas".

characterized by three main components:

maybe popup according to the referent.

for example, gaze out of the window.

again.

to actions of the agent tutor. We refined our tutoring strategy module by means of questionnaires presented to teachers. In the questionnaires we presented several scenarios of tutoring and asked the teachers to give the appropriate pedagogical and affective action for each scenario. The affective action includes the facial expression, emotional speech synthesis and text that produced from the Artificial Intelligence Markup Language (AIML) Retrieval Mechanism. The Architecture of MITS can be seen in figure 1.

Fig. 1. Architecture of MITS.
