**3.1 Component based system design**

12 Will-be-set-by-IN-TECH

In Section 2.7 we have tried to evaluate effect of using key and speech inputs especially

As we discussed in Section 2.5, vocal communication increases the transparency of the human-robot communication to the audience. We could see from the experiment results that the enjoyment and the fulfillment of the demonstration from audience perspective was not let

One of the unexpected effect of using speech is, because it is very intuitive, it decreases the burden for remembering the commands. Error rate of key based commands is unexpectedly

Despite supportive evidence for speech input, about half of the demonstrators answered that they prefer using key input. In the question sheet, subject can leave comments. Subjects who are supportive to the speech input commented that they actually enjoyed reacting to mistakes made by speech recognition. This comment can be explained that by increasing the transparency of communication channel between human-robot demonstrators, the subject can observe what is happening in the situation (human demonstrator said the right thing, robot mistakes) and feel the mistaken response of the robot funny. On the other hand, subjects who are supportive to the key input commented that they want to demonstrate the exercise in a

We are currently preparing to do an evaluation which is more focused on measuring these effects and searching for a way to realize an appropriate interface for both people who prefers

As we have looked and discussed in this section, humanoid robot has different character than the other artifacts. *It has human shape* which can attract human to join in the activity. This character also gives some effect to *gain expectation to use natural communication method (voice)* as the human do. *It has physical body and exist in same world* which can give effect also to the observers. These characters are especially useful for applications such as exercise

However, this character sometimes gives negative effect to the usefulness. Because the human gain too much expectation to the robot to use natural communication method, human tends to use colloquial expression towered the robot, which is difficult for the robot to understand. In Section 2.7, we have seen the command acceptance rate using voice recognition evaluated by elderly users. In this experiment the command acceptance rate is low, not because voice recognition rate is low, but mostly because conversation patterns programmed to the dialog manager was not enough to understand the all varieties of colloquial expressions given by the

Because the robot has physical body and exist in physical world, the voice recognition system

Although for ideal benefits of using humanoid robots, above practical problems are need to

We are not only developing the applications for humanoid robots, but also developing a support tools for assist development of the communication functions for humanoid robots.

of the robot have to work under noisy condition of the real environment.

be solved beforehand to enhance the usefulness of the robots.

From the next section, we will introduce our development tools.

high, despite it's preciseness. This may also support the use of speech input.

focused to human-robot collaborative demonstration setup.

down by the imprecision in speech recognition.

**2.10 Summary**

more precise manner.

enjoyment or precision.

**2.11 Left problems**

demonstration.

users.

At present, we are developing a set of software called Open Source Software Suite for Human Robot Interaction (OpenHRI). Using OpenHRI, we aim to solve the above problems and enable the development of communication functions for robots. For this purpose, we employ the following approach.

**Introduce a uniform component model:** We construct our set of software on RT-Middleware, an object management group (OMG)-compliant robot technology middleware specification (Ando et al, 2005). The RT-Middleware specification can be used to connect all the components without requiring implementation issues to be taken into account. Further, because it is a standard architecture for building robotic systems, individual components developed in different institutes can easily be connected.

**Provide the required functions in a reconfigurable manner:** We implement various functions from audio signal processing to dialog management in a uniform and reconfigurable manner. The developer can develop the entire system at a comparatively less development cost. In addition, the system can easily be adapted to different environments for realizing accurate recognition.

Figure 11 illustrates the overall architecture of the components provided in OpenHRI. The software covers all the functions for the development of the communication system and also incorporates an interface for establishing connections with other components that can provide multi-modal information.

The component architecture of our software is based on RT-Middleware. RT-Middleware is a middleware architecture for robotic applications that has been standardized by the OMG.

In RT-Middleware, each function of the robot is implemented as a "node." An application system can be developed by selecting the required components and connecting them to each other (the connections are called "links"). Figure 12 shows the "RT-SystemEditor" development tool to edit the links between the components.

In the specification of RT-Middleware, a "data port" is defined as a connection point of a link that realizes the transmission of a data stream. A "service port" is defined as an entry point of

**3.1.3 Voice recognition and synthesis components**

used to give instructions that carry the same meaning.

humans do not say "walk N steps" to a robot on wheels.

"juliustographviz" tool.

cause this diversity.

synthesized voice as output.

**3.2 Incremental script development**

above (described in Section 3.2.4).

assume the following form:

*s*<sup>0</sup> is the initial state.

**3.2.1 Formalization of state-transition model**

the ability to output the recognized result as an extensible XML stream.

with this, it is necessary for them to understand a variety of commands.

can accumulate the number of commands that the system can dealt with.

The voice recognition component is based on Julius (Kawahara et al, 2000) in combination with English and Japanese acoustic models. Our component is designed to possess the following features: (a) the ability to read grammar format in W3C-SRGS XML form and (b)

Speech Communication with Humanoids: How People React and How We Can Build the System 179

Figure 13 shows an example of voice recognition grammar specified using W3C-SRGS format. Voice recognition grammar can be visualized by combination of "srgstojulius" and

The voice synthesis component is based on Festival for English and Open\_JTalk for Japanese. The component accepts plain text as input and provides a data stream in the form of a

Commands given by the human to the robot are diverse. The following are the factors that

**The nature of language:** Human language is ambiguous, and different expressions can be

**Tasks:** Robots working in a life environment have to accept a variety of tasks. In order to cope

**Ability of the robot itself:** The diversity is also caused by the ability of the robot itself. A command from a human becomes effective due to the functions of the robot. For example,

Incremental development of the state-transition model has previously been conducted using the "extension-by-connection" method (described in the next section). In this section, we propose an "extension-by-unification" method that can cope with the diversities mentioned

A state-transition model is a modeling method in which the input and output of the system

where *I* represents the input alphabet, *O* represents the output alphabet, *S* represents the internal states, *γ* represents the state transition function, *λ* represents the output function and

The state transition function *γ* is defined in association with the state to the input.

The output function *λ* is defined in association with the state to the input.

*A* :=< *I*, *S*,*O*, *γ*, *λ*,*s*<sup>0</sup> > (1)

*γ* : *S* × *I* → *S* (2)

*λ* : *S* × *I* → *O* (3)

The language comprehension system of the robot must be able to deal these diversities. In the script-based development approach, diversity has been dealt with by stacking a newly developed script onto the existing scripts. By accumulating a number of scripts, the developer


Fig. 12. Screenshot of RT-SystemEditor.

each function call for a service function. "Configuration parameters" are defined to configure each component.

OpenRTM-aist is an implementation of the RT-Component specification that supports C++, JAVA, and Python languages. It runs on various platforms such as Windows, Linux, Mac OS, and FreeBSD.

#### **3.1.1 Audio input/output components**

Audio input components accept the audio information from the sound device as input, convert it to a OpenRTM-aist data stream, and pass on the output to other linked components. These components also accept audio data streams from other components as input and pass on the output to the sound device.

We use portaudio, a cross platform audio input/output library, to implement both the audio input and output components. The components support both Windows and Linux platforms from monoral input to multichannel inputs.

#### **3.1.2 Audio filter components**

Audio filter components contain input and output ports.

The "sample rate conversion component" converts the sample rate of the audio stream by using an up/down sampling algorithm. The "echo cancel component" has two input streams; it subtracts the input of stream 1 from that of stream 2 by finding a maximum correlation. The "emphasis component" applies a signal processing algorithm to enhance or de-enhance the magnitude of the specified frequency in the data stream.

#### **3.1.3 Voice recognition and synthesis components**

The voice recognition component is based on Julius (Kawahara et al, 2000) in combination with English and Japanese acoustic models. Our component is designed to possess the following features: (a) the ability to read grammar format in W3C-SRGS XML form and (b) the ability to output the recognized result as an extensible XML stream.

Figure 13 shows an example of voice recognition grammar specified using W3C-SRGS format. Voice recognition grammar can be visualized by combination of "srgstojulius" and "juliustographviz" tool.

The voice synthesis component is based on Festival for English and Open\_JTalk for Japanese. The component accepts plain text as input and provides a data stream in the form of a synthesized voice as output.

## **3.2 Incremental script development**

14 Will-be-set-by-IN-TECH

each function call for a service function. "Configuration parameters" are defined to configure

OpenRTM-aist is an implementation of the RT-Component specification that supports C++, JAVA, and Python languages. It runs on various platforms such as Windows, Linux, Mac OS,

Audio input components accept the audio information from the sound device as input, convert it to a OpenRTM-aist data stream, and pass on the output to other linked components. These components also accept audio data streams from other components as input and pass

We use portaudio, a cross platform audio input/output library, to implement both the audio input and output components. The components support both Windows and Linux platforms

The "sample rate conversion component" converts the sample rate of the audio stream by using an up/down sampling algorithm. The "echo cancel component" has two input streams; it subtracts the input of stream 1 from that of stream 2 by finding a maximum correlation. The "emphasis component" applies a signal processing algorithm to enhance or de-enhance the

Fig. 12. Screenshot of RT-SystemEditor.

**3.1.1 Audio input/output components**

on the output to the sound device.

**3.1.2 Audio filter components**

from monoral input to multichannel inputs.

Audio filter components contain input and output ports.

magnitude of the specified frequency in the data stream.

each component.

and FreeBSD.

Commands given by the human to the robot are diverse. The following are the factors that cause this diversity.

**The nature of language:** Human language is ambiguous, and different expressions can be used to give instructions that carry the same meaning.

**Tasks:** Robots working in a life environment have to accept a variety of tasks. In order to cope with this, it is necessary for them to understand a variety of commands.

**Ability of the robot itself:** The diversity is also caused by the ability of the robot itself. A command from a human becomes effective due to the functions of the robot. For example, humans do not say "walk N steps" to a robot on wheels.

The language comprehension system of the robot must be able to deal these diversities.

In the script-based development approach, diversity has been dealt with by stacking a newly developed script onto the existing scripts. By accumulating a number of scripts, the developer can accumulate the number of commands that the system can dealt with.

Incremental development of the state-transition model has previously been conducted using the "extension-by-connection" method (described in the next section). In this section, we propose an "extension-by-unification" method that can cope with the diversities mentioned above (described in Section 3.2.4).

### **3.2.1 Formalization of state-transition model**

A state-transition model is a modeling method in which the input and output of the system assume the following form:

$$A :=  *\tag{1}*$$

where *I* represents the input alphabet, *O* represents the output alphabet, *S* represents the internal states, *γ* represents the state transition function, *λ* represents the output function and *s*<sup>0</sup> is the initial state.

The state transition function *γ* is defined in association with the state to the input.

$$
\gamma: \mathbb{S} \times I \to \mathbb{S} \tag{2}
$$

The output function *λ* is defined in association with the state to the input.

$$
\lambda: \mathcal{S} \times I \to \mathcal{O} \tag{3}
$$

Fig. 14. Example of state-transition model.

to represent the current state.

between the states.

Even the input to the system is same, the output of the sytem may be different, because the

Speech Communication with Humanoids: How People React and How We Can Build the System 181

We have explained the stat-transiton model in an equation form, however, the state-transition model can be also presented in a 2-dimensional diagram called "state-transition diagram." In the diagram, each state is represented by a circle, and the transition between states is represented by arrows. In this chapter, we annotate the transition conditions and the associative actions by including text over each arrow. We use a black circle (called a "token")

For example, Figure 14 represents a conversation modeled by the state-transition model. The model presented in Figure 14, the initial state of the system is in "TV control" state. When the model gets the instruction "Turn on" as an input, it will output the command "turn-on-TV" and state transition "(a)" will occur. Then the token turns back to the same "TV control" state. When the model gets the instruction "Video" as an input, state transition "(b)" will occur and the token will move to "VTR control" state. This time when the instruction "Turn on" is given, state transition "(c)" occur and output the command "turn-on-video". In this way, we can model the context by defining an appropriate state and state transitions

*O* = ("turn-on-tv", "turn-on-video",

As we have seen here, the expression in equation form has an advantage in formalization, while the expression in diagram form has an advantage in quick understanding. In later

 *<sup>s</sup>*<sup>0</sup> *<sup>s</sup>*<sup>0</sup> *<sup>s</sup>*<sup>0</sup> *<sup>s</sup>*<sup>1</sup> *s*<sup>1</sup> *s*<sup>1</sup> *s*<sup>0</sup> *s*<sup>1</sup>

 *<sup>o</sup>*<sup>0</sup> *<sup>o</sup>*<sup>2</sup> *none none <sup>o</sup>*<sup>1</sup> *<sup>o</sup>*<sup>3</sup> *none none*

*A* := < *I*, *S*,*O*, *γ*, *λ*,*s*<sup>0</sup> > (6)

*I* = ("Turn on", "Turn off", "TV", "Video") (7)

*S* = ("tv-control", "vtr-control") (8)

"turn-off-tv", "turn-off-video") (9)

(10)

(11)

internal state *st* will be updated each time the system gets the input.

Above example is expressed as follows in the equation form:

*γ* =

*λ* =

```
<?xml version="1.0" encoding="UTF-8" ?>
<grammar xmlns="http://www.w3.org/2001/06/grammar"
        xml:lang="en"
        version="1.0" mode="voice" root="main">
 <rule id="main">
   <one-of>
     <item><ruleref uri="#greet" /></item>
     <item><ruleref uri="#command" /></item>
   </one-of>
 </rule>
 <rule id="greet">
   <one-of>
     <item>hello</item>
     <item>bye</item>
   </one-of>
 </rule>
 <rule id="command">
   <one-of>
     <item>pick</item>
     <item>give me</item>
   </one-of>
   <one-of>
     <item>apple</item>
     <item>cake</item>
     <item>remote</item>
   </one-of>
 </rule>
</grammar>
     S2 S1
                             hello
                              bye
                                       S3
                     pick
                      S4
            give
                                                apple
                                                cake
                                               remote
                              m e
```
Fig. 13. Example of the voice recognition grammar and its visualization. The grammar is in W3C-SRGS form. "one-of" indicates the grammar matches either one of the child items. "ruleref" indicates reference to "rule" identified by the "id".

When the system is in state *st* and get input alphabet *it*, state transition to *st*+<sup>1</sup> will occour as follows:

$$s\_{t+1} = \gamma\_{s\_t, i\_t} \tag{4}$$

At the same time, we get output alphabet *ot* as follows:

$$
\rho\_{t+1} = \lambda\_{s\_t i\_t} \tag{5}
$$

Fig. 14. Example of state-transition model.

16 Will-be-set-by-IN-TECH

S2 S1

S3

apple

cake remote

*st*+<sup>1</sup> = *γst*,*it* (4)

*ot*+<sup>1</sup> = *λst*,*it* (5)

bye

m e

Fig. 13. Example of the voice recognition grammar and its visualization. The grammar is in W3C-SRGS form. "one-of" indicates the grammar matches either one of the child items.

When the system is in state *st* and get input alphabet *it*, state transition to *st*+<sup>1</sup> will occour as

pick

S4

"ruleref" indicates reference to "rule" identified by the "id".

At the same time, we get output alphabet *ot* as follows:

hello

<?xml version="1.0" encoding="UTF-8" ?>

xml:lang="en"

<item>hello</item> <item>bye</item>

<item>pick</item> <item>give me</item>

<item>apple</item> <item>cake</item> <item>remote</item>

give

<rule id="main"> <one-of>

</one-of> </rule>

</one-of> </rule>

<one-of>

</one-of> <one-of>

</one-of> </rule> </grammar>

follows:

<rule id="greet"> <one-of>

<rule id="command">

<grammar xmlns="http://www.w3.org/2001/06/grammar"

<item><ruleref uri="#greet" /></item> <item><ruleref uri="#command" /></item>

version="1.0" mode="voice" root="main">

Even the input to the system is same, the output of the sytem may be different, because the internal state *st* will be updated each time the system gets the input.

We have explained the stat-transiton model in an equation form, however, the state-transition model can be also presented in a 2-dimensional diagram called "state-transition diagram." In the diagram, each state is represented by a circle, and the transition between states is represented by arrows. In this chapter, we annotate the transition conditions and the associative actions by including text over each arrow. We use a black circle (called a "token") to represent the current state.

For example, Figure 14 represents a conversation modeled by the state-transition model.

The model presented in Figure 14, the initial state of the system is in "TV control" state. When the model gets the instruction "Turn on" as an input, it will output the command "turn-on-TV" and state transition "(a)" will occur. Then the token turns back to the same "TV control" state. When the model gets the instruction "Video" as an input, state transition "(b)" will occur and the token will move to "VTR control" state. This time when the instruction "Turn on" is given, state transition "(c)" occur and output the command "turn-on-video". In this way, we can model the context by defining an appropriate state and state transitions between the states.

Above example is expressed as follows in the equation form:

$$A \coloneqq < I \, S \, O \, \gamma \, \lambda \, \text{s}\_{\prime} \, \lambda \text{,} s\_0 > \tag{6}$$

*I* = ("Turn on", "Turn off", "TV", "Video") (7)

$$S = \text{("tv-control", "vt-control")}\tag{8}$$

$$O = (\text{"turn-on-tv"}, \text{"turn-on-video"}, )$$

"turn-off-tv", "turn-off-video") (9)

$$\gamma = \begin{pmatrix} \text{s\_0 s\_0 s\_0 s\_1} \\ \text{s\_1 s\_1 s\_0 s\_1} \end{pmatrix} \tag{10}$$

$$
\lambda = \begin{pmatrix} o\_0 \ o\_2 \ nonne \ nonne \\ o\_1 \ o\_3 \ nonne \ nonne \end{pmatrix} \tag{11}
$$

As we have seen here, the expression in equation form has an advantage in formalization, while the expression in diagram form has an advantage in quick understanding. In later

*λ*�

*λ*� *s*�

Δ*γ* includes both old state *S* and new state Δ*S* in its definition.

**3.2.3 Problems with the extension-by-connection method**

First, the state *SAC* is easily separable from state *S<sup>A</sup>* and state *SC*:

*γAC sA <sup>t</sup>* ,*<sup>i</sup>* <sup>=</sup> *<sup>γ</sup><sup>A</sup>*

*γAC sAC <sup>t</sup>* ,*<sup>i</sup>* <sup>=</sup> *<sup>γ</sup><sup>C</sup>*

*γ<sup>C</sup>* contains state *S<sup>A</sup>* in its definition.

diverted for the extension of robot B.

However, the definition of state-transition function *γAC* is as follows:

following scenario:

to *AAC*.

C to this robot.

realize the function.

*<sup>t</sup>*,*<sup>i</sup>* = <sup>Δ</sup>*λs*� *t*

Extension-by-connection is a useful method, but it has the following problems.

,*<sup>i</sup>* (*s*�

Speech Communication with Humanoids: How People React and How We Can Build the System 183

The transition function of the accumulated part Δ*γ* needs to be defined based on the transition from the existing state *S* . Therefore, Δ*γ* will be a matrix of *S*� × *I*. Note that the new state Δ*S* can be expressed only by the newly defined part, but the transition of the accumulated part

The state-transition model is easy to understand in drawing a state-transition diagram. Extension-by-connection can also be carried out very easily by editing this diagram. There are several GUI that can add state-transition rules through the operation of mouse clicks.

As we can see in Equation 14 and Equation 16, the definition of Δ*γ*� requires both *S* and Δ*S*. This causes problems in the function development of robots. For example, let us consider the

1. Robot "A" has function A, and we have already developed a state-transition model *AA*to

2. For the robot "A" to accumulate function C, we have extended the state-transition model

3. We have developed another robot, "B," which has function B. And we want to add function

Here, the state-transition model for function C is already developed for robot A. We want to reuse the model for robot B. Here, we discuss whether such a diversion would be possible.

*ij* (*s<sup>A</sup>*

*ij* (*sAC*

Because states *S<sup>A</sup>* and *S<sup>B</sup>* are defined for different types of robots, A and B are not equal. In addition, because the transition for the function C is defined dependently on state *SA*, we cannot replace variables like *SAC* = *SBC*, which means that we cannot use *γ<sup>C</sup>* to extend the state-transition model *AB*. The state transition of function C developed for robot A cannot be

Ideally, once a feature is developed, it would be possible to share with other robots that need the same feature. In order to achieve this, we introduce the extension-by-unification method.

*<sup>t</sup>* ∈ *S*�

*st*,*<sup>i</sup>* = *λst*,*<sup>i</sup>* (*st* ∈ *S*, *i* ∈ *I*) (17)

*<sup>S</sup><sup>C</sup>* <sup>=</sup> *<sup>S</sup>AC* <sup>−</sup> *<sup>S</sup><sup>A</sup>* (19)

*<sup>t</sup>* <sup>∈</sup> *<sup>S</sup>A*, *<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>*) (20)

*<sup>t</sup>* <sup>∈</sup> *<sup>S</sup>AC*, *<sup>i</sup>* <sup>∈</sup> *<sup>I</sup>*) (21)

, *i* ∈ *I*) (18)

Fig. 15. Extension of a state machine using the extension-by-connection model.

discussion, we will use both the equation and the diagram forms to explain the concept quickly and formally.

State-transition model is very simple but very powerful modeling method and has been applied to very wide applications. Because the structure of state-transition model is very simple, it is frequently misunderstood that the state-transition model can only model simple behavior. However, it can model diverse behavior by applying some extensions (e.g. Huang et al (2000), Denecke (2000)).

#### **3.2.2 Extension-by-connection method**

The simplest way to extend state-transition model is as follows.


This process is illustrated in Figure 15.

Here, we formulate the above process. Let the existing state-transition model be *A* and the accumulated state-transition model be *A*� .

As explained in Section 3.2.1, the existing state-transition model *A* can be represented by following form.

$$A :=  \tag{12}$$

Here, *S* is the set of state *s* ∈ *S*. The transition function *γ* can be defined in any form. In this chapter, we use the matrix of *S* × *I*, in which the transition from state *st* to state *st*+<sup>1</sup> can occur if *γSt*,*<sup>i</sup>* = *st*+1.

Similarly, we define the accumulated state-transition model *A*� as follows:

$$A' :=  \tag{13}$$

Then, the new state Δ*S* can be calculated as follows:

$$S' = \mathcal{S} \cup \Delta \mathcal{S} \tag{14}$$

Here, *S*� ∩ Δ*S* = ∅.

The new state transition Δ*γ* can be calculated as follows:

$$
\gamma\_{s\_{t\prime}i}' = \gamma\_{s\_{t\prime}i} \qquad \quad (s\_t \in \mathcal{S}, i \in I) \tag{15}
$$

$$
\gamma\_{s\_{l\prime}'i}' = \Delta \gamma\_{s\_{l\prime}'i} \qquad \quad (s\_l' \in S', i \in I) \tag{16}
$$

$$
\lambda\_{s\_{l,l}}' = \lambda\_{s\_{l,l}} \qquad \quad (s\_l \in \mathcal{S}, i \in I) \tag{17}
$$

$$
\lambda\_{s'\_{l'}i}' = \Delta \lambda\_{s'\_{l'}i} \qquad (s'\_l \in S', i \in I) \tag{18}
$$

The transition function of the accumulated part Δ*γ* needs to be defined based on the transition from the existing state *S* . Therefore, Δ*γ* will be a matrix of *S*� × *I*. Note that the new state Δ*S* can be expressed only by the newly defined part, but the transition of the accumulated part Δ*γ* includes both old state *S* and new state Δ*S* in its definition.

The state-transition model is easy to understand in drawing a state-transition diagram. Extension-by-connection can also be carried out very easily by editing this diagram. There are several GUI that can add state-transition rules through the operation of mouse clicks.

#### **3.2.3 Problems with the extension-by-connection method**

18 Will-be-set-by-IN-TECH

discussion, we will use both the equation and the diagram forms to explain the concept

State-transition model is very simple but very powerful modeling method and has been applied to very wide applications. Because the structure of state-transition model is very simple, it is frequently misunderstood that the state-transition model can only model simple behavior. However, it can model diverse behavior by applying some extensions (e.g. Huang

Here, we formulate the above process. Let the existing state-transition model be *A* and the

As explained in Section 3.2.1, the existing state-transition model *A* can be represented by

Here, *S* is the set of state *s* ∈ *S*. The transition function *γ* can be defined in any form. In this chapter, we use the matrix of *S* × *I*, in which the transition from state *st* to state *st*+<sup>1</sup> can occur

> ,*O*, *γ*� , *λ*� ,*s*�

,*<sup>i</sup>* (*s*�

*<sup>t</sup>* ∈ *S*�

*A* :=< *I*, *S*,*O*, *γ*, *λ*,*s*<sup>0</sup> > (12)

*S*� = *S* ∪ Δ*S* (14)

*st*,*<sup>i</sup>* = *γst*,*<sup>i</sup>* (*st* ∈ *S*, *i* ∈ *I*) (15)

<sup>0</sup> > (13)

, *i* ∈ *I*) (16)

.

Similarly, we define the accumulated state-transition model *A*� as follows:

*A*� :=< *I*, *S*�

Fig. 15. Extension of a state machine using the extension-by-connection model.

quickly and formally.

following form.

if *γSt*,*<sup>i</sup>* = *st*+1.

Here, *S*� ∩ Δ*S* = ∅.

et al (2000), Denecke (2000)).

**3.2.2 Extension-by-connection method**

This process is illustrated in Figure 15.

accumulated state-transition model be *A*�

Then, the new state Δ*S* can be calculated as follows:

The new state transition Δ*γ* can be calculated as follows:

*γ*�

*γ*� *s*�

*<sup>t</sup>*,*<sup>i</sup>* = <sup>Δ</sup>*γs*� *t*

The simplest way to extend state-transition model is as follows.

2. Add a new transition from the existing state to the new state.

1. Add a new state to the existing state-transition model.

Extension-by-connection is a useful method, but it has the following problems.

As we can see in Equation 14 and Equation 16, the definition of Δ*γ*� requires both *S* and Δ*S*. This causes problems in the function development of robots. For example, let us consider the following scenario:


Here, the state-transition model for function C is already developed for robot A. We want to reuse the model for robot B. Here, we discuss whether such a diversion would be possible. First, the state *SAC* is easily separable from state *S<sup>A</sup>* and state *SC*:

$$\mathbf{S}^{\mathbb{C}} = \mathbf{S}^{A\mathbb{C}} - \mathbf{S}^{A} \tag{19}$$

However, the definition of state-transition function *γAC* is as follows:

$$
\gamma\_{s\_l^A, i}^{A\mathbb{C}} = \gamma\_{i\bar{j}}^A \qquad (s\_l^A \in \mathbb{S}^A, i \in I) \tag{20}
$$

$$
\gamma\_{s\_l^{A\mathbb{C}},i}^{A\mathbb{C}} = \gamma\_{ij}^{\mathbb{C}} \qquad (s\_l^{A\mathbb{C}} \in \mathbb{S}^{A\mathbb{C}}, i \in I) \tag{21}
$$

*γ<sup>C</sup>* contains state *S<sup>A</sup>* in its definition.

Because states *S<sup>A</sup>* and *S<sup>B</sup>* are defined for different types of robots, A and B are not equal. In addition, because the transition for the function C is defined dependently on state *SA*, we cannot replace variables like *SAC* = *SBC*, which means that we cannot use *γ<sup>C</sup>* to extend the state-transition model *AB*. The state transition of function C developed for robot A cannot be diverted for the extension of robot B.

Ideally, once a feature is developed, it would be possible to share with other robots that need the same feature. In order to achieve this, we introduce the extension-by-unification method.

As visible in Equation 26, the transition function *γ*� is an *S*� × *S*� matrix that only includes state *S*� in its definition. The extension-by-unification method does not require the definition of the

Speech Communication with Humanoids: How People React and How We Can Build the System 185

As noted in Section 3.2.3, in the conventional extension-by-connection method, the definition of the accumulated part of the state-transition model depends on information on the existing state. It is limited in terms of reusing scripts for this reason. The proposed extension-by-unification method does not have this problem. Using this method, we can

By using the above algorithms, the possibility of unification between scripts can be identified as "Unifiable," "Unifiable (Occurrence of isolated state)," or "Conflict". Similarly, scripts can be classified as "Executable" or "Unexecutable." By comparing a script and an adaptor definition for the existing scripts, we can obtain a list of scripts annotated with 6 (3 × 2) classes. Our script management server displays the above list at the bottom of each wiki page. By displaying the list, the developer can easily find a script that can be included in his/her current

Figure 17 shows overview of the editing system and Figure 18 shows example of using the

Wiki server (Docu m ents + Scripts)

Auto m aton unifier

Scripts

Current state

ID of m atched phrase

Unified auto m aton

Phrase m atcher

Hash m e m ory

Auto m aton controller

State m e m ory

Browser

Browser

Developer

Developer

original state in the accumulated part of the state-transition model.

significantly increase the reusability of the state-transition model.

**3.2.5 Visualization of unifiable states**

Adaptor

Fig. 17. Overview of the editing system.

application.

web based interface.

Input

Output

Fig. 16. Extension of the state-transition model using the extension-by-unification method.

#### **3.2.4 Extension-by-unification method**

In the extension-by-unification method, we extend the state-transition model by the following procedure:


This process is illustrated in Figure 16.

Here, we formulate the above process.

The existing state-transition model *A* can be represented by state *S*, state transition *γ*, and initial state *s*0:

$$A :=  \tag{22}$$

Similarly, the new state-transition model *A*� is represented as follows:

$$A' :=  \tag{23}$$

We accumulate the state-transition model *A*�� by unifying *A* and *A*� . First, we calculate state as follows:

$$\mathcal{S}^{\prime\prime} = \mathcal{S} \cup \mathcal{S}^{\prime} \tag{24}$$

Here, *S* ∩ *S*� �= ∅.

Next, the transition between the state *S*�� is calculated as follows:

$$
\gamma\_{s\_{l\prime}i}^{\prime\prime} = \gamma\_{s\_{l\prime}i} \qquad \quad (s\_l \in \mathbb{S}, i \in I) \tag{25}
$$

$$
\gamma\_{s'\_{l'}i}^{\prime\prime} = \gamma\_{s'\_{l'}i}^{\prime} \qquad (s'\_t \in \mathcal{S}^{\prime}, i \in I) \tag{26}
$$

$$
\lambda\_{s\_l, i}^{\prime \prime} = \lambda\_{s\_l, i} \qquad (s\_l \in \mathcal{S}, i \in I) \tag{27}
$$

$$
\lambda\_{\mathbf{s}'\_{l},i}^{\prime\prime} = \lambda\_{\mathbf{s}'\_{l},i}^{\prime} \qquad (\mathbf{s}\_{l}^{\prime} \in \mathbb{S}^{\prime}, i \in I) \tag{28}
$$

By defining initial state *s*�� <sup>0</sup> to be *s*�� <sup>0</sup> = *s*0, the extended state-transition model *A*�� will be as follows:

$$A^{\prime\prime} =  \tag{29}$$

As visible in Equation 26, the transition function *γ*� is an *S*� × *S*� matrix that only includes state *S*� in its definition. The extension-by-unification method does not require the definition of the original state in the accumulated part of the state-transition model.

As noted in Section 3.2.3, in the conventional extension-by-connection method, the definition of the accumulated part of the state-transition model depends on information on the existing state. It is limited in terms of reusing scripts for this reason. The proposed extension-by-unification method does not have this problem. Using this method, we can significantly increase the reusability of the state-transition model.
