**3.1. Application for discrete event dynamic robotic system control**

A discrete event dynamic system is a discrete-state, event-driven system in which the state evolution depends entirely on the occurrence of asynchronous discrete events over time [2]. Petri nets have been used to model various kinds of dynamic event-driven systems like computers networks, communication systems, and so on. In this Section, it is used to model Sony AIBO learning control system for the purpose of certification of the effectiveness of the proposed LPN.

*AIBO voice command recognition system* 

AIBO (Artificial Intelligence roBOt) is a type of robotic pets designed and manufactured by Sony Co., Inc. AIBO is able to execute different actions, such as go ahead, move back, sit down, stand up and cry, and so on. And it can "listens" voice via microphone. A command and control system will be constructed for making AIBO understand several human voice commands by Japanese and English and take corresponding action. The simulation system is developed on Sony AIBO's OPEN-R (Open Architecture for Entertainment Robot) [19]. The architecture of the simulation system is showed in Figure 3. Because there are English and Japanese voice commands for same AIBO action, the partnerships of voice and action are established in part (4). The lasted time of an AIBO action is learning in part (5). After an AIBO action finished, the rewards for correctness of action and action lasted time are given by the touch of different AIBO's sensors.

**Figure 3.** System architecture of voice command recognition

#### *LPN model for AIBO voice command recognition system*

In the LPN model for AIBO voice command recognition system, AIBO action change, action time are modeled as transition, transition delay, respectively. The human voice command is modeled by the different color Token. The LPN model is showed in Figure 4. The meaning of every transition is listed below: *Tr input* changes voice signal as colored Token which describe the voice characteristic. *Tr*11*, Tr*<sup>12</sup> and *Tr*<sup>13</sup> can analyze the voice signal. *Tr1* generates 35 different Token *VL1….VL35* according to the voice length. *Tr2* generates 8 different Token *E21…E28* according to the front twenty voice sample energy characteristic. *Tr3* generates 8 different Token *E41…E48* according to the front forty voice sample energy characteristic [8]. These three types of the token are compounded into a compound Token *<VLl> + <VE2m> + <VE4n>* in *p*<sup>2</sup> [12].

*Tr2j* generates the different voice Token. The input arc's weight function is *((<VLl>+<VE2m>+ <VE4n>), VWVlmn,2j)* and the output arc's weight function is different voice Token. And voice Token will generate different action Token through *Tr3j*. When *Pr4* – *Pr8* has Token, AIBO's action will last. *Tr4j* takes Token out from *p4* – *p8*, and makes corresponding AIBO action terminates. *Tr4j* has a delay time *DT4i*, and every *DT4i* has a value *VT4i*. Transition adopts which delay time *DT4i* according to *VT4i*.

#### *Results of simulation*

150 Petri Nets – Manufacturing and Computer Science

precise delay time.

proposed LPN.

10 1 ( ) *mn m jk j*

*t\*opt* of *Q* = *f*(*t*) which makes maximum *Q* is the expected optimal delay time.

*i k i*

the function approximation method. And it is listed in Table 3.

**Step 2.** Step 2. Initialize Petri net, i.e. make the Petri net state as *P1*.

Repeat (i) and (ii) until system becomes end state.

**3. Applying LPN to robotic system control** 

i. Randomly select the transition delay time *t.*

function approximation method.

*AIBO voice command recognition system* 

Transition's delay time learning algorithm 2 ( Function approximation method): **Step 1.** Step 1. Initialization: Set *Q*(*p, t*) of every transition's delay time to zero.

**Table 3.** Delay time learning algorithm using the function approximation method

**3.1. Application for discrete event dynamic robotic system control** 

*i k ii*

Solution of Equation (10) *a0*, *a1*, …, *an* can be deduced and *Q* = *f*(*t*) is attained. The solution

( ) <sup>0</sup> *f t t*

which makes *f*(*t\*opt*)= max *f*(*t*opt) (*opt* = 1, 2, …, *n-*1) is the expected optimal delay time. *t\*opt* is used as delay time and the system is executed and new *Q*(*p, t\*opt*) is gotten. This (*t\*opt*, *Q*(*p, t\*opt*)) is used as the new and the least squares method can be used again to acquire more

After the values of actions are gotten, the soft-max method is selected as the actions selection policy. And then, we found the learning algorithm of delay time of Learning Petri net using

ii. After transition fires and reward is observed, the value of *Q*(*p*, *t*) is adjusted using formula (3). **Step 3.** Step 3. Repeat Step 2 until adequacy data are gotten. Then, evaluate the optimal *t* using the

A discrete event dynamic system is a discrete-state, event-driven system in which the state evolution depends entirely on the occurrence of asynchronous discrete events over time [2]. Petri nets have been used to model various kinds of dynamic event-driven systems like computers networks, communication systems, and so on. In this Section, it is used to model Sony AIBO learning control system for the purpose of certification of the effectiveness of the

AIBO (Artificial Intelligence roBOt) is a type of robotic pets designed and manufactured by Sony Co., Inc. AIBO is able to execute different actions, such as go ahead, move back, sit down, stand up and cry, and so on. And it can "listens" voice via microphone. A command

The multi-solution of (11) *t = topt* (*opt* = 1, 2, …, *n*-1) is checked by function (5) and a *t\**

(*j* = 0, 1, …, *n*). (10)

(11)

*opt* ∈*topt*

*t a tQ*

When the system begins running, it can't recognize the voice commands. A voice command comes and it is changed into a compound Token in *p2*. This compound Token will randomly generate a voice Token and puts into *p3*. This voice Token randomly arouses an action Token. A reward for action correctness is gotten, then, *VW* and *VT* are updated. For example, a compound colored Token *(<VLl>+ <VE2m> + <VE4n>)* fired *Tr21* and colored Token

Construction and Application of Learning Petri Net 153

**3.2. Application for continuous parameter optimization** 

evaluated through computer simulation and real robot experiment.

light signal which is dynamically written to RFID.

**Figure 6.** The real experimental environment

meaning of place and transition in Figure 7 is listed below:

Tr2 Stop of the guide dog Tr3 Guide dog runs

P1 System starting state P2 Getting RFID information P3 Turning corner state P4 Left adjusting state

Tr4 Start of the turning corner state Tr5 Start of left adjusting state Tr6 Start of the right adjusting state Tr7 Stop of the turning corner state Tr8 Stop of the left adjusting state Tr9 Stop of the right adjusting state

P5 Right adjusting state Tr1 Reading of the RFID environment

*LPN model for the guide dog* 

*RFID environment construction* 

The proposed system is applied to guide dog robot system which uses RFID (Radiofrequency identification) to construct experiment environment. The RFID is used as navigation equipment for robot motion. The performance of the proposed system is

RFID tags are used to construct a blind road which showed in Figure 6. There are forthright roads, corners and traffic light signal areas. The forthright roads have two group tags which have two lines RFID tags. Every tag is stored with the information about the road. The guide dog robot moves, turns or stops on the road according to the information of tags. For example, if the guide dog robot reads corner RFID tag, then it will turn on the corner. If the guide dog robot reads either outer or inner side RFID tags, it implies that the robot will deviate from the path and robot motion direction needs adjusting. If the guide dog robot reads traffic control RFID tags, then it will stop or run unceasingly according to the traffic

The extended LPN control model for guide dog robot system is presented in Figure 7. The

**Figure 4.** LPN model of voice command recognition

*VC1* is put into *p3*. *VC1* fires *T32* and AIBO acts "go". A reward is gotten according to correctness of action. *VWVC1,32* is updated by this reward and *VWVC1,32* updated value is fed back to *p2* as next time reward value of *(<VLl>+ <VE2m> + <VE4n>)* fired *Tr21*. After an action finished, a reward for correctness of action time is gotten and *VT* is updated.

**Figure 5.** Relation between training times and recognition probability

Figure 5 shows the relation between training times and voice command recognition probability. Probability 1 shows the successful probability of recently 20 times training. Probability 2 shows the successful probability of total training times. From the result of simulation, we confirmed that LPN is correct and effective using the AIBO voice command control system.
