**3.2. Application for continuous parameter optimization**

The proposed system is applied to guide dog robot system which uses RFID (Radiofrequency identification) to construct experiment environment. The RFID is used as navigation equipment for robot motion. The performance of the proposed system is evaluated through computer simulation and real robot experiment.

#### *RFID environment construction*

152 Petri Nets – Manufacturing and Computer Science

**Figure 4.** LPN model of voice command recognition

*VC1* is put into *p3*. *VC1* fires *T32* and AIBO acts "go". A reward is gotten according to correctness of action. *VWVC1,32* is updated by this reward and *VWVC1,32* updated value is fed back to *p2* as next time reward value of *(<VLl>+ <VE2m> + <VE4n>)* fired *Tr21*. After an action

Figure 5 shows the relation between training times and voice command recognition probability. Probability 1 shows the successful probability of recently 20 times training. Probability 2 shows the successful probability of total training times. From the result of simulation, we confirmed that LPN is correct and effective using the AIBO voice command

finished, a reward for correctness of action time is gotten and *VT* is updated.

**Figure 5.** Relation between training times and recognition probability

control system.

RFID tags are used to construct a blind road which showed in Figure 6. There are forthright roads, corners and traffic light signal areas. The forthright roads have two group tags which have two lines RFID tags. Every tag is stored with the information about the road. The guide dog robot moves, turns or stops on the road according to the information of tags. For example, if the guide dog robot reads corner RFID tag, then it will turn on the corner. If the guide dog robot reads either outer or inner side RFID tags, it implies that the robot will deviate from the path and robot motion direction needs adjusting. If the guide dog robot reads traffic control RFID tags, then it will stop or run unceasingly according to the traffic light signal which is dynamically written to RFID.

**Figure 6.** The real experimental environment

#### *LPN model for the guide dog*

The extended LPN control model for guide dog robot system is presented in Figure 7. The meaning of place and transition in Figure 7 is listed below:

	-
	-
	-

Construction and Application of Learning Petri Net 155

When robot reads the <Left>, <Right> and <corner> information, it must adjust the direction of the motion. The amount of adjusting is decided by the continuing time of the robot at the

(i) Direction adjustment of the robot motion on the forthright road

(ii) Direction adjustment of the robot motion at the corner

*v*, *ω*, *tpre*, *tpost* can be measured by system when the robot is running. The delay time of *Tr7*, *Tr8*

1. As shown in Figure 8 (i), when the robot is running on the forthright road and meets

where *d1* and *l1* are width of area between two inside lines and moving distance between

*θ* = arcsin(*d1*/*l1*) = arcsin(*d1*/(*tpre*•*v*)). (13)

**Figure 8.** Direction adjustment of the guide dog robot motion

*tpost* last time of the state after adjusting

two times reading of the RFID, respectively (See Figure 8).

inside RFID line, its deviation angle *θ* is:

*v* velocity of the robot

*ω* angular velocity of the robot *tpre* continuous time of the former state *t* adjusting time

Before the simulation, some robot motion parameter symbols are given as:

and *Tr9*, i.e. the robot motion adjusting time, is simulated in two cases.

state of *P3*, *P4* and *P5*. So, the delay time of *Tr7*, *Tr8* and *Tr9* need to learn.

**Figure 7.** The LPN model for the guide dog robot

When the system begins running, it firstly reads RFID environment and gets the information, Token puts into *P2*. These Tokens fire one of transition from *Tr2* to *Tr6* according to weight function on *P2* to *Tr2*, …, *Tr6*. Then, the guide dog enters stop, running, turning corner, left adjusting or right adjusting states. Here, at *P3*, *P4*, *P5* states, the guide dog turns at a specific speed. The delay time of *Tr7-Tr9* decide the correction of guide dog adjusting its motion direction.

#### *Reward getting from environment*

When *Tr7*, *Tr8* or *Tr9* fires, it will get reward *r* as formula (12-b) when the guide dog doesn't get Token <Left> and <Right> until getting Token <corner> i.e. the robot runs according correct direction until arriving corner. It will get reward *r* as formula (12-a), where t is time from transition fire to get Token <Left> and <Right>. On the contrary, it will get punishment -1 as (12-c) if robot runs out the road.

$$r = \begin{cases} 1/e^t & \text{(a)}\\ 1 & \text{.} \\ -1 & \text{(b)} \end{cases} \tag{12}$$

*Computer simulation and real robot experiment* 

When robot reads the <Left>, <Right> and <corner> information, it must adjust the direction of the motion. The amount of adjusting is decided by the continuing time of the robot at the state of *P3*, *P4* and *P5*. So, the delay time of *Tr7*, *Tr8* and *Tr9* need to learn.

 (i) Direction adjustment of the robot motion on the forthright road

(ii) Direction adjustment of the robot motion at the corner

**Figure 8.** Direction adjustment of the guide dog robot motion

Before the simulation, some robot motion parameter symbols are given as:

*v* velocity of the robot

154 Petri Nets – Manufacturing and Computer Science

**Figure 7.** The LPN model for the guide dog robot

motion direction.

*Reward getting from environment* 


*Computer simulation and real robot experiment* 

When the system begins running, it firstly reads RFID environment and gets the information, Token puts into *P2*. These Tokens fire one of transition from *Tr2* to *Tr6* according to weight function on *P2* to *Tr2*, …, *Tr6*. Then, the guide dog enters stop, running, turning corner, left adjusting or right adjusting states. Here, at *P3*, *P4*, *P5* states, the guide dog turns at a specific speed. The delay time of *Tr7-Tr9* decide the correction of guide dog adjusting its

When *Tr7*, *Tr8* or *Tr9* fires, it will get reward *r* as formula (12-b) when the guide dog doesn't get Token <Left> and <Right> until getting Token <corner> i.e. the robot runs according correct direction until arriving corner. It will get reward *r* as formula (12-a), where t is time from transition fire to get Token <Left> and <Right>. On the contrary, it will get punishment

> 1/ 1 1

*e* 

*r*

*t*

(a)

(c)

(12)

. (b)

	- *t* adjusting time
	- *tpost* last time of the state after adjusting

*v*, *ω*, *tpre*, *tpost* can be measured by system when the robot is running. The delay time of *Tr7*, *Tr8* and *Tr9*, i.e. the robot motion adjusting time, is simulated in two cases.

1. As shown in Figure 8 (i), when the robot is running on the forthright road and meets inside RFID line, its deviation angle *θ* is:

$$\Theta = \arcsin(d\iota/l\_1) = \arcsin(d\iota/(t\_{\mathbb{P}^{\text{ref}}} \bullet v)).\tag{13}$$

where *d1* and *l1* are width of area between two inside lines and moving distance between two times reading of the RFID, respectively (See Figure 8).

Robot's adjusting time (transition delay time) is *t*.

If *ωt -θ* ≥ 0, then

$$t\_{\text{post}} = \frac{d\_1}{v \sin(\alpha t - \theta)},\tag{14}$$

Construction and Application of Learning Petri Net 157

2 10 *at at a* . (20)

transition's delay time learning by function approximation method which states in section

Computer simulations of Transition's delay time learning algorithms were executed in the all cases of the robot direction adjusting. In the simulation of algorithm of discretization, the positive inverse temperature constant *β* is set as 10.0. After the delay time of different cases was learnt, it is recorded in a delay time table. Then, the real robot experiment was carried

The simulation result of transition's delay time learning algorithm in two cases is shown in

(i) Simulation result of moving adjustment on the forthright road

(ii) Simulation result of moving adjustment at the corner

2.2.3, the relation of the delay time and its value is assumed as:

out using the delay time table which was obtained by simulation process.

 *Q* = <sup>2</sup>

*Result of simulation and experiment* 

**Figure 9.** Result of simulation for the guide dog robot

Figure 9.

else

$$t\_{pot} = \frac{d\_2}{v \sin(\alpha t - \theta)}\,. \tag{15}$$

Here, *tpost* is used to calculate reward *r* using formula (12). In the same way, the reward *r* can be calculated when the robot meets outside RFID line.

When the robot is running on the forthright road and meets the outside RFID line, the deviation angle *θ* is

$$\theta \models \arcsin(dz / (v \bullet t\_{\mathbb{M}^n})),\tag{16}$$

Robot's adjusting time (transition delay time) is *t*.

If *ωt -θ* ≥ 0, then

$$t\_{\text{post}} = \frac{d\_2}{v \sin(\alpha t - \theta)} \,\,\,\,\tag{17}$$

else the robot will runs out the road. And the reward *r* is calculated using formula (12).

2. As shown in Figure 8 (ii), when the robot is running at the corner, it must adjust *θ=*90°. If *θ*≠90°, the robot will read <Left>, <Right> after it turns corner. Now, the case which the robot will read inner line <Left>, <Right> will be considered. If robot's adjusting time is *t*. If *ωt -θ*≥0, then

$$t\_{pot} = \frac{d\_1}{2v\sin(\alpha t - \theta)},\tag{18}$$

else

$$t\_{\text{pot}} = \frac{d\_2}{2v\sin(\alpha t - \theta)}\tag{19}$$

Same to case (1), *tpost* is used to calculate reward *r* using formula (12). In the same way, the reward *r* can calculate when the robot meets outside RFID line. The calculation of reward, which is calculated from *t*, for other cases of direction adjustment of the robot is considered as the above two cases.

In this simulation, the value of the delay time has only a maximum at optimal delay time point. The graph of relation for the delay time and its value is parabola. So, when transition's delay time learning by function approximation method which states in section 2.2.3, the relation of the delay time and its value is assumed as:

$$Q = a\_2 t^2 + a\_1 t + a\_0 \,. \tag{20}$$

Computer simulations of Transition's delay time learning algorithms were executed in the all cases of the robot direction adjusting. In the simulation of algorithm of discretization, the positive inverse temperature constant *β* is set as 10.0. After the delay time of different cases was learnt, it is recorded in a delay time table. Then, the real robot experiment was carried out using the delay time table which was obtained by simulation process.

#### *Result of simulation and experiment*

156 Petri Nets – Manufacturing and Computer Science

If *ωt -θ* ≥ 0, then

deviation angle *θ* is

If *ωt -θ* ≥ 0, then

else

else

Robot's adjusting time (transition delay time) is *t*.

 *tpost* = <sup>1</sup>

*tpost* = <sup>2</sup>

be calculated when the robot meets outside RFID line.

Robot's adjusting time (transition delay time) is *t*.

 *tpost* = <sup>2</sup>

 *tpost* = <sup>1</sup>

 *tpost* = <sup>2</sup>

time is *t*. If *ωt -θ*≥0, then

as the above two cases.

sin( ) *d v t* 

sin( ) *d v t* 

sin( ) *d v t* 

else the robot will runs out the road. And the reward *r* is calculated using formula (12).

2. As shown in Figure 8 (ii), when the robot is running at the corner, it must adjust *θ=*90°. If *θ*≠90°, the robot will read <Left>, <Right> after it turns corner. Now, the case which the robot will read inner line <Left>, <Right> will be considered. If robot's adjusting

> 2 sin( ) *d v t*

2 sin( ) *d v t* 

Same to case (1), *tpost* is used to calculate reward *r* using formula (12). In the same way, the reward *r* can calculate when the robot meets outside RFID line. The calculation of reward, which is calculated from *t*, for other cases of direction adjustment of the robot is considered

In this simulation, the value of the delay time has only a maximum at optimal delay time point. The graph of relation for the delay time and its value is parabola. So, when

Here, *tpost* is used to calculate reward *r* using formula (12). In the same way, the reward *r* can

When the robot is running on the forthright road and meets the outside RFID line, the

, (14)

. (15)

, (17)

, (18)

(19)

*θ*= arcsin(*d2*/(*v* • *tpre*)), (16)

The simulation result of transition's delay time learning algorithm in two cases is shown in Figure 9.

(i) Simulation result of moving adjustment on the forthright road

(ii) Simulation result of moving adjustment at the corner

**Figure 9.** Result of simulation for the guide dog robot

The simulation result of *θ=*5° for the robot moving adjustment on forthright road is shown in Figure 9 (i). The simulation result of robot moving adjustment at the corner is shown in Figure 9 (ii). From the result, it is found that the function approximation method can quickly approach optimal delay time than the discretization method, but the discretization method can approach more near optimal delay time through long time learning.

Construction and Application of Learning Petri Net 159

**Definition1** *FPN* is a 8-tuple, given by *FPN*=<*P*, *Tr*, *F*, *D*, *I*, *O*, *α*, *β* >

*D* = {*d*1, *d*2, … , *dn*} is a finite set of propositions, where proposition *di* corresponds to place *pi*;

*I*: *tr* → *P∞* is the input function, representing a mapping from transitions to bags of (their

*O*: *tr* → *P∞* is the output function, representing a mapping from transitions to bags of (their

*α*: *P* → [0, 1] and *β*: *P* → *D*. A token value in place *pi*ϵ*P* is denoted by *α*(*pi*)ϵ [0, 1]. If *α*(*pi*)=*yi*,

A transition *trk* is enabled if for all *pi*∈*I*(*trk*), *α*(*pi*)≥*th*, where *th* is a threshold value in the unit interval. If this transition is fired, then tokens are moved from their input place and tokens are deposited to each of its output places. The truth values of the output tokens are *yi*•*uk*, where *uk* is the confidence level value of *trk*. FPN has capability of modeling fuzzy production rules. For example, the fuzzy production rule (21) can be modeled as shown in

IF *di* THEN *dj* (with Certainty Factor (CF) *uk*) (21)

In a FPN, a token in a place represents a proposition and a proposition has a degree of truth. Now, three aspects of extension are done at the FPN and learning fuzzy Petri net (LFPN) is constructed. First, a place may have different tokens (Tokens are distinguished with numbers or colors) and the different tokens represent different propositions, i.e. a place has a set of propositions. Second, a place has a special token, i.e. there is a specified proposition. This proposition may have different degrees of truth toward different transitions *tr* which regard this place as input place *\*tr*. Third, the weight of each arc is adjustable and used to

**Definition 3** *LFPN* is a 10-tuple, given by *LFPN*= <*P*, *Tr*, *F*, *D*, *I*, *O*, *Th*, *W*, *α*, *β*> (A LFPN

*yi*∈[0, 1] and *β*(*pi*)= *di*,, then this states that the degree of truth of proposition *di* is *yi*.

*P* = {*p*1, *p*2, … , *pn*} is a finite set of places; *Tr* = {*tr*1, *tr*2, … , *trm*} is a finite set of transitions; *F* (*P*×*Tr*)∪(*Tr×P*) is a finite set of directional arcs;

input) places, noting as *\*tr*;

output) places, noting as *tr\**;

**Figure 10.** A fuzzy Petri net model (FPN)

record transition's input and output information.

where: *Tr*, *F*, *I*, *O* are same with definition of FPN.

*The definition of LFPN* 

model is shown in Figure 11).

*P* ∩ *Tr* ∩ *D* = �; cardinality of (*P*) = cardinality of (*D*);

where:

Figure 10.
