**4.1. The learning fuzzy Petri net model**

Petri net is a directed, weighted, bipartite graph consisting of two kinds of nodes, called places and transitions, where arcs are either from a place to a transition or from a transition to a place. Tokens exist at different places. The use of the standard Petri net is inappropriate in situations where systems are difficult to be described precisely. Consequently, fuzzy Petri net is designed to deal with these situations where transitions, places, tokens or arcs are fuzzified.

#### *The definition of fuzzy Petri net*

A fuzzy place associates with a predicate or property. A token in the fuzzy place is characterized by a predicate or property belongs to the place, and this predicate or property has a level of belonging to the place. In this way, we may get a fuzzy proposition or conclusion, for example, *speed is low.* A fuzzy transition may correspond to an *if-then* fuzzy production rule for instance and is realized by truth values such as fuzzy inference algorithms [11, 20, 26].

**Definition1** *FPN* is a 8-tuple, given by *FPN*=<*P*, *Tr*, *F*, *D*, *I*, *O*, *α*, *β* >

where:

158 Petri Nets – Manufacturing and Computer Science

analyzed.

fuzzified.

*The definition of fuzzy Petri net* 

algorithms [11, 20, 26].

**4.1. The learning fuzzy Petri net model** 

The simulation result of *θ=*5° for the robot moving adjustment on forthright road is shown in Figure 9 (i). The simulation result of robot moving adjustment at the corner is shown in Figure 9 (ii). From the result, it is found that the function approximation method can quickly approach optimal delay time than the discretization method, but the discretization method

Petri net (PN) has ability to represent and analyze concurrency and synchronization phenomena in an easy way. PN approach can also be easily combined with other techniques and theories such as object-oriented programming, fuzzy theory, neural networks, etc. These modified PNs are widely used in the fields of manufacturing, robotics, knowledge based systems, process control, as well as other kinds of engineering applications [15]. Fuzzy Petri net (FPN), which combines PN and fuzzy theory, has been used for knowledge representation and reasoning in the presence of inexact data and knowledge based systems. But traditional FPN lacks of learning mechanism, it is the main weakness while modeling uncertain knowledge systems [25]. In this section, we propose a new learning model tool learning fuzzy Petri net (LFPN) [7]. Contrasting with the existing FPN, there are three extensions in the new model: 1) the place can possess different tokens which represent different propositions; 2) these propositions have different degrees of truth toward different transitions; 3) the truth degree of proposition can be learned through the arc's weight function adjusting. The LFPN model obtains the capability of fuzzy production rules learning through truth degree updating. The artificial neural network is gotten learning ability through weight adjusting. The LFPN learning algorithm which introduces network learning method into Petri net update is proposed and the convergence of algorithm is

Petri net is a directed, weighted, bipartite graph consisting of two kinds of nodes, called places and transitions, where arcs are either from a place to a transition or from a transition to a place. Tokens exist at different places. The use of the standard Petri net is inappropriate in situations where systems are difficult to be described precisely. Consequently, fuzzy Petri net is designed to deal with these situations where transitions, places, tokens or arcs are

A fuzzy place associates with a predicate or property. A token in the fuzzy place is characterized by a predicate or property belongs to the place, and this predicate or property has a level of belonging to the place. In this way, we may get a fuzzy proposition or conclusion, for example, *speed is low.* A fuzzy transition may correspond to an *if-then* fuzzy production rule for instance and is realized by truth values such as fuzzy inference

can approach more near optimal delay time through long time learning.

**4. Construction of the learning fuzzy Petri net model** 

*P* = {*p*1, *p*2, … , *pn*} is a finite set of places; *Tr* = {*tr*1, *tr*2, … , *trm*} is a finite set of transitions; *F* (*P*×*Tr*)∪(*Tr×P*) is a finite set of directional arcs; *D* = {*d*1, *d*2, … , *dn*} is a finite set of propositions, where proposition *di* corresponds to place *pi*; *P* ∩ *Tr* ∩ *D* = �; cardinality of (*P*) = cardinality of (*D*); *I*: *tr* → *P∞* is the input function, representing a mapping from transitions to bags of (their input) places, noting as *\*tr*; *O*: *tr* → *P∞* is the output function, representing a mapping from transitions to bags of (their

output) places, noting as *tr\**; *α*: *P* → [0, 1] and *β*: *P* → *D*. A token value in place *pi*ϵ*P* is denoted by *α*(*pi*)ϵ [0, 1]. If *α*(*pi*)=*yi*, *yi*∈[0, 1] and *β*(*pi*)= *di*,, then this states that the degree of truth of proposition *di* is *yi*.

A transition *trk* is enabled if for all *pi*∈*I*(*trk*), *α*(*pi*)≥*th*, where *th* is a threshold value in the unit interval. If this transition is fired, then tokens are moved from their input place and tokens are deposited to each of its output places. The truth values of the output tokens are *yi*•*uk*, where *uk* is the confidence level value of *trk*. FPN has capability of modeling fuzzy production rules. For example, the fuzzy production rule (21) can be modeled as shown in Figure 10.

IF *di* THEN *dj* (with Certainty Factor (CF) *uk*) (21)

**Figure 10.** A fuzzy Petri net model (FPN)

#### *The definition of LFPN*

In a FPN, a token in a place represents a proposition and a proposition has a degree of truth. Now, three aspects of extension are done at the FPN and learning fuzzy Petri net (LFPN) is constructed. First, a place may have different tokens (Tokens are distinguished with numbers or colors) and the different tokens represent different propositions, i.e. a place has a set of propositions. Second, a place has a special token, i.e. there is a specified proposition. This proposition may have different degrees of truth toward different transitions *tr* which regard this place as input place *\*tr*. Third, the weight of each arc is adjustable and used to record transition's input and output information.

**Definition 3** *LFPN* is a 10-tuple, given by *LFPN*= <*P*, *Tr*, *F*, *D*, *I*, *O*, *Th*, *W*, *α*, *β*> (A LFPN model is shown in Figure 11).

where: *Tr*, *F*, *I*, *O* are same with definition of FPN.

*P*={ *p*1, *p*2,…, *pi*,…, *pn*,…, *p′*1, *p′*2,…, *p′i*,…, *p′r*} is a finite set of places, where *pi* is input place and *p′i* is output places.

Construction and Application of Learning Petri Net 161

**Figure 12.** A LFPN model with one transition

Figure 11 and shown in Figure 12.

*trk* )≥*thk*, *trk* is enable.

token and weight.

formula (23) is held, *trk* is fired.

*Th* = {*th*1, *th*2, …, *thk*, … , *thm*} represents a set of threshold values in the interval [0, 1] associated with transitions (*tr*1, *tr*2, …, *trk*, …, *trm*), respectively; If all *pi*∈*I*(*trk*) and *α*(*dij*,

As showed in Figure 12, when *pi* has a *tokenij*, there is proposition *dij* in *pi*. This proposition *dij* has different truth to *tr*1, *tr*2, …, *trk*, …, *trm*. When a transition *trk* fired, tokens are put into *p′*1,

Figure 11 shows a LFPN which has *n*-input places, *m*-transitions and *r*-output places. To explain the truth computing, transition fire rule, token transfer rule and fuzzy production rules expression more clearly, a transition and its relation arcs, places are drawn-out from

**Truth computing** As shown in Figure 12, *wik* is the perfect value for *tokenij* when *trk* fires. When a set of *tokens=* (*token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*, …, *tokenij*, …, *tokennj*) are input to all places of \**trk*, *β*(*token*1*<sup>j</sup>*, *p*1)= *d*1*<sup>j</sup>*, …, *β*(*tokennj*, *pn*)= *dnj*. *α*(*dij*, *trk* ) is computed using the degree of similarity

According to LFPN models for different systems, the token and weight value may have different data types. There are different methods for computing *α*(*dij*, *trk* ) according to data type. If value types of token and weight are real number, *α*(*dij*, *trk* ) is computed as formula (2). In Section 4, *α*(*dij*, *trk* ) will be discussed for a LFPN model which has the textual type

**Transition fire rule** As shown in Figure 12, when a set of *tokens=*(*token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*, …, *tokennj*) are input to all places of \**trk,* and *β*(*token*1*<sup>j</sup>*, *p*1)= *d*1*<sup>j</sup>*, …, *β*(*tokennj*, *pn*)= *dnj*. If all *α*(*dij*, *trk* ) (*i*=1, 2, …, *n*) ≥*thk* is held, *trk* is enabled. Maybe, several transitions are enabled at same time. If

max( , ) *ik ij*

*w token*

*ik ij*

(22)

*w token*

…, *p′r* according to weight *w′k*1, …, *w′kr* and each of *p′*1, …, *p′r* gets a proposition.

between *tokenij* and *wik* and calculation formula is shown in formula (22).

(, )1

*ij k*

*d tr*

*D* = {*d*11, … , *d*1*<sup>N</sup>*; *d*21, …, *d*2*<sup>N</sup>*; …, *dij*, …; *dn*1, … , *dnN* ; *d′*11, … , *d′*1*<sup>N</sup>*; *d′*21, …, *d′*2*<sup>N</sup>* ; …, *d′ij*, …; *d′r*1, … , *d′rN*} is a finite set of propositions, where proposition *dij* is *j-*th proposition for input place *pi* and proposition *d′ij* is *j-*th proposition for output place *p′i*.

*W* ={*w*11, *w*12, …, *w*1*<sup>k</sup>*, …, *w*1m; …; *wi*1, *wi*2, …, *wik*, …, *wim*; …;*wn*1, *wn*2, …, *wnm*; *w′*11, *w′*12, …, *w′*1*<sup>r</sup>*; …; *w′k*1, *w′k*2, …, *w′kj*,…, *w′kr*; …;*w′m*1, *w′m*2, …, *w′mr*} is the set of weights on the arcs, where *wik* is a weight from *i-*th input place to *k-*th transition and *w′kj* is a weight from *k-*th transition to *j-*th output place.

*W* ={*w*11, *w*12, …, *w*1*<sup>k</sup>*, …, *w*1m; …; *wi*1, *wi*2, …, *wik*, …, *wim*; …;*wn*1, *wn*2, …, *wnm*; *w′*11, *w′*12, …, *w′*1*<sup>r</sup>*; …; *w′k*1, *w′k*2, …, *w′kj*,…, *w′kr*; …;*w′m*1, *w′m*2, …, *w′mr*} is the set of weights on the arcs, where *wik* is a weight from *i-*th input place to *k-*th transition and *w′kj* is a weight from *k-*th transition to *j-*th output place.

*α*(*dij*, *trk*)→ [0, 1] and *β*: *P → D*. When *pi*∈P has a special *tokenij* and *β*(*tokenij*, *pi*)=*dij*, the degree of truth of proposition *dij* in place *pi* toward to transition *trk* is denoted by *α*(*dij*, *trk* ) ∈ [0, 1]. When *trk* fires, the probability of proposition *dij* in *pi* is *α*(*dij*, *trk* ) .

**Figure 11.** The model of learning fuzzy Petri net (LFPN)

*α*(*dij*, *trk*)→ [0, 1] and *β*: *P → D*. When *pi*∈P has a special *tokenij* and *β*(*tokenij*, *pi*)=*dij*, the degree of truth of proposition *dij* in place *pi* toward to transition *trk* is denoted by *α*(*dij*, *trk* ) ∈ [0, 1]. When *trk* fires, the probability of proposition *dij* in *pi* is *α*(*dij*, *trk* ) .

**Figure 12.** A LFPN model with one transition

160 Petri Nets – Manufacturing and Computer Science

and *p′i* is output places.

transition to *j-*th output place.

transition to *j-*th output place.

*P*={ *p*1, *p*2,…, *pi*,…, *pn*,…, *p′*1, *p′*2,…, *p′i*,…, *p′r*} is a finite set of places, where *pi* is input place

*D* = {*d*11, … , *d*1*<sup>N</sup>*; *d*21, …, *d*2*<sup>N</sup>*; …, *dij*, …; *dn*1, … , *dnN* ; *d′*11, … , *d′*1*<sup>N</sup>*; *d′*21, …, *d′*2*<sup>N</sup>* ; …, *d′ij*, …; *d′r*1, … , *d′rN*} is a finite set of propositions, where proposition *dij* is *j-*th proposition for input

*W* ={*w*11, *w*12, …, *w*1*<sup>k</sup>*, …, *w*1m; …; *wi*1, *wi*2, …, *wik*, …, *wim*; …;*wn*1, *wn*2, …, *wnm*; *w′*11, *w′*12, …, *w′*1*<sup>r</sup>*; …; *w′k*1, *w′k*2, …, *w′kj*,…, *w′kr*; …;*w′m*1, *w′m*2, …, *w′mr*} is the set of weights on the arcs, where *wik* is a weight from *i-*th input place to *k-*th transition and *w′kj* is a weight from *k-*th

*W* ={*w*11, *w*12, …, *w*1*<sup>k</sup>*, …, *w*1m; …; *wi*1, *wi*2, …, *wik*, …, *wim*; …;*wn*1, *wn*2, …, *wnm*; *w′*11, *w′*12, …, *w′*1*<sup>r</sup>*; …; *w′k*1, *w′k*2, …, *w′kj*,…, *w′kr*; …;*w′m*1, *w′m*2, …, *w′mr*} is the set of weights on the arcs, where *wik* is a weight from *i-*th input place to *k-*th transition and *w′kj* is a weight from *k-*th

*α*(*dij*, *trk*)→ [0, 1] and *β*: *P → D*. When *pi*∈P has a special *tokenij* and *β*(*tokenij*, *pi*)=*dij*, the degree of truth of proposition *dij* in place *pi* toward to transition *trk* is denoted by *α*(*dij*, *trk* ) ∈ [0, 1].

*α*(*dij*, *trk*)→ [0, 1] and *β*: *P → D*. When *pi*∈P has a special *tokenij* and *β*(*tokenij*, *pi*)=*dij*, the degree of truth of proposition *dij* in place *pi* toward to transition *trk* is denoted by *α*(*dij*, *trk* ) ∈ [0, 1].

place *pi* and proposition *d′ij* is *j-*th proposition for output place *p′i*.

When *trk* fires, the probability of proposition *dij* in *pi* is *α*(*dij*, *trk* ) .

**Figure 11.** The model of learning fuzzy Petri net (LFPN)

When *trk* fires, the probability of proposition *dij* in *pi* is *α*(*dij*, *trk* ) .

*Th* = {*th*1, *th*2, …, *thk*, … , *thm*} represents a set of threshold values in the interval [0, 1] associated with transitions (*tr*1, *tr*2, …, *trk*, …, *trm*), respectively; If all *pi*∈*I*(*trk*) and *α*(*dij*, *trk* )≥*thk*, *trk* is enable.

As showed in Figure 12, when *pi* has a *tokenij*, there is proposition *dij* in *pi*. This proposition *dij* has different truth to *tr*1, *tr*2, …, *trk*, …, *trm*. When a transition *trk* fired, tokens are put into *p′*1, …, *p′r* according to weight *w′k*1, …, *w′kr* and each of *p′*1, …, *p′r* gets a proposition.

Figure 11 shows a LFPN which has *n*-input places, *m*-transitions and *r*-output places. To explain the truth computing, transition fire rule, token transfer rule and fuzzy production rules expression more clearly, a transition and its relation arcs, places are drawn-out from Figure 11 and shown in Figure 12.

**Truth computing** As shown in Figure 12, *wik* is the perfect value for *tokenij* when *trk* fires. When a set of *tokens=* (*token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*, …, *tokenij*, …, *tokennj*) are input to all places of \**trk*, *β*(*token*1*<sup>j</sup>*, *p*1)= *d*1*<sup>j</sup>*, …, *β*(*tokennj*, *pn*)= *dnj*. *α*(*dij*, *trk* ) is computed using the degree of similarity between *tokenij* and *wik* and calculation formula is shown in formula (22).

$$\alpha(d\_{ij}, tr\_k) = 1 - \frac{|w\_{ik} - token\_{ij}|}{\max(|w\_{ik}|, \left|token\_{ij}|)} \tag{22}$$

According to LFPN models for different systems, the token and weight value may have different data types. There are different methods for computing *α*(*dij*, *trk* ) according to data type. If value types of token and weight are real number, *α*(*dij*, *trk* ) is computed as formula (2). In Section 4, *α*(*dij*, *trk* ) will be discussed for a LFPN model which has the textual type token and weight.

**Transition fire rule** As shown in Figure 12, when a set of *tokens=*(*token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*, …, *tokennj*) are input to all places of \**trk,* and *β*(*token*1*<sup>j</sup>*, *p*1)= *d*1*<sup>j</sup>*, …, *β*(*tokennj*, *pn*)= *dnj*. If all *α*(*dij*, *trk* ) (*i*=1, 2, …, *n*) ≥*thk* is held, *trk* is enabled. Maybe, several transitions are enabled at same time. If formula (23) is held, *trk* is fired.

#### 162 Petri Nets – Manufacturing and Computer Science

$$\begin{aligned} & \alpha(d\_{1j}, tr\_k) \cdot \alpha(d\_{2j}, tr\_k) \cdot \dots \cdot \alpha(d\_{nj}, tr\_k) \\ &= \max(\alpha(d\_{1j}, tr\_h) \cdot \alpha(d\_{2j}, tr\_h) \cdot \dots \cdot \alpha(d\_{nj}, tr\_h)\_{1 \le k \le m}) \end{aligned} \tag{23}$$

Construction and Application of Learning Petri Net 163

(28)

*Pr Pr* (27)

*<sup>m</sup> Pr Pr Pr* (28′)

1 2

*tr p p p*

From (25), (26) and (27), (28′) is gotten by the formula of full probability and Bayesian

1 2 1 2 <sup>1</sup> ( | , ,..., ) ( , ,..., | ) *k j j nj j j nj k tr p p p p p p tr m Pr Pr*

*h j j nj*

( ) ( , ,..., ) 1

<sup>1</sup> ( | ) ( | ) ... ( | ) *<sup>j</sup> k jk nj k p tr p tr p tr*

 

<sup>1</sup> ( , ) ( , ) ... ( , ) *j k j k nj k d tr d tr d tr*

The transformation from (28′) to (28) is according to definition of *α*(*dij*, *trk*). As shown in Figure 11, when *p1*, *p2* , …, *pn* have *token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*…*tokennj*, the occurring probability of transition *tr1*, …, *trk*, …, *trm* are *α*(*d*1*<sup>j</sup>*, *tr*1) •*α*(*d*2*<sup>j</sup>*, *tr*1) •…•*α*(*dnj*, *tr*1)/*m*, …, *α*(*d*1*<sup>j</sup>*, *trk*) •*α*(*d*2*<sup>j</sup>*, *trk*) •…•*α*(*dnj*, *trk*)/*m*, …, *α*(*d*1*<sup>j</sup>*, *trm*) •*α*(*d*2*<sup>j</sup>*, *trm*) •…•*α*(*dnj*, *trm*)/*m*. Thus, the transition *trk*, which has maximum of *α*(*d*1*<sup>j</sup>*, *trk*) •*α*(*d*2*<sup>j</sup>*, *trk*) •…•*α*(*dnj*, *trk*), is selected and fired according to formula

The learning fuzzy Petri net (LFPN) can be trained and made it learn fuzzy production rules. When a set of data input LFPN, a set of propositions are produced in each input place. For example, when token vectors (*token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*, …, *tokennj*) (*j*=1, 2, …, *N*) input to *p*1*~pn*, propositions *d*1*<sup>j</sup>*, *d*2*<sup>j</sup>*, …, *dnj* (*j*=1, 2, …, *N*) are produced. To train a fuzzy production rule which is IF *d*1*<sup>j</sup>*AND *d*2*<sup>j</sup>* AND … AND *dnj* THEN *d′*1*<sup>k</sup>* AND *d′*2*<sup>k</sup>* AND … AND *d′nk*, there are

1. *α*(*d*1*<sup>j</sup>*, *trk* )•*α*(*d*2*<sup>j</sup>*, *trk* )•…•*α*(*dnj*, *trk* ) (*k*∈{1,2,…*m*}) need to be updated to hold formula

2. 2) The output weight function of *trk* need to be updated for putting correct token to

To accomplish these two tasks, the weights *w*1*<sup>k</sup>*, *w*2*<sup>k</sup>*, …, *wnk* and *w′k*1, *w′k*2, …, *w′kr* are modified by a learning algorithm of LFPN. Firstly, we define the training data set as {(*X*1, *Y*1), (*X*2, *Y*2), …, (*XN*, *YN*)}, where *X* is input token vector, *Y* is output token vector and *Xj*, *Yj* is

1

1 2

**4.2. Learning algorithm for learning fuzzy Petri net** 

*p′*1*~p′r*. Then, *β*(*p′*1) = *d′*1*<sup>k</sup>*, *β*(*p′*2) = *d′*2*<sup>k</sup>*, …, *β*(*p′r*) = *d′rk*.

*X*=(*X*1, *X*2,…, *Xj*, …, *XN*,), *Y*=( *Y*1, *Y*2, …, *Yj*, …, *YN*) , i.e.

defined as *Xj*=( *x*1*<sup>j</sup>*, *x*2*<sup>j</sup>*, …, *xnj*)*T*, *Yj*=( *y*1*<sup>j</sup>*, *y*2*<sup>j</sup>*, …, *yrj*)*T*, respectively. Thus,

1 2

*m* 

*h*

formula.

(23).

*Learning algorithm* 

two tasks:

(23);

*m*

**Token transfer rule** As shown in Figure 12, after *trk* fired, token will be taken out from *p1~pn*. The token take rule is:

If *tokenij* ≤*wik* is held, *tokenij* in *pi* will be taken out. If *tokenij* ≥*wik* is held, *token* which equates *tokenij*−*wik* will be left in *pi*.

Thus, after a transition *trk* fired, maybe the enable transitions still exist in LFPN. An enable transition will be selected and fired according to formula (23) until there isn't any enable transition.

After *trk* fired, the token according *w′ki* will be put into *p′i*. For example, if the weight function of arc *trk* to *p′i* is *w′ki*, then *token* which equates *w′ki* will be put into *p′i*.

**Fuzzy production rules expression** A LFPN is capable of modeling for fuzzy production rules just as a FPN. For example, as a case which states in **Transition fire rule** and **Token transfer rule**, when *trk* is fired, the below production rule is expressed:

IF *d*1*<sup>j</sup>*AND *d*2*<sup>j</sup>*AND … AND *dnj* THEN *d′*1*<sup>k</sup>* AND *d′*2*<sup>k</sup>* AND … AND *d′rk* 

$$(\mathbf{CF} \bowtie a \left( d \iota\_{\prime} \ tr\_{\times} \right) \bullet a (d \omega\_{\prime} \ tr\_{\times}) \bullet \dots \bullet a (d \iota\_{\prime} \ tr\_{\times})) \tag{24}$$

*The mathematical model of LFPN* 

In this section, the mathematical model of LFPN will be elaborated. Firstly, some conceptions are defined. When a *tokenij* is input to a place *pi*, it is defined event *pij* occurs, i.e. the proposition *dij* is generated and probability of event *pij* is *Pr*(*pij*). The fired *trk* is defined as event *trk* and probability of event *trk* occurrence is *Pr*(*trk*). Secondly, we assume that each transition *tr*1, *tr*2, …, *trk*, …, *trm* has the same fire probability in whole event space, then

$$Pr(tr\_k) = \frac{1}{m} \tag{25}$$

And when event *trk* occurs, the conditional probability of *pij* occurrence is defined as *Pr*(*pij* | *trk*), i.e. *á*(*dij*, *trk* ) which is the probability of proposition *dij* generation when *trk* fires.

When *p*1, *p*<sup>2</sup> , …, *pn* have *token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*…*tokennj* and events *p*1*<sup>j</sup>*, *p*2*<sup>j</sup>*, …, *pnj* occur. Then, *Pr*(*trk* | *p*1*<sup>j</sup>*, *p*2*<sup>j</sup>*, …, *pnj*) is:

$$Pr(tr\_k \mid p\_{1j}, p\_{2j}, \dots, p\_{nj}) = \frac{Pr(p\_{1j}, p\_{2j}, \dots, p\_{nj} \mid tr\_k)Pr(tr\_k)}{\sum\_{h=1}^{m} Pr(tr\_h)Pr(p\_{1j}p\_{2j}, \dots, p\_{nj})} \tag{26}$$

When events *p*1*<sup>j</sup>*, *p*2*<sup>j</sup>*, …, *pnj* occurred, there is one of transitions *tr*1, *tr*2, …, *trk*, …, *trm* which will be fired, therefore

Construction and Application of Learning Petri Net 163

$$\sum\_{h=1}^{m} \Pr(\text{tr}\_h) \Pr(p\_{1j'}, p\_{2j'}, \dots, p\_{nj}) = 1 \tag{27}$$

From (25), (26) and (27), (28′) is gotten by the formula of full probability and Bayesian formula.

$$\begin{aligned} \Pr(\text{tr}\_k \mid p\_{1j}, p\_{2j}, \dots, p\_{nj}) &= \frac{1}{m} \Pr(p\_{1j}, p\_{2j}, \dots, p\_{nj} \mid \text{tr}\_k) \\\\ &= \frac{1}{m} \Pr(p\_{1j} \mid \text{tr}\_k) \times \Pr(p\_{2j} \mid \text{tr}\_k) \times \dots \times \Pr(p\_{nj} \mid \text{tr}\_k) \\\\ &= \frac{1}{m} \alpha(\text{d}\_{1j}, \text{tr}\_k) \cdot \alpha(\text{d}\_{2j}, \text{tr}\_k) \cdot \dots \alpha(\text{d}\_{nj}, \text{tr}\_k) \end{aligned} \tag{28'}$$

The transformation from (28′) to (28) is according to definition of *α*(*dij*, *trk*). As shown in Figure 11, when *p1*, *p2* , …, *pn* have *token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*…*tokennj*, the occurring probability of transition *tr1*, …, *trk*, …, *trm* are *α*(*d*1*<sup>j</sup>*, *tr*1) •*α*(*d*2*<sup>j</sup>*, *tr*1) •…•*α*(*dnj*, *tr*1)/*m*, …, *α*(*d*1*<sup>j</sup>*, *trk*) •*α*(*d*2*<sup>j</sup>*, *trk*) •…•*α*(*dnj*, *trk*)/*m*, …, *α*(*d*1*<sup>j</sup>*, *trm*) •*α*(*d*2*<sup>j</sup>*, *trm*) •…•*α*(*dnj*, *trm*)/*m*. Thus, the transition *trk*, which has maximum of *α*(*d*1*<sup>j</sup>*, *trk*) •*α*(*d*2*<sup>j</sup>*, *trk*) •…•*α*(*dnj*, *trk*), is selected and fired according to formula (23).

#### **4.2. Learning algorithm for learning fuzzy Petri net**

#### *Learning algorithm*

162 Petri Nets – Manufacturing and Computer Science

The token take rule is:

*The mathematical model of LFPN* 

*p*1*<sup>j</sup>*, *p*2*<sup>j</sup>*, …, *pnj*) is:

will be fired, therefore

transition.

1 2

If *tokenij* ≥*wik* is held, *token* which equates *tokenij*−*wik* will be left in *pi*.

If *tokenij* ≤*wik* is held, *tokenij* in *pi* will be taken out.

( , ) ( , ) ... ( , )

*d tr d tr d tr*

*j k j k nj k*

1 2 1

*j h j h nj h h m*

 

(23)

max( ( , ) ( , ) ... ( , ) )

**Token transfer rule** As shown in Figure 12, after *trk* fired, token will be taken out from *p1~pn*.

Thus, after a transition *trk* fired, maybe the enable transitions still exist in LFPN. An enable transition will be selected and fired according to formula (23) until there isn't any enable

After *trk* fired, the token according *w′ki* will be put into *p′i*. For example, if the weight

**Fuzzy production rules expression** A LFPN is capable of modeling for fuzzy production rules just as a FPN. For example, as a case which states in **Transition fire rule** and **Token** 

(CF=*α* (*d*1*<sup>i</sup>*, *trk* )•*α*(*d*2*<sup>i</sup>*, *trk* )•…•*α*(*dni*, *trk* )) (24)

In this section, the mathematical model of LFPN will be elaborated. Firstly, some conceptions are defined. When a *tokenij* is input to a place *pi*, it is defined event *pij* occurs, i.e. the proposition *dij* is generated and probability of event *pij* is *Pr*(*pij*). The fired *trk* is defined as event *trk* and probability of event *trk* occurrence is *Pr*(*trk*). Secondly, we assume that each transition *tr*1, *tr*2, …, *trk*, …, *trm* has the same fire probability in whole event space, then

> <sup>1</sup> ( ) *<sup>k</sup> tr m*

And when event *trk* occurs, the conditional probability of *pij* occurrence is defined as *Pr*(*pij* |

When *p*1, *p*<sup>2</sup> , …, *pn* have *token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*…*tokennj* and events *p*1*<sup>j</sup>*, *p*2*<sup>j</sup>*, …, *pnj* occur. Then, *Pr*(*trk* |

1

When events *p*1*<sup>j</sup>*, *p*2*<sup>j</sup>*, …, *pnj* occurred, there is one of transitions *tr*1, *tr*2, …, *trk*, …, *trm* which

*h*

( , ,..., | ) ( ) ( | , ,..., )

*p p p tr tr tr p p p*

1 2

*Pr Pr*

*Pr Pr*

1 2

*h j j nj*

( ) ( ,..., ) *j j nj k k*

*tr p p p*

*trk*), i.e. *á*(*dij*, *trk* ) which is the probability of proposition *dij* generation when *trk* fires.

1 2

*Pr*

*k j j nj m*

*Pr* (25)

(26)

function of arc *trk* to *p′i* is *w′ki*, then *token* which equates *w′ki* will be put into *p′i*.

**transfer rule**, when *trk* is fired, the below production rule is expressed: IF *d*1*<sup>j</sup>*AND *d*2*<sup>j</sup>*AND … AND *dnj* THEN *d′*1*<sup>k</sup>* AND *d′*2*<sup>k</sup>* AND … AND *d′rk* 

 

*d tr d tr d tr*

The learning fuzzy Petri net (LFPN) can be trained and made it learn fuzzy production rules. When a set of data input LFPN, a set of propositions are produced in each input place. For example, when token vectors (*token*1*<sup>j</sup>*, *token*2*<sup>j</sup>*, …, *tokennj*) (*j*=1, 2, …, *N*) input to *p*1*~pn*, propositions *d*1*<sup>j</sup>*, *d*2*<sup>j</sup>*, …, *dnj* (*j*=1, 2, …, *N*) are produced. To train a fuzzy production rule which is IF *d*1*<sup>j</sup>*AND *d*2*<sup>j</sup>* AND … AND *dnj* THEN *d′*1*<sup>k</sup>* AND *d′*2*<sup>k</sup>* AND … AND *d′nk*, there are two tasks:


To accomplish these two tasks, the weights *w*1*<sup>k</sup>*, *w*2*<sup>k</sup>*, …, *wnk* and *w′k*1, *w′k*2, …, *w′kr* are modified by a learning algorithm of LFPN. Firstly, we define the training data set as {(*X*1, *Y*1), (*X*2, *Y*2), …, (*XN*, *YN*)}, where *X* is input token vector, *Y* is output token vector and *Xj*, *Yj* is defined as *Xj*=( *x*1*<sup>j</sup>*, *x*2*<sup>j</sup>*, …, *xnj*)*T*, *Yj*=( *y*1*<sup>j</sup>*, *y*2*<sup>j</sup>*, …, *yrj*)*T*, respectively. Thus,

$$\mathbf{X} = (\mathbf{X}\_1, \mathbf{X}\_2, \dots, \mathbf{X}\_{\vert \prime}, \dots, \mathbf{X}\_{\vert \prime}), \text{ Y} = (\mathbf{Y}\_1, \mathbf{Y}\_2, \dots, \mathbf{Y}\_{\vert \prime}, \dots, \mathbf{Y}\_{\vert \prime}, \dots, \mathbf{Y}\_{\vert \prime}), \text{ i.e.}$$

$$\mathbf{X} = \begin{bmatrix} \mathbf{x}\_{11} & \mathbf{x}\_{12} & \dots & \mathbf{x}\_{1j} & \dots & \mathbf{x}\_{1N} \\ \mathbf{x}\_{21} & \mathbf{x}\_{22} & \dots & \mathbf{x}\_{2j} & \dots & \mathbf{x}\_{2N} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \mathbf{x}\_{n1} & \mathbf{x}\_{n2} & \dots & \mathbf{x}\_{nj} & \dots & \mathbf{x}\_{nN} \end{bmatrix} \qquad \mathbf{Y} = \begin{bmatrix} y\_{11} & y\_{12} & \dots & y\_{1j} & \dots & y\_{1N} \\ y\_{21} & y\_{22} & \dots & y\_{2j} & \dots & y\_{2N} \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ y\_{r1} & y\_{r2} & \dots & y\_{rj} & \dots & y\_{rN} \end{bmatrix}^T$$

Construction and Application of Learning Petri Net 165

Some details in the algorithm need to be elaborated further.

deleted.

are same meaning with *X*min, *X*max).

*Analysis for convergence of LFPN learning algorithm* 

*X W*

– –

divide both sides of formula (11′). We get formula (11).

1– ( – )

*X W*

*jk j k j k*

1. About the net construction: The number of input and output places can be easily set according to a real problem. It is difficult to decide a number of transitions when the net is initialized. When LFPN is used to solve a special issue, the number of transitions is initially set according to practical situation experientially. Then, transitions can be dynamically appended and deleted during the training. If an input data *Xj* has a maximal truth to *trk* but one or several *α*(*dij*, *trk*)(1≤*i*≤*n*) are less than *thk* (threshold of *trk*), transition *trk* cannot fire according to definition 3. Thus, data *X<sup>j</sup>* cannot fire any existed transition. This case means that *W*1, *W*2, …, *Wk*, …, *Wm* cannot describe the vector characteristic of *Xj*. Then, a new transition *trm+*1 and the arcs which connect *trm+*1 with input and output place are constructed. *X<sup>j</sup>* can be set as weight *Wm*+1 directly. Second, during a training episode, if there is no data in *X*1, *X*2, …, *XN* that can fire transition *trd*, it means that *Wd* cannot describe the vector characteristic of any data *X*1, *X*2, …, *XN*. Then, the transition *trd* and the arcs which connect *trd* with input and output place will be

2. About *W* and *W*′ initialization: for promoting training efficiency at the first stage of training, *W* and *W*′ are set randomly in [*X*min, *X*max], [*Y*min, *Y*max] (*X*min is a vector which every components is minimal component of vector set *X*1, *X*2, …, *XN*; *X*max is a vector which every components is maximal component of vector set *X*1, *X*2, …, *XN*; *Y*min, *Y*max

3. Training stop condition of the learning algorithm: According to application case, *th*1, *th*2, …, *thk*, … *thm* are generally set a same value *th*. When training begins, the threshold *th* is set low (for example 0.2), *th* increases as training time increasing. A threshold value *thlast* (for example 0.9) is set as training stop condition and algorithm is run until *α*(*d*1*<sup>j</sup>*, *trk* )> *thlast*, *α*(*d*2*<sup>j</sup>*, *trk* ) > *thlast* …*α*(*dnj*, *trk* ) > *thlast*. From transition appending analysis, we understand that number of transitions will near to the number of training data if the threshold of transition sets near to 1. In this case, results will be obtained more correctly

In this section, the convergence of the proposed algorithm will be analyzed. In step 6 of the LFPN learning algorithm, the formula (29) is used for making Wk (new) approach Xj than

new (old) (old) (old) old

(– ) –

  (31′)

(old)) is used to

**X W**

*k k jk k j k*

(new) (old) (old)

– – [ (– )]

(old) old (old)

*W W XW W XW X W X W*

*jk j k j k*

 

**X W**

Formula (11′) is rewritten as a scalar type and the scalar type of (*Xj*–*W<sup>k</sup>*

but the training time and LFPN running time will increase.

Wk (old) when Xj fired a transition trk. It is proved as follows.

Secondly, the weight *Wk*=(*w*1*<sup>k</sup>*, *w*2*<sup>k</sup>*, …, *wnk*)*T* is the weight on arcs from \**trk* to *trk* and *W*′*k*=( *w′k*1, *w′k*2, …, *w′kr*)*T* is the weight on arcs from *trk* to *trk*\*. *W*1, …, *Wk*, …, *Wm* and *W***′**1, …, *W***′***k*, …, *W***′***m* are the input and output arcs weight for *tr*1, …, *trk*, …, *trm*. Thus,

*W=*(*W*1, *W*2,…, *Wk*, …, *Wm*), *W′=*(*W′*1, *W′*2,…, *W′k*, …, *W′m*), i.e.


Lastly, in the learning algorithm, when *trk* is fired, the truth of *d′*1*<sup>j</sup>*, *d′*2*<sup>j</sup>*, …, *d′rj* to *trk* are defined as *α*(*d′*1*<sup>j</sup>*, *trk* )=1−|*y*1*<sup>j</sup>*−*w′k*1|/max (|*w′k*1|, |*y*1*<sup>j</sup>*|), *α*(*d′*2*<sup>j</sup>*, *trk* )=1−|*y*2*<sup>j</sup>*− *w′k*2| / max (|*w′k*2|, |*y*2*<sup>j</sup>*|), …, *α*(*d′rj*, *trk* )=1 − |*yrj*− *w′kr*| / max(|*w′kr*|, |*yrj*|) according to definition 3.The learning algorithm of learning fuzzy Petri net is shown in Table 4.

#### **Learning Algorithm of LFPN:**

**Step 1.** *W* and *W′* are selected randomly. **Step 2.** For every training data set (*Xj*, *Yj*)(*j*=1, 2, …, *N*), subject propositions *d*1*<sup>j</sup>*, *d*2*<sup>j</sup>*, …, *dnj* in *p*1*~pn* and propositions *d′*1*<sup>j</sup>*, *d′*2*<sup>j</sup>*, …, *d′rj* in *p′*1*~p′r* are produced. Then do step 3 to step 7; **Step 3.** For *i*=1 to *n* For *h*=1 to *m* do Compute *α*(*dij*, *trh* ) according to formula (2); **Step 4.** Compute maximum truth of transition 4.1 *Max*=*α*(*d*1*<sup>j</sup>*, *tr*1 ) •*α*(*d*2*<sup>j</sup>*, *tr*<sup>1</sup> ) •…• *α*(*dnj*, *tr*<sup>1</sup> ); *k*=1; 4.2 For *h*=1 to *m* do If *α*(*d*1*<sup>j</sup>*, *trh* ) •*α*(*d*2*<sup>j</sup>*, *trh* ) •…• *α*(*dnj*, *trh* )>*Max* Then { *Max*=*α*(*d*1*<sup>j</sup>*, *trh* ) •*α*(*d*2*<sup>j</sup>*, *trh* ) •…• *α*(*dnj*, *trh* ); *k*=*h*; } **Step 5**. Fire *trk*; **Step 6.** Make *d*1*<sup>j</sup>*, *d*2*<sup>j</sup>*, …, *dnj* have bigger truth to *trk*, *Wk*(new) = *Wk* (old) + *γ*(*Xj*−*Wk* (old)) (29) (*Wk* (new) is the vector *Wk* after update and *Wk* (old) is the vector *W<sup>k</sup>* before updated. *γ*∈(0,1) is learning rate.) **Step 7.** Make *d′*1*<sup>j</sup>*, *d′*2*<sup>j</sup>*, …, *d′rj* have bigger truth to *trk*,  *W′k*(new) = *W′k*(old) +*γ*(*Yj*−*W′k*(old) ) (30) (*W′k*(new) is the vector *W′k* after update and *W′k*(old) is the vector *W′k* before updated. *γ*∈ (0,1) is learning rate.) **Step 8**. Repeat step 2-7, until the truth of *α*(*d*1*<sup>j</sup>*, *trk* ), *α*(*d*2*<sup>j</sup>*, *trk* ), …, *α*(*dnj*, *trk* ) meet the requirement.

**Table 4.** Learning algorithm of learning fuzzy Petri net

Some details in the algorithm need to be elaborated further.

164 Petri Nets – Manufacturing and Computer Science

1 2

1 2

11 12 1 1 21 22 2 2 : :::: : : :::: :

*xx x x xx x x*

... ... ... ...

*j N j N*

*W***′***k*, …, *W***′***m* are the input and output arcs weight for *tr*1, …, *trk*, …, *trm*. Thus,

*k m k m* *Y*=

Secondly, the weight *Wk*=(*w*1*<sup>k</sup>*, *w*2*<sup>k</sup>*, …, *wnk*)*T* is the weight on arcs from \**trk* to *trk* and *W*′*k*=( *w′k*1, *w′k*2, …, *w′kr*)*T* is the weight on arcs from *trk* to *trk*\*. *W*1, …, *Wk*, …, *Wm* and *W***′**1, …,

*W′*=

Lastly, in the learning algorithm, when *trk* is fired, the truth of *d′*1*<sup>j</sup>*, *d′*2*<sup>j</sup>*, …, *d′rj* to *trk* are defined as *α*(*d′*1*<sup>j</sup>*, *trk* )=1−|*y*1*<sup>j</sup>*−*w′k*1|/max (|*w′k*1|, |*y*1*<sup>j</sup>*|), *α*(*d′*2*<sup>j</sup>*, *trk* )=1−|*y*2*<sup>j</sup>*− *w′k*2| / max (|*w′k*2|, |*y*2*<sup>j</sup>*|), …, *α*(*d′rj*, *trk* )=1 − |*yrj*− *w′kr*| / max(|*w′kr*|, |*yrj*|) according to definition 3.The

**Step 2.** For every training data set (*Xj*, *Yj*)(*j*=1, 2, …, *N*), subject propositions *d*1*<sup>j</sup>*, *d*2*<sup>j</sup>*, …, *dnj* in *p*1*~pn* and

*Wk*(new) = *Wk* (old) + *γ*(*Xj*−*Wk* (old)) (29)

**Step 8**. Repeat step 2-7, until the truth of *α*(*d*1*<sup>j</sup>*, *trk* ), *α*(*d*2*<sup>j</sup>*, *trk* ), …, *α*(*dnj*, *trk* ) meet the requirement.

(*Wk* (new) is the vector *Wk* after update and *Wk* (old) is the vector *W<sup>k</sup>* before updated.

(*W′k*(new) is the vector *W′k* after update and *W′k*(old) is the vector *W′k* before updated. *γ*∈

 *W′k*(new) = *W′k*(old) +*γ*(*Yj*−*W′k*(old) ) (30)

propositions *d′*1*<sup>j</sup>*, *d′*2*<sup>j</sup>*, …, *d′rj* in *p′*1*~p′r* are produced. Then do step 3 to step 7;

1 2

1 2

11 12 1 1 21 22 2 2 : :::: : : :::: :

*yy y y yy y y*

... ... ... ...

*j N j N*

... ...

11 21 1 1 12 22 2 2 : :::: : : :::: :

*ww w w ww w w*

 

*ww w w*

... ... ... ...

*k m k m*

... ...

*r r kr mr*

*r r rj rN*

*yy y y*

... ...

*n n nj nN*

*W=*(*W*1, *W*2,…, *Wk*, …, *Wm*), *W′=*(*W′*1, *W′*2,…, *W′k*, …, *W′m*), i.e.

... ... ... ...

... ...

learning algorithm of learning fuzzy Petri net is shown in Table 4.

*n n nk nm*

*ww w w*

Compute *α*(*dij*, *trh* ) according to formula (2);

4.1 *Max*=*α*(*d*1*<sup>j</sup>*, *tr*1 ) •*α*(*d*2*<sup>j</sup>*, *tr*<sup>1</sup> ) •…• *α*(*dnj*, *tr*<sup>1</sup> ); *k*=1;

If *α*(*d*1*<sup>j</sup>*, *trh* ) •*α*(*d*2*<sup>j</sup>*, *trh* ) •…• *α*(*dnj*, *trh* )>*Max*

*γ*∈(0,1) is learning rate.) **Step 7.** Make *d′*1*<sup>j</sup>*, *d′*2*<sup>j</sup>*, …, *d′rj* have bigger truth to *trk*,

(0,1) is learning rate.)

**Table 4.** Learning algorithm of learning fuzzy Petri net

Then { *Max*=*α*(*d*1*<sup>j</sup>*, *trh* ) •*α*(*d*2*<sup>j</sup>*, *trh* ) •…• *α*(*dnj*, *trh* );

11 12 1 1 21 22 2 2 : :::: : : :::: :

 

*ww w w ww w w*

*xx x x*

*X*=

*W*=

**Learning Algorithm of LFPN:**

4.2 For *h*=1 to *m* do

*k*=*h*; }

**Step 3.** For *i*=1 to *n* For *h*=1 to *m* do

**Step 5**. Fire *trk*;

**Step 1.** *W* and *W′* are selected randomly.

**Step 4.** Compute maximum truth of transition

**Step 6.** Make *d*1*<sup>j</sup>*, *d*2*<sup>j</sup>*, …, *dnj* have bigger truth to *trk*,


#### *Analysis for convergence of LFPN learning algorithm*

In this section, the convergence of the proposed algorithm will be analyzed. In step 6 of the LFPN learning algorithm, the formula (29) is used for making Wk (new) approach Xj than Wk (old) when Xj fired a transition trk. It is proved as follows.

$$\begin{aligned} \mathbf{W}\_{k}^{\text{(new)}} &= \mathbf{W}\_{k}^{\text{(old)}} + \boldsymbol{\gamma} \{ \mathbf{X}\_{j} \, -\mathbf{W}\_{k}^{\text{(old)}} \} = \mathbf{W}\_{k}^{\text{(old)}} + \boldsymbol{\gamma} \mathbf{X}\_{j} \, -\boldsymbol{\gamma} \mathbf{W}\_{k}^{\text{(old)}} \\ \mathbf{X}\_{j} &= \mathbf{W}\_{k}^{\text{(new)}} = \mathbf{X}\_{j} \, -\left[ \mathbf{W}\_{k}^{\text{(old)}} + \boldsymbol{\gamma} (\mathbf{X}\_{j} \, -\mathbf{W}\_{k}^{\text{(old)}}) \right] \\ \mathbf{X} &= \mathbf{X}\_{j} \, -\mathbf{W}\_{k}^{\text{(old)}} - \boldsymbol{\gamma} \mathbf{X}\_{j} + \boldsymbol{\gamma} \mathbf{W}\_{k}^{\text{(old)}} \\ \mathbf{X} &= \left( \mathbf{I} - \boldsymbol{\gamma} \right) \left( \mathbf{X}\_{j} \, -\mathbf{W}\_{k}^{\text{(old)}} \right) \end{aligned} \tag{31}$$

Formula (11′) is rewritten as a scalar type and the scalar type of (*Xj*–*W<sup>k</sup>* (old)) is used to divide both sides of formula (11′). We get formula (11).

#### 166 Petri Nets – Manufacturing and Computer Science

$$\left(\frac{\mathbf{x}\_{ij} - \mathbf{w}\_{kj}^{\text{(new)}}}{\mathbf{x}\_{ij} - \mathbf{w}\_{kj}^{\text{(old)}}}\right)\_{0 \le i \le n} = \left(\frac{\left(1 - \boldsymbol{\gamma}\right) \cdot \left(\mathbf{x}\_{ij} - \mathbf{w}\_{kj}^{\text{(old)}}\right)}{\mathbf{x}\_{ij} - \mathbf{w}\_{kj}^{\text{(old)}}}\right)\_{0 \le i \le n} = 1 - \boldsymbol{\gamma} \tag{31}$$

Construction and Application of Learning Petri Net 167

*WX X X* (32)

( (0)

Generally, *q* is a small positive constant and *t* is large. Then,

Consequently, the learning algorithm of LFPN converges.

large, i.e. *t* is large.

From formula (14) will be gotten:

of training data *Xk*1, *Xk*2, …, *Xks*.

1 ,1

1 ,1

1 ,1

1 ,1

*W WX X X* (33)

<sup>1</sup> lim ... ) *k k t kt k <sup>t</sup> <sup>t</sup> <sup>X</sup> X XX* (34)

<sup>1</sup> ... ) *k k t kt q t*

When the training time increases, the training data set *Xk*1, *Xk*2, …, *Xkt* can be looked as very

<sup>1</sup> lim lim ... ) *kt <sup>k</sup> k t kt t t q t* 

<sup>1</sup> lim lim ... ) *kt <sup>k</sup> k t kt t t <sup>t</sup> W WX X X*

1 1 lim lim ... ) *k k t kt t t t t W X XX*

 *Wk*→ *X <sup>k</sup>* (35)

In the same way, *Wk*→ *X <sup>k</sup>* (*k*=1, 2, …, *m*) and *W′k*→*Y <sup>k</sup>* (*k*=1, 2, …, *m*) can be proved.

1. *Xk*1, *Xk*2, …, *Xkt* fire a certain transition *trk* at training time. As the training time increase, there are almost same data which fire the transition *trk* in every training time. These data belong to a class *k*. We suppose that these data are *Xk*1, *Xk*2, …, *Xks*. When training begins, supposing, there is data *Xu* which does not belong to *Xk*1, *Xk*2, …, *Xks* but fires *trk*. But, when training times increase, *Wk* will approach to *Xk*1, *Xk*2, …, *Xks* and the probability which *X<sup>u</sup>* fires *trk* will decrease. Hence, this type data *Xu* is very small part of *Xk*1, *Xk*2, …, *Xkt*. *X<sup>u</sup>* little affects to *Wk*. On the other hand, when training begins, there is *Xke* which belongs to *Xk*1, *Xk*2, …, *Xks* but doesn't fire transition *trk*. But, when training times increase, the probability which *Xke* fires *trk* increases, then, *Xk*1, *Xk*2, …, *Xks* can be approximately looked firing *trk* according to the training. *X <sup>k</sup>* is denoted as the average

2. In the convergence demonstration, we use a special series of learning rate *γ*. Form the analysis in 1), *Xk*1, *Xk*2, …, *Xks* can be looked as a class data which fires one transition *trk*. The data series *Xk*1, *Xk*2, …, *Xkt* can be looked as iterations of *Xk*1, *Xk*2, …, *Xks*. *Wk* can

3. After training, *Wk*=(*w*1*<sup>k</sup>*, *w*2*<sup>k</sup>*, …, *wnk*) comes near to the average of data which belong to class *k*, i.e., *W<sup>k</sup>* ≈ *X <sup>k</sup>* =( *x* <sup>1</sup>*<sup>k</sup>*, *x* <sup>2</sup>*<sup>k</sup>*, …, *x nk*). When a data *Xkj* belong to class *k* comes, *Xkj* will

converge to a point near *X <sup>k</sup>* with any damping learning rate series *γ*.

( 1 ,1

( (0)

( (0)

( (0)

Now, we will analyze the convergence process and signification of convergence.

Hence, *W<sup>k</sup>* will converge to *Xj* after enough training times.

In LFPN learning algorithm, there may be a class of training data *Xj* which are able to fire same transition *trk*. In this case, *W<sup>k</sup>* approaches to a class of data *Xj* and converges to a point in the class of data *Xj* according to formula (31).

Now, we will discuss the point in the class of data *Xj* where *W<sup>k</sup>* converges to. Supposing, there are *b*1 data which are in *X*1, *X*2, …, *Xj*, …, *XN* and fire a certain transition *trk* at the first training episode. At the second training episode, there are *b*2 data which fire *trk*, and so on. If the total training times is *ep* and the total number of data which fire *trk* is *t*, *t* = <sup>1</sup> *ep <sup>i</sup> <sup>i</sup> <sup>b</sup>* . According to the order of the data fired *trk*, these *t* data are rewritten as *Xk*1, *Xk*2, …, *Xkt*. The average of training data *Xk*1, *Xk*2, …, *Xkt* is noted as *X <sup>k</sup>*. To record the updated process of *W<sup>k</sup>* simply, the updated order of *Wk* is recorded as *Wk*1,*Wk*2…. *Wkt*.

The learning rate *γ* (0<*γ*<1) will decrease according to training time increasing, and it approaches to 0 at last because every training data cannot effect *Wk* too much in the last stage of training, else *Wk* will shake at the last stage of training. If learning rate *γ* is set as 1/(*q+*1) (*q*>0) when training begin, 1/(*q*+2), 1/(*q*+3), …, 1/(*q*+*t*) are set as learning rate *γ* when *trk* is fired at 2, 3, …, *t* time. Here, the initial values of *W<sup>k</sup>* is set as *Wk*0=*W*(0)×1/*q*, every component of *W*(0) ×1/*q* is a random value in [*X*min, *X*max]. According to formula (29), we get

$$\boldsymbol{W}\_{k0} = \frac{1}{q} \boldsymbol{W}^{(0)}$$

$$W\_{k1} = W\_{k0} + \frac{1}{q+1}(X\_{k1} - W\_{k0}) = \frac{1}{q}W^{(0)} - \frac{1}{q+1} \times \frac{1}{q}W^{(0)} + \frac{1}{q+1}X\_{k1} = \frac{1}{q+1}(W^{(0)} + X\_{k1})$$

$$W\_{k2} = W\_{k1} + \frac{1}{q+2}(X\_{k2} - W\_{k1}) = \frac{1}{q+1}W^{(0)} + \frac{1}{q+1}X\_{k1}$$

$$-\frac{1}{q+1} \times \frac{1}{q+2}W^{(0)} - \frac{1}{q+1} \times \frac{1}{q+2}X\_{k1} + \frac{1}{q+2}X\_{k2}$$

$$= \frac{1}{q+2}(W^{(0)} + X\_{k1} + X\_{k2})$$

$$W\_{kl} = W\_{k,t-1} + \frac{1}{q+t}(X\_{kl} - W\_{k,t-1}) = \frac{1}{q+t-1}W^{(0)} - \frac{1}{q+t} \times \frac{1}{q+t-1}W^{(0)} + \frac{1}{q+t-1}X\_{k,t-1}$$

$$\frac{1}{q+t-1}X\_{k1} - \frac{1}{q+t} \times \frac{1}{q+t-1}X\_{k1} + \dots + \frac{1}{q+t-1}X\_{k,t-1} - \frac{1}{q+t-1} \times \frac{1}{q+t-1}X\_{k,t-1} + \frac{1}{q+t}X\_{k,t}$$

1 1

Construction and Application of Learning Petri Net 167

$$=\frac{1}{q+t}(W^{(0)} + X\_{k1} + \ldots + X\_{k,t-1} + X\_{kt})\tag{32}$$

When the training time increases, the training data set *Xk*1, *Xk*2, …, *Xkt* can be looked as very large, i.e. *t* is large.

$$\lim\_{t \to \infty} \mathcal{W}\_{kt} = \lim\_{t \to \infty} \frac{1}{q+t} (\mathcal{W}^{(0)} + X\_{k1} + \dots + X\_{k,t-1} + X\_{kt}) \tag{33}$$

Generally, *q* is a small positive constant and *t* is large. Then,

$$\lim\_{t \to \infty} W\_{kt} \approx \lim\_{t \to \infty} \frac{1}{t} (W^{(0)} + X\_{k1} + \dots + X\_{k, t-1} + X\_{kt})$$

$$= \lim\_{t \to \infty} \frac{1}{t} W^{(0)} + \lim\_{t \to \infty} \frac{1}{t} (X\_{k1} + \dots + X\_{k, t-1} + X\_{kt})$$

$$\approx \lim\_{t \to \infty} \frac{1}{t} (X\_{k1} + \dots + X\_{k, t-1} + X\_{kt}) = \overline{X\_k} \tag{34}$$

From formula (14) will be gotten:

166 Petri Nets – Manufacturing and Computer Science

 0 *w*

1

*x*

In LFPN learning algorithm, there may be a class of training data *Xj* which are able to fire same transition *trk*. In this case, *W<sup>k</sup>* approaches to a class of data *Xj* and converges to a point

Now, we will discuss the point in the class of data *Xj* where *W<sup>k</sup>* converges to. Supposing, there are *b*1 data which are in *X*1, *X*2, …, *Xj*, …, *XN* and fire a certain transition *trk* at the first training episode. At the second training episode, there are *b*2 data which fire *trk*, and so on. If the total training times is *ep* and the total number of data which fire *trk* is *t*, *t* = <sup>1</sup>

According to the order of the data fired *trk*, these *t* data are rewritten as *Xk*1, *Xk*2, …, *Xkt*. The average of training data *Xk*1, *Xk*2, …, *Xkt* is noted as *X <sup>k</sup>*. To record the updated process of *W<sup>k</sup>*

The learning rate *γ* (0<*γ*<1) will decrease according to training time increasing, and it approaches to 0 at last because every training data cannot effect *Wk* too much in the last stage of training, else *Wk* will shake at the last stage of training. If learning rate *γ* is set as 1/(*q+*1) (*q*>0) when training begin, 1/(*q*+2), 1/(*q*+3), …, 1/(*q*+*t*) are set as learning rate *γ* when *trk* is fired at 2, 3, …, *t* time. Here, the initial values of *W<sup>k</sup>* is set as *Wk*0=*W*(0)×1/*q*, every component of *W*(0) ×1/*q* is a random value in [*X*min, *X*max]. According to formula (29), we get

> 0 1 *<sup>k</sup> q W W*

10 10 1 1 11 1 1 ( ) 1 1 1 *kk kk <sup>k</sup> q qq q q* 

 *WW XW W W X* ( (0)

(0)

1 11 ( ) <sup>1</sup> *kt k t kt k t q t qt qt*

, 1 , 1

1 1 1 11 ... 1 1 *k k qt qtqt* 

 

( (0)

 *WW XW W* <sup>1</sup> (0)

*X X* , 1 , 1

(0)

(0)

1 2

1 11 1 1 1 *k t k t kt qt qtqt qt* 

(0) (0)

21 21 1 1 11 ( ) 2 11 *kk kk <sup>k</sup> q qq* 

> <sup>1</sup> ) <sup>2</sup> *k k <sup>q</sup>*

*W XX*

*WW XW W X*

11 11 1 12 12 2 *k k qq qq q*

*W XX*

1 2

(0)

*ij kj old ij kj i n*

*x w*

*old*

0

1

(31)

*ep <sup>i</sup> <sup>i</sup> <sup>b</sup>* .

1

<sup>1</sup> ) <sup>1</sup> *<sup>k</sup> <sup>q</sup>* 

*q t* <sup>1</sup> 

*X X X*

*W*

*W X*

*old ij kj i n*

Hence, *W<sup>k</sup>* will converge to *Xj* after enough training times.

simply, the updated order of *Wk* is recorded as *Wk*1,*Wk*2…. *Wkt*.

*ij kj*

in the class of data *Xj* according to formula (31).

*x w x w*

 

*new*

$$\mathbf{W} \mapsto \overline{\mathbf{X}\_k} \tag{35}$$

In the same way, *Wk*→ *X <sup>k</sup>* (*k*=1, 2, …, *m*) and *W′k*→*Y <sup>k</sup>* (*k*=1, 2, …, *m*) can be proved. Consequently, the learning algorithm of LFPN converges.

Now, we will analyze the convergence process and signification of convergence.


have same vector characteristic with *Xk*1, *Xk*2, …, *Xks*, i.e. *x*1,*kj*, *x*2,*kj*, …, *xn*,*kj* are near to *w*1*<sup>k</sup>*, *w*2*<sup>k</sup>*, …, *wnk*. Then, each component *xi*, *kj* (1≤*i*≤*n*) of this data *Xkj* will have bigger similarity to *wik* (1≤*i*≤*n*) than *i*-th components of other weight *W* according to formula (2). *Xkj* will have biggest truth to *trk* according to formula (2). Thus, when data *Xkj* which belongs to class of *Xk*1, *Xk*2, …, *Xks* inputs to LFPN, it will fire *trk* correctly and product correct output.

Construction and Application of Learning Petri Net 169

model (LFPNSD) is proposed. LFPNSD is a 10-tuple, given by LFPNSD = <*P*, *Tr*, *F*, *W*, *D*, *I*, *O*,

*W*=*F*→ *Keywords+*, where weight function on *Pinput×Tr* are different keywords of service description and weight function on *Tr×Poutput* are different service invoking information. *D* = {*d*11,*<sup>a</sup>*, *d*12,*<sup>b</sup>*, *d*13,*<sup>c</sup>*}∪{*d*21,*<sup>e</sup>*, *d*22,*<sup>f</sup>* , *d*23,*<sup>g</sup>*, *d*24,*<sup>h</sup>* } is a finite set of propositions, where proposition *d*11, *<sup>a</sup>* is that *P*11 has a service description tokens; proposition *d*12, *<sup>b</sup>* is that *P*12 has a free textual description tokens; proposition *d*13, *<sup>c</sup>* is that *P*13 has a service operation and port parameters tokens. And the propositions *d*21, *<sup>e</sup>*, *d*22, *<sup>f</sup>*, *d*23, *<sup>g</sup>*, *d*24, *<sup>h</sup>* are that *P*21, *P*22, *P*23, *P*<sup>24</sup> have different

*Th*, *α*, *β* > (as shown in Figure 13.)

F (Pinput×Tr)∪(Tr×Poutput)

keywords is expressed as:

*wk*,23, *wk*,24 are put into *P*21*~P*24.

where: *Tr*, *I*, *O*, *Th*, *β* are same with definition of LFPN.

**Figure 13.** The learning fuzzy Petri net for Web service discovery (LFPNSD)

*α*(*dij*, *trk* )→ [0, 1]. *α*(*dij*, *trk*)=*yi*∈ [0, 1] is the degree of truth of proposition *dij* to *trk*. *α*(*dij*, *trk*) is computed by bellow rules: if input description has *n* keywords and the *wik* on arc *Pi* to *trk* has *s* same keywords, the degree of similarity between weight keywords and input description

> | | , 1 max( , ) *ij k n s d tr*

The fire rule of transition: if *α*(*d*11,*<sup>a</sup>*, *trk*) •*α*(*d*12,*<sup>b</sup>*, *trk*) •*α*(*d*13,*<sup>c</sup>*, *trk*) =max((*α*(*d*11,*<sup>a</sup>*, *tri*) •*α*(*d*12,*<sup>b</sup>*, *tri*) •*α*(*d*13,*<sup>c</sup>*, *tri*))1≤*i*≤*<sup>m</sup>*) and all of *α*(*d*11,*<sup>a</sup>*, *trk*), *α*(*d*12,*<sup>b</sup>*, *trk*), *α*(*d*13,*<sup>c</sup>*, *trk*) are bigger than a threshold value *th*, then *trk* fires, the tokens in *P*11*~P*13 are taken out and tokens which according to *wk*,21, *wk*,22,

*n s*

(36)

*P*= {*Pinput*}∪{*Poutput*}={*P*11, *P*12, *P*13}∪{*P*21, *P*22, *P*23, *P*24}

invoking information tokens of services.
