**2.1. Definition of HLTPN**

HLTPN is one of expanded Petri nets.

**Definition** 1: HLTPN has a 5-tuple structure, *HLTPN*= (*NG, C, W, DT, M0*) [9], where

i. *NG=* (*P*, *Tr*, *F*) is called "net graph" with *P* which called "Places". *P* is a finite set of nodes. *ID*: *PN* is a function marking *P*, *N =* (1, 2, …) is the set of natural number. *p*1, *p*2, *…*, *pn* represents the elements of *P* and *n* is the cardinality of set *P*;

*Tr* is a finite set of nodes, called "Transitions", which disjoints from *P*, *PTr= ID*:*TrN* is a function marking *Tr*. *tr*1, *tr*2, …, *trm* represents the elements of *Tr*, *m* is the cardinality of set *Tr*;

*F* (*P×Tr*)∪(*Tr×P*) is a finite set of directional arcs, known as the flow relation;


#### **2.2. Definition of LPN**

144 Petri Nets – Manufacturing and Computer Science

learning.

net models [23].

concerned with Learning Petri nets.

**2.1. Definition of HLTPN** 

nodes. *ID*: *P*

**2. The learning Petri net model** 

HLTPN is one of expanded Petri nets.

(HLTPN). The definition of HLTPN is given firstly.

capability, then FPNs are able to realize self-adapting and self-learning functions. Consequently, it achieves automatic knowledge reasoning and fuzzy production rules

Recently, there are some researches for making the Petri net have learning capability and making it optimize itself. The global variables are used to record all state of colored Petri net when it is running [22]. The global variables are optimized and colored Petri net is updated according to these global variables. A learning Petri net model which combines Petri net with a neural network is proposed by Hirasawa et al., and it was applied to nonlinear system control [10]. In our former work [5, 6], a learning Petri net model has been proposed based on reinforcement learning (RL). RL is applied to optimize the parameters of Petri net. And, this learning Petri net model has been applied to robot system control. Konar gave an algorithm to adjust thresholds of a FPN through training instances [1]. In [1], the FPN architecture is built on the connectionism, just like a neural network, and the model provides semantic justification of its hidden layer. It is capable of approximate reasoning and learning from noisy training instances. A generalized FPN model was proposed by Pedrycz et al., which can be transformed into neural networks with OR/AND logic neuron, thus, parameters of the corresponding neural networks can be learned (trained) [24]. Victor and Shen have developed a reinforcement learning algorithm for the high-level fuzzy Petri

This chapter focuses on combining the Petri net and fuzzy Petri net with intelligent learning method for construction of learning Petri net and learning fuzzy Petri net (LFPN), respectively. These are applied to dynamic system controls and a system optimization. The rest of this paper is organized as follow. Section 2 elaborates on the Learning Petri net construction and Learning algorithm. Section 3 describes how to use the Learning Petri net model in the robots systems. Section 4 constructs a LFPN. Section 5 shows the LFPN is used in Web service discovery problem. Section 6 summarizes the models of Petri net described in the chapter and results of their applications and demonstrates the future trends

The Learning Petri net (LPN) model is constructed based on high-level time Petri net

i. *NG=* (*P*, *Tr*, *F*) is called "net graph" with *P* which called "Places". *P* is a finite set of

*N* is a function marking *P*, *N =* (1, 2, …) is the set of natural number. *p*1,

**Definition** 1: HLTPN has a 5-tuple structure, *HLTPN*= (*NG, C, W, DT, M0*) [9], where

*p*2, *…*, *pn* represents the elements of *P* and *n* is the cardinality of set *P*;

In HLTPN, the weight functions of input and output arc for a transition decide the input and output token of a transition. These weight functions express the input-output mapping of transitions. If these weight functions are able to be updated according to the change of system, modeling ability of Petri net will be expanded. The delay time of HLTPN expresses the pre-state lasting time. If the delay time is able to be learnt while system is running, representing ability of Petri net will be enhanced. RL is a learning method interacting with a complex, uncertain environment to achieve an optimal policy for the selection of actions of the learner. RL suits to update dynamic system parameters through interaction with environment [18]. Hence, we consider using the RL to update the weight function and transition's delay time of Petri net for constructing the LPN. In another word, LPN is an expanded HLTPN, in which some transition's input arc weight function and transition delay time have a value item which records the reward from the environment.

**Definition** 2: LPN has a 3-tuple structure, *LPN*= (*HLTPN, VW, VT*), where


**Figure 1.** An example of LPN model

An example of LPN model is shown in Figure 1 Using LPN, a mapping of input-output tokens is gotten. For example, in Figure 1, colored tokens *Cij* (*i=*1;*j=*1, 2, …, *n*) are input to *P*<sup>1</sup> by *Trinput*. There are *n* weight functions *W*(<*C*1*<sup>j</sup>*>, *VWC*1*j,*1*,j*) on a same arc *F*1*,j*. it is according to the value *VWCij,i,j* that token *C*1*<sup>j</sup>* obeys what weight functions in *W*(<*Cij*>, *VWCij,i,j*) to fire a transition. After token *C*1*<sup>j</sup>* passed through arc *Fi,j* (*i=*1; *j=*1, 2, …, *n*), one of *Tri,j* (*i=*1; *j=*1, 2, …, *n*) fires and generates Tokens *Cij* (*i=*2; *j=*1, 2, …, *n*) in *P2*. After *P2* has color Token *Cij* (*i=*2; *j=*1, 2, …, *n*), *Tri,j* (*i=*2; *j=*1, 2, …, *n*) fires and different colored Token *Cij* (*i=*3; *j=*1, 2, …, *n*) is generated. Then, a mapping of *C*1*<sup>j</sup>* – *C*3*<sup>j</sup>* is gotten. At the same time, a reward will be gotten from environment according to whether it accords with system rule that *C*3*<sup>j</sup>* generated by *C*1*<sup>j</sup>*. These rewards are propagated to every *VWCij,i,j* and adjust the *VWCij,i,j*. After training, the LPN is able to express a correct mapping of input-output tokens.

Construction and Application of Learning Petri Net 147

*t-*1*+rt* (2)


i. *á* is the step-size,is a discount rate.

(2).

is listed in Table 1.

1 , 1, ( ) *VWci j i j*

next time.

Algorithm 1. Weight function learning algorithm

Repeat i) and ii) until system becomes end state.

B: Select the function randomly at probability *ε*.

**Table 1.** Weight function learning algorithm in LPN

**Step 1.** Initialization: Set all *VWij* and *r* of all input arc's weight function to zero. **Step 2.** Initialize the learning Petri net. i.e. make the Petri net state as *M0*.

is set according to execution environment by user, usually 0<ε<<1). A: Select the function which has the biggest *VW cij,i,j* at probability1-ε;

function value using *VWCij,i,j* = *VWCij,i,jj* +*α*[*r*+ 1 , 1, ( ) *VWci j i j*

i. When a place gets a colored Token *Cij,* there is a choice that which arc weight function is obeyed if the functions include this Token. This choice is according to selection policy which is ε greedy (ε

ii. The transition which the function correlates fires and reward is observed. Adjust the weight


feedback as *W*(<*Cij*>, *VWCij,i,j*) next time reward *r*.

where *t* is time for that <*Ci+*1*<sup>j</sup>*> is generated by *W*(<*Cij*>, *VWCij,i,j*).

ii. *r* is reward which *W*(<*Cij*>, *VWCij,i,j*) gets when *Tri,j* is fired by <*Cij*>. Here, because environment gives system reward at only last step, so a feedback learning method is used. If *W*(<*Cij*>, *VWCij,i,j*) through *Tri,j* generates Token <*Ci+1,j*> and *W*(<*Ci+1j*>, *VWCi+1j,i+1,j*) through *Tri+1,j* generates Token <*Ci+2,j*>, *VWCi+1j,i+1,j* gets an update value, and this value is

iii. 1 , 1, ( ) *VWci j i j* is calculated from feedback value of all *W*(<*Ci+1j*>, *VWCi+1j,i+1,j*) as formula

When every weight function of input arc of the transition has gotten the value, each transition has a value of its action. The policy of the action selection needs to be considered. The simplest action selection rule is to select the service with the highest estimated stateaction value, i.e. the transition corresponding to the maximum *VWCij,i,j*. This action is called a greedy action. If a greedy action is selected, the learner (agent) exploits the current knowledge. If selecting one of the non-greedy actions instead, agent intends to explore to improve its policy. Exploitation is to do the right thing to maximize the expected reward on the one play; meanwhile exploration may produce the greater total reward in the long run. Here, a method using near-greedy selection rule called ε-greedy method is used in action selection; i.e., the action is randomly selected at a small probability ε and selected the action which has the biggest *VW cij,i,j* at probability 1−ε. Now, we show the algorithm of LPN which

1 , 1, ( ) *VWci j i j* <sup>t</sup>*=* 1 , 1, ( ) *VWci j i j* 

Using LPN to model a dynamic system, the system state is modeled as Petri net marking which is marked for a set of colored token in all places of Petri net, and the change of the system state (i.e. the system action) is modeled as fired of transitions. Some parameters of system can be expressed as token number and color, arc weight function, transition delay time, and so on. For example, different system signals are expressed as different colored of token. When the system is modeled, some parameters are unknown or uncertain. So, these parameters are set randomly. When system runs, the system parameters are gotten gradually and appropriately through system acting with environment and the effect of RL.

## **2.3. Learning algorithm for LPN**

In LPN, there are two kinds of parameters. One is discrete parameter −− the arc's weight function which describes the input and output colored tokens for transition. The other is continuous parameter −− the delay time for the transition ring. Now, we will discuss two kinds of parameters which are learnt using RL.
