**3. Algorithms for PMWL**

The results of traditional matching algorithm are complete, so the focus of research is to improve the matching efficiency. As a kind of searching problem, the key to solving matching problem is how to use and extract information getting from text and pattern. KMP, BM algorithm uses automata to describe the pattern characteristics, and deposit information obtained from scanning during matching process into automata. Algorithm visits the automata, when the jump distance needs to be calculated, thus to avoid obtaining the pattern information repeatedly and to ensure the jump in matching process does not affect the final result. The basic idea the suffix tree is to use the tree structure to describe the text information, and to avoid scanning the same text repeatedly when matching a set of patterns. We believe that data structure and search strategy are crucial for traditional algorithms to access to information of text and pattern. Reasonable data structure is better to explore the potential of the computer, such as bit parallel technology, and can also be a more reasonable representation of the sequence information, such as automata. In addition, there exist the sliding window, indexes and other data structures. Reasonable matching strategy makes better use of sequence information. These strategies can approximately be divided into prefix searching, suffix searching and factor searching (Navarro & Raffinot, 2001).

Research on Pattern Matching with Wildcards and Length Constraints: Methods and Completeness 305

As different extension of traditional matching problem, PMWL problem, approximate matching, and swap matching all belong to the *Non-standard Stringology* problem. Problems in this field mostly belong to the optimization problem, and most of them have not yet been completely solved, such as PMWL and approximate matching problems with wildcards etc.

1. From the view of algorithm itself, the data structures and matching strategies are as the

2. From the view of describing the object, how to effectively describe the patterns and text

1. For PMWL, there is no complete solving yet, so algorithm evaluation criteria include both time efficiency and solution quality; but traditional matching algorithm is only

2. The flexibility and complexity of the PMWL problem definition are reflected in the pattern, therefore, compared with traditional matching, PMWL pay more attention to the description of the pattern information. Pattern characteristics are extremely

Next, we will give the representative algorithms for solving PMWL problem, and detailed description of their design ideas from the perspective of data structure and matching

*Input*: A text *T* = *t*0*t*1*…tn-*1, a pattern *P* = *p*0*p*1*…pm-*1, local constraints *gi* = *g*(*Ni, Mi*), global

1. *Location*: ① Search position *i* where *t*[*i*] = *p*[*m*-1], and locate position *k* where *t*[*k*] = *p*[0] by considering the global constraint. ② Cut out a substring in *T* from *t*[*k*] to *t*[*i*] named

2. *Forward*: Scan the table forward, and mark all the positions satisfying the local and

3. *Backward*: Scan the table backward, and select the *left-most* position in the marked cells

Generally, SAIL starts from the beginning of *T* to search position *i* where *t*[*i*] = *p*[*m*-1]. After that, SAIL conducts two phases, the *Forward* phase and the *Backward* phase. In the *Forward* phase, SAIL determines whether there is a potential matching occurrence by using a search table. Afterwards, if a potential matching occurrence can be determined, *Backward* phase is

*T'*. ③ Build the table with the row and column according to *T'* and *P*.

global constraints. They are the potential matching positions.

every row that compose an occurrence. Then mark them used.

triggered out to output an optimal occurrence by using the *left-most* strategy.

What PMWL and traditional matching problem have in common are:

What PMWL and traditional matching problem have in difference are:

key of algorithm design.

concerned with matching time.

strategy.

**3.1. The SAIL Algorithm** 

constraints [*minLen*, *maxLen*].

The Steps of the algorithm:

information is the key to solve the problem.

associated with the solution of PMWL problem.

Description of SAIL Algorithm (Chen et al., 2006):

*Output*: Occurrences of *P* in *T* satisfying the constraints.


**Table 1.** Analysis of the traditional pattern matching algorithms

As different extension of traditional matching problem, PMWL problem, approximate matching, and swap matching all belong to the *Non-standard Stringology* problem. Problems in this field mostly belong to the optimization problem, and most of them have not yet been completely solved, such as PMWL and approximate matching problems with wildcards etc. What PMWL and traditional matching problem have in common are:


What PMWL and traditional matching problem have in difference are:


Next, we will give the representative algorithms for solving PMWL problem, and detailed description of their design ideas from the perspective of data structure and matching strategy.
