**5. Conclusions**

As an extension of traditional matching problem, the PMWL problem has aroused more and more attention because of its unique flexibility and complexity. Based on problem definition and drawing on research idea in traditional matching problem, this article introduces SAIL, RSAIL, SBO and BPBM which are representative algorithms for PMWL in three important respects: the data structures, the matching strategies and the characteristics of pattern. The article also analyzes the pros and cons of the above algorithms from the point of quality of the solution and time complexity, and gives experimental matching results by using real DNA data. Among them, the SAIL algorithm is the first to propose the method of solving PMWL problem, it uses the sliding window structure and the representative *left-most* matching strategy. This paper finds that in short patterns, the approximation ratio of SAIL is higher than 0.9, while in longer patterns, the occurrences obtained by SAIL are of poor quality; the quality of occurrences obtained by SBO is best, but its time consumption has a non-linear relationship with the length of text; BPBM utilizes bit parallel technology to improve the efficiency of matching greatly, but also is impact by the machine word; for pattern with repeated letters in tail, RSAIL uses symmetry to improve the quality of occurrences under certain conditions, thus providing a solving idea to PMWL problem, but in longer patterns and wilder gaps, the efficiency is not obvious.

Afterwards, this article focus on relationship between approximation ratio ε and alphabet size ∑, pattern length *m*, wildcards span *gap* and repeatability *rep*. Firstly, this article proposes the model ε = F (Σ, *m*, *gap*), describing the functional relationship between pattern characteristics and approximation ratio approximately; secondly, this article proves PMWL's completeness under the conditions of *rep* = 0; finally, the relationship between the

pattern features are also analyzed andm in addition, relationship that 2 ( ) | | *Cm E rep* is

proposed.

In future work, the formal description of the PMWL problem will be considered, in order to explain the complexity of the problem better, thus helping algorithm design and analysis for problem complexity.

## **Author details**

Haiping Wang, Taining Xiang and Xuegang Hu *Hefei University of Technology, China* 

#### **6. References**

Amir, A., Aumann, Y., Landau, G., Lewenstein, M. & Lewenstein, N. (2000). Pattern matching with swaps, *Journal of Algorithms*, 37(2): 247-266

Research on Pattern Matching with Wildcards and Length Constraints: Methods and Completeness 325

Ji, X. N., Bailey, J. & Dong, G. Z. (2007). Mining minimal distinguishing subsequence patterns with gap constraints, *Knowledge and Information Systems*, 11(3): 259-286 Kucherov, G. & Rusinowitch, M. (1995). Matching a set of strings with variable length don't cares, *Proceedings of the 6th Symposium on Combinatorial Pattern Matching*, Springer, Berlin

Manber, U. & Baeza-Yates, R. (1991). An algorithm for string matching with a sequence of

Min, F., Wu, X. D. & Lu, Z. Y. (2009). Pattern matching with independent wildcard gaps, *Eighth IEEE International Conference on Dependable, Autonomic and Secure* 

Morgante, M., Policriti, A., Vitacolonna, N. & Zuccolo, A. (2004). Structured motifs search, *Proceedings of the 8th annual international conference on Computational molecular biology*, In

Muth, R. & Manber, U. (1996). Approximate multiple string search, *Combinatorial Pattern* 

Muthukrishnan, S. & Krishna, P. (1994). Non-standard stringology: algorithms and complexity [C], *Proceedings of the twenty-sixth annual ACM symposium on Theory of* 

Navarro, G. & Raffinot, M. (2001). Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences, Cambridge University

Rahman, M. S., Iliopoulos, C., Lee, I., Mohamed, M. & Smyth, W. F. (2006). Finding Patterns with Variable Length Gaps or Don't Cares, *Computing and Combinatorics, 12th Annual International Conference*, COCOON 2006, Taipei, Taiwan, August 15-18, Proceedings.

Sagot, M. F. & Viari, A. (1996). A Double Combinatorial Approach to Discovering Patterns in Biological Sequence, *Proceedings of the 7th Symposium on Combinatorial Pattern* 

Wang, H. P., Xie, F., Hu, X. G., Li, P. P. & Wu, X. D. (2010). Pattern Matching with Flexible Wildcards and Recurring Characters, *Proceedings of 2010 IEEE International Conference on* 

Wu, Y. X., Wu, X. D., Jiang, H. & Min, F. (2011). A Heuristic Algorithm for MPMGOOC,

Xie, F., Wu, X. D., Hu, X. G., Gao, J., Guo, D., Fei, Y. L. & Ertian, H. (2010). Sequential Pattern Mining with Wildcards [C], *22nd IEEE International Conference on Tools with Artificial* 

"National center for biotechnology information website", [online], available:

Heidelberg New York, pp. 230–247

*Matching, Springer*, pp. 75–86

http://www.ncbi.nlm.nih.gov/

*Matching*, Springer, pp. 186-208

*Granular Computing*, pp. 782-786

*Intelligence (ICTAI)*, pp. 241-247

*Chinese Journal of Computers*, 34(8): 1452-1462

*computing New York*, NY, USA, pp. 770-779

print

Press

Vol. 4112

don't cares, *Information Processing Letters*, 37(3): 133–136

*Computing(DASC-2009)*, Chengdu, China, pp. 194-199


Ji, X. N., Bailey, J. & Dong, G. Z. (2007). Mining minimal distinguishing subsequence patterns with gap constraints, *Knowledge and Information Systems*, 11(3): 259-286

324 Bioinformatics

**6. References** 

41–44

399–419

vol 7, pp. 113-125

*Systems*, 42(6): 382-401

158

Amir, A., Aumann, Y., Landau, G., Lewenstein, M. & Lewenstein, N. (2000). Pattern

Amir, A. & Navarro, G. (2009). Parameterized matching on non-linear structures, *Information* 

Baeza-Yates, R. & Gonnet, G. (1992). A new approach to text searching, *Communications of the* 

Bille, P., Gørtz, I. L., Vildhøj, H. & Wind, D. (2010). String matching with variable length

Brudno, M., Steinkamp, R. & Morgenstern, B. (2004). The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences, *Nucleic Acids Research*, 32:

Califf, M. E. & Mooney, R. J. (2003). Bottom-up relational learning of pattern matching rules

Chen, G., Wu, X. D., Zhu, X.Q., Arslan, A. N. & He, Y. (2006). Efficient string matching with wildcards and length constraints, *Knowledge and Information Systems*, 10(4):

Cole, J.R., Chai, B., Marsh, T. L., Farris, R. J., Wang, Q., Kulam, S. A., Chandra, D. M., McGarrell, D. M., Schmidt, T. M., Garrity, G. M. & Tiedje, J. M. (2005). The ribosomal database project(RDP-11): Sequences and tools for high-throughput rRNA analysis,

Cole, R., Gottlieb, L. A. & Lewenstein, M. (2004). Dictionary matching and indexing with errors and don't cares, *Proceedings of the 36th ACM Symposium on the Theory of* 

Fischer, M. J. & Paterson, M. S. (1974). String matching and other products, *In Karp RM(ed) Complexity of computation, Massachusetts Institute of Technology*, Cambridge, MA, USA,

Gusfield, D. (1997). Algorithms on strings, trees and sequences: computer science and

Guo, D., Hong, X. L., Hu, X. G., Gao, J., Liu, Y. L., Wu, G. Q. & Wu, X. D. (2011). A Bit-Parallel Algorithm for Sequential Pattern Matching with Wildcards, *Cybernetics and* 

He, D., Wu, X. D. & Zhu, X. Q. (2007). SAIL-APPROX: An efficient on-line algorithm for approximate pattern matching with wildcards and length constraints, *IEEE International Conference on Bioinformatics and Biomedicine (BIBM'07)*, IEEE Computer Society, pp. 151–

He, Y., Wu, X. D., Zhu, X. Q. & Arslan, A. N. (2007). Mining Frequent Patterns with Wildcards from Biological Sequences [C], *IEEE International Conference on Information* 

for information extraction, *Journal of Machine Learning Research*, 4(6): 177-210

matching with swaps, *Journal of Algorithms*, 37(2): 247-266

*processing letters*, 109(15): 864-867

gaps, *Proceedings of 17th SPIRE*, pp. 385–394

*Nucleic Acids Research*, 33(1): 294-296

*Computing*, ACM Press, New York, NY, USA, pp. 91–100

computational biology, chapter 6, Cambridge University Press

*Reuse and Integration*, Las Vegas, IL, pp. 329-334

*ACM*, 35(10): 74–82


Zhang, M. H., Kao, B., Cheung, D. W. & Yip, K. Y. (2005). Mining periodic patterns with gap requirement from sequences, *Proceedings of ACM SIGMOD*, Baltimore Maryland, pp. 623–633

623–633

Zhang, M. H., Kao, B., Cheung, D. W. & Yip, K. Y. (2005). Mining periodic patterns with gap requirement from sequences, *Proceedings of ACM SIGMOD*, Baltimore Maryland, pp.
