**1. Introduction**

290 Bio-Inspired Computational Algorithms and Their Applications

Schaffer, J. D. and Morishima, A. 1987. An Adaptive crossover distribution mechanism for

Schwefel, H. P. 1977. Numerishce Optimierung von Computer-Modellen mittels der

Smith, V. 1962. Experimental study of competitive market behavior. *Journal Political* 

Spears, W. M. 1995. Adapting Crossover in Evolutionary Algorithm. *Proceedings of the Fourth* 

Tesfatsion, L. 2002. Agent-based computational economics: Growing economies from the

Uckun, S., Bagchi, S. and Kawamura, K. 1993. Managing Genetic Search in Job Shop Scheduling. *IEEE Expert: Intelligent Systems and Their Applications*, 8(5): 15-24. Wolfstetter, E. 2002. Auctions: An Introduction. *Journal of Economic Surveys,* 10: 367-420.

*Annual Conference on Evolutionary Programming.* pp. 367-384. Cambridge: MIT Press.

Evolutionsstrategic. *Interdisciplinary System Research.* 26.

pp. 36-40.

*Economy*, 70: 111–137.

bottom up. *Artificial Life*, 8(1): 55-82.

Genetic Algorithms. *In* Grefensttete, J. J. (Ed) *Genetic Algorithms and their Applications: Proceedings of the Second International Conference on Genetic Algorithms.* 

> Data stream is massive sequence of data elements generated at a rapid rate which is characterized by continuously flowing, high arrival rate, unbounded size of data and realtime query requests. The knowledge embedded in a data stream is more likely to be changed as time goes by. Identifying the recent change of a data stream, especially for an online data stream, can provide valuable information for the analysis of the data stream. Frequent patterns on a data stream can provide an important basis for decision making and applications. Because of the data stream's fluidity and continuity, the information of frequent patterns changes with the new data coming.

> Mining over data streams is one of the most interesting issues of data mining in recent years. Online mining of data streams is an important technique to handle real-world applications, such as traffic flow management, stock tickers monitoring and analysis, wireless communication management, etc. In most of the data stream applications, users tend to pay more attention to the mode information of the recent data stream. Therefore, mining frequent patterns in recent data stream is a challenging work. The mining process should have one-pass algorithm, high efficiency of updating, limited space cost and online response of queries. However, most of mining algorithms or frequency approximation algorithms over a data stream could not have high efficiency to differentiate the information of recently generated data elements from the obsolete information of old data elements which may be no longer useful or possibly invalid at present.

> Many previous studies contributed to efficient mining of the frequent itemsets over the streams. Generally, three processing models are used which are the landmark model, the sliding window model and the damped model[1]. The landmark model analyzes the stream in a particular window, which starts from a fixed timestamp called landmark and ends up with the current timestamp. For the sliding window model case, the mining process is performed over a sliding window of a fixed length. Based on the sliding window model, the oldest data is pruned immediately when a new data arrives. The damped model uses the entire stream to compute the frequency with a decay factor *d*, which makes the recent data more important than the previous ones.

<sup>\*</sup> Supported by Fundamental Research Funds for the Central Universities No. DUT10JR15

Mining Frequent Itemsets over Recent Data Stream Based on Genetic Algorithm 293

itemsets in data streams in Section3. In Section 4, comprehensive experiments for the algorithm are implemented in built environment and give the comparison with other methods. Moreover, algorithm analysis is also proposed for mining time-sensitive sliding

The study combines the sliding window techniques, frequent itemsets, genetic algorithm

Sliding window has been used in the network communication, time-series data mining, data stream mining and so on. This algorithm uses the sliding window [9,10] to obtain the

**Definition 1** sliding window: For a positive number ω1, a certain time T, data sets D = (d0, d1 ,..., dn) fall into the window SW(the size of window SW is ω1), the window SW is

**Definition 2** nested sub-window: For a positive number ω2, a certain time T, the newest data set dn in sliding window SW falls into the nested window NSW ( the size of NSW is

As shown in Figure 1, the application of sliding window for dynamic updating of data sets

d0 d1 d2 d3 d4 ... ... ... dn-1 dn dn+1

(a)

SW

d0 d1 d2 d3 d4 ... ... ... dn-1 dn dn+1

(b)

SW NSW

NSW

New Data

windows in this section.. Finally, we summarize the work in Section 5.

ω2), the nested window NSW is called the nested sub-window.

Fig. 1. Dynamic updating of the data in sliding window

Historical Data

**2. Theoretical foundation** 

current data stream.

is explained.

called the sliding window.

and parallel processing technology.

Mining frequent patterns on a data stream has been studied in many ways and the mining methods include Dstree[2,3,4], FP-tree[5,6,7]as well as estDec[11] algorithm.

FP-Tree structure is generated by reading data from the transaction database. Each tree node contains an item marker and a count. The count shows transaction numbers which is mapped in the path. Initially FP-Tree contains only one root node, marked with the symbol null. First of all it scans the data set to determine the support count of each item to discard non-frequent items, and list the frequent items in descending order according to their support count. Then, it scans data set secondly to construct FP-Tree. After reading the first transaction data, it can create a node and the path of the first transaction and give the transaction a code. We design the frequency count as 1 to all of the nodes on the path. Then, it should read each of the other transaction data in order to form different paths and nodes. The frequency count will be adjusted until each transaction is mapped to a path on FP-Tree. After reading all the transaction formation to construct the FP-Tree, the FP-Stream algorithm could be used on FP-Tree to mine its frequent itemsets.

DStree algorithm is a relatively new algorithm for mining frequent itemsets which have the concept of nested sub windows in sliding window. DStree algorithm separates the current transaction database data into blocks, then statistic frequent itemsets in the current window. When a next block of data comes to the moment, the prior block data becomes the historical data. The second block of data replace the first one. Some of the information are available in current DStree and prepare for the next generation of a DStree

estDec algorithm is a effective way to mine frequent itemsets of current on-line data stream. Each node of estDec algorithm model tree contains a triple (*count, error, Id*). For the relevant item *e*, its number is shown by *count*. The maximum error count of *e* is shown with *error* and *Id* is the determined factor of *e* wich contains the most recent transactions. estDec algorithm is divided into four parts: update parameter, update count, the delay difference and choose frequent items.

As using model tree in FP-Tree , DStree and estDec algorithm, it is difficult to make the algorithm computing parallel and the algorithm run time is also difficult to reduce.

With the development of the card, GPU (Graphic Process Unit) become more and more powerful. It has transcended the CPU computation not only on graphic but also on scientific computing. CUDA is a parallel computation framework which is introduced by NVIDIA. The schema makes GPU be able to solve complex calculations. It contains the schema CUDA instruction set and internal computation engine. GPU is characterized by processing parallel computation and dense data, so CUDA suites large-scale parallel computation field very well[12].

This work proposes a NSWGA (Nested Sliding Window Genetic Algorithm) algorithm. Firstly, NSWGA gets the current data stream through the sliding window and uses a nested sub-window dividing up the data stream in current window into sub-blocks; then, the parallel idea of genetic algorithm and parallel computation ability of GPU are used to seek frequent itemsets in the nested sub-window; at last, NSWGA gets the frequent patterns in the current window through the frequent patterns of the nested sub-windows.

This chapter is organized as follows. Theoretical foundation is described in Section 2. The algorithm is designed for Nested Sliding Window Genetic Algorithm of mining frequent itemsets in data streams in Section3. In Section 4, comprehensive experiments for the algorithm are implemented in built environment and give the comparison with other methods. Moreover, algorithm analysis is also proposed for mining time-sensitive sliding windows in this section.. Finally, we summarize the work in Section 5.
