**2. Theoretical foundation**

292 Bio-Inspired Computational Algorithms and Their Applications

Mining frequent patterns on a data stream has been studied in many ways and the mining

FP-Tree structure is generated by reading data from the transaction database. Each tree node contains an item marker and a count. The count shows transaction numbers which is mapped in the path. Initially FP-Tree contains only one root node, marked with the symbol null. First of all it scans the data set to determine the support count of each item to discard non-frequent items, and list the frequent items in descending order according to their support count. Then, it scans data set secondly to construct FP-Tree. After reading the first transaction data, it can create a node and the path of the first transaction and give the transaction a code. We design the frequency count as 1 to all of the nodes on the path. Then, it should read each of the other transaction data in order to form different paths and nodes. The frequency count will be adjusted until each transaction is mapped to a path on FP-Tree. After reading all the transaction formation to construct the FP-Tree, the FP-Stream algorithm

DStree algorithm is a relatively new algorithm for mining frequent itemsets which have the concept of nested sub windows in sliding window. DStree algorithm separates the current transaction database data into blocks, then statistic frequent itemsets in the current window. When a next block of data comes to the moment, the prior block data becomes the historical data. The second block of data replace the first one. Some of the information are available in

estDec algorithm is a effective way to mine frequent itemsets of current on-line data stream. Each node of estDec algorithm model tree contains a triple (*count, error, Id*). For the relevant item *e*, its number is shown by *count*. The maximum error count of *e* is shown with *error* and *Id* is the determined factor of *e* wich contains the most recent transactions. estDec algorithm is divided into four parts: update parameter, update count, the delay difference and choose

As using model tree in FP-Tree , DStree and estDec algorithm, it is difficult to make the

With the development of the card, GPU (Graphic Process Unit) become more and more powerful. It has transcended the CPU computation not only on graphic but also on scientific computing. CUDA is a parallel computation framework which is introduced by NVIDIA. The schema makes GPU be able to solve complex calculations. It contains the schema CUDA instruction set and internal computation engine. GPU is characterized by processing parallel computation and dense data, so CUDA suites large-scale parallel computation field very

This work proposes a NSWGA (Nested Sliding Window Genetic Algorithm) algorithm. Firstly, NSWGA gets the current data stream through the sliding window and uses a nested sub-window dividing up the data stream in current window into sub-blocks; then, the parallel idea of genetic algorithm and parallel computation ability of GPU are used to seek frequent itemsets in the nested sub-window; at last, NSWGA gets the frequent patterns in

This chapter is organized as follows. Theoretical foundation is described in Section 2. The algorithm is designed for Nested Sliding Window Genetic Algorithm of mining frequent

the current window through the frequent patterns of the nested sub-windows.

algorithm computing parallel and the algorithm run time is also difficult to reduce.

methods include Dstree[2,3,4], FP-tree[5,6,7]as well as estDec[11] algorithm.

could be used on FP-Tree to mine its frequent itemsets.

current DStree and prepare for the next generation of a DStree

frequent items.

well[12].

The study combines the sliding window techniques, frequent itemsets, genetic algorithm and parallel processing technology.

Sliding window has been used in the network communication, time-series data mining, data stream mining and so on. This algorithm uses the sliding window [9,10] to obtain the current data stream.

**Definition 1** sliding window: For a positive number ω1, a certain time T, data sets D = (d0, d1 ,..., dn) fall into the window SW(the size of window SW is ω1), the window SW is called the sliding window.

**Definition 2** nested sub-window: For a positive number ω2, a certain time T, the newest data set dn in sliding window SW falls into the nested window NSW ( the size of NSW is ω2), the nested window NSW is called the nested sub-window.

As shown in Figure 1, the application of sliding window for dynamic updating of data sets is explained.

Fig. 1. Dynamic updating of the data in sliding window

Mining Frequent Itemsets over Recent Data Stream Based on Genetic Algorithm 295

Block (2,1)

Block (2,1)

Block (2,0)

Block (2,0)

Thread (0,0)

Thread (0,1)

Thread (0,15)

Thread (1,0) Thread (1,1)

Thread (1,15)

Thread (15,0)

Thread (15,1)

Thread (15,15)

Block (1,1)

Block (1,1)

Block (1,0)

Block (1,0)

NSWGA uses the sliding window to get the recent data and uses genetic algorithms to mine

NSWGA algorithm is divided into three parts: (1) NSWGA uses the parallelism of genetic algorithm to search for the frequent itemsets of the latest data in the nested sub-window. (2)The final frequent itemsets of the sliding window are obtained by the integrated treatment of this series of frequent itemsets in nested sub-windows. (3)With the new data

In the first part, the current frequent itemsets in NSW is obtained. The process is shown as

**Step 1.** Set the size of sliding window SW ω1. Set the size of nested sub-window NSW

ω2.Window sizes are determined by the properties of the data stream. ω1 depends on how many current affairs whose frequent itemsets we are interested in. ω2 depends on the processing capability of the algorithm and our statistical frequency. Given the support threshold S, fitness function Fi=Wi/WZ, when Fi ≥ S, transaction

The iteration times T depends on the number of attributes that a transaction iterm includs and the scope of the attribute values and the original population size. The

coming, the expired data is deleted periodically. Repeat the above two operations.

iterm i is a frequent pattern of the data set in sliding window.

Fig. 2. CUDA programme model

Serial code

Parallel kernel

Serial code

Parallel kernel

frequent itemsets of the data in the current window.

Block (0,0)

Grid0

Block (0,1)

Block (0,0)

Grid1

Block (0,1)

Output frequent itemsets of recent data stream

**3. Algorithm design** 

**3.1 Algorithm description** 

figure 3.

Input data streams to be mined

**Definition 3** frequent itemsets in sliding window: For the current data in sliding window, a collection of items I = {i1, i2, ..., in} , transaction iterm data set S = {s1, s2 ,..., sn}, each transaction iterm is a collection of items, s⊆I。If X⊆S, then X is an itemset. If there are k elements in X, we call X the k-itemsets. With respect to an itemset X, if its support degree is greater than or equal to the minimum support threshold given by the user, then X is called the frequent itemsets.

Genetic algorithm starts the search process from an initial population. Each individual in the population is a possible frequent pattern. We use the genetic algorithm to achieve the result mainly through crossover, mutation and selection [8]. After several generations of selection, we achieve a final frequent itemsets. The major rules and operators in genetic algorithm are as follows:


1

*i*

=

Crossover operator is mainly used to interchange some genes between the parent chromosomes. Through the operation between two individuals of parent generation, we get the daughter generation. Thus, daughter generation would inherit the effective models of the parent generation.

5. Mutation Operator: The algorithm uses the Simple Mutation. If the parent chromosome is A (a1a2a3 ... ai ... an), after the variation, the daughter chromosome becomes A1 (a1a2a3 ... bi ... an).

Mutation operation changes some genes randomly to generate new individuals. Mutation operation is an important cause to obtain global optimization. It helps to increase the population diversity, but in this algorithm, the corresponding genes which are required to generate the frequent itemsets already exist, so we use a lower mutation rate.

When we establish the parallel part in the program, we can let this part run into GPU. The function which runs in GPU is called kernel (kernel function). A kernel function is not a complete program, but the parallel part of the entire CUDA program[13,14]. A complete CUDA program execution is shown in figure 2. The graph shows that in a kernel function there are two parallel levels, the parallel blocks in the grid and the parallel threads in the block.

Fig. 2. CUDA programme model
