**4. Experiment and analysis**

#### **4.1 Experiment**

In this experiment, we use artificial data sets and the MATLAB and CUDA C language to implement NSWGA algorithm. We use the computer with 2.61GHZ CPU, 2GMB memory, Nvidia GPU C1060, windows XP operating system to test the performance of the algorithm.

The size of the sliding window is 100k. The size of the data set is 200K.With the data flowing, we make statistic every 10K of the data.

1. The analog data stream has three attributes. Each attribute has 10 possible values. The running results of the algorithms are shown in Table 1.

Mining Frequent Itemsets over Recent Data Stream Based on Genetic Algorithm 301

In Table 2, the attribute of analog data has 10 possible property values, and in Table 3 there are 20. With the number of possible property values increasing, the runtime of Dstree algorithm will be greatly increased, while the runtime of NSWGA algorithm almost has no

It is important for prediction and decision-making to find frequent items among huge data stream. This chapter presents an approach, namely NSWGA (Nested Sliding Window Genetic Algorithm), about mining frequent itemsets on data stream within the current window. NSWGA uses the parallelism of genetic algorithm to search for the frequent itemset of the latest data in the nested sub-window. The final frequent itemsets of the sliding window is obtained by the integrated treatment of this series of frequent itemsets in nested sub-window. NSWGA captures the latest frequent itemsets accurately and timely on data stream. At the same time the expired data is deleted periodically. As the use of nested windows and the parallel processing capability of genetic algorithm, this method reduced

In this chapter, an algorithm about mining frequent patterns of data stream- NSWGA algorithm is proposed. The main contributions of this algorithm: (1) The parallelism of genetic algorithm is used to mine the frequent patterns of data stream , which reduces the runtime; (2) The algorithm combines the sliding window with genetic algorithm to propose an improved method to obtain initial population; (3) This algorithm gurantees the speed of

[1] Lichao Guon, HongyeSu,YuQu. Approximate mining of global closed frequent itemsets over data streams. Journal of the Franklin Institute 348 (2011) 1052–1081. [2] Chao-Wei Li, Kuen-Fang Jea. An adaptive approximation method to discover frequent

[3] Syed Khairuzzaman Tanbeer, Chowdhury Farhan Ahmed, Byeong-Soo Jeong \*, Young-

[4] Carson Kai-Sang Leung Quamrul I. Khan.DSTree: A Tree Structure for the Mining of

[5] Tzung Pei Hong, Chun Wei Lin, Yu Lung Wu. Maintenance of fast updated frequent

[6] Han J, Jian P.Miningfrequent patterns without candidate generation[C].Dallas,TX:

[7] Zhi-Xin Feng, Zhong Cheng. An algorithm for mining maximal frequent patterns based

itemsets over sliding-window-based data streams. Expert Systems with

Koo Lee. Sliding window-based frequent pattern mining over data streams.

Frequent Sets from Data Streams[C].Hong Kong: Proceedings of the Sixth

pattern trees for record deletion. Computational Statistics & Data

Proceedings of ACM SIGMOD International Conference on Management of Data,

change.

**5. Summary** 

the time complexity.

**6. References** 

implementation and query precision.

Applications 38 (2011) 13386–13404.

Analysis,Vol.53,(2009)2485-2499.

(2000)1-12.

Information Sciences 179 (2009) 3843–3865.

International Conference on Data Mining. (2006)928–932.

on FP-tree. Computer Engineering, Vol.30, (2004) 123-124.


Table 1. The comparison of fp-tree **algorithm** and NSWGA algorithm

2. The analog data stream is the same as above. The running results of the algorithms are shown in Table 2.


Table 2. The comparison 1 of Dstree algorithm and NSWGA algorithm

3. The analog data stream has three attributes. Each attribute has 20 possible values. The running results of the algorithms are shown in Table 3.


Table 3. The comparison 2 of Dstree algorithm and NSWGA algorithm

#### **4.2 Analysis of the experimental results**

As shown in Table 1, with the support degree increasing, the frequent patterns of these two algorithms are rapidly reducing, the number of matching is reduced and eventually the runtime will be reduced. However, fp-tree algorithm not only needs to maintain the global frequent pattern tree, but also requires additional time to build a sub-pattern tree for each data segment. Then this algorithm saves the information of the sub-pattern tree to the global frequent pattern tree. With the times of process increasing,the runtime of fp-tree algorithm is becoming longer than NSWGA.

Table 2 shows that, with the support degree increasing, the algorithms which use pattern tree to maintain the information of the frequent patterns such as Dstree algorithm can not reduce the runtime, but NSWGA algorithm is able to save a lot of runtime.

In Table 2, the attribute of analog data has 10 possible property values, and in Table 3 there are 20. With the number of possible property values increasing, the runtime of Dstree algorithm will be greatly increased, while the runtime of NSWGA algorithm almost has no change.
