**4. Improved Eclat framework**

For the four learning behavior datasets corresponding to L1, L2, T1 and T2, the execution results of the reference items can be described in the form of probability, but not the "Support" calculation mode. The expected "Support" of reference items should be used to describe the execution frequency of uncertain components [20], that is a feasible and target analysis strategy, which is the model basis for improving the Eclat framework.

#### **4.1 Related models**

The Related Models for the improvement of Eclat framework are as follows.

#### *4.1.1 Expected "Support" of reference items*

Given a probabilistic data set with *N* reference item instances, the expected "Support" of a reference term *X* is expressed as the cumulative value of the probability in the probabilistic data set. The calculation formula is exp *ect* � supð Þ¼ *X* P*<sup>N</sup> <sup>i</sup>*¼<sup>1</sup>*pi* ð Þ *X* .

#### *4.1.2 Frequent itemsets*

Based on the expected "Support" of a reference item, a probabilistic data set with *N*reference item instances is given, if it meets exp *ect* � supð Þ *X* ≥ *N* � min \_*RST*, the reference item *X* is a frequent item set. min \_*RST* is the minimum relative "Support" threshold, which is calculated by the ratio of the minimum absolute "Support" threshold to the reference item instance. Generally, this value can be specified according to the data distribution.

#### *4.1.3 Probability frequency*

Combined with the conditions of frequent itemsets, given a probabilistic data set with *N* reference item instances, the probability frequency of the reference term is defined as: *proF X*ð Þ¼ *proF*f g exp *ect* � supð Þ *X* ≥ *N* � min \_*RST* .

### *4.1.4 Probabilistic frequent itemsets*

Given a probabilistic data set with *N* reference item instances, if meeting *proF X*ð Þ≥ min \_*proF*, the reference item *X* is a probabilistic frequent itemset, min \_*proF* is the minimum frequent probability threshold, which can also be specified according to the data distribution.

#### **4.2 Algorithm design**

Many algorithms for mining frequent itemsets mostly use horizontal data format with transaction as vector [5, 21]. The uncertainty of learning behavior data makes the analysis of learning behavior need vertical data format. One complete learning behavior of learners constitutes a transaction. Based on Eclat framework, it is suitable to adopt *tidlist* data structure, and add a probability parameter to each item of learning behaviors to indicate the possibility of a specific transaction.

The vertical data format of learning behavior data is a binary tuple ð Þ *x*, *tidlist x*ð Þ , which represents the item set of learning behaviors, and *x* is the identifier of each item, that is, the number of each learning behavior, *tidlist x*ð Þ is the list of items of *x*. If each item contains an identifier *ii* and an existence probability *pX i*ð Þ*<sup>i</sup>* , *tidlist x*ð Þ is expressed as a tuple *i*1, *px*ð Þ *i*<sup>1</sup> � �, *<sup>i</sup>*2, *px*ð Þ *<sup>i</sup>*<sup>2</sup> � �, <sup>⋯</sup>, *ii*, *px <sup>i</sup>*ð Þ*<sup>i</sup>* � � � � . In the algorithm design of vertical data format, it is necessary to complete the calculation of probability frequency. Here we use two-dimensional array *Px*½ � *i*, *j* to represent the probability quality function, which means the *X* probability of the *i* occurrence in the previous *j* reference items. Therefore, the calculation process of probability frequency is described as PFC(Frequent Pattern Calculation) program.

#### **PFC program**

Input: Item set *X ii*, *pX i*ð Þ*<sup>i</sup>* � �//1≤*<sup>i</sup>* <sup>≤</sup>j j*I*,*<sup>I</sup>* represents the maximum number of transactions.

*Improved Probabilistic Frequent Itemset Analysis Strategy of Learning Behaviors Based on… DOI: http://dx.doi.org/10.5772/intechopen.97219*

Output: *Px*½ � *i*, *j* Process

1.PFC()

2.For *j*=0 to j j*I*

3.*Px*½ � 0, *j* =1

4.EndFor //Initialize the first row units of *Px*½ � *i*, *j* of a to 1

5.For *j*=0 to j j*I*

6.For *i*=0 to min \_*Value j* ð Þ , min \_*RST* // min \_*Value j* ð Þ , min \_*RST* is used to compare *j* and *min*\_*RST*, then return the minimum value.

$$
\text{7. } \qquad \text{If } i > j \text{ then } P\_{\mathbf{x}}[i, j] = \mathbf{0}.
$$

$$\text{8.} \qquad \text{Else if } i > j \text{ then } P\_{\mathbf{x}}[i, j] = \prod\_{i=1}^{j} p\_{\mathbf{x}}(i\_i).$$

9. Else if *i* <*j*

$$\begin{array}{cc} \mathbf{10.} & \mathbf{Then} \ P\_X[i, j] = P\_X[i - \mathbf{1}, j - \mathbf{1}] \cdot p\_X(i\_j) + \\ & \max\left(P\_X[i, j - \mathbf{1}], P\_X[i - \mathbf{1}, j]\right) \cdot \left(\mathbf{1} - p\_X(i\_j)\right) \end{array}$$

/\*This formula is a kind of dynamic decision programming, and the maximum probability frequency is obtained by the adjacent units.\*/


Based on the calculation results of probability frequency, Eclat algorithm is designed. There are three main steps:

**Firstly**, according to the vertical data format, the transactions and corresponding items are extracted from the learning behavior data set, with the help of bi-directional sorting strategy, transactions are initialized. The items are stored according to *tidlist*. Then, it analyzes the "Support" of the transactions stored in *tidlist*, and discards the transactions with lower "Support" (support <*min*\_*RST*).

**Secondly**, the items of learning behaviors are pruned and optimized, and the *k*�item set from *tidlist* is extracted by intersection, and the probability frequency of *k*�item set is realized by multiplication operation.

**Thirdly**, mining probabilistic frequent itemsets recursively in candidate itemsets. In the mining process, pruning strategy based on *tidlist* is implemented to reduce the search time complexity. Furthermore, based on the projection of *k*�frequent itemsets, the probability data composed of frequent itemsets are obtained.

These three steps constitute a recursive process, and the whole algorithm process is described as LB(Learning Behavior)-Eclat program.

#### **LB-Eclat Program**


15.Output: all probabilistic frequent itemsets.
