**2.1 Frequent itemset mining based on apriori**

Frequent itemset mining based on Apriori takes the construction of itemset association rules as the premise. The mining process is based on the horizontal format and completes the extraction of rules through iterative search strategy. After data connection and pruning, the itemsets satisfying the association rules are formed [8–11]. If one itemset satisfies the minimum support and a certain confidence, it is defined as a frequent itemset. Apriori algorithm is used to analyze the relevance of learning behaviors, the main idea is to select the learning platform, locate the components of learning behaviors, realize the association between learning behaviors and learning effects, define learning behavior as "cause", and define learning effect as "result". The traditional Apriori algorithm is improved flexibly. With the help of clustering, weighted balance, decision tree evaluation and other means, the data tracking are realized. The research target is to optimize learning behaviors and improve learning efficiency. However, the frequent itemset mining process of Apriori needs to scan the original data many times. When the original data is large, the number of times of iterative scanning is too much, which seriously affects the efficiency of the algorithms.

#### **2.2 Frequent itemsets mining based on FP-growth**

Frequent itemset mining based on FP-growth also uses horizontal data format, but the data structure is essentially different. The process of data analysis is mainly divided into two steps: constructing FP tree and mining frequent itemsets. Through the construction of FP tree, the expression of itemsets associated transaction is realized, that is, one path of FP tree corresponds to a transaction, and the transaction is composed of items. Different transactions may have the same items, which makes the path of FP tree overlap. The more overlapped, the greater the path compression space, the higher the access efficiency of FP [12–15]. FP-growth is used to mine frequent itemsets of learning behaviors. Its main idea is similar to Apriori. According to the research target of learning behaviors, users require to select the data set of learning behaviors, define the itemsets and research target, put forward hypothesis test, explore the rules by means of classification, clustering and decision making, draw a conclusion, and verify the existing education and teaching according to the data analysis results, but there are some problems. Due to the diversity, randomness and complexity of learning behaviors, FP-growth algorithm has obvious limitations in the study of learning behaviors. When the itemsets of learning behavior are too many or the relationships are complex, it will lead to too many sub nodes of FP tree, which will greatly reduce the efficiency of the algorithms, and can not get accurate and complete frequent itemsets. FP-growth algorithm is very difficult to learn.

#### **2.3 Frequent itemsets mining based on Eclat framework**

Compared with Apriori and FP(Frequent Pattern)-growth, The fundamental difference of Eclat is that the algorithm analysis of Eclat uses vertical data format, and is essentially a deep optimization search mechanism. The rule search space is effectively divided into subspace sets through concept lattice and equivalence relationships. The support calculation of each itemset does not require repeated retrieval of the entire dataset [16–19]. The main idea of using Eclat framework to study learning behaviors need the support of big data set of learning behaviors, through data transposition and standardization processing, we can get the itemsets and the transaction set. On this basis, the relevant models and algorithms of Eclat framework are improved and redesigned. On the premise of support, confidence and promotion, frequent itemsets and association rules are mined. Taking the final frequent itemsets and association rules as the references. Vertical data analysis and research based on Eclat framework can improve the speed of data search, association and analysis, and also improve the reliability of data validation results to a certain extent.

However, the Eclat framework is rarely used in the data processing of learning behaviors. Therefore, the improvement of algorithms and models has no effective results, which is directly related to the difficulty of technology caused by the complexity of learning behaviors. If Eclat is used to transpose and intersect all items and transactions, or if the number of items and transactions is too large, the efficiency of the algorithms will be affected. Therefore, the mining of frequent itemsets in Eclat framework should be assisted by other algorithms and tools, which is more practical. This chapter will integrate the advantages and feasible attempts in the application of Eclat framework, such as technical improvement, model design, tool application, etc., so as to provide more effective methods for the follow-up study of big data of learning behaviors and others.

#### **3. Elements of learning behaviors and research problems**

We select a big data set of learning behaviors of UK open university in four periods in recent two years, and the data scale reaches hundreds of millions. From the perspective of course category, we realize the tracking and comparison of learning behaviors of the same category and different categories, and make adaptive decision. The courses are divided into two categories: Literature and Technology. For each category, two courses are selected, namely L1 and L2, T1 and T2. Different courses have different periods of learning behaviors, with the help of assessment, the learning effects are achieved. There is correlation between learning behaviors and learning effects, and there is mutual restrictive and driving relationships between the components of learning behaviors. The empirical problems and testing strategies are established between learning behaviors and learning effects, the research conclusions and decision making reflection are the basis for the improvement and optimization of data-driven learning behaviors.

**Tables 1**–**4** show the components and indicators of learning behaviors corresponding to the four courses of L1, L2, T1 and T2. The four tables involve four learning periods: P1, P2, P3 and P4. The data distribution of the tables indicates that not all courses have learning behavior in each period. The indicators involve two statistics: the median and the mode, which are used to investigate the population trend. Different indicators are selected according to different types of components. "assessment" represents the assessment method of courses, that is composed of enumeration components, mainly including CT (Computer Test), TT (Teacher


#### **Table 1.** *Components and indicators of L1.*

*Improved Probabilistic Frequent Itemset Analysis Strategy of Learning Behaviors Based on… DOI: http://dx.doi.org/10.5772/intechopen.97219*


#### **Table 2.**

*Components and indicators of L2.*


#### **Table 3.**

*Components and indicators of T1.*

Test) and exam (computer and Teacher joint test); "final\_result" represents the result of the course assessment and is also an enumeration type, including four components: excellent, pass, fail and withdrawn. "assessment" and "final\_result" measure the group tendency of courses. Other components are the main parts of the interaction processes. They all describe the interaction frequency, which has the autonomy and randomness of learners. The strength of interaction frequency is assessed by the median to investigate the distribution range.

From **Tables 1**–**4**, we can see that the concentration of group selection of "assessment" is very obvious. Most of the learners have completed the course assessment by teachers, but the assessment results are quite different, and the assessment results of the same course in different learning periods are also different. About P4 of L2, as same as P2 and P3 of T1, learners tend to give up the assessment.


#### **Table 4.**

*Components and indicators of T2.*

In P4 of T2, most of the learners obtain excellent assessment results, and most of them pass the course. From "assessment" and "final\_result", the group indicators of Literature courses and Technology courses are similar.

As for other components of learning behaviors, it can be found from the data that the category of components and the participation of isomorphic components show strong discrete characteristics. The results show that the types of interaction components in two learning periods of L2 and three learning periods of T1 are consistent, and the median is relatively close, which indicates that the distribution of learners' participation in these interactive components is basically consistent. The two learning periods of L2 have the same "final\_result" mode, and the assessment results of T1 have obvious differences. The comparison of the types or numbers of interaction components related to the same course in different learning periods directly shows the differences. The interactions are significantly different, and there is a gap in the median of the same interaction component, such as "content" of two learning periods of L1. At the same time, the types of interaction components that belong to Literature or Technology courses are subject to the courses. The learners of L1 and L2 have their own interactive components, and T1 and T2 are the same.

Therefore, their interaction components of L1, L2, T1 and T2 reflect the autonomous learning characteristics, and the component constraints of assessment methods and results realize the differentiation of learners. The problems and relationships are shown in **Figure 1**, which is divided into the following four steps:

1.The mining of frequent itemsets will take different interaction components as reference items, and realize the analysis and mining of frequent itemsets based on reference items according to certain probability;

*Improved Probabilistic Frequent Itemset Analysis Strategy of Learning Behaviors Based on… DOI: http://dx.doi.org/10.5772/intechopen.97219*

**Figure 1.** *The research problems and logical relationships.*


The certain probability in the four steps depends on the selected algorithm requirements and measurement support. Based on the improved Eclat framework, we complete the four steps of the research problems, uses the three indicators "Support", "Confidence" and "Lift" to realize constraints, analyzes threshold and test criteria, and mines probabilistic frequent itemsets and association rules.
