4. An inductive inference perspective at mining HCI data

In contrast to earlier approaches that are widespread (see Figure 1, where in the CRISP-like model on the right, the "model" node is hatched, as it is missing in the original figure [33]), the authors stress the aspects illustrated by the (four groups of) darker boxes in Figure 3. First of all, data are not seen as a monolithic object within the process concept but as an emerging sequence. Second, whereas in the Fayyad process (see Figure [1] and the source [32]), the pattern concept appears from nowhere, the terminology of forming hypotheses is seen a central issue—the selection of a logic and the design of suitable spaces of hypotheses, both potentially subject to revision over time. Third, the inductive modeling procedure discussed in some more detail throughout this chapter is identification by enumeration.

without any need for dealing with syntactic sugar. The underlying apparatus of mathematics

Mining HCI Data for Theory of Mind Induction http://dx.doi.org/10.5772/intechopen.74400 57

In Figure 4, the darker boxes with white inscriptions denote conventional concepts of recursion-theoretic inductive inference [44]. The other boxes reflect formalizations of this chapter's core approaches to HCI data mining by means of identification by enumeration. The concepts derived from the present chapter's practical investigations form a previously unknown infinite hierarchy between the previously known concepts NUM and TOTAL.

Throughout the remaining part of this section, the authors confine themselves to only elemen-

Learning logical theories is very much like learning recursive functions. Both have finite descriptions but determine a usually infinite amount of facts—the theorems of a theory and the values of a function, respectively. In both cases, the sets of facts are recursively enumerable but usually undecidable. The deep interplay of logic and recursion theory is well understood for almost a century and provides a firm basis of seminal results [74]. Inductively learning a recursive function means, in some sense, mining the function's graph which is presented in

A few notions and notations are inevitable. IN is the set of natural numbers. P<sup>n</sup> denotes the class of n-ary partial recursive functions mapping from IN<sup>n</sup> into IN. Rn ⊂ P<sup>n</sup> is the subclass of all total recursive functions. Assume any ordering of IN written in the form X = {x0,x1,x2,…}.

provides growing but incomplete information about f. With respect to the ordering X, the amount of information up to the timepoint n is encoded in fX[n] = ((x0,f(x0)),…,(xn,f(xn))). If X0

Figure 4. Abstractions of fundamental inductive learning concepts compared and related; ascending lines mean the

proper set inclusion of the lower learning concept in the upper one.

, the sequence of observations (x0,f(x0)), (x1,f(x1)), (x2,f(x2)), (x3,f(x3)), …

growing chunks over time, a process very similar to mining HCI data.

can be found in textbooks such as [41, 63].

tary concepts.

For any function f ∈ R<sup>1</sup>

Involved logical reasoning may easily become confusing—not so much to a computer or to a logic program [4], but to a human being. Within the digital game case study [4, 5], the generation of a single indexed family of logical formulas has been sufficient. Identification by enumeration works well for identifying even a bit perfidious human player intentions. Business applications as in [6] are more complex and may require unforeseeable revisions of the terminology in use, i.e., the dynamic generation of spaces of hypotheses on demand [73].

The present section is aimed at a clarification of the core ideas and technicalities. For this purpose, the approach is stripped to the essentials. Recursion-theoretic inductive inference as in [17, 43, 44] is the most lucid area in which problems of inductive learning can be explicated

Figure 3. HCI data mining approach with emphasis on aspects of inductive modeling.

without any need for dealing with syntactic sugar. The underlying apparatus of mathematics can be found in textbooks such as [41, 63].

4. An inductive inference perspective at mining HCI data

56 Data Mining

some more detail throughout this chapter is identification by enumeration.

Figure 3. HCI data mining approach with emphasis on aspects of inductive modeling.

In contrast to earlier approaches that are widespread (see Figure 1, where in the CRISP-like model on the right, the "model" node is hatched, as it is missing in the original figure [33]), the authors stress the aspects illustrated by the (four groups of) darker boxes in Figure 3. First of all, data are not seen as a monolithic object within the process concept but as an emerging sequence. Second, whereas in the Fayyad process (see Figure [1] and the source [32]), the pattern concept appears from nowhere, the terminology of forming hypotheses is seen a central issue—the selection of a logic and the design of suitable spaces of hypotheses, both potentially subject to revision over time. Third, the inductive modeling procedure discussed in

Involved logical reasoning may easily become confusing—not so much to a computer or to a logic program [4], but to a human being. Within the digital game case study [4, 5], the generation of a single indexed family of logical formulas has been sufficient. Identification by enumeration works well for identifying even a bit perfidious human player intentions. Business applications as in [6] are more complex and may require unforeseeable revisions of the terminology in use, i.e., the dynamic generation of spaces of hypotheses on demand [73].

The present section is aimed at a clarification of the core ideas and technicalities. For this purpose, the approach is stripped to the essentials. Recursion-theoretic inductive inference as in [17, 43, 44] is the most lucid area in which problems of inductive learning can be explicated In Figure 4, the darker boxes with white inscriptions denote conventional concepts of recursion-theoretic inductive inference [44]. The other boxes reflect formalizations of this chapter's core approaches to HCI data mining by means of identification by enumeration. The concepts derived from the present chapter's practical investigations form a previously unknown infinite hierarchy between the previously known concepts NUM and TOTAL.

Throughout the remaining part of this section, the authors confine themselves to only elementary concepts.

Learning logical theories is very much like learning recursive functions. Both have finite descriptions but determine a usually infinite amount of facts—the theorems of a theory and the values of a function, respectively. In both cases, the sets of facts are recursively enumerable but usually undecidable. The deep interplay of logic and recursion theory is well understood for almost a century and provides a firm basis of seminal results [74]. Inductively learning a recursive function means, in some sense, mining the function's graph which is presented in growing chunks over time, a process very similar to mining HCI data.

A few notions and notations are inevitable. IN is the set of natural numbers. P<sup>n</sup> denotes the class of n-ary partial recursive functions mapping from IN<sup>n</sup> into IN. Rn ⊂ P<sup>n</sup> is the subclass of all total recursive functions. Assume any ordering of IN written in the form X = {x0,x1,x2,…}. For any function f ∈ R<sup>1</sup> , the sequence of observations (x0,f(x0)), (x1,f(x1)), (x2,f(x2)), (x3,f(x3)), … provides growing but incomplete information about f. With respect to the ordering X, the amount of information up to the timepoint n is encoded in fX[n] = ((x0,f(x0)),…,(xn,f(xn))). If X0

Figure 4. Abstractions of fundamental inductive learning concepts compared and related; ascending lines mean the proper set inclusion of the lower learning concept in the upper one.

is the standard ordering 0,1,2,3,4, …, the index is dropped such that the notation is f[n]. Throughout any learning process, hypotheses are natural numbers interpreted as programs according to some Gödel numbering φ. Because any two Gödel numberings are recursively isomorphic, the choice of a numbering does not matter. Learnability is transcendental.

A class of functions C ⊆ R1 belongs to NUM\*, if and only if there exists a computable generator function γ∈P<sup>1</sup> such that for all f∈C, it holds (I) for all n∈IN that γ(f[n]) is defined, φγ(f[n]) ∈ R1

The criteria (I), (II), and (III) are practically motivated [6]. They are called operational appropriateness, conversational appropriateness, and semantic appropriateness, respectively. Usually, the change of γ(f[n]) to another γ(f[n + 1]) means an extension of terminology [6, 73]. The condition

According to [73], it holds NUM\* = TOTAL. This proves the enormous gain of learning power by means of dynamic identification by enumeration. Whereas NUM is incomparable to FIN, NUM\* is lying far above FIN; [44] provides much more information about the space between

there exists some generator function γ that is constant and identical for all functions f of C. For

every function f of C, does generate at most k different spaces of hypotheses γ(f[n]). Intuitively, γ suggest at most k times an extension of terminology for the purpose of more appropriately

infinite hierarchy as on display in Figure 4. For brevity, just two benchmarks are presented.

f(x) = g(x)) <sup>∧</sup> <sup>∀</sup>x∈IN (x > n ! f(x)=φf(n + 1)(x-n-1))}. This allows to separate NUMk+1 from NUMk

By the end of Section 3, the authors have summarized their HCI data mining approach and visualized essentials of inductive modeling in Figure 3. We take up the thread once again. The selection or the design of a terminology is essential. The terminology determines the space of hypothetical models that may be found. Throughout the process of data mining, model spaces

The world of models is overwhelmingly rich. Models may be characterized by properties, by purpose, by function, by model viability, or by model fitness [30]. As Thalheim puts it, "models

Every concrete application domain provides such an underlying theory. It is a necessary precondition to data mining to specify all the aspects of the underlying theory that should be taken into account (see [30], p. 115, for mapping, truncation, distortion, and the like). Revisions

q-like ∧ ∃n∈IN (f(n) = 0 ∧ ∀x∈IN (x > n ! f(x) > 0) ∧ ∀x∈IN (x < n !

In this chapter, the authors are going much further by introducing a family {NUMk

infinitely many refinements of NUM\*. A class C in NUM\* belongs to NUM<sup>0</sup>

expressing hypotheses throughout the process of data analysis and learning. Jantke [76] provides a detailed discussion of benchmarks to prove that {NUM<sup>k</sup>

q-like = {f | f∈R1 <sup>∧</sup> <sup>∀</sup>x∈IN (x > 0 ! f(x) > 0) <sup>∧</sup> <sup>φ</sup>f(0) = f}. Apparently, C<sup>1</sup>

5. Process models and heuristics for mining HCI data

may be subject to revision repeatedly (see preceding Section 4).

are developed within a theory" ([30], p. 117).

(II) of conversational appropriates prevents us from a Babylonian confusion.

larger than m, it holds γ(f[m]) = γ(f[n]) and (III) f ∈ Cγ(f[m]).

a positive number k, a class C in NUM\* belongs to NUM<sup>k</sup>

, and f[n] ∈ [Cγ(f[n])] and (II) there is a critical point m∈IN such that for all n∈IN

Cγ(f[n]) ⊆ R1

FIN and TOTAL.

C1

Ck+1

q-like = {f | f∈R1 ∧ ∃g∈ C<sup>k</sup>

,

59

}k∈IN of

, if and only if

}k∈IN forms an

.

.

q-like ∈ NUM<sup>1</sup> \ NUM<sup>0</sup>

, if and only if there exists a γ that, for

Mining HCI Data for Theory of Mind Induction http://dx.doi.org/10.5772/intechopen.74400

Assume any class C ⊂ R<sup>1</sup> of total recursive functions. The functions of C are uniformly learnable by an effectively computable learner L ∈ P<sup>1</sup> on the ordering of information X0, if and only if the following conditions are satisfied. For all f ∈ C and for all n ∈ IN, the learner computes some hypothesis L(f[n]). For every f ∈ C, the sequence of hypotheses converges to some c that correctly describes f, i.e., φ<sup>c</sup> = f. EX denotes the family of all function classes learnable as described. EX(L) ∈ EX is the class of all functions learnable by L. In the case arbitrary arrangements of information X are taken into account, the definition is changed by substituting fX[n] for f[n]. The class of functions learnable by L is named EXarb(L), and the family of all function classes learnable on arbitrary X is EXarb. The term EX is intended to resemble explanatory learning; this is exactly what theory of mind induction is aiming at.

The equality of EX and EXarb is folklore in inductive inference. Therefore, arbitrary orderings are ignored whenever possible without loss of generality.

Intuitively, it seems desirable that a hypothesis reflects the information it is built upon. Formally, ∀m≤n (φh(xm) = f(xm)) where h abbreviates L(fX[n]). In the simpler case of X0, every xm equals m. The property is named consistency. The families of function classes uniformly learnable consistently are CONS and CONSarb, resp., and CONSarb ⊂ CONS ⊂ EX is folklore as well. Apparently, the message is that consistency is a nontrivial property.

Consistency may be easily guaranteed, (T) if all hypotheses are in R1 or (F) if it is decidable whether or not a hypothesis is finally correct. Adding (T) or (F) to the definition of EX and EXarb, one gets learning types denoted by TOTAL, TOTALarb, FIN, and FINarb, respectively. In inductive inference, FIN = FINarb ⊂ TOTAL = TOTALarb ⊂ CONSarb is folklore as well [44].

Under the prior knowledge of FIN = FINarb, TOTAL = TOTALarb, and EX = EXarb (see [44]), all the abovementioned inclusions are on display in Figure 4.

NUM is the learning type defined by means of identification by enumeration as discussed in the previous section. A class C ⊂ R1 belongs to NUM, if and only if there exists a general recursive enumeration h with C ⊆ {φh(n)}n∈IN ⊂ R1 . A partial recursive learning device L ∈ P<sup>1</sup> learns via identification by enumeration on h, if and only if L(f[n]) = h(μm[φh(m)[n] = f[n]). Interestingly, this extremely simple concept reflects exactly the application in [4, 5].

The potential of generalizing the learning principle of identification by enumeration is practically demonstrated in [6]. Accordingly, [73] introduces the novel concept of dynamic identification by enumeration. In terms of recursion theory, this looks as follows.

For simplicity, the authors confine themselves to X0. Note that we adopt a few more notations. If h is an enumeration or, alternatively, if n is an index of the enumeration h, Ch and Cn denote the class of all functions enumerated by h. From Grieser [75], we adopt the notation [C] to denote all initial segments of functions in C, i.e., [C] = {f[n] | f ∈ C ∧ n ∈ IN}.

A class of functions C ⊆ R1 belongs to NUM\*, if and only if there exists a computable generator function γ∈P<sup>1</sup> such that for all f∈C, it holds (I) for all n∈IN that γ(f[n]) is defined, φγ(f[n]) ∈ R1 , Cγ(f[n]) ⊆ R1 , and f[n] ∈ [Cγ(f[n])] and (II) there is a critical point m∈IN such that for all n∈IN larger than m, it holds γ(f[m]) = γ(f[n]) and (III) f ∈ Cγ(f[m]).

is the standard ordering 0,1,2,3,4, …, the index is dropped such that the notation is f[n]. Throughout any learning process, hypotheses are natural numbers interpreted as programs according to some Gödel numbering φ. Because any two Gödel numberings are recursively

Assume any class C ⊂ R<sup>1</sup> of total recursive functions. The functions of C are uniformly learnable by an effectively computable learner L ∈ P<sup>1</sup> on the ordering of information X0, if and only if the following conditions are satisfied. For all f ∈ C and for all n ∈ IN, the learner computes some hypothesis L(f[n]). For every f ∈ C, the sequence of hypotheses converges to some c that correctly describes f, i.e., φ<sup>c</sup> = f. EX denotes the family of all function classes learnable as described. EX(L) ∈ EX is the class of all functions learnable by L. In the case arbitrary arrangements of information X are taken into account, the definition is changed by substituting fX[n] for f[n]. The class of functions learnable by L is named EXarb(L), and the family of all function classes learnable on arbitrary X is EXarb. The term EX is intended to resemble explanatory learning; this is exactly what theory of mind induction is aiming at.

The equality of EX and EXarb is folklore in inductive inference. Therefore, arbitrary orderings

Intuitively, it seems desirable that a hypothesis reflects the information it is built upon. Formally, ∀m≤n (φh(xm) = f(xm)) where h abbreviates L(fX[n]). In the simpler case of X0, every xm equals m. The property is named consistency. The families of function classes uniformly learnable consistently are CONS and CONSarb, resp., and CONSarb ⊂ CONS ⊂ EX is folklore as well.

Consistency may be easily guaranteed, (T) if all hypotheses are in R1 or (F) if it is decidable whether or not a hypothesis is finally correct. Adding (T) or (F) to the definition of EX and EXarb, one gets learning types denoted by TOTAL, TOTALarb, FIN, and FINarb, respectively. In inductive inference, FIN = FINarb ⊂ TOTAL = TOTALarb ⊂ CONSarb is folklore as well [44].

Under the prior knowledge of FIN = FINarb, TOTAL = TOTALarb, and EX = EXarb (see [44]), all

NUM is the learning type defined by means of identification by enumeration as discussed in the previous section. A class C ⊂ R1 belongs to NUM, if and only if there exists a general

learns via identification by enumeration on h, if and only if L(f[n]) = h(μm[φh(m)[n] = f[n]).

The potential of generalizing the learning principle of identification by enumeration is practically demonstrated in [6]. Accordingly, [73] introduces the novel concept of dynamic identifi-

For simplicity, the authors confine themselves to X0. Note that we adopt a few more notations. If h is an enumeration or, alternatively, if n is an index of the enumeration h, Ch and Cn denote the class of all functions enumerated by h. From Grieser [75], we adopt the notation [C] to

Interestingly, this extremely simple concept reflects exactly the application in [4, 5].

cation by enumeration. In terms of recursion theory, this looks as follows.

denote all initial segments of functions in C, i.e., [C] = {f[n] | f ∈ C ∧ n ∈ IN}.

. A partial recursive learning device L ∈ P<sup>1</sup>

are ignored whenever possible without loss of generality.

58 Data Mining

the abovementioned inclusions are on display in Figure 4.

recursive enumeration h with C ⊆ {φh(n)}n∈IN ⊂ R1

Apparently, the message is that consistency is a nontrivial property.

isomorphic, the choice of a numbering does not matter. Learnability is transcendental.

The criteria (I), (II), and (III) are practically motivated [6]. They are called operational appropriateness, conversational appropriateness, and semantic appropriateness, respectively. Usually, the change of γ(f[n]) to another γ(f[n + 1]) means an extension of terminology [6, 73]. The condition (II) of conversational appropriates prevents us from a Babylonian confusion.

According to [73], it holds NUM\* = TOTAL. This proves the enormous gain of learning power by means of dynamic identification by enumeration. Whereas NUM is incomparable to FIN, NUM\* is lying far above FIN; [44] provides much more information about the space between FIN and TOTAL.

In this chapter, the authors are going much further by introducing a family {NUMk }k∈IN of infinitely many refinements of NUM\*. A class C in NUM\* belongs to NUM<sup>0</sup> , if and only if there exists some generator function γ that is constant and identical for all functions f of C. For a positive number k, a class C in NUM\* belongs to NUM<sup>k</sup> , if and only if there exists a γ that, for every function f of C, does generate at most k different spaces of hypotheses γ(f[n]). Intuitively, γ suggest at most k times an extension of terminology for the purpose of more appropriately expressing hypotheses throughout the process of data analysis and learning.

Jantke [76] provides a detailed discussion of benchmarks to prove that {NUM<sup>k</sup> }k∈IN forms an infinite hierarchy as on display in Figure 4. For brevity, just two benchmarks are presented. C1 q-like = {f | f∈R1 <sup>∧</sup> <sup>∀</sup>x∈IN (x > 0 ! f(x) > 0) <sup>∧</sup> <sup>φ</sup>f(0) = f}. Apparently, C<sup>1</sup> q-like ∈ NUM<sup>1</sup> \ NUM<sup>0</sup> . Ck+1 q-like = {f | f∈R1 ∧ ∃g∈ C<sup>k</sup> q-like ∧ ∃n∈IN (f(n) = 0 ∧ ∀x∈IN (x > n ! f(x) > 0) ∧ ∀x∈IN (x < n ! f(x) = g(x)) <sup>∧</sup> <sup>∀</sup>x∈IN (x > n ! f(x)=φf(n + 1)(x-n-1))}. This allows to separate NUMk+1 from NUMk .
