**Algorithm Selection: From Meta-Learning to Hyper-Heuristics**

Laura Cruz-Reyes1, Claudia Gómez-Santillán1, Joaquín Pérez-Ortega2, Vanesa Landero3, Marcela Quiroz1 and Alberto Ochoa4 *1Instituto Tecnológico de Cd. Madero 2Centro Nacional de Investigación y Desarrollo Tecnológico 3Universidad Politécnica de Nuevo León 4Universidad de Ciudad Juárez México* 

#### **1. Introduction**

76 Intelligent Systems

Matsusaka, Y.; Tojo, T.; Kuota, S.; Furukawa, K. Tamiya, D.; Hayata, K.; Nakano, Y. &

Nadamoto, A. & Tanaka, K. (2004). Passive viewing of Web contents based on automatic

Nonaka, I. & Takeuchi, H. (1995). *The Knowledge-Creating Company*, Oxford University Press,

Waizenbaum, J. (1966). ELIZA—A Computer Program For the Study of Natural Language

Yamamoto, E.; Kanzaki, K. & Isahara, H. (2005). Extraction of hierarchies based on inclusion

Yamamoto, E. & Isahara, H. (2008). Efficient Knowledge Transfer by Hearing a Conversation

Budapest, Hungary, September, 1999

2004-DBS-134(1), pp.183-190, Japan

94-2, Edinburgh, Scotland, UK, August, 2005

ISBN 978-019-5092-69-1, USA

No.1, pp.36-45, NY, USA

*Multimedia*, pp.231-238

Kobayashi, T. (1999). Multi-person Conversation via Multimodal Interface –A Robot who Communicate with Multi-user–, *Proceedings of 6th European Conference on Speech Communication Technology* (*EUROSPEECH-99*), Vol.4, pp.1723-1726,

generation of conversational sentences *Japanese Society of Information Processing*,

Communication Between Man And Machine, *Communications of the ACM*, Vol.9,

of co-occurring words with frequency information, *Proceedings of the 19th International Joint Conference on Artificial Intelligence*, pp.1166-1172, ISBN 0-938075-

While Doing Something, *Collection of New Directions in Intelligent Interactive* 

In order for a company to be competitive, an indispensable requirement is the efficient management of its resources. As a result derives a lot of complex optimization problems that need to be solved with high-performance computing tools. In addition, due to the complexity of these problems, it is considered that the most promising approach is the solution with approximate algorithms; highlighting the heuristic optimizers. Within this category are the basic heuristics that are experience-based techniques and the metaheuristic algorithms that are inspired by natural or artificial optimization processes.

A variety of approximate algorithms, which had shown satisfactory performance in optimization problems, had been proposed in the literature. However, there is not an algorithm that performs better for all possible situations, given the amount of available strategies, is necessary to select the one who adapts better to the problem. An important point is to know which strategy is the best for the problem and why it is better.

The chapter begins with the formal definition of the Algorithm Selection Problem (ASP), since its initial formulation. The following section describes examples of "Intelligent Systems" that use a strategy of algorithm selection. After that, we present a review of the literature related to the ASP solution. Section four presents the proposals of our research group for the ASP solution; they are based on machine learning, neural network and hyper-heuristics. Besides, the section presents experimental results in order to conclude about the advantages and disadvantages of each approach. Due to a fully automated solution to ASP is an undecidable problem, Section Five reviews other less rigid approach which combines intelligently different strategies: The Hybrid Systems of Metaheuristics.

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 79

The Figure 2 shows the dimensions of ASP and allows see a higher level of abstraction scope. There are three dimensions: 1) in the *x*-axis expresses a set of algorithms of solution *s*, *t*, *w*, *y*, *z*, 2) *z*-axis shows a set of instances of the problem *a, b, c, d*, and a new instance *e* to solve, 3) in the *y*-axis the set of values of the results of applying the algorithms to each of the instances is represented by vertical lines. As shown in figure, to solve the instance *a* and *b* the algorithms have different performances, it is noteworthy that no algorithm is superior to others in solving all instances. Moreover, as shown in figure the algorithm *s* has a different performance by solving each of the instances. Finally the problem to be solved is to

As we can see in the definition of the Algorithm Selection Problem there are three principal

a. The *selection of the set features of the problem* that might be indicative of the performance

b. The *selection of the set of algorithms* that together allow to solve the largest number of

c. The *selection of an efficient mapping mechanism* that permits to select the best algorithm to

Some studies have been focused in construct a suitable set of features that adequately measure the relative difficulty of the instances of the problem (Smith-Miles et al., 2009; Messelis et al., 2009; Madani et al., 2009; Quiroz, 2009; Smith-Miles & Lopes, 2012). Generally there are two main approaches used to characterize the instances: the first is to

Fig. 1. The Algorithm Selection Problem (ASP)

select for the new instance *e* the algorithm that will solve better.

Fig. 2. Dimensions of algorithm selection problem

maximize the performance measure.

of the algorithms.

aspects that must be tackled in order to solve the problem:

instances of the problem with the highest performance.

#### **2. The Algorithm Selection Problem (ASP)**

Many optimization problems can be solved by multiple algorithms, with different performance for different problem characteristics. Although some algorithms are better than others on average, there is not a best algorithm for all the possible instances of a given problem. This phenomenon is most pronounced among algorithms for solving NP-Hard problems, because runtimes for these algorithms are often highly variable from instance to instance of a problem (Leyton-Brown et al., 2003). In fact, it has long been recognized that there is no single algorithm or system that will achieve the best performance in all cases (Wolpert & Macready, 1997). Instead we are likely to attain better results, on average, across many different classes of a problem, if we tailor the selection of an algorithm to the characteristics of the problem instance (Smith-Miles et al., 2009). To address this concern, in the last decades researches has developed technology to automatically choose an appropriate optimization algorithm to solve a given instance of a problem, in order to obtain the best performance.

Recent work has focused on creating algorithm portfolios, which contain a selection of state of the art algorithms. To solve a particular problem with this portfolio, a pre-processing step is run where the suitability of each algorithm in the portfolio for the problem at hand is assessed. This step often involves some kind of machine learning, as the actual performance of each algorithm on the given, unseen problem is unknown (Kotthoff et al., 2011).

The Algorithm Selection Problem (ASP) was first described by John R. Rice in 1976 (Rice, 1976) he defined this problem as: learning a mapping from feature space to algorithm performance space, and acknowledged the importance of selecting the right features to characterize the hardness of problem instances (Smith-Miles & Lopes, 2012). This definition includes tree important characteristics (Rice, 1976):


Rice proposed a basic model for this problem, which seeks to predict which algorithm from a subset of the algorithm space is likely to perform best based on measurable features of a collection of the problem space: Given a problem subset of the problem space *P*, a subset of the algorithm space *A*, a mapping from *P* to *A* and the performance space *Y*. The Algorithm Selection Problem can be formally defined as: for a particular problem instance *p* ∈ *P*, find the selection mapping *S*(*p*) into the algorithm space *A*, such that the selected algorithm *a* ∈ *A* maximizes the performance measure *y* for *y*(*a,p*) ∈ *Y*. This basic abstract model is illustrated in Figure 1 (Rice, 1976; Smith-Miles & Lopes, 2012).

Many optimization problems can be solved by multiple algorithms, with different performance for different problem characteristics. Although some algorithms are better than others on average, there is not a best algorithm for all the possible instances of a given problem. This phenomenon is most pronounced among algorithms for solving NP-Hard problems, because runtimes for these algorithms are often highly variable from instance to instance of a problem (Leyton-Brown et al., 2003). In fact, it has long been recognized that there is no single algorithm or system that will achieve the best performance in all cases (Wolpert & Macready, 1997). Instead we are likely to attain better results, on average, across many different classes of a problem, if we tailor the selection of an algorithm to the characteristics of the problem instance (Smith-Miles et al., 2009). To address this concern, in the last decades researches has developed technology to automatically choose an appropriate optimization algorithm to solve a given instance of a problem, in order to obtain

Recent work has focused on creating algorithm portfolios, which contain a selection of state of the art algorithms. To solve a particular problem with this portfolio, a pre-processing step is run where the suitability of each algorithm in the portfolio for the problem at hand is assessed. This step often involves some kind of machine learning, as the actual performance

The Algorithm Selection Problem (ASP) was first described by John R. Rice in 1976 (Rice, 1976) he defined this problem as: learning a mapping from feature space to algorithm performance space, and acknowledged the importance of selecting the right features to characterize the hardness of problem instances (Smith-Miles & Lopes, 2012). This definition

a. *Problem Space*: The set of all possible instance of the problem. There are a big number of independent characteristics that describe the different instances which are important for the algorithm selection and performance. Some of these characteristics and their

b. *Algorithm Space*: The set of all possible algorithms that can be used to solve the problem. The dimension of this set could be unimaginable, and the influence of the algorithm

c. *Performance Measure*: The criteria used to measure the performance of a particular algorithm for a particular problem and see how difficult to solve (hard) is the instance. There is considerable uncertainly in the use and interpretation of these measures (e. g.

Rice proposed a basic model for this problem, which seeks to predict which algorithm from a subset of the algorithm space is likely to perform best based on measurable features of a collection of the problem space: Given a problem subset of the problem space *P*, a subset of the algorithm space *A*, a mapping from *P* to *A* and the performance space *Y*. The Algorithm Selection Problem can be formally defined as: for a particular problem instance *p* ∈ *P*, find the selection mapping *S*(*p*) into the algorithm space *A*, such that the selected algorithm *a* ∈ *A* maximizes the performance measure *y* for *y*(*a,p*) ∈ *Y*. This basic abstract model is

of each algorithm on the given, unseen problem is unknown (Kotthoff et al., 2011).

**2. The Algorithm Selection Problem (ASP)** 

includes tree important characteristics (Rice, 1976):

characteristics is uncertain.

influences on algorithm performance are usually unknown.

some prefer fast execution, others effectiveness, others simplicity).

illustrated in Figure 1 (Rice, 1976; Smith-Miles & Lopes, 2012).

the best performance.

The Figure 2 shows the dimensions of ASP and allows see a higher level of abstraction scope. There are three dimensions: 1) in the *x*-axis expresses a set of algorithms of solution *s*, *t*, *w*, *y*, *z*, 2) *z*-axis shows a set of instances of the problem *a, b, c, d*, and a new instance *e* to solve, 3) in the *y*-axis the set of values of the results of applying the algorithms to each of the instances is represented by vertical lines. As shown in figure, to solve the instance *a* and *b* the algorithms have different performances, it is noteworthy that no algorithm is superior to others in solving all instances. Moreover, as shown in figure the algorithm *s* has a different performance by solving each of the instances. Finally the problem to be solved is to select for the new instance *e* the algorithm that will solve better.

Fig. 2. Dimensions of algorithm selection problem

As we can see in the definition of the Algorithm Selection Problem there are three principal aspects that must be tackled in order to solve the problem:


Some studies have been focused in construct a suitable set of features that adequately measure the relative difficulty of the instances of the problem (Smith-Miles et al., 2009; Messelis et al., 2009; Madani et al., 2009; Quiroz, 2009; Smith-Miles & Lopes, 2012). Generally there are two main approaches used to characterize the instances: the first is to

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 81

In (Nascimento et al., 2009) the authors investigate the performance of clustering algorithms on gene expression data, by extracting rules that relate the characteristics of the data sets of gene expression to the performance achieved by the algorithms. This represents a first attempt to solve the problem of choosing the best cluster algorithm with independence of gene expression data. In general, the choice of algorithms is basically driven by the familiarity of biological experts to the algorithm, rather than the characteristics of the algorithms themselves and of the data. In particular, the bioinformatics community has not reached consensus on which method should be preferably used. This work is directly derived from the Meta-Learning framework, originally proposed to support algorithm selection for classification and regression problems. However, Meta-Learning has been extended to other domains of application, e.g. to select algorithms for time series forecasting, to support the design of planning systems, to analyze the performance of metaheuristics for optimization problems. Meta-Learning can be defined by considering four aspects: (a) the problem space, P, (b) the meta-feature space, F, (c) the algorithm space, A and (d) a performance metric, Y. As final remark, authors demonstrated that the rule-based ensemble classifier presented the most accuracy rates in predicting the best clustering algorithms for gene expression data sets. Besides, the set of extracted rules for the selection of clustering algorithms, using an inductive decision tree algorithm, gave some interesting

In recent years, many studies have focused on developing feasible mechanisms to select appropriate services from service systems in order to improve performance and efficiency. However, these traditional methods do not provide effective guidance to users and, with regard to ubiquitous computing, the services need to be context-aware. In consequence, the work achieved by (Cai et al., 2009) proposed a novel service selection algorithm based on Artificial Neural Network (ANN) for ubiquitous computing environment. This method can exactly choose a most appropriate service from many service providers, due to the earlier information of the cooperation between the devices. Among the elements that exist in the definition of a service, *Z* represents the evaluation value of respective service providers' service quality, and its value is calculated with a function that involves the time and the conditions of current context environment, e.g. user context, computing context, physical

Among the advantages of using ANN to solve the service selection problem, is that, the method can easily adapt the evaluation process to the varying context information, and hence, it can provide effective guidance so that lots of invalid selecting processes can be avoided. The neural network selected was Back Propagation (BP) because is the most commonly used; however, this algorithm was improved with a three-term approach: learning rate, momentum factor and proportional factor. The efficiency of such algorithm was obtained because adding the proportional factor enhanced the convergence speed and stability. In conclusion, the authors claim, that this novel service selection outperforms the

**3.1 Bioinformatics** 

guidelines for choosing the right method.

traditional service selection scheme.

context, with a division into static and dynamical information.

**3.2 WEB services** 

identify problem dependent features based on domain knowledge of what makes a particular instance challenging or easy to solve; the second is a more general set of features derived from landscape analysis (Schiavinotto & Stützle, 2007; Czogalla & Fink, 2009). To define the set of features that describe the characteristics of the instances is a difficult task that requires expert domain knowledge of the problem. The indices of characterization should be carefully chosen, so as to permit a correct discrimination of the difficulty of the instances to explain the algorithms performance. There is little that will be learned via a knowledge discovery process if the features selected to characterize the instances do not have any differentiation power (Smith-Miles et al., 2009).

On the other hand, portfolio creation and algorithm selection has received a lot of attention in areas that deal with solving computationally hard problems (Leyton-Brown et al., 2003; O'Mahony et al., 2008). The current state of the art is such that often there are many algorithms and systems for solving the same kind of problem; each with its own performance on a particular problem. Machine learning is an established method of addressing ASP (Lobjois & Lemâitre, 1998; Fink, 1998). Given the performance of each algorithm on a set of training problems, we try to predict the performance on unseen problems (Kotthoff et al., 2011). There have been many studies in the area of algorithm performance prediction, which is strongly related to algorithm selection in the sense that supervised learning or regression models are used to predict the performance ranking of a set of algorithms, given a set of features of the instances (Smith-Miles & Lopes, 2012).

In the selection of the efficient mapping mechanism a challenging research goal is to design a run-time system that can repeatedly execute a program, learning over time to make decisions that maximize the performance measure. Since the right decisions may depend on the problem size and parameters, the machine characteristics and load, the data distribution, and other uncertain factors, this can be quite challenging. Some works treats algorithms in a black-box manner: each time a single algorithm is selected and applied to the given instance then a regression analysis or machine learning techniques are used to build a predictive model of the performance of the algorithms given the features of the instances (Lobjois & Lemâitre, 1998; Fink, 1998; Leyton-Brown et al., 2003; Ali & Smith, 2006). Other works focus on dynamic selection of algorithm components while the instance is being solved. In that sense, each instance is solved by a mixture of algorithms formed dynamically at run-time (Lagoudakis & Littman, 2000; Samulowitz & Memisevic, 2007; Streeter et al., 2007). The use of efficient mapping mechanism in intelligent systems is described in the next section.

#### **3. Applications of algorithm selection to real world and theorists problems**

The principles applied to ASP can be used in a wide range of applications in the real world and theoretical. Generally an application that solves a real problem is an extended version of parameters and constraints in another application that solves a theoretical problem. The nature of the algorithm selection problem is dynamic because it must incorporate new knowledge periodically, in order to preserve the efficacy of selection strategies. This section describes some applications to real-world complex problems, such as knowledge discovery and data mining, bioinformatics and Web services. It also describes some applications to solve complex theoretical problems; some examples are NP-hard problems, also called combinatorial optimization problems.

#### **3.1 Bioinformatics**

80 Intelligent Systems

identify problem dependent features based on domain knowledge of what makes a particular instance challenging or easy to solve; the second is a more general set of features derived from landscape analysis (Schiavinotto & Stützle, 2007; Czogalla & Fink, 2009). To define the set of features that describe the characteristics of the instances is a difficult task that requires expert domain knowledge of the problem. The indices of characterization should be carefully chosen, so as to permit a correct discrimination of the difficulty of the instances to explain the algorithms performance. There is little that will be learned via a knowledge discovery process if the features selected to characterize the instances do not

On the other hand, portfolio creation and algorithm selection has received a lot of attention in areas that deal with solving computationally hard problems (Leyton-Brown et al., 2003; O'Mahony et al., 2008). The current state of the art is such that often there are many algorithms and systems for solving the same kind of problem; each with its own performance on a particular problem. Machine learning is an established method of addressing ASP (Lobjois & Lemâitre, 1998; Fink, 1998). Given the performance of each algorithm on a set of training problems, we try to predict the performance on unseen problems (Kotthoff et al., 2011). There have been many studies in the area of algorithm performance prediction, which is strongly related to algorithm selection in the sense that supervised learning or regression models are used to predict the performance ranking of a

set of algorithms, given a set of features of the instances (Smith-Miles & Lopes, 2012).

In the selection of the efficient mapping mechanism a challenging research goal is to design a run-time system that can repeatedly execute a program, learning over time to make decisions that maximize the performance measure. Since the right decisions may depend on the problem size and parameters, the machine characteristics and load, the data distribution, and other uncertain factors, this can be quite challenging. Some works treats algorithms in a black-box manner: each time a single algorithm is selected and applied to the given instance then a regression analysis or machine learning techniques are used to build a predictive model of the performance of the algorithms given the features of the instances (Lobjois & Lemâitre, 1998; Fink, 1998; Leyton-Brown et al., 2003; Ali & Smith, 2006). Other works focus on dynamic selection of algorithm components while the instance is being solved. In that sense, each instance is solved by a mixture of algorithms formed dynamically at run-time (Lagoudakis & Littman, 2000; Samulowitz & Memisevic, 2007; Streeter et al., 2007). The use of efficient mapping mechanism in intelligent systems is described in the next section.

**3. Applications of algorithm selection to real world and theorists problems**  The principles applied to ASP can be used in a wide range of applications in the real world and theoretical. Generally an application that solves a real problem is an extended version of parameters and constraints in another application that solves a theoretical problem. The nature of the algorithm selection problem is dynamic because it must incorporate new knowledge periodically, in order to preserve the efficacy of selection strategies. This section describes some applications to real-world complex problems, such as knowledge discovery and data mining, bioinformatics and Web services. It also describes some applications to solve complex theoretical problems; some examples are NP-hard problems, also called

have any differentiation power (Smith-Miles et al., 2009).

combinatorial optimization problems.

In (Nascimento et al., 2009) the authors investigate the performance of clustering algorithms on gene expression data, by extracting rules that relate the characteristics of the data sets of gene expression to the performance achieved by the algorithms. This represents a first attempt to solve the problem of choosing the best cluster algorithm with independence of gene expression data. In general, the choice of algorithms is basically driven by the familiarity of biological experts to the algorithm, rather than the characteristics of the algorithms themselves and of the data. In particular, the bioinformatics community has not reached consensus on which method should be preferably used. This work is directly derived from the Meta-Learning framework, originally proposed to support algorithm selection for classification and regression problems. However, Meta-Learning has been extended to other domains of application, e.g. to select algorithms for time series forecasting, to support the design of planning systems, to analyze the performance of metaheuristics for optimization problems. Meta-Learning can be defined by considering four aspects: (a) the problem space, P, (b) the meta-feature space, F, (c) the algorithm space, A and (d) a performance metric, Y. As final remark, authors demonstrated that the rule-based ensemble classifier presented the most accuracy rates in predicting the best clustering algorithms for gene expression data sets. Besides, the set of extracted rules for the selection of clustering algorithms, using an inductive decision tree algorithm, gave some interesting guidelines for choosing the right method.

#### **3.2 WEB services**

In recent years, many studies have focused on developing feasible mechanisms to select appropriate services from service systems in order to improve performance and efficiency. However, these traditional methods do not provide effective guidance to users and, with regard to ubiquitous computing, the services need to be context-aware. In consequence, the work achieved by (Cai et al., 2009) proposed a novel service selection algorithm based on Artificial Neural Network (ANN) for ubiquitous computing environment. This method can exactly choose a most appropriate service from many service providers, due to the earlier information of the cooperation between the devices. Among the elements that exist in the definition of a service, *Z* represents the evaluation value of respective service providers' service quality, and its value is calculated with a function that involves the time and the conditions of current context environment, e.g. user context, computing context, physical context, with a division into static and dynamical information.

Among the advantages of using ANN to solve the service selection problem, is that, the method can easily adapt the evaluation process to the varying context information, and hence, it can provide effective guidance so that lots of invalid selecting processes can be avoided. The neural network selected was Back Propagation (BP) because is the most commonly used; however, this algorithm was improved with a three-term approach: learning rate, momentum factor and proportional factor. The efficiency of such algorithm was obtained because adding the proportional factor enhanced the convergence speed and stability. In conclusion, the authors claim, that this novel service selection outperforms the traditional service selection scheme.

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 83

In (Kadioglu et al., 2011) the main idea is taken from an algorithm selector called Boolean Satisfiability (SAT) based on nearest neighbor classifier. On one hand, authors presented two extensions to it; one of them is based on the concept of distance-based weighting, where they assign larger weights to instances that are closer to the test instance. The second extension, is based on clustering-based adaptive neighborhood size, where authors adapt the size of the neighborhood based on the properties of the given test instance. These two extensions show moderate but consistent performance improvements to the algorithm selection using Nearest-Neighbor Classification (Malitsky et al., 2011). On the other hand, authors developed a new hybrid portfolio that combines algorithm selection and algorithm scheduling, in static and dynamic ways. For static schedules the problem can be formulated as an integer program, more precisely, as a resource constrained set covering problem, where the goal is to select a number of solver-runtime pairs that together "cover" (i.e., solve) as many training instances as possible. Regarding dynamic schedules, the column generation approach works fast enough when yielding potentially sub-optimal but usually high quality solutions. This allows us to embed the idea of dynamic schedules in the previously developed nearest-neighbor approach, which selects optimal neighborhood sizes by random sub-sampling validation. With SAT as the testbed, experimentation demonstrated that author's approach can handle highly diverse benchmarks, in particular a mix of random, crafted, and industrial SAT instances, even when deliberately removed entire families of instances from the training set. As a conclusion, authors presented a heuristic method for computing solver schedules efficiently, which O'Mahony (O'Mahony et al., 2008) identified as an open problem. In addition, they showed that a completely new way of solver scheduling consisting of a combination of static schedules and solver selection

is able to achieve significantly better results than plain algorithm selection.

order to improve the performance of the selection models.

In (Kanda et al., 2011), the work is focused in the selection of optimization algorithms for solving TSP instances; this paper proposes a meta-learning approach to recommend optimization algorithms for new TSP instances. Each instance is described by meta-features of the TSP that influences the efficiency of the optimization algorithms. When more than one algorithm reaches the best solution, the multi-label classification problem is addressed applying three steps: 1) decomposition of multi-label instances into several single-label instances, 2) elimination of multi-label instances, and 3) binary representation, in order to transform multi-label instances into several binary classification problems. Features were represented as a graph. The success of this meta-learning approach depended on the correct identification of the meta-features that best relate the main aspects of a problem to the performances of the used algorithms. Finally the authors claimed that it is necessary to define and expand the set of metafeatures, which are important to characterize datasets in

In (Xu et al., 2009) is described an algorithm for constructing per-instance algorithm portfolios for SAT. It has been widely observed that there is no single "dominant" SAT solver; instead, different solvers perform best on different instances. SATzilla is an

**3.5 Scheduling problem** 

**3.6 Traveling salesman problem** 

**3.7 Satisfiability problem** 

#### **3.3 Learning systems**

In (Bradzil et al., 2003) is described a meta-learning method to support selection of candidate learning algorithms. Bradzil et al. use the Instance-Based Learning (IBL) approach because IBL has the advantage that the system is extensible; once a new experimental result becomes available, it can be easily integrated into the existing results without the need to reinitiate complex re-learning. In this work a k-Nearest Neighbor (k-NN) algorithm to identify the datasets that are most similar to the one is used. The distance between datasets is assessed using a relatively small set of data characteristics, which was selected to represent properties that affect algorithm performance; it is used to generate a recommendation to the user in the form of a ranking. The prediction, is constructed by aggregating performance information for the given candidate algorithms on the selected datasets. They use a ranking method based on the relative performance between pairs of algorithms. This work shown how can be exploited meta-learning to pre-select and recommend one or more classification algorithms to the user. They claimed that choosing adequate methods in a multistrategy learning system might significantly improve its overall performance. Also it was shown that meta-learning with k-NN improves the quality of rankings methods in general.

#### **3.4 Knowledge discovery and data mining**

In (Hilario & Kaousis, 2000) is addressed the model selection problem in knowledge discovery systems, defined as the problem of selecting the most appropriate learning model or algorithm for a given application task. In this work they propose framework for characterizing learning algorithms for classification as well as their underlying models, using learning algorithm profiles. These profiles consist of metalevel feature-value vectors, which describe learning algorithms from the point of view of their representation and functionality, efficiency, resilience, and practicality. Values for these features are assigned on the basis of author specifications, expert consensus or previous empirical studies. Authors review past evaluations of the better known learning algorithms and suggest an experimental strategy for building algorithm profiles on more quantitative grounds. The scope of this paper is limited to learning algorithms for classification tasks, but it can be applied to learning models for other tasks such as regression or association.

In (Kalousis & Theoharis, 1999) is presented an Intelligent Assistant called NOEMON, which by inducing helpful suggestion from background information can reduce the effort in classifier selection task. For each registered classifier, NOEMON measures its performance in order to collect datasets for constituting a morphologic space. For suggest the most appropriate classifier, NOEMON decides on the basis of morphological similarity between the new dataset and the existing collection. Rules are induced from those measurements and accommodated in a knowledge database. Finally, the suggestions on the most appropriate classifier for a dataset are based on those rules. The purpose of NOEMON is to supply the expert with suggestions based on its knowledge on the performance of the models and algorithms for related problems. This knowledge is being accumulated in a knowledge base end is updated as new problems as are being processed.

#### **3.5 Scheduling problem**

82 Intelligent Systems

In (Bradzil et al., 2003) is described a meta-learning method to support selection of candidate learning algorithms. Bradzil et al. use the Instance-Based Learning (IBL) approach because IBL has the advantage that the system is extensible; once a new experimental result becomes available, it can be easily integrated into the existing results without the need to reinitiate complex re-learning. In this work a k-Nearest Neighbor (k-NN) algorithm to identify the datasets that are most similar to the one is used. The distance between datasets is assessed using a relatively small set of data characteristics, which was selected to represent properties that affect algorithm performance; it is used to generate a recommendation to the user in the form of a ranking. The prediction, is constructed by aggregating performance information for the given candidate algorithms on the selected datasets. They use a ranking method based on the relative performance between pairs of algorithms. This work shown how can be exploited meta-learning to pre-select and recommend one or more classification algorithms to the user. They claimed that choosing adequate methods in a multistrategy learning system might significantly improve its overall performance. Also it was shown that meta-learning with k-NN improves the quality of

In (Hilario & Kaousis, 2000) is addressed the model selection problem in knowledge discovery systems, defined as the problem of selecting the most appropriate learning model or algorithm for a given application task. In this work they propose framework for characterizing learning algorithms for classification as well as their underlying models, using learning algorithm profiles. These profiles consist of metalevel feature-value vectors, which describe learning algorithms from the point of view of their representation and functionality, efficiency, resilience, and practicality. Values for these features are assigned on the basis of author specifications, expert consensus or previous empirical studies. Authors review past evaluations of the better known learning algorithms and suggest an experimental strategy for building algorithm profiles on more quantitative grounds. The scope of this paper is limited to learning algorithms for classification tasks, but it can be

In (Kalousis & Theoharis, 1999) is presented an Intelligent Assistant called NOEMON, which by inducing helpful suggestion from background information can reduce the effort in classifier selection task. For each registered classifier, NOEMON measures its performance in order to collect datasets for constituting a morphologic space. For suggest the most appropriate classifier, NOEMON decides on the basis of morphological similarity between the new dataset and the existing collection. Rules are induced from those measurements and accommodated in a knowledge database. Finally, the suggestions on the most appropriate classifier for a dataset are based on those rules. The purpose of NOEMON is to supply the expert with suggestions based on its knowledge on the performance of the models and algorithms for related problems. This knowledge is being accumulated in a knowledge base end is updated as new problems as are being

applied to learning models for other tasks such as regression or association.

**3.3 Learning systems** 

rankings methods in general.

processed.

**3.4 Knowledge discovery and data mining** 

In (Kadioglu et al., 2011) the main idea is taken from an algorithm selector called Boolean Satisfiability (SAT) based on nearest neighbor classifier. On one hand, authors presented two extensions to it; one of them is based on the concept of distance-based weighting, where they assign larger weights to instances that are closer to the test instance. The second extension, is based on clustering-based adaptive neighborhood size, where authors adapt the size of the neighborhood based on the properties of the given test instance. These two extensions show moderate but consistent performance improvements to the algorithm selection using Nearest-Neighbor Classification (Malitsky et al., 2011). On the other hand, authors developed a new hybrid portfolio that combines algorithm selection and algorithm scheduling, in static and dynamic ways. For static schedules the problem can be formulated as an integer program, more precisely, as a resource constrained set covering problem, where the goal is to select a number of solver-runtime pairs that together "cover" (i.e., solve) as many training instances as possible. Regarding dynamic schedules, the column generation approach works fast enough when yielding potentially sub-optimal but usually high quality solutions. This allows us to embed the idea of dynamic schedules in the previously developed nearest-neighbor approach, which selects optimal neighborhood sizes by random sub-sampling validation. With SAT as the testbed, experimentation demonstrated that author's approach can handle highly diverse benchmarks, in particular a mix of random, crafted, and industrial SAT instances, even when deliberately removed entire families of instances from the training set. As a conclusion, authors presented a heuristic method for computing solver schedules efficiently, which O'Mahony (O'Mahony et al., 2008) identified as an open problem. In addition, they showed that a completely new way of solver scheduling consisting of a combination of static schedules and solver selection is able to achieve significantly better results than plain algorithm selection.

#### **3.6 Traveling salesman problem**

In (Kanda et al., 2011), the work is focused in the selection of optimization algorithms for solving TSP instances; this paper proposes a meta-learning approach to recommend optimization algorithms for new TSP instances. Each instance is described by meta-features of the TSP that influences the efficiency of the optimization algorithms. When more than one algorithm reaches the best solution, the multi-label classification problem is addressed applying three steps: 1) decomposition of multi-label instances into several single-label instances, 2) elimination of multi-label instances, and 3) binary representation, in order to transform multi-label instances into several binary classification problems. Features were represented as a graph. The success of this meta-learning approach depended on the correct identification of the meta-features that best relate the main aspects of a problem to the performances of the used algorithms. Finally the authors claimed that it is necessary to define and expand the set of metafeatures, which are important to characterize datasets in order to improve the performance of the selection models.

#### **3.7 Satisfiability problem**

In (Xu et al., 2009) is described an algorithm for constructing per-instance algorithm portfolios for SAT. It has been widely observed that there is no single "dominant" SAT solver; instead, different solvers perform best on different instances. SATzilla is an

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 85

Gent and Walsh make an empirical study of the GSAT algorithm, it is an approximation algorithm for SAT, and they apply regression analysis to model the growth of the cost of

In (Cruz 1999), Pérez and Cruz present a statistical method to build algorithm performance models, using polynomial functions, which relate the performance with the problem size. This method first generates a representative sample of the algorithms performance, and then the performance functions are determined by regression analysis, which finally are incorporated in an algorithm selection mechanism. The polynomial functions are used to

The performance of local search algorithms Novelty and SAPS for solving instances of the SAT problem were analyzed by (Hutter 2006). The authors used linear regression with linear and quadratic basis functions to build prediction models. Firstly, they built a prediction model, using problem features and algorithm performance, to predict the algorithm run time. Secondly, they build another prediction model, using problem features, algorithm parameter settings and algorithm performance. This model is used to automatically adjust the algorithm's parameters on a per instance basis in order to optimize

Frost finds that the performance of the algorithms to solve CSP instances can be approximated by two standard families of functions of continuous probability distribution (Frost 1997). The resoluble instances can be modeled by the Weibull distribution and the instances that are not resoluble by the lognormal distribution. He utilizes four parameters to generate instances: number of constraints, number of prohibited value pairs per constraint, the probability of a constraint existing between any pair of variables, the probability each constraint is statistically independent of the others, and the probability that a value in the domain of one variable in a constraint will be incompatible with a value in the domain of the

Hoos and Stuzle present a similar work to Frost. They find that the performance of algorithms that solve the SAT instances can be characterized by an exponential distribution (Hoos 2000). The execution time distribution is determined by the execution of *k* times of an algorithm over a set of instances of the same family, using a high time as stop criteria and storing for each successful run the execution time required to find the solution. The empirical distribution of the execution time is the accumulated distribution associated with these observations, and it allows projecting the execution time *t* (given by the user) to the probability of finding a solution in this time. A family is a set of instances with the same

An algorithm portfolio architecture was proposed in (Silverthorn 2010). This architecture employs three core components: a portfolio of algorithms; a generative model, which is fit to data on those algorithms past performance, then used to predict their future performance; and a policy for action selection, which repeatedly chooses algorithms based on those predictions. Portfolio operation begins with offline training, in which a) training tasks are

value of the parameters that are considered critical for the performance.

**4.2 Regression analysis** 

its performance.

**4.3 Functions of probability distribution** 

other variable in the constraint.

obtaining the solution with the problem size (Gent 1997).

predict the best algorithm that satisfies the user requirements.

automated approach for constructing per-instance algorithm portfolios for SAT that use socalled empirical hardness models to choose among their constituent solvers. This approach takes as input a distribution of problem instances and a set of component solvers, and constructs a portfolio optimizing a given objective function (such as mean runtime, percent of instances solved, or score in a competition). The algorithm selection approach is based on the idea of building an approximate runtime predictor, which can be seen as a heuristic approximation to a perfect oracle. Specifically, they use machine learning techniques to build an empirical hardness model, a computationally inexpensive predictor of an algorithm's runtime on a given problem instance based on features of the instance and the algorithm's past performance. By modeling several algorithms and, at runtime, choosing the algorithm predicted to have the best performance; empirical hardness models can serve as the basis for an algorithm portfolio that solves the algorithm selection problem automatically.

#### **3.8 Vehicle routing problem**

In (Ruiz-Vanoye et al., 2008) the main contribution of this paper is to propose statistical complexity indicators applied to the Vehicle Routing Problem with Time Windows (VRPTW) instances in order that it allows to select appropriately the algorithm that better solves a VRPTW instance. In order to verify the proposed indicators, they used the discriminant analysis contained in SPSS software, such as a machine learning method to find the relation between the characteristics of the problem and the performance of algorithms (Perez et al., 2004), as well as the execution of 3 variants of the genetic algorithms and the random search algorithm. The results obtained in this work showed a good percentage of prediction taking into account that this based on statistical techniques and not on data-mining techniques. By means of the experimentation, authors conclude that it is possible to create indicators applied to VRPTW that help appropriately to predict the algorithm that better solves the instances of the VRPTW.

#### **4. Related work on automatic algorithm selection**

In this section some examples of related works of the reviewed literature are classified by Methods or methodologies utilized for establishing the relation between the problems and algorithms, and solve the algorithm selection problem. 2.1. Solution Environments, where the algorithm selection problem is boarded, are described in section 2.2.

#### **4.1 Simple statistical tests**

The most common method to compare experimentally algorithms consists in the complementary use of a set of simple well-known statistical tests: The Sign, Wilcoxon and Friedman tests, among others. The tests are based on the determination of the differences in the average performance, which is observed experimentally: if the differences among the algorithms are significant statistically, the algorithm with the best results is considered as superior (Lawler 1985). Reeves comments that a heuristic with good averaged performance, but with high dispersion, has a very high risk to show a poor or low performance in many instances (Reeves 1993). He suggests as alternative to formulate for each algorithm, a utility function adjusted to a gamma distribution, whose parameters permit to compare the heuristics on a range of risk value.

#### **4.2 Regression analysis**

84 Intelligent Systems

automated approach for constructing per-instance algorithm portfolios for SAT that use socalled empirical hardness models to choose among their constituent solvers. This approach takes as input a distribution of problem instances and a set of component solvers, and constructs a portfolio optimizing a given objective function (such as mean runtime, percent of instances solved, or score in a competition). The algorithm selection approach is based on the idea of building an approximate runtime predictor, which can be seen as a heuristic approximation to a perfect oracle. Specifically, they use machine learning techniques to build an empirical hardness model, a computationally inexpensive predictor of an algorithm's runtime on a given problem instance based on features of the instance and the algorithm's past performance. By modeling several algorithms and, at runtime, choosing the algorithm predicted to have the best performance; empirical hardness models can serve as the basis for an algorithm portfolio that solves the algorithm selection problem

In (Ruiz-Vanoye et al., 2008) the main contribution of this paper is to propose statistical complexity indicators applied to the Vehicle Routing Problem with Time Windows (VRPTW) instances in order that it allows to select appropriately the algorithm that better solves a VRPTW instance. In order to verify the proposed indicators, they used the discriminant analysis contained in SPSS software, such as a machine learning method to find the relation between the characteristics of the problem and the performance of algorithms (Perez et al., 2004), as well as the execution of 3 variants of the genetic algorithms and the random search algorithm. The results obtained in this work showed a good percentage of prediction taking into account that this based on statistical techniques and not on data-mining techniques. By means of the experimentation, authors conclude that it is possible to create indicators applied to VRPTW that help appropriately to predict the

In this section some examples of related works of the reviewed literature are classified by Methods or methodologies utilized for establishing the relation between the problems and algorithms, and solve the algorithm selection problem. 2.1. Solution Environments, where

The most common method to compare experimentally algorithms consists in the complementary use of a set of simple well-known statistical tests: The Sign, Wilcoxon and Friedman tests, among others. The tests are based on the determination of the differences in the average performance, which is observed experimentally: if the differences among the algorithms are significant statistically, the algorithm with the best results is considered as superior (Lawler 1985). Reeves comments that a heuristic with good averaged performance, but with high dispersion, has a very high risk to show a poor or low performance in many instances (Reeves 1993). He suggests as alternative to formulate for each algorithm, a utility function adjusted to a gamma distribution, whose parameters permit to compare the

automatically.

**3.8 Vehicle routing problem** 

**4.1 Simple statistical tests** 

heuristics on a range of risk value.

algorithm that better solves the instances of the VRPTW.

**4. Related work on automatic algorithm selection** 

the algorithm selection problem is boarded, are described in section 2.2.

Gent and Walsh make an empirical study of the GSAT algorithm, it is an approximation algorithm for SAT, and they apply regression analysis to model the growth of the cost of obtaining the solution with the problem size (Gent 1997).

In (Cruz 1999), Pérez and Cruz present a statistical method to build algorithm performance models, using polynomial functions, which relate the performance with the problem size. This method first generates a representative sample of the algorithms performance, and then the performance functions are determined by regression analysis, which finally are incorporated in an algorithm selection mechanism. The polynomial functions are used to predict the best algorithm that satisfies the user requirements.

The performance of local search algorithms Novelty and SAPS for solving instances of the SAT problem were analyzed by (Hutter 2006). The authors used linear regression with linear and quadratic basis functions to build prediction models. Firstly, they built a prediction model, using problem features and algorithm performance, to predict the algorithm run time. Secondly, they build another prediction model, using problem features, algorithm parameter settings and algorithm performance. This model is used to automatically adjust the algorithm's parameters on a per instance basis in order to optimize its performance.

#### **4.3 Functions of probability distribution**

Frost finds that the performance of the algorithms to solve CSP instances can be approximated by two standard families of functions of continuous probability distribution (Frost 1997). The resoluble instances can be modeled by the Weibull distribution and the instances that are not resoluble by the lognormal distribution. He utilizes four parameters to generate instances: number of constraints, number of prohibited value pairs per constraint, the probability of a constraint existing between any pair of variables, the probability each constraint is statistically independent of the others, and the probability that a value in the domain of one variable in a constraint will be incompatible with a value in the domain of the other variable in the constraint.

Hoos and Stuzle present a similar work to Frost. They find that the performance of algorithms that solve the SAT instances can be characterized by an exponential distribution (Hoos 2000). The execution time distribution is determined by the execution of *k* times of an algorithm over a set of instances of the same family, using a high time as stop criteria and storing for each successful run the execution time required to find the solution. The empirical distribution of the execution time is the accumulated distribution associated with these observations, and it allows projecting the execution time *t* (given by the user) to the probability of finding a solution in this time. A family is a set of instances with the same value of the parameters that are considered critical for the performance.

An algorithm portfolio architecture was proposed in (Silverthorn 2010). This architecture employs three core components: a portfolio of algorithms; a generative model, which is fit to data on those algorithms past performance, then used to predict their future performance; and a policy for action selection, which repeatedly chooses algorithms based on those predictions. Portfolio operation begins with offline training, in which a) training tasks are

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 87

each recursive step, and the most adequate algorithm in size is used for each of them. This

A system (PHYTHIA-II) to select the most appropriated software to solve a scientific problem is proposed in (Houstis 2002). The user introduces the problem features (operators of the equation, its domain, values of the variables, etc.) and time requirements and allowed error. The principal components of PHYTHIA-II are the statistical analysis, pattern extraction module and inference engine. The first consists in ranking the algorithms performance data by means of Friedman rank sums (Hollander 1973). The second utilizes different machine learning methods to extract performance patterns and represent them with decision and logic rules. The third is the process to correspond the features of a new problem with the produced rules; the objective is to predict the best algorithm and the most

The METAL research group proposed a method to select the most appropriate classification algorithm for a set of similar instances (Soares 2003). They used a K-nearest neighborhood algorithm to identify the group of instances from a historical registry that exhibit similar features to those of a new instance group. The algorithm performance on instances of historical registry is known and is used to predict the best algorithms for the new instance group. The similarity among instances groups is obtained considering three types of

A Bayesian approach is proposed in (Guo, 2004) to construct an algorithm selection system which is applied to the Sorting and Most Probable Explanation (MPE) problems. From a set of training instances, their features and the run time of the best algorithm that solves each instance are utilized to build the Bayesian network. Guo proposed four representative indexes from the Sorting problem features: the size of the input permutation and three presortedness measures. For the MPE problem he utilizes general features of the problem and several statistical indexes of the Bayesian network that

A methodology for instance based selection of solver's policies that solves instances of the SAT problem was proposed by (Nikolic 2009). The policies are heuristics that guide the search process. Different configurations of these policies are solution strategies. The problem structure of all instances was characterized by indices. The problem instances were grouped by the values of these indices, forming instances families. All problem instances were solved by all solution strategies. The best solution strategy for each family is selected. The k-nearest neighbor algorithm selects the solution strategy for a new input instance. The results of the performance of the algorithm ARGOSmart, that performs the proposed methodology, were

In this chapter we solve ASP with two approaches: meta-learning and hyper-heuristics. The meta-learning approach is oriented to learning about classification using machine learning methods; three methods are explored to solve an optimization problem: Discriminant Analysis (Pérez, 2004), C4.5 and the Self-Organising Neural Network. The hyper-heuristic approach is oriented to automatically produce an adequate combination of available lowlevel heuristics in order to effectively solve a given instance (Burke et al., 2010); a hyper-

problem features: general, statistical and derived from information theory.

work is extended to backtracking algorithms to SAT problem in (Lagoudakis 2001).

appropriated parameters to solve the problem.

represents the problem.

superior to ARGOSAT algorithm.

**5. Approaches to building algorithm selectors** 

drawn from the task distribution, b) each solver is run many times on each training task, and c) a model is fit to the outcomes observed in training. In the test phase that follows, repeatedly, (1) a test task is drawn from the same task distribution, (2) the model predicts the likely outcomes of each solver, (3) the portfolio selects and runs a solver for some duration, (4) the run's outcome conditions later predictions, and (5) the process continues from (2) until a time limit expires.

The models of solver behavior are two latent class models: a multinomial mixture that captures the basic correlations between solvers, runs, and problem instances, and a mixture of Dirichlet compound multinomial distributions that also captures the tendency of solver outcomes to recur. Each model was embedded in a portfolio of diverse SAT solvers and evaluated on competition benchmarks. Both models support effective problem solving, and the DCM-based portfolio is competitive with the most prominent modern portfolio method for SAT (Xu 2009).

#### **4.4 Functions of heuristic rules**

Rice introduced the poly-algorithm concept (Rice 1968) in the context of parallel numeric software. He proposes the use of functions that can select, from a set of algorithms, the best to solve a given situation. After the Rice work, other researchers have formulated different functions that are presented in (Li 1997, Brewer 1995). The majority of the proposed functions are simple heuristic rules about structural features of the parameters of the instance that is being solved, or about the computational environment. The definition of the rules requires of the human experience.

The objective of the proposed methodology in (Beck 2004) is to find the best solution to a new instance, when a total limit time T is given. Firstly, the selection strategies for a set of algorithms A were formulated and denominated as prediction rules, these are: Selection is based on the cost of the best solution found by each algorithm; Selection is based on the change in the cost of the best solutions found at 10 second intervals; Selection is based on the extrapolation of the current cost and slope to a predicted cost at T.

These rules are applied for the training dataset and the optimal sampling time t\* (required time to select the algorithm with the less solution error) is identified for each of them. After, when a new instance is given, each prediction rule is utilized to find the algorithm with the best found solution in a time tp = |A| x t\*, and it is executed in the remaining time *tr* = T - tp. One of the advantages is that the methodology can be applied to different problems and algorithms. Nevertheless, the new dataset must have similarity with the training dataset.

#### **4.5 Machine learning**

The algorithm selection problem is focused by Lagoudakis and Littam in (Lagoudakis 2000) as a minimization problem of execution total time, which is solved with a Reinforced Learning algorithm (RL). Two classical problems were focused: selecting and ordering. A function that predicts the best algorithm for a new instance using its problem size is determined by means of training. The learned function permits to combine several recursive algorithms to improve its performance: the actual problem is divided in subproblems in

drawn from the task distribution, b) each solver is run many times on each training task, and c) a model is fit to the outcomes observed in training. In the test phase that follows, repeatedly, (1) a test task is drawn from the same task distribution, (2) the model predicts the likely outcomes of each solver, (3) the portfolio selects and runs a solver for some duration, (4) the run's outcome conditions later predictions, and (5) the process continues

The models of solver behavior are two latent class models: a multinomial mixture that captures the basic correlations between solvers, runs, and problem instances, and a mixture of Dirichlet compound multinomial distributions that also captures the tendency of solver outcomes to recur. Each model was embedded in a portfolio of diverse SAT solvers and evaluated on competition benchmarks. Both models support effective problem solving, and the DCM-based portfolio is competitive with the most prominent modern portfolio method

Rice introduced the poly-algorithm concept (Rice 1968) in the context of parallel numeric software. He proposes the use of functions that can select, from a set of algorithms, the best to solve a given situation. After the Rice work, other researchers have formulated different functions that are presented in (Li 1997, Brewer 1995). The majority of the proposed functions are simple heuristic rules about structural features of the parameters of the instance that is being solved, or about the computational environment. The definition of the

The objective of the proposed methodology in (Beck 2004) is to find the best solution to a new instance, when a total limit time T is given. Firstly, the selection strategies for a set of algorithms A were formulated and denominated as prediction rules, these are: Selection is based on the cost of the best solution found by each algorithm; Selection is based on the change in the cost of the best solutions found at 10 second intervals; Selection is based on the

These rules are applied for the training dataset and the optimal sampling time t\* (required time to select the algorithm with the less solution error) is identified for each of them. After, when a new instance is given, each prediction rule is utilized to find the algorithm with the best found solution in a time tp = |A| x t\*, and it is executed in the remaining time *tr* = T - tp. One of the advantages is that the methodology can be applied to different problems and algorithms. Nevertheless, the new dataset must have similarity with the

The algorithm selection problem is focused by Lagoudakis and Littam in (Lagoudakis 2000) as a minimization problem of execution total time, which is solved with a Reinforced Learning algorithm (RL). Two classical problems were focused: selecting and ordering. A function that predicts the best algorithm for a new instance using its problem size is determined by means of training. The learned function permits to combine several recursive algorithms to improve its performance: the actual problem is divided in subproblems in

extrapolation of the current cost and slope to a predicted cost at T.

from (2) until a time limit expires.

**4.4 Functions of heuristic rules** 

rules requires of the human experience.

for SAT (Xu 2009).

training dataset.

**4.5 Machine learning** 

each recursive step, and the most adequate algorithm in size is used for each of them. This work is extended to backtracking algorithms to SAT problem in (Lagoudakis 2001).

A system (PHYTHIA-II) to select the most appropriated software to solve a scientific problem is proposed in (Houstis 2002). The user introduces the problem features (operators of the equation, its domain, values of the variables, etc.) and time requirements and allowed error. The principal components of PHYTHIA-II are the statistical analysis, pattern extraction module and inference engine. The first consists in ranking the algorithms performance data by means of Friedman rank sums (Hollander 1973). The second utilizes different machine learning methods to extract performance patterns and represent them with decision and logic rules. The third is the process to correspond the features of a new problem with the produced rules; the objective is to predict the best algorithm and the most appropriated parameters to solve the problem.

The METAL research group proposed a method to select the most appropriate classification algorithm for a set of similar instances (Soares 2003). They used a K-nearest neighborhood algorithm to identify the group of instances from a historical registry that exhibit similar features to those of a new instance group. The algorithm performance on instances of historical registry is known and is used to predict the best algorithms for the new instance group. The similarity among instances groups is obtained considering three types of problem features: general, statistical and derived from information theory.

A Bayesian approach is proposed in (Guo, 2004) to construct an algorithm selection system which is applied to the Sorting and Most Probable Explanation (MPE) problems. From a set of training instances, their features and the run time of the best algorithm that solves each instance are utilized to build the Bayesian network. Guo proposed four representative indexes from the Sorting problem features: the size of the input permutation and three presortedness measures. For the MPE problem he utilizes general features of the problem and several statistical indexes of the Bayesian network that represents the problem.

A methodology for instance based selection of solver's policies that solves instances of the SAT problem was proposed by (Nikolic 2009). The policies are heuristics that guide the search process. Different configurations of these policies are solution strategies. The problem structure of all instances was characterized by indices. The problem instances were grouped by the values of these indices, forming instances families. All problem instances were solved by all solution strategies. The best solution strategy for each family is selected. The k-nearest neighbor algorithm selects the solution strategy for a new input instance. The results of the performance of the algorithm ARGOSmart, that performs the proposed methodology, were superior to ARGOSAT algorithm.

#### **5. Approaches to building algorithm selectors**

In this chapter we solve ASP with two approaches: meta-learning and hyper-heuristics. The meta-learning approach is oriented to learning about classification using machine learning methods; three methods are explored to solve an optimization problem: Discriminant Analysis (Pérez, 2004), C4.5 and the Self-Organising Neural Network. The hyper-heuristic approach is oriented to automatically produce an adequate combination of available lowlevel heuristics in order to effectively solve a given instance (Burke et al., 2010); a hyper-

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 89

*Feedback phase*, the new solved instances are incorporated into the characterization process for increasing the selection quality. The relationship learned in the knowledge base is improved with a new set of solved instances and is used again in the prediction phase.

The steps of this phase are shown in Figure 4 In step 1 (Characteristics Modeling) indices are derived for measuring the influence of problem characteristics on algorithm performance (see Equations 1 to 5). In step 2 (Statistical Sampling) a set of representative instances are generated with stratified sampling and a sample size derived from survey sampling. In step 3 (Characteristics Measurement) the parameter values of each instance are transformed into indices. In step 4 (Instances Solution) instances are solved using a set of heuristic algorithms. In Step 5 (Clustering) groups are integrated in such a way that they are constituted by instances with similar characteristics, and for which an algorithm outperformed the others. Finally, in step 6 (Classification) the identified grouping is learned into formal classifiers.

*Instance size p* expresses a relationship between instance size and the maximum size solved,

Fig. 3. Phases of the algorithm selection methodology

Fig. 4. Steps of the initial training phase

We propose five indices to characterize the instances of BPP:

where, *n* is the number of items, *maxn* is the maximum size solved

**Initial training phase** 

heuristic strategy is incorporated in an ant colony algorithm to select the heuristic that best adjust one of its control parameter.

#### **5.1 Selection of metaheuristics using meta-learning**

In this section a methodology based on Meta-Learning is presented for characterizing algorithm performance from past experience data. The characterization is used to select the best algorithm for a new instance of a given problem. The phases of the methodology are described and exemplified with the well known one-dimensional Bin-Packing problem.

#### **5.1.1 Algorithms for the solution of the Bin Packing Problem**

The Bin Packing Problem (BPP) is an NP-hard combinatorial optimization problem, in which the objective is to determine the smallest number of bins to pack a set of objects. For obtaining suboptimal solutions of BPP, with less computational effort, we used deterministic and non-deterministic algorithms. The algorithm performance is evaluated with the optimal deviation percentage and the processing time (Quiroz, 2009).

The deterministic algorithms always follow the same path to arrive at the same solution. The First Fit Decreasing (FFD) algorithm places the items in the first bin that can hold them. The Best Fit Decreasing (BFD) places the items in the best-filled bin that can hold them. The Match to First Fit (MFF) algorithm is a variation of FFD, wich uses complementary bins for holding temporarily items. The Match to Best Fit (MBF) algorithm is a variation of BFD and, like MFF uses complementary bins. The Modified Best Fit Decreasing (MBFD) partially pack the bins in order to find a "good fit" item combination.

The Non-Deterministic Algorithms do not obtain the same solution in different executions, but in many cases they are faster than deterministic algorithms. The Ant Colony Optimization (ACO) algorithm builds a solution with each ant: it starts with an empty bin; next, each new bin is filled with "selected items" until no remaining item fits in it; finally, a "selected item" is chosen stochastically using mainly a pheromone trail (Ducatelle, 2001). In the Threshold Accepting (TA) algorithm, a new feasible solution is accepted if the difference with the previous solution is within a threshold temperature; the value of the temperature is decreased each time until a thermal equilibrium is reached (Pérez, 2002).

#### **5.1.2 Methodology**

The methodology proposed for performance characterization and its application to algorithm selection consists of three consecutive phases: Initial Training, Prediction and Training with Feedback. Figure 3 depicts these phases.

In the *Initial Training Phase*, two internal processes build a past experience database: the Problem Characterization Process obtains statistical indices to measure the computational complexity of a problem instance and, the Algorithm characterization Process solves instances with the available algorithms to obtain performance indices. The Training Process finally builds a knowledge base using the Problem and Algorithms Database. This knowledge is represented through a learning model, which relates the algorithms performance and the problem characteristics. In the *Prediction Phase*, The relationship learned is used to predict the best algorithm for a new given instance. In the *Training with*  *Feedback phase*, the new solved instances are incorporated into the characterization process for increasing the selection quality. The relationship learned in the knowledge base is improved with a new set of solved instances and is used again in the prediction phase.

#### **Initial training phase**

88 Intelligent Systems

heuristic strategy is incorporated in an ant colony algorithm to select the heuristic that best

In this section a methodology based on Meta-Learning is presented for characterizing algorithm performance from past experience data. The characterization is used to select the best algorithm for a new instance of a given problem. The phases of the methodology are described and exemplified with the well known one-dimensional Bin-Packing problem.

The Bin Packing Problem (BPP) is an NP-hard combinatorial optimization problem, in which the objective is to determine the smallest number of bins to pack a set of objects. For obtaining suboptimal solutions of BPP, with less computational effort, we used deterministic and non-deterministic algorithms. The algorithm performance is evaluated

The deterministic algorithms always follow the same path to arrive at the same solution. The First Fit Decreasing (FFD) algorithm places the items in the first bin that can hold them. The Best Fit Decreasing (BFD) places the items in the best-filled bin that can hold them. The Match to First Fit (MFF) algorithm is a variation of FFD, wich uses complementary bins for holding temporarily items. The Match to Best Fit (MBF) algorithm is a variation of BFD and, like MFF uses complementary bins. The Modified Best Fit Decreasing (MBFD) partially pack

The Non-Deterministic Algorithms do not obtain the same solution in different executions, but in many cases they are faster than deterministic algorithms. The Ant Colony Optimization (ACO) algorithm builds a solution with each ant: it starts with an empty bin; next, each new bin is filled with "selected items" until no remaining item fits in it; finally, a "selected item" is chosen stochastically using mainly a pheromone trail (Ducatelle, 2001). In the Threshold Accepting (TA) algorithm, a new feasible solution is accepted if the difference with the previous solution is within a threshold temperature; the value of the temperature is

The methodology proposed for performance characterization and its application to algorithm selection consists of three consecutive phases: Initial Training, Prediction and

In the *Initial Training Phase*, two internal processes build a past experience database: the Problem Characterization Process obtains statistical indices to measure the computational complexity of a problem instance and, the Algorithm characterization Process solves instances with the available algorithms to obtain performance indices. The Training Process finally builds a knowledge base using the Problem and Algorithms Database. This knowledge is represented through a learning model, which relates the algorithms performance and the problem characteristics. In the *Prediction Phase*, The relationship learned is used to predict the best algorithm for a new given instance. In the *Training with* 

with the optimal deviation percentage and the processing time (Quiroz, 2009).

decreased each time until a thermal equilibrium is reached (Pérez, 2002).

adjust one of its control parameter.

**5.1 Selection of metaheuristics using meta-learning** 

the bins in order to find a "good fit" item combination.

Training with Feedback. Figure 3 depicts these phases.

**5.1.2 Methodology** 

**5.1.1 Algorithms for the solution of the Bin Packing Problem** 

The steps of this phase are shown in Figure 4 In step 1 (Characteristics Modeling) indices are derived for measuring the influence of problem characteristics on algorithm performance (see Equations 1 to 5). In step 2 (Statistical Sampling) a set of representative instances are generated with stratified sampling and a sample size derived from survey sampling. In step 3 (Characteristics Measurement) the parameter values of each instance are transformed into indices. In step 4 (Instances Solution) instances are solved using a set of heuristic algorithms. In Step 5 (Clustering) groups are integrated in such a way that they are constituted by instances with similar characteristics, and for which an algorithm outperformed the others. Finally, in step 6 (Classification) the identified grouping is learned into formal classifiers.

Fig. 4. Steps of the initial training phase

We propose five indices to characterize the instances of BPP:

*Instance size p* expresses a relationship between instance size and the maximum size solved, where, *n* is the number of items, *maxn* is the maximum size solved

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 91

The steps of this phase are shown in Figure 6. The objective is to feedback the system in order to maintain it in a continuous training. For each new solved and characterized instance, step 9 (Instance Solution) obtains the real best algorithm. Afterwards, step 10 (Patterns Verification) compares the result, if the prediction is wrong and the average accuracy is beyond an specified threshold, then the classifiers are rebuilt using the old and

For test purposes 2,430 random instances of the Bin-Packing problem were generated, characterized and solved using the seven heuristic algorithms described in Section 5.1.1.

Instance Problem characteristic indices Real best

E1i10.txt 0.078 0.427 0.029 0.000 0.003 FFD,TA E50i10.txt 0.556 0.003 0.679 0.048 0.199 BFD,ACO E147i10.txt 0.900 0.002 0.530 0.000 0.033 TA Table 1. Example of random intances with their characteristics and the best algorithms

The K-means clustering method was used to create similar instance groups. Four groups were obtained; each group was associated with a similar instances set and an algorithm with the best performance for it. Three algorithms had poor performance and were outperformed by the other four algorithms. The Discriminant Analysis (DA) and C4.5 classification methods were used to build the algorithm selector. We use the machine learning methods available in SPSS version 11.5 and Weka 3.4.2, respectively. Afterwards, for validating the system, 1,369 standard instances were collected [Ross 2002]. In the selection of the best algorithm for all standard instances, the experimental results showed an accuracy of 76% with DA and 81% with C4.5. This accuracy was compared with a random selection from the

*p b t f d* algorithms

Table 1 shows a small instance set, which were selected from the sample.

new dataset; otherwise the new instance is stored and the process ends.

Fig. 6. Steps of the training with feedback phase

**5.1.3 Experimentation** 

**Training and FeedBack phase** 

$$p = \frac{n}{\max m} \tag{1}$$

a. *Constrained capacity t* expresses a relationship between the average item size and the bin size. The size of item *i* is *si* and the bin size is *c*.

$$t = \frac{\sum (s\_i \mid c)}{n} \qquad 1 \le i \le n \tag{2}$$

b. *Item dispersion d* expresses the dispersion degree of the item size values.

$$d = \sigma \text{ (t)}\tag{3}$$

c. *Number of factors f* expresses the proportion of items whose sizes are factors of the bin capacity.

$$f = \frac{\sum f\_{\text{factor}}(c, s\_i)}{n} \quad 1 \le i \le n \tag{4}$$

d. *Bin usage b* expresses the proportion of the total size that can fit in a bin of capacity *c*.

$$b = \begin{cases} 1 & \text{if} \quad c \ge \sum\_{i} s\_i \\ \frac{c}{\sum\_{j} s\_j} & \text{otherwise} \end{cases} \quad 1 \le i \le n \tag{5}$$

#### **Prediction phase**

The steps of this phase are shown in Figure 5. For a new instance, step 7 (Characteristics Measurement) calculates its characteristic values using indices. Step 8 uses the learned classifiers to determine, from the characteristics of the new instance, which group it belongs to. The algorithm associated to this group is the expected best algorithm for the instance.

Fig. 5. Steps of the prediction phase

#### **Training and FeedBack phase**

90 Intelligent Systems

*n*

a. *Constrained capacity t* expresses a relationship between the average item size and the bin

*t i n <sup>n</sup>*

c. *Number of factors f* expresses the proportion of items whose sizes are factors of the bin

*factor(c,s ) f i n <sup>n</sup>*

d. *Bin usage b* expresses the proportion of the total size that can fit in a bin of capacity *c*.

<sup>1</sup> otherwise

*c s*

*b c i n*

The steps of this phase are shown in Figure 5. For a new instance, step 7 (Characteristics Measurement) calculates its characteristic values using indices. Step 8 uses the learned classifiers to determine, from the characteristics of the new instance, which group it belongs to. The algorithm associated to this group is the expected best algorithm for the

*i i*

 1 *i*

1

*maxn* (1)

*d t* () (3)

(2)

(4)

(5)

*p*

*i i*

b. *Item dispersion d* expresses the dispersion degree of the item size values.

*i*

1 if

*i i*

*s*

  *(s / c)*

size. The size of item *i* is *si* and the bin size is *c*.

capacity.

**Prediction phase** 

Fig. 5. Steps of the prediction phase

instance.

The steps of this phase are shown in Figure 6. The objective is to feedback the system in order to maintain it in a continuous training. For each new solved and characterized instance, step 9 (Instance Solution) obtains the real best algorithm. Afterwards, step 10 (Patterns Verification) compares the result, if the prediction is wrong and the average accuracy is beyond an specified threshold, then the classifiers are rebuilt using the old and new dataset; otherwise the new instance is stored and the process ends.

Fig. 6. Steps of the training with feedback phase

#### **5.1.3 Experimentation**

For test purposes 2,430 random instances of the Bin-Packing problem were generated, characterized and solved using the seven heuristic algorithms described in Section 5.1.1. Table 1 shows a small instance set, which were selected from the sample.


Table 1. Example of random intances with their characteristics and the best algorithms

The K-means clustering method was used to create similar instance groups. Four groups were obtained; each group was associated with a similar instances set and an algorithm with the best performance for it. Three algorithms had poor performance and were outperformed by the other four algorithms. The Discriminant Analysis (DA) and C4.5 classification methods were used to build the algorithm selector. We use the machine learning methods available in SPSS version 11.5 and Weka 3.4.2, respectively. Afterwards, for validating the system, 1,369 standard instances were collected [Ross 2002]. In the selection of the best algorithm for all standard instances, the experimental results showed an accuracy of 76% with DA and 81% with C4.5. This accuracy was compared with a random selection from the

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 93

gives up in its absence. Due to its complexity (Michlmayr, 2007) solutions proposed to SQRP

Hyper-Heuristic\_AdaNAS (HH\_AdaNAS) is an adaptive metaheuristic algorithm, which resolves SQRP (Hernandez, 2010). This algorithm was created from AdaNAS (Gómez et al., 2010). The *high-level algorithm* is formed by HH\_AdaNAS, which use as solution algorithm AdaNAS, that is inspired by an ant colony and the set of *low-level heuristics* are included in the algorithm called HH\_TTL. The goal of hyperheuristic HH\_TTL is to define by itself in real time, the most adequate values for time to live (TTL) parameter during the execution of the algorithm. The main difference between AdaNAS and HH\_AdaNAS are: when applying the modification of the TTL and the calculation of the amount of TTL to be

allocated. In the Figure 8 we show HH\_AdaNAS is form by AdaNAS + HH\_TTL.

**Tables Learning:**  pheromone table τ and tables *D*, *N*  y *H.*

**HH\_AdaNAS Hyper-Heuristic HH\_AdaNAS**

*Zx*

HH\_AdaNAS inherited some data structures of AdaNAS, as the pheromone table τ and the tables *H*, *D* and *N*. Besides the data structures of the high level metaheuristics, are the structures that help to select the low-level heuristic these are the pheromone table τ*hh* and the table hiperheuristic visibility states η. All the tables stored heuristic information or gained experience in the past. The relationship of these structures is shown in Figure 9.

When HH\_AdaNAS searches for the next node, in the routing process of the query, is based on the pheromone table τ and tables *D*, *N* y *H*; these tables are intended to give information on distant *D*, *H* is a table that records the successes of past queries and number of documents *N* which are the closest nodes that can satisfy the query. In the same way, when HH\_TTL chooses the following low level heuristic, through data structures τ*hh* and η. The memory is composed of two data structures that store information of prior consultations. The first of these memories is the pheromone table τ*hh* which has three dimensions, and the other memory structure is the table hiper-heuristic visibility states η, which allows the hiper-

**D o m a i n B a rr i e r**

+

**E v a l u a t i o n F u n c t i o n**

**AdaNAS**

**SURVIVAL RULE**

**HH\_TTL** H1 H2 H3 H4 **H1: To increase TTL in 1 unit. H2: Maintains TTL constant. H3 = Falls TTL in 1 unit. H4 = Falls TTL in 2 units.**

**Tables Learning:**  pheromone table τ*hh* and tables η*.*

**TOPOLOGY** 

Fig. 8. HH\_AdaNAS is form by AdaNAS + HH\_TTL.

**NODE**

**REPOSITORY**

**Data structures of HH\_AdaNAS** 

**SQRP**

**P2P-NETWORK Hyper-Heuristic**

typically limit to special cases.

seven algorithms: 14.2%. For the instances of the remaining percentage (100-76%), the selected algorithms generate a solution close to the optimal.

The selection system with feedback was implemented using a neural network, particularly the Self-Organizing Map (SOM) of Kohonen available in Matlab 7.0. The best results were obtained with only two problem characteristic indices (*p*,*t*) in a multi-network. The accuracy increased from 78.8% in 100 epochs up to 100% in 20,000 epochs. These percentages correspond to the network with initial-training and training-with-feedback, respectively. The SOM was gradually feedback with all the available instances. Using all indices (*p*,*b*,*t*,*f*,*d*) the SOM only reached 76.6% even with feedback.

#### **5.2 Selection of heuristics in a hyper-heuristic framework**

A hyper-heuristic is an automated methodology for selecting heuristics to solve hard computational search problems (Burke et al., 2009; Burke et al., 2010; Duarte et al., 2007). Its methodology is form by a high-level algorithm that, given a particular problem instance and a number of low-level heuristics or metaheuristic, can select and apply an appropriate lowlevel heuristic or metaheuristic at each decision step. These procedures on their way to work raise the generality at which search strategy can operate. General scheme for design a hyperheuristic is shown in Figure 7**.**

Fig. 7. Hyper-heuristic Elements

The first low-level algorithms build a solution incrementally; starting with an empty solution with the goal is to intelligently select the next construction heuristics or metaheuristic to gradually build a complete solution (Garrido, & Castro, 2009).

#### **5.2.1 Representative examples**

SQRP is the problem of locating information in a network based on a query formed by keywords. The goal of SQRP is to determine the shortest paths from a node that issues a query to nodes that can appropriately answer it (by providing the requested information). Each query traverses the network, moving from the initiating node to a neighboring node and then to a neighbor of a neighbor and so forth, until it locates the requested resource or

seven algorithms: 14.2%. For the instances of the remaining percentage (100-76%), the

The selection system with feedback was implemented using a neural network, particularly the Self-Organizing Map (SOM) of Kohonen available in Matlab 7.0. The best results were obtained with only two problem characteristic indices (*p*,*t*) in a multi-network. The accuracy increased from 78.8% in 100 epochs up to 100% in 20,000 epochs. These percentages correspond to the network with initial-training and training-with-feedback, respectively. The SOM was gradually feedback with all the available instances. Using all indices (*p*,*b*,*t*,*f*,*d*)

A hyper-heuristic is an automated methodology for selecting heuristics to solve hard computational search problems (Burke et al., 2009; Burke et al., 2010; Duarte et al., 2007). Its methodology is form by a high-level algorithm that, given a particular problem instance and a number of low-level heuristics or metaheuristic, can select and apply an appropriate lowlevel heuristic or metaheuristic at each decision step. These procedures on their way to work raise the generality at which search strategy can operate. General scheme for design a hyper-

> **D o m a i n B a r r i e r**

The first low-level algorithms build a solution incrementally; starting with an empty solution with the goal is to intelligently select the next construction heuristics or

SQRP is the problem of locating information in a network based on a query formed by keywords. The goal of SQRP is to determine the shortest paths from a node that issues a query to nodes that can appropriately answer it (by providing the requested information). Each query traverses the network, moving from the initiating node to a neighboring node and then to a neighbor of a neighbor and so forth, until it locates the requested resource or

metaheuristic to gradually build a complete solution (Garrido, & Castro, 2009).

**E v a l u a t i o n F u n c t i o n**

**Set of low level heuristic or Metaheuristic**  H1 H2 H3 Hn

selected algorithms generate a solution close to the optimal.

**5.2 Selection of heuristics in a hyper-heuristic framework** 

**Hyper-Heuristic** 

the SOM only reached 76.6% even with feedback.

heuristic is shown in Figure 7**.**

**Problem to be solved** 

The problem solution space

Fig. 7. Hyper-heuristic Elements

**5.2.1 Representative examples** 

gives up in its absence. Due to its complexity (Michlmayr, 2007) solutions proposed to SQRP typically limit to special cases.

Hyper-Heuristic\_AdaNAS (HH\_AdaNAS) is an adaptive metaheuristic algorithm, which resolves SQRP (Hernandez, 2010). This algorithm was created from AdaNAS (Gómez et al., 2010). The *high-level algorithm* is formed by HH\_AdaNAS, which use as solution algorithm AdaNAS, that is inspired by an ant colony and the set of *low-level heuristics* are included in the algorithm called HH\_TTL. The goal of hyperheuristic HH\_TTL is to define by itself in real time, the most adequate values for time to live (TTL) parameter during the execution of the algorithm. The main difference between AdaNAS and HH\_AdaNAS are: when applying the modification of the TTL and the calculation of the amount of TTL to be allocated. In the Figure 8 we show HH\_AdaNAS is form by AdaNAS + HH\_TTL.

Fig. 8. HH\_AdaNAS is form by AdaNAS + HH\_TTL.

#### **Data structures of HH\_AdaNAS**

HH\_AdaNAS inherited some data structures of AdaNAS, as the pheromone table τ and the tables *H*, *D* and *N*. Besides the data structures of the high level metaheuristics, are the structures that help to select the low-level heuristic these are the pheromone table τ*hh* and the table hiperheuristic visibility states η. All the tables stored heuristic information or gained experience in the past. The relationship of these structures is shown in Figure 9.

When HH\_AdaNAS searches for the next node, in the routing process of the query, is based on the pheromone table τ and tables *D*, *N* y *H*; these tables are intended to give information on distant *D*, *H* is a table that records the successes of past queries and number of documents *N* which are the closest nodes that can satisfy the query. In the same way, when HH\_TTL chooses the following low level heuristic, through data structures τ*hh* and η. The memory is composed of two data structures that store information of prior consultations. The first of these memories is the pheromone table τ*hh* which has three dimensions, and the other memory structure is the table hiper-heuristic visibility states η, which allows the hiper-

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 95

Where *Hi,j,l* indicates the number of documents consistent with the query *l*, *Di,j,l* indicates the length of the route to obtain the documents, *i* represented the current node and *j* is the node chosen, and *Zx* is a measure of current performance. In this work the visibility states are: *m*<sup>1</sup> = (α > 1)&(*TTL* < *D*)&( *TTL* != 1), *m*2 = (α > 1)&(*TTL* < *D*)&( *TTL* = 1), *m*3 = (*H* = 0) ||(( α > 1)&(*TTL ≥ D*))|| (( α ≤ 1)&(*TTL* = 1)) and *m*4 = ( α ≤ 1)&(*TTL* > 1). All the visibility states are

The experimental environment used during experiments, and the results obtained are presented in this section. **Software:** Microsoft Windows 7 Home Premium; Java programming language, Java Platform, JDK 1.6; and integrated development, Eclipse 3.4. **Hardware:** Computer equipment with processor Intel (R) Core (TM) i5 CPU M430 2.27 GHz and RAM memory of 4 GB. **Instances:** It has 90 different SQRP instances; each of them consists of three files that represent the topology, queries and repositories. The description of the features can

The average performance was studied by computing three performance measures of each 100 queries: **Average hops**, defined as the average amount of links traveled by a Forward Ant until its death that is, reaching either the maximum amount of results required or running out of TTL. **Average hits**, defined as the average number of resources found by each Forward Ant until its death, and **Average efficiency**, defined as the average of resources found per traversed edge (hits/hops). The initial Configuration of HH\_AdaNAS is shown in Table 2. The parameter values were based on values suggested of the literature

In this section we show experimentally that HH\_AdaNAS algorithm outperforms the AdaNAS algorithm. Also HH\_AdaNAS outperforms NAS (Aguirre, 2008), SemAnt (Michlmayr, 2007) and random walk algorithms (Cruz et al., 2008), this was reported in

In this experiment, we compare the HH\_AdaNAS and AdaNAS algorithms. The performance achieved is measured by the rate of found documents and the experiments were conducted under equal conditions, so each algorithm was run 30 times per instance and used the same configuration parameters for the two algorithms, which is described

as (Dorigo & Stützle, 2004; Michlmayr, 2007; Aguirre, 2008 and Rivera, 2009).

(Gómez et al., 2010), so HH\_AdaNAS algorithm is positioned as the best of them.

Table 2. Shows the assignment of values for each HH\_AdaNAS parameter.

calculated to identify which heuristic will be applied to TTL.

**5.2.2 Experimentation** 

be found in (Cruz et al. 2008).

in Table 2.

heuristic know in what state is SQRP. Is to say, if is necessary to add more TTL, because the amount of resources found are few and decreases the lifetime.

Fig. 9. Storage structures of HH\_AdaNAS.

The pheromone table τ is divided into *n* two-dimensional tables, one corresponding to each node *i* of the network. These tables contain only entries for a node fixed *i*, therefore, its dimensions are at most |*L*|×|Γ (*i*)|, where *L* is the dictionary, which defines the keywords allowed for consultation and Γ (*i*) is the set of neighboring nodes of *i*. Each in turn contains a two-dimensional table |*m*|×|*h*|, where *m* is the states visibility set of the problem and *h* is the available heuristics set. The pheromone table is also called learning structure long.

The visibility state table η expresses the weight of the relation between SQRP-states and TTL-heuristics and was inspired by the deterministic survival rule designed by Rivera (Rivera G. 2009). Table η is formed by the combination of |*m*|×|*h*|, where a visibility state *mi* is identified mainly by α, which depends on the node selected by AdaNAS to route the query SQRP. The variable α in Equation 6 contributes to ensure that the node selected by HH\_AdaNAS, in the future, not decreases the performance of the algorithm. A TTLheuristic is intelligently selected according with the past performance given by its pheromone value, and its visibility value, given by an expert. The Figure 10 shows the visibility state table used in this work.


Fig. 10. Visibility state table

$$\mathbf{u} = (\mathbf{H}\_{i,j,l} \;/ \; \mathbf{D}\_{i,j,l}) / \; \mathbf{Z}\_{\mathbf{x}} \tag{6}$$

Where *Hi,j,l* indicates the number of documents consistent with the query *l*, *Di,j,l* indicates the length of the route to obtain the documents, *i* represented the current node and *j* is the node chosen, and *Zx* is a measure of current performance. In this work the visibility states are: *m*<sup>1</sup> = (α > 1)&(*TTL* < *D*)&( *TTL* != 1), *m*2 = (α > 1)&(*TTL* < *D*)&( *TTL* = 1), *m*3 = (*H* = 0) ||(( α > 1)&(*TTL ≥ D*))|| (( α ≤ 1)&(*TTL* = 1)) and *m*4 = ( α ≤ 1)&(*TTL* > 1). All the visibility states are calculated to identify which heuristic will be applied to TTL.

#### **5.2.2 Experimentation**

94 Intelligent Systems

heuristic know in what state is SQRP. Is to say, if is necessary to add more TTL, because the

*τ*

*τhh η*

The pheromone table τ is divided into *n* two-dimensional tables, one corresponding to each node *i* of the network. These tables contain only entries for a node fixed *i*, therefore, its dimensions are at most |*L*|×|Γ (*i*)|, where *L* is the dictionary, which defines the keywords allowed for consultation and Γ (*i*) is the set of neighboring nodes of *i*. Each in turn contains a two-dimensional table |*m*|×|*h*|, where *m* is the states visibility set of the problem and *h* is the available heuristics set. The pheromone table is also called learning structure long.

The visibility state table η expresses the weight of the relation between SQRP-states and TTL-heuristics and was inspired by the deterministic survival rule designed by Rivera (Rivera G. 2009). Table η is formed by the combination of |*m*|×|*h*|, where a visibility state *mi* is identified mainly by α, which depends on the node selected by AdaNAS to route the query SQRP. The variable α in Equation 6 contributes to ensure that the node selected by HH\_AdaNAS, in the future, not decreases the performance of the algorithm. A TTLheuristic is intelligently selected according with the past performance given by its pheromone value, and its visibility value, given by an expert. The Figure 10 shows the

> *h*1 *h*2 *h*3 *h*<sup>4</sup> *m*<sup>1</sup> 1 0.75 0.5 0.25 *m*<sup>2</sup> 0.75 1 0.5 0.5 *m*<sup>3</sup> 0.5 0.5 1 0.75 *m*<sup>4</sup> 0.25 0.5 0.75 1

> > =( *H /D )/Z i,j,l i,j,l x* (6)

D N H

Tables Storage: Short and Long learning, applied in selecting the next neighbor.

HH\_TTL. Tables storage long learning, applied in selecting the next low-level heuristic.

amount of resources found are few and decreases the lifetime.

Hyper-Heuristic Ant Colony Algorithm **HH\_AdaNAS**

Fig. 9. Storage structures of HH\_AdaNAS.

visibility state table used in this work.

Fig. 10. Visibility state table

The experimental environment used during experiments, and the results obtained are presented in this section. **Software:** Microsoft Windows 7 Home Premium; Java programming language, Java Platform, JDK 1.6; and integrated development, Eclipse 3.4. **Hardware:** Computer equipment with processor Intel (R) Core (TM) i5 CPU M430 2.27 GHz and RAM memory of 4 GB. **Instances:** It has 90 different SQRP instances; each of them consists of three files that represent the topology, queries and repositories. The description of the features can be found in (Cruz et al. 2008).

The average performance was studied by computing three performance measures of each 100 queries: **Average hops**, defined as the average amount of links traveled by a Forward Ant until its death that is, reaching either the maximum amount of results required or running out of TTL. **Average hits**, defined as the average number of resources found by each Forward Ant until its death, and **Average efficiency**, defined as the average of resources found per traversed edge (hits/hops). The initial Configuration of HH\_AdaNAS is shown in Table 2. The parameter values were based on values suggested of the literature as (Dorigo & Stützle, 2004; Michlmayr, 2007; Aguirre, 2008 and Rivera, 2009).

In this section we show experimentally that HH\_AdaNAS algorithm outperforms the AdaNAS algorithm. Also HH\_AdaNAS outperforms NAS (Aguirre, 2008), SemAnt (Michlmayr, 2007) and random walk algorithms (Cruz et al., 2008), this was reported in (Gómez et al., 2010), so HH\_AdaNAS algorithm is positioned as the best of them.


Table 2. Shows the assignment of values for each HH\_AdaNAS parameter.

In this experiment, we compare the HH\_AdaNAS and AdaNAS algorithms. The performance achieved is measured by the rate of found documents and the experiments were conducted under equal conditions, so each algorithm was run 30 times per instance and used the same configuration parameters for the two algorithms, which is described in Table 2.

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 97

This "winner-take-all" approach has produced recent and important advances in algorithm design and refinement, but has caused the rejection of many algorithms that has an excellent performance on an specific cases, but result uncompetitive on average. The following two questions emerge from the literature (Leyton-Brown, 2003). How to perform an algorithm

a. Algorithms with high average running times can be combined to form a hybrid algorithm more robust and with low average running time when the algorithm inputs

b. New hybrid algorithm design should find more robust solution and focus on problems

c. A portfolio of algorithms can also be integrated through the use of hybrid algorithms

In previous section we use machine learning algorithms to automatically acquire knowledge for algorithm selection, leading to a reduced need for experts and a potential improvement of performance. In general, the algorithm selection problem can be treated via meta-learning approaches. The results of this approach can cause an important impact on hybridization. In order to clarify this point, is important to speculate about how the empirical results of meta-

learning can be analyzed from a theoretical perspective with different intentions:

b. Generate insights into algorithm behavior that can be used to refine the algorithms.

The acquired knowledge is confirmed when the performance of the refined algorithms is evaluated. The knowledge can be used to integrate complementary strategies in a hybrid

The principal advanced in the reduction of Complexity is related with the amalgam of different perspectives established on different techniques which to demonstrate their

Hybridization of Algorithms is one of the most adequate ways to try to improve and solve different ASP related with the optimization of time. Many applied ASP´s have an impact on social domains specially to solve dynamic and complex models related with human behavior. In (Araiza, 2011) is possible analyze with a Multiagents System the concept of "Social Isolation", featuring this behavior on the time according with interchanges related

In addition, is possible specify the deep and impact of a viral marketing campaign using a Social Model related with Online Social Networking. In (Azpeitia, 2011), an adequate ASP determines the way on the future of this campaign and permits analyze the track of this to

**6.3 Future trends on the resolution of ASP using a hybrid system of metaheuristics**  We expected that the future trends for solving ASP with hybridization will be based on models that tend to perform activities according to a selection framework and a dynamic

selection for a given instance? How to evaluate novel hybrid algorithms?

are sufficiently easy and uncorrelated.

a. Confirm the sense of the selection rules

understand their best features.

algorithm.

on which a single algorithm performs poorly.

because the solutions are considering more innovative.

**6.2 Use of hybridization to solve ASP in social domains** 

efficiency in different application domains with good results.

with a minority and the associated health effects, when this occurs.

The Figure 11 shows the average efficiency performed during a set of queries with HH\_AdaNAS and AdaNAS algorithms; for the two algorithms the behavior is approximately the same. The algorithm HH\_AdaNAS at the beginning the efficiency is around 2.38 hits per hop in the first 100 queries and the algorithm AdaNAS start approximately at 2.37 hits per query also in the top 100 queries. Analyzing at another example of the experiment, after processing the 11 000 queries at the end the efficiency increases around 3.31 hits per hop for the algorithm HH\_AdaNAS and the algorithm AdaNAS at 3.21 hits per query. Finally, due to the result we conclude that HH\_AdaNAS achieves a final improvement in performance of 28.09%, while AdaNAS reaches an improvement of 26.16%.

#### **6. Hybrid systems of metaheuristics: an approximate solution of ASP**

The majority of problems related with ASP have a high level of complexity, according to application domains. An alternative solution is the use of Hybrid Systems based on Heuristics and Metaheuristics. Algorithm selection has attracted the attention of some research in hybrid intelligent systems, for which many algorithms and large datasets are available. Hybrid Intelligent Systems seek to take advantage of the synergy between various intelligent techniques in solving real problems (Ludermir et al., 2011).

#### **6.1 Relation of meta-learning and hybridization**

Although some algorithms based on Hybrid Systems of Metaheuristics are better than others on average, there is rarely a best algorithm for a given problem according to the complexity and application domain related with the proposal solution. Instead, it is often the case that different algorithms perform well on different problem instances. This condition is most accentuated among algorithms for solving NP-Hard problems, because runtimes of these algorithms are often highly variable from instance to instance.

When algorithms present high runtime variability, one is faced with the problem of deciding which algorithm to use. Rice called this the "algorithm selection problem" (Rice, 1976). The algorithm selection has not received widespread attention. The most common approach to algorithm selection has been to measure the performance of different algorithms on a given instances set with certain distribution, and then select the algorithm with the lowest average runtime.

The Figure 11 shows the average efficiency performed during a set of queries with HH\_AdaNAS and AdaNAS algorithms; for the two algorithms the behavior is approximately the same. The algorithm HH\_AdaNAS at the beginning the efficiency is around 2.38 hits per hop in the first 100 queries and the algorithm AdaNAS start approximately at 2.37 hits per query also in the top 100 queries. Analyzing at another example of the experiment, after processing the 11 000 queries at the end the efficiency increases around 3.31 hits per hop for the algorithm HH\_AdaNAS and the algorithm AdaNAS at 3.21 hits per query. Finally, due to the result we conclude that HH\_AdaNAS achieves a final improvement in performance of 28.09%, while AdaNAS reaches an

Fig. 11. The average efficiency performed during 11,000 queries with two algorithms.

**6. Hybrid systems of metaheuristics: an approximate solution of ASP** 

intelligent techniques in solving real problems (Ludermir et al., 2011).

**6.1 Relation of meta-learning and hybridization** 

with the lowest average runtime.

The majority of problems related with ASP have a high level of complexity, according to application domains. An alternative solution is the use of Hybrid Systems based on Heuristics and Metaheuristics. Algorithm selection has attracted the attention of some research in hybrid intelligent systems, for which many algorithms and large datasets are available. Hybrid Intelligent Systems seek to take advantage of the synergy between various

Although some algorithms based on Hybrid Systems of Metaheuristics are better than others on average, there is rarely a best algorithm for a given problem according to the complexity and application domain related with the proposal solution. Instead, it is often the case that different algorithms perform well on different problem instances. This condition is most accentuated among algorithms for solving NP-Hard problems, because

When algorithms present high runtime variability, one is faced with the problem of deciding which algorithm to use. Rice called this the "algorithm selection problem" (Rice, 1976). The algorithm selection has not received widespread attention. The most common approach to algorithm selection has been to measure the performance of different algorithms on a given instances set with certain distribution, and then select the algorithm

runtimes of these algorithms are often highly variable from instance to instance.

improvement of 26.16%.

This "winner-take-all" approach has produced recent and important advances in algorithm design and refinement, but has caused the rejection of many algorithms that has an excellent performance on an specific cases, but result uncompetitive on average. The following two questions emerge from the literature (Leyton-Brown, 2003). How to perform an algorithm selection for a given instance? How to evaluate novel hybrid algorithms?


In previous section we use machine learning algorithms to automatically acquire knowledge for algorithm selection, leading to a reduced need for experts and a potential improvement of performance. In general, the algorithm selection problem can be treated via meta-learning approaches. The results of this approach can cause an important impact on hybridization. In order to clarify this point, is important to speculate about how the empirical results of metalearning can be analyzed from a theoretical perspective with different intentions:


The acquired knowledge is confirmed when the performance of the refined algorithms is evaluated. The knowledge can be used to integrate complementary strategies in a hybrid algorithm.

#### **6.2 Use of hybridization to solve ASP in social domains**

The principal advanced in the reduction of Complexity is related with the amalgam of different perspectives established on different techniques which to demonstrate their efficiency in different application domains with good results.

Hybridization of Algorithms is one of the most adequate ways to try to improve and solve different ASP related with the optimization of time. Many applied ASP´s have an impact on social domains specially to solve dynamic and complex models related with human behavior. In (Araiza, 2011) is possible analyze with a Multiagents System the concept of "Social Isolation", featuring this behavior on the time according with interchanges related with a minority and the associated health effects, when this occurs.

In addition, is possible specify the deep and impact of a viral marketing campaign using a Social Model related with Online Social Networking. In (Azpeitia, 2011), an adequate ASP determines the way on the future of this campaign and permits analyze the track of this to understand their best features.

#### **6.3 Future trends on the resolution of ASP using a hybrid system of metaheuristics**

We expected that the future trends for solving ASP with hybridization will be based on models that tend to perform activities according to a selection framework and a dynamic

Algorithm Selection: From Meta-Learning to Hyper-Heuristics 99

Ali, S. & Smith, K. (2006). On learning algorithm selection for classification. *Applied Soft* 

Aguirre, M. (2008). *Algoritmo de Búsqueda Semántica para Redes P2P Complejas*. Master's thesis,

Azpeitia, D. (2011). Critical Factors for Success of a Viral Marketing Campaign of Real-Estate

Beck, J. & Freuder, E. (2004). Simple Rules for Low-Knowledge Algorithm Selection.

2004, J. Regin and M. Rueher (Ed.). Springer-Verlag Vol. 3011, pp. 50-64. Brazdil, P. B., Soares C., & Pinto, D. C. J. (2003). Ranking Learning Algorithms: Using IBL

Brewer, E. (1995). High-Level Optimization Via Automated Statistical Modeling. *Proceedings* 

Burke, E., Hyde, M., Kendall, G., Ochoa, G., Özcan, E. & Woodward, J. (2009). Exploring

Burke, K., Hyde, M., Kendall, G., Ochoa, G., Özcan, E. & Woodward, R. (2010). A

Cai, H., Hu X., Lü Q., & Cao, Q. (2009). A novel intelligent service selection algorithm and

Cruz, L. (1999). *Automatización del Diseño de la Fragmentación Vertical y Ubicación en Bases de* 

Cruz, L., Gómez, C., Aguirre, M., Schaeffer, S., Turrubiates, T., Ortega, R. & Fraire,H.(2008).

Czogalla, J. & Fink, A. (2009). Fitness Landscape Analysis for the Resource Constrained

Dorigo, M. & Stützle, T. (2004). *Ant Colony Optimization*. MIT Press, Cambridge, MA., ISBN

Science+Business Media, ISBN 978-1-4419-1663-1, NY, USA

*Applications*, Vol. 36, No. 2, Part 1, pp. 2200-2212, ISSN: 09574174

Tecnológico y de Estudios Superiores de Monterrey, México.

División de Estudio de Posgrado e Investigación del Instituto Tecnológico de

Sector at Facebook: The strength of weak learnability. *Proceedings of the HIS* 

*Proceedings of the 1st International Conference on Integration of IA and OR Techniques in Constraint Programming for Combinatorial Optimization Problems*, Nice, France, April

and Meta-Learning on Accuracy and Time Results. *Machine Learning*, Vol. 50, No. 3,

*of Principles and Practice of Parallel Programming*, Santa Barbara, CA, July 1995, ACM

hyper-heuristic methodologies with genetic programming. In: *Computational Intelligence*: Collaboration, Fusion and Emergence, Intelligent Systems Reference

Classification of Hyper-heuristic Approaches, In: *International Series in Operations Research & Management Science*, Gendreau, M. and Potvin, J.Y. pp.(449). Springer

application for ubiquitous web services environment. *Expert Systems with* 

*Datos Distribuidas usando Métodos Heurísticos y Exactos*. Master's thesis, Instituto

NAS algorithm for semantic query routing systems in complex networks. In: *International Symposium on Distributed Computing and Artificial Intelligence 2008/ Advances in Soft Computing 2009*. Corchado J., Rodríguez S., Llinas J. & Molina J., pp. (284-292), Springer, Berlin /Heidelberg, ISBN 978-3-540-85862-1, DOI

Project Scheduling Problem. *Lecture Notes in Computer Science, Learning and* 

This research was supported in part by CONACYT and DGEST

Ciudad Madero, Tamaulipas, México.

*Workshop at MICAI*

Library

pp. 251–277, ISSN: 08856125

10.1007/978-3-540-85863-8

0-262-04219-3, EUA

*Intelligent Optimization*, Vol. 5851, pp. 104-118

Press, New York, USA, pp. 80-91

*Computing*, Vol. 6, No. 2, (January 2006), pp. 119-38.

**8. Acknowledgment** 

**9. References** 

contextual area. The decision of the most appropriate actions requires advanced Artificial Intelligence Technique to satisfy a plethora of application domains in which interaction and conclusive results are needed. This only is possible with Intelligent Systems equipped with high processing speed, knowledge bases and an innovative model for designing experiments, something will happen in this decade.

#### **7. Conclusions**

Many real world problems belong to a special class of problems called NP-hard, which means that there are no known efficient algorithms to solve them exactly in the worst case. The specialized literature offers a variety of heuristic algorithms, which have shown satisfactory performance. However, despite the efforts of the scientific community in developing new strategies, to date, there is no an algorithm that is the best for all possible situations. The design of appropriate algorithms to specific conditions is often the best option. In consequence, several approaches have emerged to deal with the algorithm selection problem. We review hyper-heuristics and meta-learning; both related and promising approaches.

Meta-learning, through machine learning methods like clustering and classification, is a well-established approach of selecting algorithms, particularity to solve hard optimization problems. Despite this, comparisons and evaluations of machine learning methods to build algorithm selector is not a common practice. We compared three machine learning techniques for algorithm selection on standard data sets. The experimental results revealed in general, a high performance with respect to a random algorithm selector, but low perform with respect to other classification tasks. We identified that the Self-Organising Neural Network is a promising method for selection; it could reaches 100% of accuracy when feedback was incorporated and the number of problem characteristics was the minimum.

On the other hand, hyper-heuristics offers a general framework to design algorithms that ideally can select and generate heuristics adapted to a particular problem instance. We use this approach to automatically select, among basic-heuristics, the most promising to adjust a parameter control of an Ant Colony Optimization algorithm for routing messages. The adaptive parameter tuning with hyper-heuristics is a recent open research.

In order to get a bigger picture of the algorithm performance we need to know them in depth. However, most of the algorithmic performance studies have focused exclusively on identifying sets of instances of different degrees of difficulty; in reducing the time needed to resolve these cases and reduce the solution errors; in many cases following the strategy "the -winner takes-all". Although these are important goals, most approaches have been quite particular. In that sense, statistical methods and machine learning will be an important element to build performance models for understanding the relationship between the characteristics of optimization problems, the search space that defines the behavior of algorithms that solve, and the final performance achieved by these algorithms. We envision that the knowledge gained, in addition to supporting the growth of the area, will be useful to automate the selection of algorithms and refine algorithms; hiper-heuristics, hybridization, and meta-learning go in the same direction and can complement each other.

#### **8. Acknowledgment**

This research was supported in part by CONACYT and DGEST

#### **9. References**

98 Intelligent Systems

contextual area. The decision of the most appropriate actions requires advanced Artificial Intelligence Technique to satisfy a plethora of application domains in which interaction and conclusive results are needed. This only is possible with Intelligent Systems equipped with high processing speed, knowledge bases and an innovative model for designing

Many real world problems belong to a special class of problems called NP-hard, which means that there are no known efficient algorithms to solve them exactly in the worst case. The specialized literature offers a variety of heuristic algorithms, which have shown satisfactory performance. However, despite the efforts of the scientific community in developing new strategies, to date, there is no an algorithm that is the best for all possible situations. The design of appropriate algorithms to specific conditions is often the best option. In consequence, several approaches have emerged to deal with the algorithm selection problem. We review hyper-heuristics and meta-learning; both related and

Meta-learning, through machine learning methods like clustering and classification, is a well-established approach of selecting algorithms, particularity to solve hard optimization problems. Despite this, comparisons and evaluations of machine learning methods to build algorithm selector is not a common practice. We compared three machine learning techniques for algorithm selection on standard data sets. The experimental results revealed in general, a high performance with respect to a random algorithm selector, but low perform with respect to other classification tasks. We identified that the Self-Organising Neural Network is a promising method for selection; it could reaches 100% of accuracy when feedback was incorporated and the number of problem characteristics was the minimum.

On the other hand, hyper-heuristics offers a general framework to design algorithms that ideally can select and generate heuristics adapted to a particular problem instance. We use this approach to automatically select, among basic-heuristics, the most promising to adjust a parameter control of an Ant Colony Optimization algorithm for routing messages. The

In order to get a bigger picture of the algorithm performance we need to know them in depth. However, most of the algorithmic performance studies have focused exclusively on identifying sets of instances of different degrees of difficulty; in reducing the time needed to resolve these cases and reduce the solution errors; in many cases following the strategy "the -winner takes-all". Although these are important goals, most approaches have been quite particular. In that sense, statistical methods and machine learning will be an important element to build performance models for understanding the relationship between the characteristics of optimization problems, the search space that defines the behavior of algorithms that solve, and the final performance achieved by these algorithms. We envision that the knowledge gained, in addition to supporting the growth of the area, will be useful to automate the selection of algorithms and refine algorithms; hiper-heuristics, hybridization, and meta-learning go in the same direction and can

adaptive parameter tuning with hyper-heuristics is a recent open research.

experiments, something will happen in this decade.

**7. Conclusions** 

promising approaches.

complement each other.


Algorithm Selection: From Meta-Learning to Hyper-Heuristics 101

Kotthoff, L.; Gent, I. & Miguel I. (2011). A Preliminary Evaluation of Machine Learning in

http://www.aaai.org/ocs/index.php/SOCS/SOCS11/paper/view/4006 Lagoudakis, M. & Littman, M. (2000). Algorithm Selection Using Reinforcement Learning.

Lagoudakis, M. & Littman, M. (2001). Learning to select branching rules in the dpll

Lawler, E.; Lenstra, J.; Rinnooy, K. & Schmoys, D. (1985). *The Traveling Salesman Problem: A Guided Tour of Combinatorial Optimization*. John Wiley & Sons, New York, USA Leyton-Brown, K.; Nudelman, E.; Andrew, G.; McFadden, J. & Shoham, Y. (2003). A

Li, J.; Skjellum, A. & Falgout, R. (1997). A Poly-Algorithm for Parallel Dense Matrix

Lobjois, L. & Lemâitre, M. (1998). Branch and bound algorithm selection by performance

Ludermir, T.B.; Ricardo B. C. Prudêncio, R.B.C; Zanchettin, C. (2011). Feature and algorithm

Madani, O.; Raghavan, H. & Jones, R. (2009). On the Empirical Complexity of Text

Malitsky, Y., Sabharwal, A., Samulowitz, H., & Sellmann M. (2011). Non-Model-Based

Michlmayr, E. (2007). *Ant Algorithms for Self-Organization in Social Networks*. PhD thesis, Women's Postgraduate College for Internet Technologies (WIT), Vienna, Austria Nascimento, A. C. A., Prudencio, R. B. C., Costa, I. G., & de Souto, M. C. P. (2009). Mining

Nikolić, M.; Marić, F. & Janičić, P. (2009). Instance-Based Selection of Policies for SAT

O'Mahony, E., Hebrard, E., Holland, A., Nugent, C., & O'Sullivan, B. (2009). Using Case-

*Theory and Applications of Satisfiability Testing*, Ann Arbor, June 2011 Messelis, T.; Haspeslagh, S.; Bilgin, B.; De Causmaecker, P. & Vanden Berghe, G. (2009).

http://www.aaai.org/Papers/AAAI/1998/AAAI98-050.pdf

Classification Problems. *SRI AI Center Technical Report*

*Networks (ICANN)*, Cyprus, September 2009

*Testing*, Vol. 5584, pp. 326-340

(Ed.), AAAI Press, pp. 511–518

*on artificial intelligence*, Vol. 18, pp. 1542-3

*and Experience*, Vol. 9, No. 5, pp. 345-389

*Systems* ,Vol. 8, No. 3, pp. 115-116

pp. 177-183

2001), pp. 344-359

Algorithm Selection for Search Problems. In: *AAAI Publications, Fourth International Symposium on Combinatorial Search (SoCS)*, Borrajo, Daniel and Likhachev, Maxim and López, Carlos Linare, pp. 84-91, AAAI Press, Retrieved from:

*Proceedings of the Sixteenth International Conference on Machine Learning*. P. Langley

procedure for satisfiability. *Electronic Notes in Discrete Mathematics*, Vol. 9, (June

portfolio approach to algorithm selection. *Proceedings of International joint conference* 

Multiplication on Two-Dimensional Process Grid Topologies. *Concurrency, Practice* 

prediction. In: *AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence*, Jack Mostow, Charles Rich, Bruce Buchanan, pp. 353-358, AAAI Press, Retrieved from:

selection with Hybrid Intelligent Techniques. *International Journal Hybrid Intelligent* 

Algorithm Portfolios for SAT, *Proceedings of the 14th international conference on* 

Towards prediction of algorithm performance in real world optimization problems. *Proceedings of the 21st Benelux Conference on Artificial Intelligence*, BNAIC, Eindhoven,

rules for the automatic selection process of clustering methods applied to cancer gene expression data, *Proceedings of 19th International Conference on Artificial Neural* 

Solvers. *Lecture Notes in Computer Science, Theory and Applications of Satisfiability* 

based Reasoning in an Algorithm Portfolio for Constraint Solving. (2008).

Duarte, A., Pantrigo, J., Gallego, M. (2007). *Metaheurísticas,* Ed. Dykinson S.L. España


Ducatelle, F., & Levine, J. (2001). Ant Colony Optimisation for Bin Packing and Cutting

Fink, E. (1998). How to solve it automatically: Selection among Problem-Solving methods.

Frost, D.; Rish, I. & Vila, L. (1997). Summarizing CSP hardness with continuous probability

Garrido, P. & Castro C. (2009). Stable solving of cvrps using hyperheuristics. *Proceedings of* 

Gent, I.; Macintyre, E.; Prosser, P. & Walsh, T. (1997). The Scaling of Search Cost. In:

Guo, H. & Hsu, W. (2004). A Learning-based Algorithm Selection Meta-reasoner for the

Hernández P. (2010). *Método Adaptativo para el Ajuste de Parámetros de un Algoritmo Evolutivo* 

Hilario, M., & Kalousis, A. (2000). Building algorithm profiles for prior model selection in

Hollander, M. & Wolfe, D. (1973). *Non-parametric Statistical Methods*. John Wiley and Sons.

Hoos, H. & Stutzle, T. (2000). Systematic vs. Local Search for SAT. *Journal of Automated* 

Houstis, E.; Catlin, A. & Rice, J. (2002). PYTHIA-II: A Knowledge/Database System for

Hutter, F.; Hamadi, Y.; Hoos, H. & Leyton-Brown, K. (2006). Performance prediction and

Kadioglu, S., Malitsky, Y., Sabharwal, A., Samulowitz, H., & Sellmann, M. (2011). Algorithm

*and Practice of Constraint Programming (CP2011)*, Italy, September 2011 Kalousis, A., & Theoharis, T. (1999). NOEMON: Design, implementation and performance

Instituto Tecnológico de Ciudad Madero, Tamaulipas, México

*y Sistemas* Vol. 13, No. 4, pp (433-448), ISSN 1405-5546

(Ed.), Springer-Verlag Vol. 3339, pp. 307-318

Rice's 65th birthday, Vol. 26, No. 2, (June 2000)

Vol. 3, No. 5, pp. 319-337, ISSN: 1088467X

Stock Problems. *Proceedings* of the UK Workshop on Computational Intelligence,

distributions. *Proceedings of the 14th National Conference on AI*, American Association

*the 11th Annual conference on Genetic and evolutionary computation (GECCO'09)*, ACM,

*Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence,* pp. 315-320, AAI Press, Retrieved from: https://www.aaai.org/Papers/AAAI/1997/AAAI97-049.pdf Gómez, C.G., Cruz, L., Meza, E., Schaeffer, E. & Castilla, G.(2010). A Self-Adaptive Ant

Colony System for Semantic Query Routing Problem in P2P Networks. *Computación* 

Real-time MPE Problem. *Proceedings of the 17th Australian Joint Conference on Artificial Intelligence*, Cairns, Australian, Dec 2004, G. I. Webb and Xinghuo Yu

*Hiperheurístico*. Master's thesis, División de Estudio de Posgrado e Investigación del

knowledge discovery systems. *International Journal of Engineering Intelligent Systems for Electrical Engineering and Communications*, Vol. 8, No. 2, 2000, pp. 77-88, ISSN:

Managing Performance Data and Recommending Scientific Software, ACM Transactions on Mathematical Software (TOMS) - Special issue in honor of John

automated tuning of randomized and parametric algorithms. *Lecture Notes in Computer Science, Principles and Practice of Constraint Programming*, Vol. 4204, pp.

Selection and Scheduling, *Proceedings of the 17th International Conference on Principles* 

results of an intelligent assistant for classifier selection, *Intelligent Data Analysis*,

Duarte, A., Pantrigo, J., Gallego, M. (2007). *Metaheurísticas,* Ed. Dykinson S.L. España

Edinburgh

09691170

213-228

New York, USA

*Reasoning*, Vol. 24, pp. 421-481

*Proceedings of ICAPS 1998*, pp. 128–136

for Artificial Intelligence, pp. 327-333

Montreal, Canada, July 2009


**5** 

*Italy* 

**Experiences and Obstacles in Industrial** 

*1Politecnico di Torino, Dipartimento di Elettronica e Telecomunicazioni, Torino,* 

*Neural networks* and *fuzzy systems* are well known soft computing techniques, which date back several decades since the preliminary work of McCulloch and Pitts, Grossberg, Zadeh, and dozens of other precursors. At first, the neural network was believed to be "simple and workable solution" for all the difficult problems can be dealt with, and then gave rise to a broad interest in research around the world and garnered a lot of funding. During this preliminary period, many theories have been developed, analyzed and

Later, the domain of neural networks and fuzzy systems has broadened and also many other algorithms and methods have been collected under the term of *Soft Computing* and, more general, *Intelligent Systems*. These include, among others, neural networks, fuzzy logic, wavelet networks, genetic algorithms, expert systems, etc... It was then discovered that several simple problems (the so-called "toy problems") actually found very simple solutions using intelligent systems. On the other hand, difficult problems (for example, handwriting recognition and most problems of industrial relevance), still could not be completely resolved, even if intelligent systems could contribute to simplify their

Today, after several decades of alternating interest of the scientific and industrial community, after the publication of tens of thousands of theoretical and practical papers, and after several attempts to apply them in a large number of application domains, intelligent systems are now reaching a rather *mature phase*. People have begun to understand the real capabilities, potentials, limitations and disadvantages, so they are on the right path towards a widespread adoption, without excessive and inappropriate enthusiasm, but also,

This chapter attempts to analyze the actual level of maturity and acceptance achieved by intelligent systems and attempts to assess how, where and why they are (or can be) accepted in the industry. Note that, although the focus is on *industrial applications*, this term generally applies also to several other real-world applications such as agronomy, economics,

more importantly, with a good rationale for their use.

mathematics, weather forecasting, etc.

**1. Introduction** 

applied.

solution.

**Applications of Intelligent Systems** 

*2Scuola Superiore Sant'Anna, TeCIP Institute, PERCRO, Pisa,* 

Leonardo M. Reyneri1 and Valentina Colla2

*Proceedings of The 19th Irish Conference on Artificial Intelligence and Cognitive Science*, Ireland, August 2008

	- *Transactions on Evolutionary Computation*, Vol. 1, No. 1, pp. 67–82

### **Experiences and Obstacles in Industrial Applications of Intelligent Systems**

Leonardo M. Reyneri1 and Valentina Colla2 *1Politecnico di Torino, Dipartimento di Elettronica e Telecomunicazioni, Torino, 2Scuola Superiore Sant'Anna, TeCIP Institute, PERCRO, Pisa, Italy* 

#### **1. Introduction**

102 Intelligent Systems

Pérez, O.J., Pazos, R.A., Frausto, J., Rodríguez, G., Romero, D., Cruz, L. (2004). A Statistical

Pérez, J., Pazos, R.A., Vélez, L. Rodríguez, G. (2002). Automatic Generation of Control

Quiroz, M. (2009). *Caracterización de Factores de Desempeño de Algoritmos de Solución de BPP*. Master´s thesis, Instituto Tecnológico de Cd. Madero, Tamaulipas, México Reeves, C. (1993). *Modern heuristic techniques for combinatorial problems*. John Wiley & Sons,

Rice, J. R. (1976). The algorithm selection problem. *Advances in Computers*, Vol. 15, pp. 65-118 Rice, J.R. (1968). On the Construction of Poly-algorithms for Automatic Numerical Analysis.

Ruiz-Vanoye, J. A., Pérez, J., Pazos, R. A., Zarate, J. A., Díaz-Parra, O., & Zavala-Díaz, J. C.

Silverthorn, B. & Miikkulainen, R. (2010). Latent Class Models for Algorithm Portfolio Methods. *Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence* Smith-Miles, K. & Lopes, L. (2012). Measuring instance difficulty for combinatorial

Smith-Miles, K.; James, R.; Giffin, J. & Tu, Y. (2009). Understanding the relationship between

Wolpert, D. H. & Macready, W. G. (1997). No free lunch theorems for optimization. *IEEE* 

Xu, L.; Hutter, F.; Hoos, H. & Leyton-Brown, K. (2009). SATzilla2009: An automatic algorithm portfolio for SAT. In: *Solver description, 2009 SAT Competition*

Anthony Cohn, pp. 1197-1203, AAAI Press, Retrieved from: http://www.aaai.org/Papers/AAAI/2007/AAAI07-190.pdf

*Transactions on Evolutionary Computation*, Vol. 1, No. 1, pp. 67–82

*Computer Science and Software Technology*, Vol. 2, No. 2, ISSN: 0974-3898 Samulowitz, H. & Memisevic, R. (2007). Learning to solve QBF. In: *AAAI-07*, pp. 255-260, retrieved from: https://www.aaai.org/Papers/AAAI/2007/AAAI07-039.pdf Schiavinotto, T. & Stützle, T. (2007). A review of metrics on permutations for search

Ireland, August 2008

pp. 3143-3153

(May 2004) pp. 417-431, ISSN: 0302-9743

ISBN: 0-470-22079-1, New York, USA

(Ed.) Academic Press, Burlington, MA, pp. 301-313

*Science*, Vol. 2313, pp. 119-127

*Proceedings of The 19th Irish Conference on Artificial Intelligence and Cognitive Science*,

Approach for Algorithm Selection. *Lectures Notes in Computer Science*, Vol. 3059,

Parameters for the Threshold Accepting Algorithm. *Lectures Notes in Computer* 

*Interactive System for Experimental Applied Mathematics*, M. Klerer & J. Reinfelds,

(2009). Statistical Complexity Indicators Applied to the Vehicle Routing Problem with Time Windows for Discriminate Appropriately the Best Algorithm, *Journal of* 

landscape analysis. *Computers & Operations Research*, Vol. 34, No. 10, (October 2007),

optimization problems. *Computers & Operations Research*, in press (accepted 6/7/11)

scheduling problem structure and heuristic performance using knowledge discovery. In: *Learning and Intelligent Optimization,* LION-3, Vol. 3, Available from: lion.disi.unitn.it/intelligent-optimization/LION3/online\_proceedings/35.pdf Soares, C. & Pinto, J. (2003). Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results. *Machine Learning*, Vol. 50, No. 3, pp. 251-277 Streeter, M; Golovin, D. & Smith, S. F. (2007). Combining multiple heuristics online. In:

*AAAI 2007*, *Proceedings of the 22nd national conference on Artificial intelligence*, Vol. 22,

*Neural networks* and *fuzzy systems* are well known soft computing techniques, which date back several decades since the preliminary work of McCulloch and Pitts, Grossberg, Zadeh, and dozens of other precursors. At first, the neural network was believed to be "simple and workable solution" for all the difficult problems can be dealt with, and then gave rise to a broad interest in research around the world and garnered a lot of funding. During this preliminary period, many theories have been developed, analyzed and applied.

Later, the domain of neural networks and fuzzy systems has broadened and also many other algorithms and methods have been collected under the term of *Soft Computing* and, more general, *Intelligent Systems*. These include, among others, neural networks, fuzzy logic, wavelet networks, genetic algorithms, expert systems, etc... It was then discovered that several simple problems (the so-called "toy problems") actually found very simple solutions using intelligent systems. On the other hand, difficult problems (for example, handwriting recognition and most problems of industrial relevance), still could not be completely resolved, even if intelligent systems could contribute to simplify their solution.

Today, after several decades of alternating interest of the scientific and industrial community, after the publication of tens of thousands of theoretical and practical papers, and after several attempts to apply them in a large number of application domains, intelligent systems are now reaching a rather *mature phase*. People have begun to understand the real capabilities, potentials, limitations and disadvantages, so they are on the right path towards a widespread adoption, without excessive and inappropriate enthusiasm, but also, more importantly, with a good rationale for their use.

This chapter attempts to analyze the actual level of maturity and acceptance achieved by intelligent systems and attempts to assess how, where and why they are (or can be) accepted in the industry. Note that, although the focus is on *industrial applications*, this term generally applies also to several other real-world applications such as agronomy, economics, mathematics, weather forecasting, etc.

Experiences and Obstacles in Industrial Applications of Intelligent Systems 105

Despite the level of maturity they reached, intelligent systems still experience a lot of difficulties to be *accepted* by the industrial community, which still sees them as *academic experiments* or *bizarre techniques* and not as a powerful tool to solve their problems. Thus, *what still misses to a complete industrial maturity?* This will be analyzed in the following.

Knowledge and expertise on intelligent systems cannot easily be found in the industrial domain, at least in decision-making people (namely: *decision staff*, *businessmen* and *engineers*), except perhaps in the newest generation (these are still too few and not yet high enough in the decision-making stair). Decision-making people are the key actors for having intelligent systems accepted in the industry. On the other hand, the real experts in intelligent systems are those who have been trained for long time in the area of soft computing, but these often have *too little knowledge of the specific problem they are faced with*, therefore they might not

Personnel training is rather time consuming, therefore costly, for industry and it can seldom be afforded unless there is a reasonable guarantee to get appropriate returns. It must be remembered that adopting *any* novel method *may* offer advantages, but it *surely* costs money. The lack of good expertise, together with people laziness often leads to using oversized networks, oversized training sets, conservative choices for paradigms and learning coefficients, etc. Altogether, more complex (therefore more costly) networks, longer

At the beginning, neural networks, fuzzy systems and other soft computing techniques like wavelet networks, Bayesian classifiers, clustering methods, etc., were believed to be independent, although complementary, methods, which had to be analyzed and studied independently of each other. This caused an excessive effort to study, analyze, get familiar with a huge variety of methods and therefore to train personnel consequently. It was also believed that each paradigm had its characteristics and preferred application domains, such that a lot of experience was required to choose the best architecture for any application.

On the contrary, Reyneri (Reyneri, 1999) proved that most soft computing techniques are nothing but different *languages* for a few basic paradigms. For instance, he proved that Perceptrons, Adaline, Wavelet networks, linear transforms, and adaptive linear filters are equivalent to each other. Also Fuzzy logic, Radial Basis, Bayesian classifiers, Gaussian regressors, Kernel methods, Kohonen maps and fuzzy/hard c-means are equivalent methods, as well as Local-global networks, TKS fuzzy systems and gain scheduling

With a good use of such *neurofuzzy unification*, the number of independent paradigms reduces to as few as four. All known topologies for neural, fuzzy, wavelet, Bayesian, clustering paradigms, etc. and supervised or unsupervised training algorithms are, in practice, just particular implementations and interconnections of four elementary blocks, namely: i) *computing elements*; ii) *computing layers*; iii) *normalization layers* and iv) *sensitivity layers*. All traditional neurofuzzy paradigms are then nothing else than specific *languages*,

design and training time, less advantages; in conclusion, *less chance of acceptance*.

**2.1 Availability of expertise and training personnel** 

tackle the problem in the most appropriate or efficient way.

**2.2 The apparent diversity of neurofuzzy paradigms** 

each one being more appropriate to any given application.

controllers which are also equivalent.

### **2. Maturity level of intelligent systems**

As mentioned in the introduction, all soft computing and intelligent systems techniques suffered alternating periods of *acceptance* (due to the novelty and the promising preliminary results) and *rejection* (due to the acquired awareness of limits). Hundreds of algorithms, topologies, training rules have been: i) *conceived* and *developed*; ii) *tested*, *evaluated*, *tuned* and *optimized*; iii) *temporarily or partially abandoned* (>>90%); iv) *accepted* and *applied* to real problems (<<10%).

Most of the original theories have been nearly abandoned (like, for instance, *Hopfield networks* and *Boltzmann machines*, *glass spin theories*, *stochastic networks*, etc.) either because they could not offer reasonable performance or because they were too cumbersome to use. Other theories (like, for instance, *perceptrons*, *radial basis functions* and *fuzzy systems*) eventually reached widespread acceptance, since they are more viable.

What is then at present the level of maturity of intelligent systems? This can be evaluated from a series of clues such as:


Due to the maturity level they reached in about half a century from the preliminary works, intelligent systems now deserve to be in the *knowledge briefcase* of each engineer, economist, agronomist, scientist, etc. *together with, and at the same acceptance level of* several other basic techniques like algebra, statistics, geometry, etc.

A way to reach a widespread industrial acceptance is to avoid using statements like:

*I have used/developed a neural network for...*

but, instead:

*I have just developed a complex system with interacting signal pre-processor, neural network, user interface, a differential equation solver, a post processor, some sensor and actuator interface, etc.*

The major difference between the two approaches is which element(s) of a system receive(s) more attention by the designer. In the former statement, attention (therefore the design effort) is stressed on the presence of a neural network, which therefore improperly becomes the most relevant block. In the latter statement, the neural network takes its proper place, that is, at the same level as all the other system elements. In many cases, the blocks surrounding intelligent subsystem are the most complex to design and use.

An example for this is in the field of image processing and handwriting recognition, where a successful application relies much more on a proper image pre-processing (filtering, contrast enhancement, segmentation, labelling, skeletonization, etc.) than on the neurofuzzy processing itself.

Despite the level of maturity they reached, intelligent systems still experience a lot of difficulties to be *accepted* by the industrial community, which still sees them as *academic experiments* or *bizarre techniques* and not as a powerful tool to solve their problems. Thus, *what still misses to a complete industrial maturity?* This will be analyzed in the following.

#### **2.1 Availability of expertise and training personnel**

104 Intelligent Systems

As mentioned in the introduction, all soft computing and intelligent systems techniques suffered alternating periods of *acceptance* (due to the novelty and the promising preliminary results) and *rejection* (due to the acquired awareness of limits). Hundreds of algorithms, topologies, training rules have been: i) *conceived* and *developed*; ii) *tested*, *evaluated*, *tuned* and *optimized*; iii) *temporarily or partially abandoned* (>>90%); iv) *accepted* and *applied* to real

Most of the original theories have been nearly abandoned (like, for instance, *Hopfield networks* and *Boltzmann machines*, *glass spin theories*, *stochastic networks*, etc.) either because they could not offer reasonable performance or because they were too cumbersome to use. Other theories (like, for instance, *perceptrons*, *radial basis functions* and *fuzzy systems*)

What is then at present the level of maturity of intelligent systems? This can be evaluated

how many theories and paradigms *have been developed altogether*. This number should be

how many paradigms *have survived after maybe ten years*. This should be low, to

the *level of acquaintance* a typical engineer has with these techniques. This should be high

Due to the maturity level they reached in about half a century from the preliminary works, intelligent systems now deserve to be in the *knowledge briefcase* of each engineer, economist, agronomist, scientist, etc. *together with, and at the same acceptance level of* several other basic

*I have just developed a complex system with interacting signal pre-processor, neural network, user interface, a differential equation solver, a post processor, some sensor and actuator interface, etc.*

The major difference between the two approaches is which element(s) of a system receive(s) more attention by the designer. In the former statement, attention (therefore the design effort) is stressed on the presence of a neural network, which therefore improperly becomes the most relevant block. In the latter statement, the neural network takes its proper place, that is, at the same level as all the other system elements. In many cases, the blocks

An example for this is in the field of image processing and handwriting recognition, where a successful application relies much more on a proper image pre-processing (filtering, contrast enhancement, segmentation, labelling, skeletonization, etc.) than on the neurofuzzy

the *count of accepted industrial applications* should be significant (see sections 3, 4, 5).

A way to reach a widespread industrial acceptance is to avoid using statements like:

surrounding intelligent subsystem are the most complex to design and use.

eventually reached widespread acceptance, since they are more viable.

as high as possible, to ensure that no option has been forgotten;

minimize the knowledge one needs to learn (see section 2.2);

and it should be achieved quickly (see section 2.1);

techniques like algebra, statistics, geometry, etc.

*I have used/developed a neural network for...*

but, instead:

processing itself.

**2. Maturity level of intelligent systems** 

problems (<<10%).

from a series of clues such as:

Knowledge and expertise on intelligent systems cannot easily be found in the industrial domain, at least in decision-making people (namely: *decision staff*, *businessmen* and *engineers*), except perhaps in the newest generation (these are still too few and not yet high enough in the decision-making stair). Decision-making people are the key actors for having intelligent systems accepted in the industry. On the other hand, the real experts in intelligent systems are those who have been trained for long time in the area of soft computing, but these often have *too little knowledge of the specific problem they are faced with*, therefore they might not tackle the problem in the most appropriate or efficient way.

Personnel training is rather time consuming, therefore costly, for industry and it can seldom be afforded unless there is a reasonable guarantee to get appropriate returns. It must be remembered that adopting *any* novel method *may* offer advantages, but it *surely* costs money. The lack of good expertise, together with people laziness often leads to using oversized networks, oversized training sets, conservative choices for paradigms and learning coefficients, etc. Altogether, more complex (therefore more costly) networks, longer design and training time, less advantages; in conclusion, *less chance of acceptance*.

#### **2.2 The apparent diversity of neurofuzzy paradigms**

At the beginning, neural networks, fuzzy systems and other soft computing techniques like wavelet networks, Bayesian classifiers, clustering methods, etc., were believed to be independent, although complementary, methods, which had to be analyzed and studied independently of each other. This caused an excessive effort to study, analyze, get familiar with a huge variety of methods and therefore to train personnel consequently. It was also believed that each paradigm had its characteristics and preferred application domains, such that a lot of experience was required to choose the best architecture for any application.

On the contrary, Reyneri (Reyneri, 1999) proved that most soft computing techniques are nothing but different *languages* for a few basic paradigms. For instance, he proved that Perceptrons, Adaline, Wavelet networks, linear transforms, and adaptive linear filters are equivalent to each other. Also Fuzzy logic, Radial Basis, Bayesian classifiers, Gaussian regressors, Kernel methods, Kohonen maps and fuzzy/hard c-means are equivalent methods, as well as Local-global networks, TKS fuzzy systems and gain scheduling controllers which are also equivalent.

With a good use of such *neurofuzzy unification*, the number of independent paradigms reduces to as few as four. All known topologies for neural, fuzzy, wavelet, Bayesian, clustering paradigms, etc. and supervised or unsupervised training algorithms are, in practice, just particular implementations and interconnections of four elementary blocks, namely: i) *computing elements*; ii) *computing layers*; iii) *normalization layers* and iv) *sensitivity layers*. All traditional neurofuzzy paradigms are then nothing else than specific *languages*, each one being more appropriate to any given application.

Experiences and Obstacles in Industrial Applications of Intelligent Systems 107

For instance, the model of an *electric motor* can model nothing else than an *electric motor*, and its parameters represent, for instance, winding resistance and inductance, rotor inertia, friction, etc. which are directly measurable and for which the designer can feel if they assume reasonable values or not. By appropriately varying these parameters, the model will be adapted to either *large or small motors*, either *fast or slow*, but it will never be able to model, for instance, a *chemical process*. The designer can easily become aware that, for instance, an improperly tuned model has a too large or too narrow winding resistance in comparison with the size of the motor under examination. He can therefore immediately be aware of

On the other hand, intelligent systems are so generic that they can adapt to virtually any system, either electrical or chemical or economical or mechanical or agronomic, etc. The same parameters can therefore mean anything, depending on the actual use of the network (e.g. pollution of a chemical process, yield of a manufacturing process, winding resistance of a motor, rate of infection in an agricultural plant, etc.); in addition, parameters are interchangeable and there is no clue to understand what a given parameter really represents in practice. Further, nobody will ever be aware that training has not been done correctly and

The use of modern unification paradigms (Reyneri, 1999) allows to easily convert neural and wavelet networks into fuzzy systems and viceversa, with several advantages, among which,

 a given neuro/wavelet network can be converted into fuzzy language, thus interpreted linguistically by experts, who are then able to "validate" and consequently "accept" an

 human experience, usually expressed as a set of fuzzy rules, can be converted into a neural network and then empirically tuned by means of an appropriate training set. Fuzzy (or expert systems) rules are usually understandable by an expert, such as he/she can understand the "concept" which lays behind them. An appropriate neural training of the rules therefore allows to fine tune the expert's knowledge based on the

It is therefore mandatory to abandon all the older approaches who were more like "magic formulae" than real engineering methods and concentrate on modern approaches that consider neural, wavelet, fuzzy, Bayesian, regressor, clustering techniques, etc. as *interchangeable paradigms*. The ever lasting fight among neural- and fuzzy-people is so detrimental, as it helps to maintain the level of crypticity high, therefore preventing a

The choice between, for instance, *neural networks* and *fuzzy logic* should therefore be converted into a more appropriate selection between a *neural* and a *fuzzy language*, which should be chosen depending on: i) the available knowledge from human experts; ii) the size of available training set; iii) the availability of other piece of information on the problem; iv) the level of crypticity which is accepted; v) if and how the model has to be interpreted by

improper tuning or of some motor fault or damage and behave accordingly.

whether the model really represents a given system or not.

otherwise cryptic neuro/wavelet model;

available empirical evidence.

widespread acceptance of intelligent systems.

humans or processed by computers.

**3.2 How to avoid crypticity** 

for instance:

The efficient use of *neurofuzzy unification* would simplify personnel training, since: i) it helps the practitioner to quickly learn and get familiar with the very few basic paradigms; ii) it augments flexibility and performance of intelligent systems; iii) it therefore increases the economical return and iv) reducing the corresponding risks. Nevertheless, neurofuzzy unification is still far from being widely applied, for several historical reasons. The permanence of tens of apparently different paradigms often still creates *confusion*, *noise*, *disaffection*; it increases *personnel training costs* and reduces *advantages*; altogether it significantly reduces the *appeal of neurofuzzy systems*.

### **3. Relevant characteristics for industry**

In this section, we try to analyze here some of the reasons why intelligent systems still experience difficulties in being accepted as an industrial standard.

#### **3.1 Crypticity**

Many intelligent systems are often felt to be rather *cryptic*, in the sense that nobody can really understand why and how a trained network solves a given problem. Apart from the many theoretical proofs that an intelligent system is capable of solving a large variety of problems, the real industrially-relevant question is that all the knowledge of a trained network is hidden within a chunk of numbers, usually arranged into *weight* or *centre matrices* or *genomes*. There is usually no clue on how to interpret such "magic numbers", thus engineers are often sceptical in regards to their correctness, reliability or robustness.

In practice, correctness of weights is based on a successful training, although it is often difficult to either guarantee or feel that training has properly succeeded. Quality of training is measured on the amount of a residual *error measure*, but there is often no indication on which is an appropriate value for this error, especially when *sum-of-errors measures* are used, as in several commercial simulation tools. The user cannot reliably argue that a trained model is really representative of the desired system/function.

Furthermore, most training processes are often based on some amount of *randomness*, which is seldom appreciated in the industrial domain. On the other hand, traditional design methods (namely, those not using intelligent systems) are based on some predictable analytical or empirical model which is *chosen by the designer*, together with its parameters. Designer's *knowledge and experience* usually provide enough information to properly solve a problem, even though seldom in an optimal way. Nothing is apparently left to randomness.

In reality, the process of empirical adaptation of a given analytical or empirical model to a given system resembles the approach of training/adapting a soft computing system (which is nothing but a highly generic parametric model) based on a set of training data. Yet everybody considers the former as normal and straightforward, while most designers are still sceptical when facing the latter. Why is that so?

One of the reasons is that traditional (namely, non-intelligent) parametric models currently used in practice are much less generic than any soft computing models; therefore they are always under total control of the engineer, who is capable of properly interpreting parameters and values.

For instance, the model of an *electric motor* can model nothing else than an *electric motor*, and its parameters represent, for instance, winding resistance and inductance, rotor inertia, friction, etc. which are directly measurable and for which the designer can feel if they assume reasonable values or not. By appropriately varying these parameters, the model will be adapted to either *large or small motors*, either *fast or slow*, but it will never be able to model, for instance, a *chemical process*. The designer can easily become aware that, for instance, an improperly tuned model has a too large or too narrow winding resistance in comparison with the size of the motor under examination. He can therefore immediately be aware of improper tuning or of some motor fault or damage and behave accordingly.

On the other hand, intelligent systems are so generic that they can adapt to virtually any system, either electrical or chemical or economical or mechanical or agronomic, etc. The same parameters can therefore mean anything, depending on the actual use of the network (e.g. pollution of a chemical process, yield of a manufacturing process, winding resistance of a motor, rate of infection in an agricultural plant, etc.); in addition, parameters are interchangeable and there is no clue to understand what a given parameter really represents in practice. Further, nobody will ever be aware that training has not been done correctly and whether the model really represents a given system or not.

#### **3.2 How to avoid crypticity**

106 Intelligent Systems

The efficient use of *neurofuzzy unification* would simplify personnel training, since: i) it helps the practitioner to quickly learn and get familiar with the very few basic paradigms; ii) it augments flexibility and performance of intelligent systems; iii) it therefore increases the economical return and iv) reducing the corresponding risks. Nevertheless, neurofuzzy unification is still far from being widely applied, for several historical reasons. The permanence of tens of apparently different paradigms often still creates *confusion*, *noise*, *disaffection*; it increases *personnel training costs* and reduces *advantages*; altogether it

In this section, we try to analyze here some of the reasons why intelligent systems still

Many intelligent systems are often felt to be rather *cryptic*, in the sense that nobody can really understand why and how a trained network solves a given problem. Apart from the many theoretical proofs that an intelligent system is capable of solving a large variety of problems, the real industrially-relevant question is that all the knowledge of a trained network is hidden within a chunk of numbers, usually arranged into *weight* or *centre matrices* or *genomes*. There is usually no clue on how to interpret such "magic numbers", thus

In practice, correctness of weights is based on a successful training, although it is often difficult to either guarantee or feel that training has properly succeeded. Quality of training is measured on the amount of a residual *error measure*, but there is often no indication on which is an appropriate value for this error, especially when *sum-of-errors measures* are used, as in several commercial simulation tools. The user cannot reliably argue that a trained

Furthermore, most training processes are often based on some amount of *randomness*, which is seldom appreciated in the industrial domain. On the other hand, traditional design methods (namely, those not using intelligent systems) are based on some predictable analytical or empirical model which is *chosen by the designer*, together with its parameters. Designer's *knowledge and experience* usually provide enough information to properly solve a problem, even though seldom in an optimal way. Nothing is apparently left to randomness. In reality, the process of empirical adaptation of a given analytical or empirical model to a given system resembles the approach of training/adapting a soft computing system (which is nothing but a highly generic parametric model) based on a set of training data. Yet everybody considers the former as normal and straightforward, while most designers are

One of the reasons is that traditional (namely, non-intelligent) parametric models currently used in practice are much less generic than any soft computing models; therefore they are always under total control of the engineer, who is capable of properly interpreting

engineers are often sceptical in regards to their correctness, reliability or robustness.

significantly reduces the *appeal of neurofuzzy systems*.

experience difficulties in being accepted as an industrial standard.

model is really representative of the desired system/function.

still sceptical when facing the latter. Why is that so?

parameters and values.

**3. Relevant characteristics for industry** 

**3.1 Crypticity** 

The use of modern unification paradigms (Reyneri, 1999) allows to easily convert neural and wavelet networks into fuzzy systems and viceversa, with several advantages, among which, for instance:


It is therefore mandatory to abandon all the older approaches who were more like "magic formulae" than real engineering methods and concentrate on modern approaches that consider neural, wavelet, fuzzy, Bayesian, regressor, clustering techniques, etc. as *interchangeable paradigms*. The ever lasting fight among neural- and fuzzy-people is so detrimental, as it helps to maintain the level of crypticity high, therefore preventing a widespread acceptance of intelligent systems.

The choice between, for instance, *neural networks* and *fuzzy logic* should therefore be converted into a more appropriate selection between a *neural* and a *fuzzy language*, which should be chosen depending on: i) the available knowledge from human experts; ii) the size of available training set; iii) the availability of other piece of information on the problem; iv) the level of crypticity which is accepted; v) if and how the model has to be interpreted by humans or processed by computers.

Experiences and Obstacles in Industrial Applications of Intelligent Systems 109

sure that the author was really convinced of the optimality of his result, due perhaps to his limited experience on the specific application domain, which was not enough to judge.

What the author surely did was to try a number of different topologies, paradigms, network sizes, training algorithms and found that his own network was offering the very best performance among all those tests. All tests were neural and no test was performed according to the state-of-the-art using traditional approaches. This method is (partially) correct to *optimize the performance of a novel intelligent system* (namely, to find which are the best choices to get the best out of is, among all possible intelligent systems), but **not** to evaluate the *appropriateness of an intelligent system for the given application*, compared against a

What was true for that specific problem, was that the proposed hybrid empirical/analytical model developed by a team of experienced engineers and biologists offered a much better performance than the best existing neural network, even for a comparable computation complexity, not considering the possible performance of the state-of-the-art. The reason for that (which happens much more frequently than one can even imagine) is that human experience, knowledge and mental capacities, which are used to develop a given "nonintelligent" model, boost so much the overall performance of a given system than even an optimal intelligent system, trained in the best way but without using the available human

knowledge, cannot compensate ignoring human knowledge during its development.

An important step towards acceptance is to avoid unnecessary and inapplicable optimism. Any development, comparison or selection has to be *fair* and based on *real and well proven data*; never on *hypotheses*. Optimism usually tends to push the designer towards a solution which then proves less performing than originally expected, therefore convincing even more the decision-making people that intelligent systems are not yet a viable solution to their

An important step towards industrial acceptance (as for many other industrially relevant items) is the availability of an appropriate support to the development, use, integration,

An excellent intelligent system will never be applied until its use is straightforward and user-friendly. The only chance to have an intelligent system applied is therefore the development, around the intelligent system itself together with its surrounding elements (e.g. pre-processors, postprocessors, data mining, etc.), of an appropriate user interface and

the decision-making process in helping to choose the intelligent systems instead of any

the conduction phase, namely the nominal operation of the intelligent system, when

the preparation phase (e.g. data collection, training, tuning and testing)

maintenance, to overcome any problem might occur during conduction.

standard one.

problem.

**3.7 Tools and support** 

conduction and maintenance of the system.

development tool which supports, in the order:

applied to the industrial process under interest

other traditional system

**3.6 How to avoid optimism…** 

#### **3.3 Gathering data for network training**

Many intelligent systems (mostly, those based on *neural languages*) rely on the availability of *empirical data*, which is usually gathered into large *training sets*. Unfortunately, these are often too expensive to obtain, as each data point is usually an appropriate *measurement* of a mechanical or chemical or biological or economical process. Several processes are so slow that each point may require up to several days to be acquired. In some cases, if an accurate numerical model is available, computer simulations can substitute direct measurements.

Some soft computing techniques (in particular, those based on *fuzzy* and *Bayesian languages*) may require much smaller training sets, as they rely on a predefined model, described in *linguistic terms* according to *previous human experience*. This is the main reason why fuzzy logic has been accepted more quickly and extensively by industry than neural systems.

An industrial manager has to consider attentively the trade-off between the cost of gathering a large training set and the reliability of the trained neurofuzzy network. As already said, this trade-off often pushes towards the use of *fuzzy languages* whenever possible and bounds the use of *neural languages* to applications which have enough (historical) data available.

#### **3.4 Analytical vs. empirical methods**

As already said, one of the advantages of intelligent systems is that a given analytical/ empirical model is by definition specific and cannot be tailored to a different problem, while neural networks are. Furthermore an analytical/empirical model usually comes after years of improvements, while neural networks are trained in a short time. Yet, a purely analytical model can be developed without any field measurement, while an empirical model requires a limited amount of field measurement. Instead most intelligent systems always require a huge amount of field measurements which, in several cases, can take years to gather.

Last but not least, the amount of field measurements which is required (that is roughly the development time) is a function of the reliability which is asked to the model. A large training set is in fact mandatory in industry to offer an adequately high reliability, while reliability of analytical models is often independent of field measurements but relies on designer's experience.

#### **3.5 Performance is always optimistic...**

Virtually any paper published in literature shows that, for a "wide range of applications, neural networks and fuzzy systems offer tremendously good performance".

Unfortunately, more than 90% of them do not even try to afford a fair performance comparison with other *state-of-the-art* techniques and it becomes difficult to feel how good such performance really is. Just as an example, a paper (not cited) claimed that *the proposed neural model of a biochemical process is 90% accurate* and the author was *enthusiast* of that *incredible result*. Since the reviewer had little experience on modelling that specific process, he could not do anything else than blindly accept author's statement. But, when the paper was read by an experienced colleague, he pointed out that state of the art had already achieved about 95% since a few years, making those results useless for industry. It is quite sure that the author was really convinced of the optimality of his result, due perhaps to his limited experience on the specific application domain, which was not enough to judge.

What the author surely did was to try a number of different topologies, paradigms, network sizes, training algorithms and found that his own network was offering the very best performance among all those tests. All tests were neural and no test was performed according to the state-of-the-art using traditional approaches. This method is (partially) correct to *optimize the performance of a novel intelligent system* (namely, to find which are the best choices to get the best out of is, among all possible intelligent systems), but **not** to evaluate the *appropriateness of an intelligent system for the given application*, compared against a standard one.

What was true for that specific problem, was that the proposed hybrid empirical/analytical model developed by a team of experienced engineers and biologists offered a much better performance than the best existing neural network, even for a comparable computation complexity, not considering the possible performance of the state-of-the-art. The reason for that (which happens much more frequently than one can even imagine) is that human experience, knowledge and mental capacities, which are used to develop a given "nonintelligent" model, boost so much the overall performance of a given system than even an optimal intelligent system, trained in the best way but without using the available human knowledge, cannot compensate ignoring human knowledge during its development.

#### **3.6 How to avoid optimism…**

108 Intelligent Systems

Many intelligent systems (mostly, those based on *neural languages*) rely on the availability of *empirical data*, which is usually gathered into large *training sets*. Unfortunately, these are often too expensive to obtain, as each data point is usually an appropriate *measurement* of a mechanical or chemical or biological or economical process. Several processes are so slow that each point may require up to several days to be acquired. In some cases, if an accurate numerical model is available, computer simulations can substitute direct

Some soft computing techniques (in particular, those based on *fuzzy* and *Bayesian languages*) may require much smaller training sets, as they rely on a predefined model, described in *linguistic terms* according to *previous human experience*. This is the main reason why fuzzy logic has been accepted more quickly and extensively by industry than neural systems.

An industrial manager has to consider attentively the trade-off between the cost of gathering a large training set and the reliability of the trained neurofuzzy network. As already said, this trade-off often pushes towards the use of *fuzzy languages* whenever possible and bounds the use of *neural languages* to applications which have enough (historical) data available.

As already said, one of the advantages of intelligent systems is that a given analytical/ empirical model is by definition specific and cannot be tailored to a different problem, while neural networks are. Furthermore an analytical/empirical model usually comes after years of improvements, while neural networks are trained in a short time. Yet, a purely analytical model can be developed without any field measurement, while an empirical model requires a limited amount of field measurement. Instead most intelligent systems always require a

Last but not least, the amount of field measurements which is required (that is roughly the development time) is a function of the reliability which is asked to the model. A large training set is in fact mandatory in industry to offer an adequately high reliability, while reliability of analytical models is often independent of field measurements but relies on

Virtually any paper published in literature shows that, for a "wide range of applications,

Unfortunately, more than 90% of them do not even try to afford a fair performance comparison with other *state-of-the-art* techniques and it becomes difficult to feel how good such performance really is. Just as an example, a paper (not cited) claimed that *the proposed neural model of a biochemical process is 90% accurate* and the author was *enthusiast* of that *incredible result*. Since the reviewer had little experience on modelling that specific process, he could not do anything else than blindly accept author's statement. But, when the paper was read by an experienced colleague, he pointed out that state of the art had already achieved about 95% since a few years, making those results useless for industry. It is quite

neural networks and fuzzy systems offer tremendously good performance".

huge amount of field measurements which, in several cases, can take years to gather.

**3.3 Gathering data for network training** 

**3.4 Analytical vs. empirical methods** 

**3.5 Performance is always optimistic...** 

designer's experience.

measurements.

An important step towards acceptance is to avoid unnecessary and inapplicable optimism. Any development, comparison or selection has to be *fair* and based on *real and well proven data*; never on *hypotheses*. Optimism usually tends to push the designer towards a solution which then proves less performing than originally expected, therefore convincing even more the decision-making people that intelligent systems are not yet a viable solution to their problem.

#### **3.7 Tools and support**

An important step towards industrial acceptance (as for many other industrially relevant items) is the availability of an appropriate support to the development, use, integration, conduction and maintenance of the system.

An excellent intelligent system will never be applied until its use is straightforward and user-friendly. The only chance to have an intelligent system applied is therefore the development, around the intelligent system itself together with its surrounding elements (e.g. pre-processors, postprocessors, data mining, etc.), of an appropriate user interface and development tool which supports, in the order:


in Fig. 1.

*processing, data collection, result presentation, etc.* 

a) b)

and quality of the cast products.

labelled SOM-based classifier.

**4.2 Prediction of malfunctioning during steel casting** 

Experiences and Obstacles in Industrial Applications of Intelligent Systems 111

network having as inputs the contents of some chemical elements and some of the previously predicted hardenability values, according to the schematic description provided

*This application proved to be successful because: i) results were accurate and accuracy could easily be measured; ii) the intelligent systems (a neural network in both cases) was very simple, with few weights, and these could easily be interpreted by the technician; iii) the neural predictor has been coupled with a user-friendly software interface allowing not only to run the model, but also to collect data and re-train all the neural networks with new data provided by the user, so that each steel company can progressively "specialise" the predictor on its own steel grades; iv) training time for using the software tool which was developed was very short. It is worth noting that, as pointed out at the beginning, the neural network itself is just a small element of the whole system (software tool, pre-*

In the standard steelmaking practice, during continuous casting, the liquid material produced in the blast furnace is cast, after some manufacture, into the ladle and, subsequently, into the tundish (see Fig. 2). On the bottom of the tundish, some nozzles are located, through which the liquid steel passes into the mould or strip casters. The section of such nozzles is far smaller with respect to the tundish dimensions. When particular steel grades are produced, some alumina precipitation on the entry and on the lateral surface of these nozzles can partially or even totally block the flow of the liquid steel. This phenomenon is commonly known as *clogging* and is highly detrimental to casting reliability

Fig. 2. a) Location of the nozzles that can be occluded; b) Schematic description of the

The clogging phenomenon is still not deeply understood (Heesom 1988), due to the very high number of chemical and process factors affecting the occurrence of the precipitation of the materials on the nozzle internal surface as well as to the impossibility of installing complex systems of probes and sensors in order to closely observe the phenomenon itself. For this reason, some attempts have been performed to apply intelligent systems for the prediction of clogging occurrence on the basis of the steel chemical composition and on the process parameters. In particular, such as it can be frequently found in fault diagnosis applications, the prediction of clogging has been faced as a binary classification problem

### **4. Case studies**

This chapter will present a number of real industrial applications where all the aspects described and commented in the previous sections have been applied. Most of them come from our personal experience, as acquiring enough, reliable and trustable details from other people is usually difficult.

#### **4.1 Prediction of Jominy profiles of steels**

The *Jominy profile* of a steel is a curve obtained in a test, where a small cylindrical specimen of steel is kept at a very high temperature (usually more than 1500 °C) and one end of the specimen is cooled by quenching it for at least 10 min. in a water stream, while the other specimen end is cooled in air. This treatment causes a cooling rate gradient to develop over the length of the specimen, with the highest cooling rate corresponding to the quenched end. This procedure affects the steel micro-structure along the length of the specimen and, as a consequence, the steel hardness in the diverse portions of the test bar. The Jominy profile is built by measuring the specimen hardness values *hi* on the Rockwell C scale at increasing distances *di* from the quenched end. Several studies investigated the correlation between the shape of such curve and the steel chemistry (Doane & Kirkaldy, 1978) and some of them applied neural networks to this purpose, such as (Vermeulen et al., 1996).

In particular Colla et al. (Colla et al., 2000) propose a parametric characterization of the profile, namely the approximation of the generic profile with a parametric curve, and then predict the shape of each profile through a neural network which links the steel chemistry to the curve parameters.

This approach proved to be successful when the "shape" of the profile is constant (which happens, for instance, when dealing with the same steel grades produced by one single manufacturer). On the other hand, when facing the prediction of the Jominy curve of many different steel grades manufactured by different steel producers, the actual shape of the curve might considerably vary and the parametric approach is no more successful.

Fig. 1. Conceptual scheme of the sequential predictor of Jominy curves.

In fact, a different approach to the same problem has been proposed by Marin et al. (Marin et al. 2007), where a neural sequential predictor has been proposed: here, apart from the first two points of the curve (i.e. the ones corresponding to the lowest distance values from the quenched end), each single point of the Jominy profile is singularly predicted by a neural

This chapter will present a number of real industrial applications where all the aspects described and commented in the previous sections have been applied. Most of them come from our personal experience, as acquiring enough, reliable and trustable details from other

The *Jominy profile* of a steel is a curve obtained in a test, where a small cylindrical specimen of steel is kept at a very high temperature (usually more than 1500 °C) and one end of the specimen is cooled by quenching it for at least 10 min. in a water stream, while the other specimen end is cooled in air. This treatment causes a cooling rate gradient to develop over the length of the specimen, with the highest cooling rate corresponding to the quenched end. This procedure affects the steel micro-structure along the length of the specimen and, as a consequence, the steel hardness in the diverse portions of the test bar. The Jominy profile is built by measuring the specimen hardness values *hi* on the Rockwell C scale at increasing distances *di* from the quenched end. Several studies investigated the correlation between the shape of such curve and the steel chemistry (Doane & Kirkaldy, 1978) and some

of them applied neural networks to this purpose, such as (Vermeulen et al., 1996).

curve might considerably vary and the parametric approach is no more successful.

Fig. 1. Conceptual scheme of the sequential predictor of Jominy curves.

In particular Colla et al. (Colla et al., 2000) propose a parametric characterization of the profile, namely the approximation of the generic profile with a parametric curve, and then predict the shape of each profile through a neural network which links the steel chemistry to

This approach proved to be successful when the "shape" of the profile is constant (which happens, for instance, when dealing with the same steel grades produced by one single manufacturer). On the other hand, when facing the prediction of the Jominy curve of many different steel grades manufactured by different steel producers, the actual shape of the

In fact, a different approach to the same problem has been proposed by Marin et al. (Marin et al. 2007), where a neural sequential predictor has been proposed: here, apart from the first two points of the curve (i.e. the ones corresponding to the lowest distance values from the quenched end), each single point of the Jominy profile is singularly predicted by a neural

**4. Case studies** 

people is usually difficult.

the curve parameters.

**4.1 Prediction of Jominy profiles of steels** 

network having as inputs the contents of some chemical elements and some of the previously predicted hardenability values, according to the schematic description provided in Fig. 1.

*This application proved to be successful because: i) results were accurate and accuracy could easily be measured; ii) the intelligent systems (a neural network in both cases) was very simple, with few weights, and these could easily be interpreted by the technician; iii) the neural predictor has been coupled with a user-friendly software interface allowing not only to run the model, but also to collect data and re-train all the neural networks with new data provided by the user, so that each steel company can progressively "specialise" the predictor on its own steel grades; iv) training time for using the software tool which was developed was very short. It is worth noting that, as pointed out at the beginning, the neural network itself is just a small element of the whole system (software tool, preprocessing, data collection, result presentation, etc.* 

#### **4.2 Prediction of malfunctioning during steel casting**

In the standard steelmaking practice, during continuous casting, the liquid material produced in the blast furnace is cast, after some manufacture, into the ladle and, subsequently, into the tundish (see Fig. 2). On the bottom of the tundish, some nozzles are located, through which the liquid steel passes into the mould or strip casters. The section of such nozzles is far smaller with respect to the tundish dimensions. When particular steel grades are produced, some alumina precipitation on the entry and on the lateral surface of these nozzles can partially or even totally block the flow of the liquid steel. This phenomenon is commonly known as *clogging* and is highly detrimental to casting reliability and quality of the cast products.

Fig. 2. a) Location of the nozzles that can be occluded; b) Schematic description of the labelled SOM-based classifier.

The clogging phenomenon is still not deeply understood (Heesom 1988), due to the very high number of chemical and process factors affecting the occurrence of the precipitation of the materials on the nozzle internal surface as well as to the impossibility of installing complex systems of probes and sensors in order to closely observe the phenomenon itself.

For this reason, some attempts have been performed to apply intelligent systems for the prediction of clogging occurrence on the basis of the steel chemical composition and on the process parameters. In particular, such as it can be frequently found in fault diagnosis applications, the prediction of clogging has been faced as a binary classification problem

Experiences and Obstacles in Industrial Applications of Intelligent Systems 113

 *<sup>c</sup> n <sup>P</sup> <sup>A</sup> Log <sup>B</sup>*

where *Pc* is a constant pressure value in the range [1, 1.5] atm, *A* and *B* are constants whose nominal values are respectively, *A=1895 °K* and *B=1.6* and *n* is commonly assumed to be

Equation (1) can easily be inverted in order to predict *[C]* from *[O]* and *T*, but the prediction obtained using the nominal values of the constant parameters is not very reliable compared to the *[C]* measurements contained in a dataset provided by the steel manufacturer for the steel grades of interest. A reliable prediction of the final Carbon content at the end of the refining process is very important, as it allows to evaluate the process parameters (such as the amount of inflated Oxygen and the duration of the refining process) required to achieve the desired results optimizing the time and cost of the production process. By adopting a simple two-layer MLP with 3 neurons in the hidden layer, the prediction error has been

*This system presents the following advantages: i) the performance is acceptable, ii) the neural network is very simple; iii) the training time is negligible. However the neural model has been not very well perceived by the end-users mainly because it is difficult to attribute a precise physical meaning and interpretation to the network parameters, such as it happens for the parameters of the formula (1).* 

Therefore the alternative solution of a fine tuning of the physical parameters around the

The efficiency and productivity of steel hot rolling mills is heavily affected by the possibility of precisely estimating when the different manufacturing stages are completed, as this avoids bottlenecks and provides important time and energy savings. For this reason, several Mill Pacing Control (MPC) systems have been realised and implemented, which allow optimising the production flow starting from the reheating furnaces, where slabs are heated at a temperature between 1100°C and 1300°C for optimal workability before being rolled. Hot steel rolling mills are usually composed of two main stages, namely the *roughing mill*, where the slab is firstly compressed, and the *finishing* mill, where the aimed thickness of the hot rolled coil is reached. A further rolling stage can be afterwards required, named *cold rolling*, which is pursued at far lower temperatures in order to produce flat products, such as

MPC systems allow shorten the discharging interval between two subsequent slabs avoiding collisions. To this aim, schedule systems are developed and simulations are

Colla et al. (Colla et al., 2010) applied neural networks to solve a particular mill pacing problem, different from the usual one, namely the prediction of the total roughing time and of the time required for passing the first gauge of the finishing mill. This investigation has been pursued in order to increase the rolling efficiency and decrease the total rolling time. The slabs that are subsequently rolled can differ in steel grade and other features, thus the related rolling processes can require different times and energy amounts. The time required

performed in order to test new strategies without affecting the production cycle.

unitary, but some literature results provide *n≈0.5* for *[C]<0.08%* and *n≈1* otherwise.

reduced of 64% with respect to the prediction obtained through eq.(1).

**4.4 Prediction of the time required by each stage of hot rolling mills** 

nominal values has finally been preferred.

plate, sheets or coils of various thicknesses.

*C O <sup>T</sup>* (1)

where one of the two classes to be distinguished (i.e. the one corresponding to malfunctioning) is far less frequent than the other one. Firstly Self Organising Maps (SOM) have been applied (Colla et al., 2006) in order to predict the clogging occurrence, in parallel with a physical model that takes into account some basic mechanism of the alumina precipitation and the geometry of the nozzles, but is not capable to explain all the complex relationships between process and chemical variables.

*The performance of the overall system are acceptable (this is also a key element in industry: the aim is seldom to optimize but often to achieve any performance better than a given threshold in a limited time) and the system has been successfully applied in the industrial context, mainly because: i) after a short testing, the prediction accuracy was proven to be higher; ii) risk was little as the traditional approach could be used to crosscheck the predictions of the intelligent system; iii) the availability of a simplified end-user interface reduced personnel training to the minimum, allowing the operator to input the relevant process parameters and obtain immediate indication of the actual danger of clogging occurrence and the potential countermeasure to adopt (i.e. Calcium Oxide addition to the liquid steel) (Fera et al. 2005).* 

Improvements are also possible, by taking into account the different importance of misclassification errors. In fact the erroneous classification of a faulty situation as a normal one (sometimes called *missed detection*) prevents the operators to develop suitable countermeasures to avoid the clogging, with potential heavier consequences with respect to the opposite case, when a potential unnecessary warning message is raised in a standard condition (the so-called *false alarm*). Actually standard classifiers are not always capable of providing excellent results when dealing with imbalanced datasets: therefore in (Vannucci & Colla 2011) a classified has been applied, which is explicitly designed to cope with imbalanced datasets and exploits the labelled SOMs, according to the scheme depicted in Fig. 2.b. Once trained, each neuron of the SOM is labelled as corresponding to the frequent or infrequent class through a procedure that exploits a Fuzzy Inference System in order to find a suitable compromise between missed detection and false alarms rates.

#### **4.3 Prediction of the end-point in the converter**

In the integrated steelmaking cycle, where steel is produced from primary raw materials, the Blast Oxygen Furnace (BOF) is the plant where steel is produced from pig iron, scrap and lime, by blowing oxygen to burn off the carbon. In the BOF, the sublance device that measures carbon content and temperature rapidly before the late period of blowing is the most important tool for BOF process control. The use of such sublance has been an important step for controlling the BOF steel making processes. Sublance is basically used to take sample at the end of blow usually 3-4 minutes before end of blow for analysis of sample and also measures temperature of the bath. Since the introduction of sublance, the accuracy of the end point prediction (hit rate) at most of steel plants has gradually increased from approximately 60% to 90%. In (Valentini et al. 2004) a neural network has been applied to predict the final Carbon content *[C]* for an OBM (from the German *Oxygen Bodenblasen Maxhuette*) converter in steel making industry by exploiting the estimates of the Oxygen content *[O]*and of the temperature *T*. These three variables are usually linked by the following approximate mathematical equation:

where one of the two classes to be distinguished (i.e. the one corresponding to malfunctioning) is far less frequent than the other one. Firstly Self Organising Maps (SOM) have been applied (Colla et al., 2006) in order to predict the clogging occurrence, in parallel with a physical model that takes into account some basic mechanism of the alumina precipitation and the geometry of the nozzles, but is not capable to explain all the complex

*The performance of the overall system are acceptable (this is also a key element in industry: the aim is seldom to optimize but often to achieve any performance better than a given threshold in a limited time) and the system has been successfully applied in the industrial context, mainly because: i) after a short testing, the prediction accuracy was proven to be higher; ii) risk was little as the traditional approach could be used to crosscheck the predictions of the intelligent system; iii) the availability of a simplified end-user interface reduced personnel training to the minimum, allowing the operator to input the relevant process parameters and obtain immediate indication of the actual danger of clogging occurrence and the potential countermeasure to adopt (i.e. Calcium Oxide addition to the* 

Improvements are also possible, by taking into account the different importance of misclassification errors. In fact the erroneous classification of a faulty situation as a normal one (sometimes called *missed detection*) prevents the operators to develop suitable countermeasures to avoid the clogging, with potential heavier consequences with respect to the opposite case, when a potential unnecessary warning message is raised in a standard condition (the so-called *false alarm*). Actually standard classifiers are not always capable of providing excellent results when dealing with imbalanced datasets: therefore in (Vannucci & Colla 2011) a classified has been applied, which is explicitly designed to cope with imbalanced datasets and exploits the labelled SOMs, according to the scheme depicted in Fig. 2.b. Once trained, each neuron of the SOM is labelled as corresponding to the frequent or infrequent class through a procedure that exploits a Fuzzy Inference System in order to find a suitable compromise between missed detection and false alarms

In the integrated steelmaking cycle, where steel is produced from primary raw materials, the Blast Oxygen Furnace (BOF) is the plant where steel is produced from pig iron, scrap and lime, by blowing oxygen to burn off the carbon. In the BOF, the sublance device that measures carbon content and temperature rapidly before the late period of blowing is the most important tool for BOF process control. The use of such sublance has been an important step for controlling the BOF steel making processes. Sublance is basically used to take sample at the end of blow usually 3-4 minutes before end of blow for analysis of sample and also measures temperature of the bath. Since the introduction of sublance, the accuracy of the end point prediction (hit rate) at most of steel plants has gradually increased from approximately 60% to 90%. In (Valentini et al. 2004) a neural network has been applied to predict the final Carbon content *[C]* for an OBM (from the German *Oxygen Bodenblasen Maxhuette*) converter in steel making industry by exploiting the estimates of the Oxygen content *[O]*and of the temperature *T*. These three variables are usually linked by the

relationships between process and chemical variables.

**4.3 Prediction of the end-point in the converter** 

following approximate mathematical equation:

*liquid steel) (Fera et al. 2005).* 

rates.

$$\log \frac{P\_c}{\left[\text{C}\right]^\mu \left[O\right]} = \frac{A}{T} + B \tag{1}$$

where *Pc* is a constant pressure value in the range [1, 1.5] atm, *A* and *B* are constants whose nominal values are respectively, *A=1895 °K* and *B=1.6* and *n* is commonly assumed to be unitary, but some literature results provide *n≈0.5* for *[C]<0.08%* and *n≈1* otherwise.

Equation (1) can easily be inverted in order to predict *[C]* from *[O]* and *T*, but the prediction obtained using the nominal values of the constant parameters is not very reliable compared to the *[C]* measurements contained in a dataset provided by the steel manufacturer for the steel grades of interest. A reliable prediction of the final Carbon content at the end of the refining process is very important, as it allows to evaluate the process parameters (such as the amount of inflated Oxygen and the duration of the refining process) required to achieve the desired results optimizing the time and cost of the production process. By adopting a simple two-layer MLP with 3 neurons in the hidden layer, the prediction error has been reduced of 64% with respect to the prediction obtained through eq.(1).

*This system presents the following advantages: i) the performance is acceptable, ii) the neural network is very simple; iii) the training time is negligible. However the neural model has been not very well perceived by the end-users mainly because it is difficult to attribute a precise physical meaning and interpretation to the network parameters, such as it happens for the parameters of the formula (1).* 

Therefore the alternative solution of a fine tuning of the physical parameters around the nominal values has finally been preferred.

#### **4.4 Prediction of the time required by each stage of hot rolling mills**

The efficiency and productivity of steel hot rolling mills is heavily affected by the possibility of precisely estimating when the different manufacturing stages are completed, as this avoids bottlenecks and provides important time and energy savings. For this reason, several Mill Pacing Control (MPC) systems have been realised and implemented, which allow optimising the production flow starting from the reheating furnaces, where slabs are heated at a temperature between 1100°C and 1300°C for optimal workability before being rolled. Hot steel rolling mills are usually composed of two main stages, namely the *roughing mill*, where the slab is firstly compressed, and the *finishing* mill, where the aimed thickness of the hot rolled coil is reached. A further rolling stage can be afterwards required, named *cold rolling*, which is pursued at far lower temperatures in order to produce flat products, such as plate, sheets or coils of various thicknesses.

MPC systems allow shorten the discharging interval between two subsequent slabs avoiding collisions. To this aim, schedule systems are developed and simulations are performed in order to test new strategies without affecting the production cycle.

Colla et al. (Colla et al., 2010) applied neural networks to solve a particular mill pacing problem, different from the usual one, namely the prediction of the total roughing time and of the time required for passing the first gauge of the finishing mill. This investigation has been pursued in order to increase the rolling efficiency and decrease the total rolling time. The slabs that are subsequently rolled can differ in steel grade and other features, thus the related rolling processes can require different times and energy amounts. The time required

Experiences and Obstacles in Industrial Applications of Intelligent Systems 115

board subsystem actuates suitable countermeasures, such as emergency breaking. The evaluation of the above-mentioned distance values requires the knowledge of the breaking parameters and of the actual train speed: a correct estimate of this last variable even in poor adhesion conditions (i.e. when one or more train wheels are sliding on the rails and, thus,

Allotta et al. (Allotta et al 2001, Allotta et al 2002) developed a series of algorithms for estimating the actual train speed on the basis of the information collected concerning two axles of the locomotor. A first set of such algorithms have been developed according to expert personnel specifications and following the traditional "crisp" reasoning, which exploits different simple deterministic formulas for calculating the train speed depending on the condition of adhesion of the wheels to the rails. In fact, among a huge number of state variables that are considered in the procedure, there are two binary variables indicating the adhesion condition of each axle. The technical personnel of the train society formalised the reasoning that leads the human operators to a correct determination of the adhesion conditions. Then two identical fuzzy systems have been developed, which take two inputs each, namely the difference between the velocities of the two controlled axles and the acceleration of the axle whose adhesion condition is estimated, and return the degree of adhesion of one axle. The design of two fuzzy systems have been refined by means of a training procedure exploiting a great quantity of the available data and, finally, they have been merely substituted to the old crisp algorithm for adhesion condition estimation, by leaving the rest of the speed estimation procedure unchanged. As an alternative, the standard rule-based system merely implementing the human operators' reasoning has been implemented and its own parameters (such as thresholds) have been tuned by means of Genetic Algorithms (GA) exploiting the available experimental data and adopting as fitness function to minimise the error between the actual and estimated train speed. A second set of algorithms that have been tested for this application perform a direct estimate of the train velocity taking as inputs some of the available state variables (in particular axles velocities, accelerations and acceleration variations). Both neural networks and fuzzy inference

*From a comparison among all the tested approaches it turned out that the algorithms purely based on AI techniques (and, in particular, the neural network (Colla et al. 2003)) outperform the rule-based ones and have also a simpler structure. However, these systems also present the following non negligible disadvantages: i) a difficult physical interpretation of fuzzy rules or of the neural network;* 

the axle angular velocity is not proportional to the train speed) is crucial.

Fig. 4. Working principle of an ATP

systems have been tested to this purpose.

for the roughing process is in average (but not always) smaller than the finishing time. Thus a slab can be output by the roughing mill while the rolling of the previous coil is being completed: this fact may cause a collision or may force the second slab to remain stuck while its temperature decreases, which makes its successive rolling more difficult. On the other hand, the time between the input of two successive slabs to the roughing time cannot be excessive, in order to keep productivity and avoid energy losses. Ideally, a slab should be input to roughing mill exactly at a time instant that will allow it to arrive at the entrance of the finishing mill when the rolling of the previous coil is just terminated. (Colla et al. 2010) applied various neural networks-based approaches to predict the time *�i* (*1≤i≤6*) required by the slab to pass each one of the 6 stages that form the roughing mill (see Fig. 3).

Fig. 3. Scheme of the first stages of a steel hot rolling mill.

In particular, the most successful solution performs a sequential prediction, namely bases the prediction of *�i* (for *i>1*) not only on product and process parameters, but also on the prediction of the times required to pass the previous stages, i.e. *<sup>k</sup>* with *1≤k≤i*. Moreover, neural networks have been applied also to predict the time required for passing the finishing mill.

*In this case, the application of neural networks were actually advantageous for the following main reasons: i) neural networks proved to outperform more traditional approaches; ii) the neural system is naturally adaptable to the changing operating conditions thanks to its capability of self-learning from data. However the on-site real-time implementation has not been easy and required considerable efforts because it has been difficult to interface the system with the control system of the mill.* 

#### **4.5 Estimate of train position and speed from wheels velocity measurements**

Within an Automatic Train Protection (ATP) system, two subsystems are usually included: a ground subsystem, which provides updated information on the train position and the line gradient by exploiting fixed balises or another source of absolute information (e.g. GPS), and an on-board subsystem, which estimates the actual train position and speed, according to the scheme depicted in Fig.4.

The ground subsystem communicates to the on-board one the distance from the next reference point on the line, the gradient of the line and the allowed speed at the next reference point. The on-board subsystem then evaluates the distances from the next information point and the minimum distance that allow compliance with the speed limit at the next reference point. If it turns out that the train cannot meet the target speed at the next reference point (as the residual breaking resources of the train are not sufficient), the onboard subsystem actuates suitable countermeasures, such as emergency breaking. The evaluation of the above-mentioned distance values requires the knowledge of the breaking parameters and of the actual train speed: a correct estimate of this last variable even in poor adhesion conditions (i.e. when one or more train wheels are sliding on the rails and, thus, the axle angular velocity is not proportional to the train speed) is crucial.

Fig. 4. Working principle of an ATP

114 Intelligent Systems

for the roughing process is in average (but not always) smaller than the finishing time. Thus a slab can be output by the roughing mill while the rolling of the previous coil is being completed: this fact may cause a collision or may force the second slab to remain stuck while its temperature decreases, which makes its successive rolling more difficult. On the other hand, the time between the input of two successive slabs to the roughing time cannot be excessive, in order to keep productivity and avoid energy losses. Ideally, a slab should be input to roughing mill exactly at a time instant that will allow it to arrive at the entrance of the finishing mill when the rolling of the previous coil is just terminated. (Colla et al. 2010) applied various neural networks-based approaches to predict the time *�i* (*1≤i≤6*) required by

In particular, the most successful solution performs a sequential prediction, namely bases the prediction of *�i* (for *i>1*) not only on product and process parameters, but also on the

neural networks have been applied also to predict the time required for passing the

*In this case, the application of neural networks were actually advantageous for the following main reasons: i) neural networks proved to outperform more traditional approaches; ii) the neural system is naturally adaptable to the changing operating conditions thanks to its capability of self-learning from data. However the on-site real-time implementation has not been easy and required considerable* 

Within an Automatic Train Protection (ATP) system, two subsystems are usually included: a ground subsystem, which provides updated information on the train position and the line gradient by exploiting fixed balises or another source of absolute information (e.g. GPS), and an on-board subsystem, which estimates the actual train position and speed, according to

The ground subsystem communicates to the on-board one the distance from the next reference point on the line, the gradient of the line and the allowed speed at the next reference point. The on-board subsystem then evaluates the distances from the next information point and the minimum distance that allow compliance with the speed limit at the next reference point. If it turns out that the train cannot meet the target speed at the next reference point (as the residual breaking resources of the train are not sufficient), the on-

*efforts because it has been difficult to interface the system with the control system of the mill.* 

**4.5 Estimate of train position and speed from wheels velocity measurements** 

*<sup>k</sup>* with *1≤k≤i*. Moreover,

the slab to pass each one of the 6 stages that form the roughing mill (see Fig. 3).

Fig. 3. Scheme of the first stages of a steel hot rolling mill.

finishing mill.

the scheme depicted in Fig.4.

prediction of the times required to pass the previous stages, i.e.

Allotta et al. (Allotta et al 2001, Allotta et al 2002) developed a series of algorithms for estimating the actual train speed on the basis of the information collected concerning two axles of the locomotor. A first set of such algorithms have been developed according to expert personnel specifications and following the traditional "crisp" reasoning, which exploits different simple deterministic formulas for calculating the train speed depending on the condition of adhesion of the wheels to the rails. In fact, among a huge number of state variables that are considered in the procedure, there are two binary variables indicating the adhesion condition of each axle. The technical personnel of the train society formalised the reasoning that leads the human operators to a correct determination of the adhesion conditions. Then two identical fuzzy systems have been developed, which take two inputs each, namely the difference between the velocities of the two controlled axles and the acceleration of the axle whose adhesion condition is estimated, and return the degree of adhesion of one axle. The design of two fuzzy systems have been refined by means of a training procedure exploiting a great quantity of the available data and, finally, they have been merely substituted to the old crisp algorithm for adhesion condition estimation, by leaving the rest of the speed estimation procedure unchanged. As an alternative, the standard rule-based system merely implementing the human operators' reasoning has been implemented and its own parameters (such as thresholds) have been tuned by means of Genetic Algorithms (GA) exploiting the available experimental data and adopting as fitness function to minimise the error between the actual and estimated train speed. A second set of algorithms that have been tested for this application perform a direct estimate of the train velocity taking as inputs some of the available state variables (in particular axles velocities, accelerations and acceleration variations). Both neural networks and fuzzy inference systems have been tested to this purpose.

*From a comparison among all the tested approaches it turned out that the algorithms purely based on AI techniques (and, in particular, the neural network (Colla et al. 2003)) outperform the rule-based ones and have also a simpler structure. However, these systems also present the following non negligible disadvantages: i) a difficult physical interpretation of fuzzy rules or of the neural network;* 

Experiences and Obstacles in Industrial Applications of Intelligent Systems 117

*The success of this applications of intelligent systems with respect to the previous system which was based on heuristics depends on the following reasons: i) a the correct formalisation of the MOO problem; ii) a suitable simulation system of the automatic warehouse (Colla & Nastasi 2010) that has been realised in order to reproduce the monitoring system of the warehouse itself and can be fed with the same data files that are used by the real system; iii) the possibility (as a consequence of point ii) to easily test the different strategies in a realistic way without affecting the normal operations of the warehouse; iv) the possibility (as a consequence of point ii) of performing an easy and user-friendly comparison of the standard and simulated situation of the warehouse obtained through the previous and improved strategies is possible, which can be of help for the technical personnel in order to evaluate the advantages of a new strategy; v) the easiness of collecting training data, which are no more than standard system data; vi) the modularity of the software for simulation and for the implementation of the allocation strategy, which makes the substitution of the new code within the* 

According to authors' personal experience, it *cannot* be stated that intelligent systems are so advantageous with respect to traditional techniques to be *universally* accepted for industrial applications. Or better, advantages exist but they are often too limited when compared with the additional risks, training costs, design time, and documentation/maintenance effort. There are surely applications were they provide advantages, especially in tough problems,

**Unfortunately, in most industrial applications that we have encountered so far, very few intelligent system offered such better performance with respect to other techniques to really convince the sceptical user. Of course the comparison is made between the** *best*

There is perhaps a major advantage which makes intelligent systems attractive in a wider

*generic approximation and modelling techniques which allow accurate system modelling/* 

In practice anybody without any experience in a specific subject can afford solving a problem which could otherwise (namely, with traditional techniques) be afforded only by an expert (or a team of experts) in that field. It is likely that an expert, with appropriate knowledge of the problem and of a bunch of more specific methods would achieve a better result, but this would be far more expensive for an industry, both because of the higher cost of the expert and for the longer development time. This is a rather interesting advantage, even when intelligent systems are suboptimal, as it significantly reduces training costs of

The authors personally believe that industry strongly needs to be helped to accept intelligent systems and this should be a major role for universities and research institutions.

*forecasting/approximation/classification/etc. without any specific experience of the designer*.

but these are rather limited, therefore they do not justify a universal acceptance.

**intelligent system and the correspondingly** *best* **non-intelligent technique.** 

*control system of the warehouse straightforward.* 

**5.1 A global advantage of intelligent system** 

inexperienced personnel.

range of applications. In practice, intelligent systems are:

**5.2 How to help industry accepting intelligent systems** 

**5. Conclusion** 

*ii) a difficult implementation of the specification requirements; iii) the FIS-based methods are also computationally less efficient; iv) when testing the more frequent fault conditions, all the developed algorithms present an acceptable degree of reliability and robustness, but the crisp algorithm, which is actually adherent to the specifications provided by the expert personnel of the train society, provides the best guarantee of estimated speed values lying within acceptable limits; v) with respect the merely crisp algorithm and to its improved version, where some parameters have been optimized through GA, the soft computing-based algorithms provide less control over the internal parameters of the estimator, which increase in number but loose in physical meaning (this is especially true for the neural predictor, which has been applied as a black-box parametric estimator); vi) soft computingbased procedures, and especially the ones which exploit neural networks, do not guarantee that a particular input pattern (or series of input patterns) will lead to unacceptable velocity estimates.* 

Thus finally the rule-based algorithm tuned through GA has been preferred to all the other approaches for the final implementation.

#### **4.6 Optimisation of the logistic in an automatic warehouse of steel tubes**

Optimisation of logistic is one of the fields where intelligent systems have most successfully been applied. Colla et al. (Colla et al., 2010) tested several AI-based techniques for the optimisation of products allocation in an automatic warehouse of steel tubes. The warehouse has been designed to stock a large variety of typologies of steel tubes, differing in quality of the steel, in the length as well as in the shape and dimensions of the section. As soon as the tubes are produced, they are grouped in packs and automatically transferred to a stocking area, where they are located in piles that must be composed by the same typology of tubes. A non optimal allocation strategy can cause the available space in the warehouse not to be fully exploited, such as, for instance, the case in which many short (i.e. composed by a few packs) piles are present in place of a few higher ones. To this aim, firstly some Key Performance Indicators (KPIs) have been defined in order to derive objective functions to be optimised by the different allocation strategies. Afterwards, an optimisation problem has been formulated, for which an analytical model of the problem was really too complex to define and implement, due to the variability of the workload and to the interaction between the automatic tube conveyors, as traffic control is only partially centralised (for instance collision avoidance is managed at local level through suitable sensing and communicating devices mounted on each conveyor). Traditional derivative-based optimisation models cannot be applied, while GAs are a very suitable solution for the optimisation problem, as they allow a decoupling between the problem formulation and the search procedure. The destination of each tubes pack has been suitably codified in a chromosome and GAs have been applied in order to minimise a fitness function obtained from a composition of the above-defined KPIs. Different ways to aggregate the selected KPIs have been tested, from a simple weighted sum up to a Fuzzy Inference System implementing a complex combination according to rules derived from the knowledge of the technical personnel working on the plant. However, this application is intrinsically a Multi-Objective Optimization (MOO) problem, as the KPIs represent requirements that are often in contrast to each other. Any kind of aggregation of the KPIs simplifies it to a Single Objective Optimization problem, but surely the most suitable way to cope with this problem is by exploiting GA-based MOO algorithms. The Strength Pareto Evolutionary Algorithm (Zitzler & Thiele 1999) has been successfully applied to this problem and outperforms all the other approaches.

*The success of this applications of intelligent systems with respect to the previous system which was based on heuristics depends on the following reasons: i) a the correct formalisation of the MOO problem; ii) a suitable simulation system of the automatic warehouse (Colla & Nastasi 2010) that has been realised in order to reproduce the monitoring system of the warehouse itself and can be fed with the same data files that are used by the real system; iii) the possibility (as a consequence of point ii) to easily test the different strategies in a realistic way without affecting the normal operations of the warehouse; iv) the possibility (as a consequence of point ii) of performing an easy and user-friendly comparison of the standard and simulated situation of the warehouse obtained through the previous and improved strategies is possible, which can be of help for the technical personnel in order to evaluate the advantages of a new strategy; v) the easiness of collecting training data, which are no more than standard system data; vi) the modularity of the software for simulation and for the implementation of the allocation strategy, which makes the substitution of the new code within the control system of the warehouse straightforward.* 

### **5. Conclusion**

116 Intelligent Systems

*ii) a difficult implementation of the specification requirements; iii) the FIS-based methods are also computationally less efficient; iv) when testing the more frequent fault conditions, all the developed algorithms present an acceptable degree of reliability and robustness, but the crisp algorithm, which is actually adherent to the specifications provided by the expert personnel of the train society, provides the best guarantee of estimated speed values lying within acceptable limits; v) with respect the merely crisp algorithm and to its improved version, where some parameters have been optimized through GA, the soft computing-based algorithms provide less control over the internal parameters of the estimator, which increase in number but loose in physical meaning (this is especially true for the neural predictor, which has been applied as a black-box parametric estimator); vi) soft computingbased procedures, and especially the ones which exploit neural networks, do not guarantee that a particular input pattern (or series of input patterns) will lead to unacceptable velocity estimates.* 

Thus finally the rule-based algorithm tuned through GA has been preferred to all the other

Optimisation of logistic is one of the fields where intelligent systems have most successfully been applied. Colla et al. (Colla et al., 2010) tested several AI-based techniques for the optimisation of products allocation in an automatic warehouse of steel tubes. The warehouse has been designed to stock a large variety of typologies of steel tubes, differing in quality of the steel, in the length as well as in the shape and dimensions of the section. As soon as the tubes are produced, they are grouped in packs and automatically transferred to a stocking area, where they are located in piles that must be composed by the same typology of tubes. A non optimal allocation strategy can cause the available space in the warehouse not to be fully exploited, such as, for instance, the case in which many short (i.e. composed by a few packs) piles are present in place of a few higher ones. To this aim, firstly some Key Performance Indicators (KPIs) have been defined in order to derive objective functions to be optimised by the different allocation strategies. Afterwards, an optimisation problem has been formulated, for which an analytical model of the problem was really too complex to define and implement, due to the variability of the workload and to the interaction between the automatic tube conveyors, as traffic control is only partially centralised (for instance collision avoidance is managed at local level through suitable sensing and communicating devices mounted on each conveyor). Traditional derivative-based optimisation models cannot be applied, while GAs are a very suitable solution for the optimisation problem, as they allow a decoupling between the problem formulation and the search procedure. The destination of each tubes pack has been suitably codified in a chromosome and GAs have been applied in order to minimise a fitness function obtained from a composition of the above-defined KPIs. Different ways to aggregate the selected KPIs have been tested, from a simple weighted sum up to a Fuzzy Inference System implementing a complex combination according to rules derived from the knowledge of the technical personnel working on the plant. However, this application is intrinsically a Multi-Objective Optimization (MOO) problem, as the KPIs represent requirements that are often in contrast to each other. Any kind of aggregation of the KPIs simplifies it to a Single Objective Optimization problem, but surely the most suitable way to cope with this problem is by exploiting GA-based MOO algorithms. The Strength Pareto Evolutionary Algorithm (Zitzler & Thiele 1999) has been

**4.6 Optimisation of the logistic in an automatic warehouse of steel tubes** 

successfully applied to this problem and outperforms all the other approaches.

approaches for the final implementation.

According to authors' personal experience, it *cannot* be stated that intelligent systems are so advantageous with respect to traditional techniques to be *universally* accepted for industrial applications. Or better, advantages exist but they are often too limited when compared with the additional risks, training costs, design time, and documentation/maintenance effort. There are surely applications were they provide advantages, especially in tough problems, but these are rather limited, therefore they do not justify a universal acceptance.

**Unfortunately, in most industrial applications that we have encountered so far, very few intelligent system offered such better performance with respect to other techniques to really convince the sceptical user. Of course the comparison is made between the** *best* **intelligent system and the correspondingly** *best* **non-intelligent technique.** 

#### **5.1 A global advantage of intelligent system**

There is perhaps a major advantage which makes intelligent systems attractive in a wider range of applications. In practice, intelligent systems are:

*generic approximation and modelling techniques which allow accurate system modelling/ forecasting/approximation/classification/etc. without any specific experience of the designer*.

In practice anybody without any experience in a specific subject can afford solving a problem which could otherwise (namely, with traditional techniques) be afforded only by an expert (or a team of experts) in that field. It is likely that an expert, with appropriate knowledge of the problem and of a bunch of more specific methods would achieve a better result, but this would be far more expensive for an industry, both because of the higher cost of the expert and for the longer development time. This is a rather interesting advantage, even when intelligent systems are suboptimal, as it significantly reduces training costs of inexperienced personnel.

#### **5.2 How to help industry accepting intelligent systems**

The authors personally believe that industry strongly needs to be helped to accept intelligent systems and this should be a major role for universities and research institutions.

Experiences and Obstacles in Industrial Applications of Intelligent Systems 119

*modelling of complex systems*, where any other modelling technique would be as

*consumer applications* where the appeal of the "fuzzy label" increases the market of an

Allotta, B; Malvezzi, M.; Toni, P.; Colla, V. (2001). Train speed and position evaluation using

Allotta, B; Colla, V.; Malvezzi, M. (2002). Train Position And Speed Estimation Using Wheel

Colla, V.; Reyneri, L.M.; Sgarbi, M. (2000). Parametric Characterization of Jominy Profiles in Steel Industry, *Integrated Computer-Aided Engineering*, Vol. 7, pp. 217-228. Colla, V.; Vannucci, M.; Allotta, B.; Malvezzi, M. (2003). Comparison of traditional and

*on Artificial Neural Networks ESANN 2003*, Brugges, Belgium, 23-25 April. Colla, V.; Vannucci, M.; Fera, S.; Valentini, R. (2006). Ca-treatment of Al-Killed steels:

Colla, V.; Vannucci, M.; Valentini, R. (2010). Neural network based prediction of roughing

Colla, V.; Nastasi, G. (2010). Modelling and Simulation of an Automated Warehouse for the

Colla, V.; Nastasi, G.; Matarese, N.; Reyneri L.M. (2010). GA-Based Solutions Comparison

Doane, D.V.; Kirkaldy, J.S. (1978). Hardenability Concepts with Applications to Steel, *TMS-*

Fera, S.; Harloff, A.; Roedl, S.; Mavrommatis, K.; Colla, V.; Santisteban, V.; Roessler S. (2005).

Haykin, S. (1994), Neural Networks: A Comprehensive Foundation, *Mc Millan College* 

Heesom, M.J. (1988). 'Physical and chemical aspects of nozzle blockage during continuous casting, *Proceedings of the 1st Int. Calcium Treatment Symposium*, London (UK). Marin, B.; Bell, A.; Idoyaga, Z.; Colla, V.; Fernàndez, L.M. (2007). Optimization of the

*Conference on Advanced Intelligent Mechatronics AIM'01*, Como, Italy.

*of Mechanical Engineers Part F*, Vol. 216, No. 3, pp. 207-225.

*EOSC'06*, 26-28 June 2006, Aachen, Germany, pp. 387-394.

*Optimization*, INTECH pp. 471-486 (ISBN 978-953-7619-36-7).

*Intelligent Systems*, Vol. 7 pp. 283–297.

*AIME*, Warrendale.

Report EUR 22446.

Technical Report EUR 21442.

*Publishing Company*, New York, 1994.

wheel velocity measurements, *Proceedings of the 2001 IEEE/ASME International* 

Velocity Measurements *Journal of Rail and Rapid Transit Proceedings of the Institution* 

neural system for train speed estimation, *Proceedings of the 11th European Symposium* 

inclusion modification and application of Artificial Neural Networks for the prediction of clogging, *Proceedings of the 5th European Oxygen Steelmaking Conference* 

and finishing times in a hot strip mill, *Revista de Metalurgia*, Vol. 46, No 1, pp. 15-21.

Comparison of Storage Strategies, Chap.21 in *Modelling, Simulation and* 

for Storage Strategies Optimization for an Automated Warehouse" *Journal of Hybrid* 

Development of a model predicting inclusions precipitation in nozzles based on chemical composition and process parameters such as casting rate, liquid temperature, nozzle design and slag composition, *European Commission Ed.*

influence of Boron on the properties of steels, *European Commission Ed.* Technical

intelligent systems can be accepted at no cost;

incomprehensible as a neurofuzzy model;

appliance.

**6. References** 

process/patterns prevents a 100% prediction accuracy, therefore the errors of the

Yet this has to be done in the most appropriate way, that is, by showing industry *unambiguously if, where and when* intelligent systems offer significant advantages or, more realistically, more advantages than drawbacks and associating the intelligent system with an appropriate *development environment* and enough *supporting tools*, which is often the most time consuming element to be developed.

This is one of the major reasons for the several Special Sessions on Industrial Applications of Intelligent Systems which have been held in the last decade. Authors are usually requested to present their ideas on intelligent systems but, more important, to prove that they are either comparable or significantly better than other standard techniques. *Such a comparison has to be as fair as possible, as it is not normally the case*. In practice, in most papers, intelligent systems are usually compared among themselves. The expert reader is left with the question:

*Are you sure that other techniques would not be even better or simpler?* 

Or, when a comparison is attempted with standard techniques, these are usually much older, that is, the paper demonstrates, for instance, that an *up-to-date neurofuzzy network* is much better than an *older-than-my-father standard technique*, which is rather obvious, as technology keeps improving, independent of intelligent system.

One of the major reasons for this lack of fair comparisons is that comparing an intelligent system against an *up-to-date standard method* requires developing by scratch an appropriate demonstrator, which often requires either a lot of specific experience or a lot of time, and usually nobody wants to afford it.

*Only those research groups who tightly cooperate with an industrial group can merge industrial and academic experiences, to develop both techniques appropriately, although these are seldom done together, due to unaffordable additional costs.* 

#### **5.3 A critical question**

So far very few applications of intelligent systems provided such better performance with respect to other techniques to really convince even the most sceptical user. In most cases, they can either offer a slightly better performance (when compared with an alternative welldesigned method) with a shorter design time but, on the other hand, design risks are often so critical that they definitely impair the advantages. It is therefore time for a critical question:

*In which applications are neural networks have fuzzy logic a higher chance of being accepted?*

We think that, at present, the most promising areas are, for instance:


process/patterns prevents a 100% prediction accuracy, therefore the errors of the intelligent systems can be accepted at no cost;


#### **6. References**

118 Intelligent Systems

Yet this has to be done in the most appropriate way, that is, by showing industry *unambiguously if, where and when* intelligent systems offer significant advantages or, more realistically, more advantages than drawbacks and associating the intelligent system with an appropriate *development environment* and enough *supporting tools*, which is often the most

This is one of the major reasons for the several Special Sessions on Industrial Applications of Intelligent Systems which have been held in the last decade. Authors are usually requested to present their ideas on intelligent systems but, more important, to prove that they are either comparable or significantly better than other standard techniques. *Such a comparison has to be as fair as possible, as it is not normally the case*. In practice, in most papers, intelligent systems are usually compared among themselves. The expert reader is left with the

Or, when a comparison is attempted with standard techniques, these are usually much older, that is, the paper demonstrates, for instance, that an *up-to-date neurofuzzy network* is much better than an *older-than-my-father standard technique*, which is rather obvious, as

One of the major reasons for this lack of fair comparisons is that comparing an intelligent system against an *up-to-date standard method* requires developing by scratch an appropriate demonstrator, which often requires either a lot of specific experience or a lot of time, and

*Only those research groups who tightly cooperate with an industrial group can merge industrial and academic experiences, to develop both techniques appropriately, although these are seldom done* 

So far very few applications of intelligent systems provided such better performance with respect to other techniques to really convince even the most sceptical user. In most cases, they can either offer a slightly better performance (when compared with an alternative welldesigned method) with a shorter design time but, on the other hand, design risks are often so critical that they definitely impair the advantages. It is therefore time for a critical

 *data mining*, *knowledge based systems*, where information, data, knowledge and models are *valuable items*, but they are often *hidden* in a huge amount of noise, ambiguous, contradicting data. Data is so wide, contradicting, ambiguous, that no method can be accurate and predictable, therefore neural networks may provide advantages, without

 *prediction/classification of partially random processes*, like *time-series prediction*, *forecasting*, *complex pattern classification*, *semantic Web*, etc., where the randomness of the

*In which applications are neural networks have fuzzy logic a higher chance of being accepted?*

We think that, at present, the most promising areas are, for instance:

time consuming element to be developed.

usually nobody wants to afford it.

**5.3 A critical question** 

question:

*together, due to unaffordable additional costs.* 

the need to be 100% correct;

*Are you sure that other techniques would not be even better or simpler?* 

technology keeps improving, independent of intelligent system.

question:


**6** 

Nhon Van Do

*Vietnam* 

*University of Information Technology* 

**Intelligent Problem Solvers in Education:** 

In this chapter we present the method for designing intelligent problem solvers (IPS), especially those in education. An IPS, which is an intelligent system, can consist of AIcomponents such as theorem provers, inference engines, search engines, learning programs, classification tools, statistical tools, question-answering systems, machinetranslation systems, knowledge acquisition tools, etc (Sowa, John F. 2002). An IPS in education (IPSE) considered here must have suitable knowledge base used by the inference engine to solve problems in certain knowledge domain, and the system not only give human readable solutions but also present solutions as the way teachers and students usually write them. Knowledge representation methods used to design the knowledge base should be convenient for studying of users and for using by inference engine. Besides, problems need to be modeled so that we can design algorithms for solving problems automatically and propose a simple language for specifying them. The system can solve problems in general forms. Users only declare hypothesis and goal of problems base on a simple language but strong enough for specifying problems. The hypothesis can consist of objects, relations between objects or between attributes. It can also contain formulas, determination properties of some attributes or their values. The goal can be to compute an attribute, to determine an object, a relation or a formula. After specifying a problem, users can request the program to solve it automatically or to give instructions that help them to solve it themselves. The second function of the system is "Search for Knowledge". This function helps users to find out necessary knowledge quickly. They can search for concepts, definitions, properties, related theorems or formulas, and problem patterns. By the cross-reference systems of menus, users can easily

Knowledge representation has a very important role in designing the knowledge base and the inference engine of the system. There are many various models and methods for knowledge representation which have already been suggested and applied in many fields of science. Many popular methods for knowledge representation such as logic, frames, classes, semantic networks, conceptual graphs, rules of inference, and ontologies can be found in George F. Luger (2008), Stuart Russell & Peter Norvig (2010), or in Sowa, John F. (2000). These methods are very useful in many applications. However, they are not enough and not easy to use for constructing an IPSE in practice. Knowledge

**1. Introduction** 

get knowledge they need.

**Design Method and Applications** 


### **Intelligent Problem Solvers in Education: Design Method and Applications**

#### Nhon Van Do

*University of Information Technology Vietnam* 

#### **1. Introduction**

120 Intelligent Systems

Reyneri, L.M. (1999) Unification of Neural and Wavelet Networks and Fuzzy Systems, *IEEE* 

Valentini, R.; Colla, V.; Vannucci, M. (2004). Neural predictor of the end point in a

Vannucci, M.; Colla, V. (2011). Novel classification methods for sensitive problems and

Vermeulen, W.G.; Van Der Wolk, P.J.; De Weijer, A.P.; Van Der Zwaag, S. (1996). Prediction

Zhang, Q. (1997) Using Wavelet Network in Non-parametric Estimation, *IEEE Transactions* 

Zitzler, E.; Thiele, L. (1999) Multiobjective Evolutionary Algorithms: A Comparative Case

uneven datasets based on neural networks and fuzzy logic, *Applied Soft Computing*,

of Jominy Hardness Profiles of Steels Using Artificial Neural Networks," *Journal of* 

Study and the Strength Pareto Evolutionary Algorithm, *IEEE Transactions on* 

*Trans. on Neural Networks*, Vol. 10(4), pp. 801-814.

*on Neural Networks*}, Vol. 8(2), pp. 227-236.

*Evolutionary Computation*, Vol. 3 (4), pp. 257-271.

Vol. 11, pp. 2383-2390.

converter", *Revista de Metalurgia*, Vol 40, No. 6, pp. 416-419.

*Materials Engineering and Performances*, Vol. 5, No. 1, pp. 57-63.

In this chapter we present the method for designing intelligent problem solvers (IPS), especially those in education. An IPS, which is an intelligent system, can consist of AIcomponents such as theorem provers, inference engines, search engines, learning programs, classification tools, statistical tools, question-answering systems, machinetranslation systems, knowledge acquisition tools, etc (Sowa, John F. 2002). An IPS in education (IPSE) considered here must have suitable knowledge base used by the inference engine to solve problems in certain knowledge domain, and the system not only give human readable solutions but also present solutions as the way teachers and students usually write them. Knowledge representation methods used to design the knowledge base should be convenient for studying of users and for using by inference engine. Besides, problems need to be modeled so that we can design algorithms for solving problems automatically and propose a simple language for specifying them. The system can solve problems in general forms. Users only declare hypothesis and goal of problems base on a simple language but strong enough for specifying problems. The hypothesis can consist of objects, relations between objects or between attributes. It can also contain formulas, determination properties of some attributes or their values. The goal can be to compute an attribute, to determine an object, a relation or a formula. After specifying a problem, users can request the program to solve it automatically or to give instructions that help them to solve it themselves. The second function of the system is "Search for Knowledge". This function helps users to find out necessary knowledge quickly. They can search for concepts, definitions, properties, related theorems or formulas, and problem patterns. By the cross-reference systems of menus, users can easily get knowledge they need.

Knowledge representation has a very important role in designing the knowledge base and the inference engine of the system. There are many various models and methods for knowledge representation which have already been suggested and applied in many fields of science. Many popular methods for knowledge representation such as logic, frames, classes, semantic networks, conceptual graphs, rules of inference, and ontologies can be found in George F. Luger (2008), Stuart Russell & Peter Norvig (2010), or in Sowa, John F. (2000). These methods are very useful in many applications. However, they are not enough and not easy to use for constructing an IPSE in practice. Knowledge

Intelligent Problem Solvers in Education: Design Method and Applications 123

Knowledge Bases contain the knowledge for solving some problems in a specific knowledge domain. It must be stored in the computer-readable form so that the inference engine can use it in the procedure of automated deductive reasoning to solve problems stated in general forms. They can contain concepts and objects, relations, operators and functions,

The Inference engine will use the knowledge stored in knowledge bases to solve problems, to search or to answer for the query. It is the "brain" that systems use to reason about the information in the knowledge base for the ultimate purpose of formulating new conclusions. It must identify problems and use suitable deductive strategies to find out right rules and facts for solving the problem. In an IPSE, the inference engine also have to produce solutions

The working memory contains the data that is received from the user during operation of the system. Consequents from rules in the knowledge base may create new values in working memory, update old values, or remove existing values. It also stores data, facts and

The explanation component supports to explain the phases, concepts in the process of solving the problem. It presents the method by which the system reaches a conclusion may not be obvious to a human user, and explains the reasoning process that lead to the final

The knowledge manager aims to support updating knowledge into knowledge base. It also

The user interface is the means of communication between a user and the system problemsolving processes. An effective interface has to be able to accept the queries, instructions or problems in a form that the user enters and translate them into working problems in the form for the rest of the system. It also has to be able to translate the answers, produced by the system, into a form that the user can understand. The interface component of the system is required to have a specification language for communication between the system and

The main process for problem solving: From the user, a problem in a form that the user enter is input into the system, and the problem written by specification language is created; then it is translated so that the system receives the working problem in the form for the inference engine, and this is placed in the working memory. After analyzing the problem, the inference engine generates a possible solution for the problem by doing some automated reasoning strategies such as forward chaining reasoning method, backward chaining reasoning method, reasoning with heuristics. Next, The first solution is analyzed and from this the inference engine produces a good solution for the interface component. Based on the good solution found, the answer solution in human-readable form will be created for output

rules in the process of searching and deduction of the inference engine.

supports to search the knowledge and test consistence of knowledge.

learners, between the system and instructors as well. The figure 1 below shows the structure of the system.

The knowledge manager.

as human reading, thinking, and writing.

The interface.

facts and rules.

answer of the system.

to the user.

representation should be convenient for studying of users and for using by inference engine. Besides, problems need to be modeled so that we can design algorithms for solving problems automatically and propose a simple language for specifying them. Practical intelligent systems expect more powerful and useful models for knowledge representation. The *Computational Object Knowledge Base* model (COKB) presented in Nhon Van Do (2010) will be used to design the system. This model can be used to represent the total knowledge and to design the knowledge base component of systems. Next, computational networks (Com-Net) and networks of computational objects (CO-Net) in Nhon Van Do (2009) and Nhon Van Do (2010) can be used for modeling problems in knowledge domains. These models are tools for designing inference engine of systems.

We used COKB model, CO-Net and Com-Net in constructing some practical IPSE such as the program for studying and solving problems in plane geometry presented in Nhon Van Do (2000) and Nhon Do & Hoai P. Truong & Trong T. Tran (2010), the system that supports studying knowledge and solving of analytic geometry problems, the system for solving algebraic problems in Nhon Do & Hien Nguyen (2011), the program for solving problems in electricity, in inorganic chemistry, etc. The applications have been implemented by using programming tools and computer algebra systems such as C++, C#, JAVA, and MAPLE. They are very easy to use for students in studying knowledge, to solve automatically problems and give human readable solutions agree with those written by teachers and students*.* 

The chapter will be organized as follows: In Section 2, the system architecture and the design process will be presented. In Section 3, models for knowledge representation are discussed. Designing the knowledge base and the inference engine of an IPSE will be presented in Section 4. Some applications will be introduced in section 5. Conclusions and future works are drawn in Section 6.

#### **2. System architecture and the design process**

The structure of an IPSE are considered here consists of the components such as knowledge base, inference engine, interface, explanation component, working memory and knowledge manager. In this setion, these components will be studied together with relationships between them; we will also study and discuss how an IPSE runs, and present a process to construct the system together with methods and techniques can be used in each phase of the process.

#### **2.1 Components of the system**

An IPSE is also a knowledge base system, which supports searching, querying and solving problems based on knowledge bases; it has the structure of an expert system. We can design the system which consists of following components:


representation should be convenient for studying of users and for using by inference engine. Besides, problems need to be modeled so that we can design algorithms for solving problems automatically and propose a simple language for specifying them. Practical intelligent systems expect more powerful and useful models for knowledge representation. The *Computational Object Knowledge Base* model (COKB) presented in Nhon Van Do (2010) will be used to design the system. This model can be used to represent the total knowledge and to design the knowledge base component of systems. Next, computational networks (Com-Net) and networks of computational objects (CO-Net) in Nhon Van Do (2009) and Nhon Van Do (2010) can be used for modeling problems in knowledge domains. These models are tools for designing inference engine of

We used COKB model, CO-Net and Com-Net in constructing some practical IPSE such as the program for studying and solving problems in plane geometry presented in Nhon Van Do (2000) and Nhon Do & Hoai P. Truong & Trong T. Tran (2010), the system that supports studying knowledge and solving of analytic geometry problems, the system for solving algebraic problems in Nhon Do & Hien Nguyen (2011), the program for solving problems in electricity, in inorganic chemistry, etc. The applications have been implemented by using programming tools and computer algebra systems such as C++, C#, JAVA, and MAPLE. They are very easy to use for students in studying knowledge, to solve automatically problems and give human readable solutions agree with those written by teachers and

The chapter will be organized as follows: In Section 2, the system architecture and the design process will be presented. In Section 3, models for knowledge representation are discussed. Designing the knowledge base and the inference engine of an IPSE will be presented in Section 4. Some applications will be introduced in section 5. Conclusions and

The structure of an IPSE are considered here consists of the components such as knowledge base, inference engine, interface, explanation component, working memory and knowledge manager. In this setion, these components will be studied together with relationships between them; we will also study and discuss how an IPSE runs, and present a process to construct the system together with methods and techniques can be used in each phase of the

An IPSE is also a knowledge base system, which supports searching, querying and solving problems based on knowledge bases; it has the structure of an expert system. We can design

systems.

students*.* 

process.

future works are drawn in Section 6.

**2.1 Components of the system** 

 The knowledge base. The inference engine. The explanation component. The working memory.

**2. System architecture and the design process** 

the system which consists of following components:

Knowledge Bases contain the knowledge for solving some problems in a specific knowledge domain. It must be stored in the computer-readable form so that the inference engine can use it in the procedure of automated deductive reasoning to solve problems stated in general forms. They can contain concepts and objects, relations, operators and functions, facts and rules.

The Inference engine will use the knowledge stored in knowledge bases to solve problems, to search or to answer for the query. It is the "brain" that systems use to reason about the information in the knowledge base for the ultimate purpose of formulating new conclusions. It must identify problems and use suitable deductive strategies to find out right rules and facts for solving the problem. In an IPSE, the inference engine also have to produce solutions as human reading, thinking, and writing.

The working memory contains the data that is received from the user during operation of the system. Consequents from rules in the knowledge base may create new values in working memory, update old values, or remove existing values. It also stores data, facts and rules in the process of searching and deduction of the inference engine.

The explanation component supports to explain the phases, concepts in the process of solving the problem. It presents the method by which the system reaches a conclusion may not be obvious to a human user, and explains the reasoning process that lead to the final answer of the system.

The knowledge manager aims to support updating knowledge into knowledge base. It also supports to search the knowledge and test consistence of knowledge.

The user interface is the means of communication between a user and the system problemsolving processes. An effective interface has to be able to accept the queries, instructions or problems in a form that the user enters and translate them into working problems in the form for the rest of the system. It also has to be able to translate the answers, produced by the system, into a form that the user can understand. The interface component of the system is required to have a specification language for communication between the system and learners, between the system and instructors as well.

The figure 1 below shows the structure of the system.

The main process for problem solving: From the user, a problem in a form that the user enter is input into the system, and the problem written by specification language is created; then it is translated so that the system receives the working problem in the form for the inference engine, and this is placed in the working memory. After analyzing the problem, the inference engine generates a possible solution for the problem by doing some automated reasoning strategies such as forward chaining reasoning method, backward chaining reasoning method, reasoning with heuristics. Next, The first solution is analyzed and from this the inference engine produces a good solution for the interface component. Based on the good solution found, the answer solution in human-readable form will be created for output to the user.

Intelligent Problem Solvers in Education: Design Method and Applications 125

**Stage 4:** Modeling of problems and designing algorithms for automated reasoning. General problems can be represented by using Com-Nets, CO-Nets, and their extensions. The CO-

O = O1, O2, . . ., On, F = f1, f2, . . ., fm,

Goal = [ g1, g2, . . ., gm ]. In the above model the set O consists of n Com-objects, F is the set of facts given on the

The design of deductive reasoning algorithms for solving problems and the design of

**Step 1.** Classify problems such as problems as frames, problems of a determination or a

The basic technique for designing deductive algorithms is the unification of facts. Based on the kinds of facts and their structures, there will be criteria for unification proposed. Then it

The next important work is doing research on strategies for deduction to solve problems on computer. The most difficult thing is modeling for experience, sensible reaction and intuitional human to find heuristic rules, which were able to imitate the human thinking for

**Stage 5**: Creating a query language for the models. The language helps to design the

**Stage 6**: Designing the interface of the system and coding to produce the application. Intelligent applications for solving problems in education of mathematic, physic, chemistry have been implemented by using programming tools and computer algebra systems such as Visual Basic.NET or C#, SQL Server, Maple. They are very easy to use for students, to

**Stage 7**: Testing, maintaining and developing the application. This stage is similar as in

The main models for knowledge representation used in the above process will be presented

In artificial intelligence science, models and methods for knowledge representation play an important role in designing knowledge base systems and expert systems, especially intelligent problem solvers. Nowadays there are many various knowledge models which have already been suggested and applied. In the books of Sowa (2002), George F. Luger

kind, we can construct a general model for problems, which are given to the system

**Step 2.** Classify facts and representing them based on the kinds of facts of COKB model. **Step 3.** Modeling kinds of problems from classifying in step 1 and 2. From models of each

Net problem model consists of three parts:

objects, and Goal is a list, which consists of goals.

produces algorithms to check the unification of two facts.

communication between the system and users by words.

for solving them.

search, query and solve problems*.*

**3. Knowledge representation models** 

other computer systems.

in the next section.

solving problems.

interface of the system can be developed by three steps for modeling:

proof of a fact, problems of finding objects or facts, etc…

Fig. 1. Structure of a system

#### **2.2 Design process**

The process of analysis and design the components of the systems consists of the following stages.

**Stage 1:** Determine the knowledge domain and scope; then do collecting real knowledge consisting of data, concepts and objects, relations, operators and functions, facts and rules, etc. The knowledge can be classified according to some ways such as chapters, topics or subjects; and this classification help us to collect problems appropriately and easily. Problems are also classified by some methods such as frame-based problems, general forms of problems.

**Stage 2:** Knowledge representation or modeling for knowledge to obtain knowledge base model of the system. This is an important base for designing the knowledge base. Classes of problem are also modeled as well to obtain initial problem models.

The above stages can be done by using the COKB model, Com-Nets, CO-Nets, and their extensions. These models will be presented in section 3.

**Stage 3:** Establishing knowledge base organization for the system based on COKB model and its specification language. Knowledge base can be organized by structured text files. They include the files below.


The process of analysis and design the components of the systems consists of the following

**Stage 1:** Determine the knowledge domain and scope; then do collecting real knowledge consisting of data, concepts and objects, relations, operators and functions, facts and rules, etc. The knowledge can be classified according to some ways such as chapters, topics or subjects; and this classification help us to collect problems appropriately and easily. Problems are also classified by some methods such as frame-based problems, general forms

**Stage 2:** Knowledge representation or modeling for knowledge to obtain knowledge base model of the system. This is an important base for designing the knowledge base. Classes of

The above stages can be done by using the COKB model, Com-Nets, CO-Nets, and their

**Stage 3:** Establishing knowledge base organization for the system based on COKB model and its specification language. Knowledge base can be organized by structured text files.

A file stores information of the Hasse diagram representing the component H of COKB

 Files store the specification of relations (the component R of COKB model). Files store the specification of operators (the component Ops of COKB model). Files store the specification of functions (the component Funcs of COKB model).

problem are also modeled as well to obtain initial problem models.

extensions. These models will be presented in section 3.

A file stores the definition of kinds of facts.

Files stores names of concepts, and structures of concepts.

Fig. 1. Structure of a system

They include the files below.

 A file stores deductive rules. Files store certain objects and facts.

model.

**2.2 Design process** 

stages.

of problems.

**Stage 4:** Modeling of problems and designing algorithms for automated reasoning. General problems can be represented by using Com-Nets, CO-Nets, and their extensions. The CO-Net problem model consists of three parts:

$$\mathbf{O} = \langle \mathbf{O}\_1, \mathbf{O}\_2, \dots, \mathbf{O}\_n \rangle\_\prime \text{ F} = \langle \mathbf{f}\_1, \mathbf{f}\_2, \dots, \mathbf{f}\_m \rangle\_\prime \text{ F}$$

Goal = [ g1, g2, . . ., gm ].

In the above model the set O consists of n Com-objects, F is the set of facts given on the objects, and Goal is a list, which consists of goals.

The design of deductive reasoning algorithms for solving problems and the design of interface of the system can be developed by three steps for modeling:


The basic technique for designing deductive algorithms is the unification of facts. Based on the kinds of facts and their structures, there will be criteria for unification proposed. Then it produces algorithms to check the unification of two facts.

The next important work is doing research on strategies for deduction to solve problems on computer. The most difficult thing is modeling for experience, sensible reaction and intuitional human to find heuristic rules, which were able to imitate the human thinking for solving problems.

**Stage 5**: Creating a query language for the models. The language helps to design the communication between the system and users by words.

**Stage 6**: Designing the interface of the system and coding to produce the application. Intelligent applications for solving problems in education of mathematic, physic, chemistry have been implemented by using programming tools and computer algebra systems such as Visual Basic.NET or C#, SQL Server, Maple. They are very easy to use for students, to search, query and solve problems*.*

**Stage 7**: Testing, maintaining and developing the application. This stage is similar as in other computer systems.

The main models for knowledge representation used in the above process will be presented in the next section.

#### **3. Knowledge representation models**

In artificial intelligence science, models and methods for knowledge representation play an important role in designing knowledge base systems and expert systems, especially intelligent problem solvers. Nowadays there are many various knowledge models which have already been suggested and applied. In the books of Sowa (2002), George F. Luger

Intelligent Problem Solvers in Education: Design Method and Applications 127

properties or events of objects, and *Rules* is a set of deductive rules on facts. For example, knowledge about a triangle consists of elements (angles, edges, etc) together with formulas and some properties on them can be modeled as a class of C-objects whose sets

*Attrs* = {A, B, C, a, b, c, R, S, p, ...} is the set of all attributes of a triangle,

*F* = {A+B+C = ; a/sin(A) = 2R; b/sin(B) = 2R; c/sin(C) = 2R; a/sin(A) = b/sin(B); ... },

*Facts* = {a+b>c; a+c>b; b+c>a ; …},

*Rules* = { {a>b} {A>B}; {b>c} {B>C}; {c>a} {C>A}; {a=b} {A=B}; {a^2= b^2+c^2} {A=pi/2}; {A=pi/2} {a^2 = b^2+c^2, b c}; ...}. An object also has basic behaviors for solving problems on its attributes. Objects are

2. Executes deduction and gives answers for questions about problems of the form:

*For example*, when a triangle object is requested to give a solution for problem {a, B, C} S,

**Definition 3.2:** The model for knowledge bases of computational objects (COKB model)

(C, H, R, Ops, Funcs, Rules)

**C** is a set of concepts of computational objects. Each concept in C is a class of Com-

There are relations represent specializations between concepts in the set **C**; **H** represents these special relations on **C**. This relation is an ordered relation on the set **C**, and **H** can be

are as follows:

equipped abilities to solve problems such as:

3. Executes computations

Step 1: determine A, by A = -B-C;

**3.1.2 Components of COKB model** 

consists of six components:

 **Ops** is a set of operators. **Funcs** is a set of functions. **Rules** is a set of rules.

objects.

Step 2: determine b, by b = a.sin(B)/sin(A); Step 3: determine S, by S = a.b.sin(C)/2;

The meanings of the components are as follows:

**H** is a set of hierarchy relation on the concepts.

considered as the Hasse diagram for that relation.

**R** is a set of relations on the concepts.

1. Determines the closure of a set of attributes.

4. Suggests completing the hypothesis if needed.

it will give a solution consists of three following steps:

determine some attributes from some other attributes.

(2008), Michel Chein & Marie-Laure Mugnier (2009) and Frank van Harmelem & Vladimir & Bruce (2008) we have found popular methods for knowledge representation. They include predicate logic, semantic nets, frames, deductive rules, conceptual graphs. The above methods are very useful for designing intelligent systems, especially intelligent problem solvers. However, they are not suitable to represent knowledge in the domains of reality applications in many cases, especially the systems that can solve problems in practice based on knowledge bases. There have been new models proposed such as computational networks, networks of computational objects in Nhon Van Do (2009) and model for knowledge bases of computational objects (COKB) in Nhon Van Do (2010). The COKB model can be used to represent the total knowledge and to design the knowledge base component of practical intelligent systems. Networks of computational objects can be used for modeling problems in knowledge domains. These models are tools for designing inference engine of systems. The models have been used in designing some intelligent problem solvers in education (IPSE) such as the program for studying and solving problems in Plane Geometry in Nhon (2000), the program for solving problems about alternating current in physics. These applications are very easy to use for students in studying knowledge, to solve automatically problems and give human readable solutions agree with those written by teachers and students. In this section, the COKB model and computational networks, that are used for designing IPSE will be presented in details.

#### **3.1 COKB model**

The model for knowledge bases of computational objects (COKB) has been established from Object-Oriented approach to represent knowledge together with programming techniques for symbolic computation. There have been many results and tools for Object-Oriented methods, and some principles as well as techniques were presented in Mike (2005). This way also gives us a method to model problems and to design algorithms. The models are very useful for constructing components and the whole knowledge base of intelligent system in practice of knowledge domains.

#### **3.1.1 Computational objects**

In many problems we usually meet many different kinds of objects. Each object has attributes and internal relations between them. They also have basic behaviors for solving problems on its attributes.

**Definition 3.1:** A computational object (or Com-object) has the following characteristics:

	- Given a subset A of M(O). The object O can show us the attributes that can be determined from A.
	- The object O will give the value of an attribute.
	- It can also show the internal process of determining the attributes.

The structure computational objects can be modeled by (*Attrs, F, Facts, Rules*). *Attrs* is a set of attributes, *F* is a set of equations called computation relations, *Facts* is a set of properties or events of objects, and *Rules* is a set of deductive rules on facts. For example, knowledge about a triangle consists of elements (angles, edges, etc) together with formulas and some properties on them can be modeled as a class of C-objects whose sets are as follows:

*Attrs* = {A, B, C, a, b, c, R, S, p, ...} is the set of all attributes of a triangle,

$$F = \{ \mathbf{A} + \mathbf{B} + \mathbf{C} = \pi; \mathbf{a} / \sin(\mathbf{A}) = 2\mathbf{R}; \, \mathbf{b} / \sin(\mathbf{B}) = 2\mathbf{R}; \, \mathbf{c} / \sin(\mathbf{C}) = 2\mathbf{R}; \, \mathbf{a} / \sin(\mathbf{A}) = \mathbf{b} / \sin(\mathbf{B}); \, \dots \},$$
 
$$\text{Factts} = \{ \mathbf{a} + \mathbf{b} > \mathbf{c}; \, \mathbf{a} + \mathbf{c} > \mathbf{b}; \, \mathbf{b} + \mathbf{c} > \mathbf{a} > \dots \},$$

$$\begin{aligned} \text{Rales} &= \{ \{ \mathbf{a} \succeq \mathbf{b} \} \Leftrightarrow \{ \mathbf{A} \rhd \mathbf{B} \}; \{ \mathbf{b} \rhd \mathbf{c} \} \Leftrightarrow \{ \mathbf{B} \rhd \mathbf{C} \}; \{ \mathbf{c} \rhd \mathbf{a} \} \Leftrightarrow \{ \mathbf{C} \rhd \mathbf{A} \}; \{ \mathbf{a} \rhd \mathbf{b} \} \Leftrightarrow \{ \mathbf{A} \rhd \mathbf{B} \}; \\ \{ \mathbf{a} \diamond \mathbf{2} = \mathbf{b} \diamond \mathbf{2} + \mathbf{c} \diamond \mathbf{2} \} \Rightarrow \{ \mathbf{A} \succeq \mathbf{p} \land \{ \mathbf{2} \}; \{ \mathbf{A} \preceq \mathbf{p} \land \{ \mathbf{2} \} \Rightarrow \{ \mathbf{a} \diamond \mathbf{2} + \mathbf{c} \diamond \mathbf{2}, \mathbf{b} \perp \mathbf{c} \} \ldots \}. \end{aligned}$$

An object also has basic behaviors for solving problems on its attributes. Objects are equipped abilities to solve problems such as:


126 Intelligent Systems

(2008), Michel Chein & Marie-Laure Mugnier (2009) and Frank van Harmelem & Vladimir & Bruce (2008) we have found popular methods for knowledge representation. They include predicate logic, semantic nets, frames, deductive rules, conceptual graphs. The above methods are very useful for designing intelligent systems, especially intelligent problem solvers. However, they are not suitable to represent knowledge in the domains of reality applications in many cases, especially the systems that can solve problems in practice based on knowledge bases. There have been new models proposed such as computational networks, networks of computational objects in Nhon Van Do (2009) and model for knowledge bases of computational objects (COKB) in Nhon Van Do (2010). The COKB model can be used to represent the total knowledge and to design the knowledge base component of practical intelligent systems. Networks of computational objects can be used for modeling problems in knowledge domains. These models are tools for designing inference engine of systems. The models have been used in designing some intelligent problem solvers in education (IPSE) such as the program for studying and solving problems in Plane Geometry in Nhon (2000), the program for solving problems about alternating current in physics. These applications are very easy to use for students in studying knowledge, to solve automatically problems and give human readable solutions agree with those written by teachers and students. In this section, the COKB model and computational

The model for knowledge bases of computational objects (COKB) has been established from Object-Oriented approach to represent knowledge together with programming techniques for symbolic computation. There have been many results and tools for Object-Oriented methods, and some principles as well as techniques were presented in Mike (2005). This way also gives us a method to model problems and to design algorithms. The models are very useful for constructing components and the whole knowledge base of intelligent

In many problems we usually meet many different kinds of objects. Each object has attributes and internal relations between them. They also have basic behaviors for solving

1. It has valued attributes. The set consists of all attributes of the object O will be denoted

2. There are internal computational relations between attributes of a Com-object O. These

The structure computational objects can be modeled by (*Attrs, F, Facts, Rules*). *Attrs* is a set of attributes, *F* is a set of equations called computation relations, *Facts* is a set of


**Definition 3.1:** A computational object (or Com-object) has the following characteristics:


are manifested in the following features of the object:


networks, that are used for designing IPSE will be presented in details.

**3.1 COKB model** 

system in practice of knowledge domains.

determined from A.

**3.1.1 Computational objects** 

problems on its attributes.

by **M(O)**.

4. Suggests completing the hypothesis if needed.

*For example*, when a triangle object is requested to give a solution for problem {a, B, C} S, it will give a solution consists of three following steps:

Step 1: determine A, by A = -B-C; Step 2: determine b, by b = a.sin(B)/sin(A); Step 3: determine S, by S = a.b.sin(C)/2;

#### **3.1.2 Components of COKB model**

**Definition 3.2:** The model for knowledge bases of computational objects (COKB model) consists of six components:

(C, H, R, Ops, Funcs, Rules)

The meanings of the components are as follows:


There are relations represent specializations between concepts in the set **C**; **H** represents these special relations on **C**. This relation is an ordered relation on the set **C**, and **H** can be considered as the Hasse diagram for that relation.

Intelligent Problem Solvers in Education: Design Method and Applications 129

 **Fact of kind 4**: equality on objects or attributes of objects. This kind of facts is also popular, and there are many problems related to it on the knowledge base. The

Problem: Given the parallelogram ABCD. Suppose M and N are two points of segment AC

 **Fact of kind 5**: a dependence of an object on other objects by a general equation. An example in geometry for this kind of fact is that w = 2\*u + 3\*v; here u, v and w are

 **Fact of kind 6**: a relation on objects or attributes of the objects. In almost problems there are facts of kind 6 such as the parallel of two lines, a line is perpendicular to a plane, a

**Fact of kind 11**: a dependence of a function on other functions or other objects by an

The last five kinds of facts are related to knowledge about functions, the component **Funcs** in the COKB model. The problem below gives some examples for facts related to functions. Problem: Let d be the line with the equation 3x + 4y - 12 = 0. P and Q are intersection points

For each line segment, there exists one and only one point which is the central point of that segment. Therefore, there is a function MIDPOINT(A, B) that outputs the central point M of the line segment AB. Part (a) of the above problem can be represented as to find the point I such that I = MIDPOINT(P,Q), a fact of kind 9. The projection can also be represented by the function PROJECTION(M, d) that outputs the projection point N of point M onto line d. Part (b) of the above problem can be represented as to find the point A such that A =

Unification algorithms of facts were designed and used in different applications such as the system that supports studying knowledge and solving analytic geometry problems, the program for studying and solving problems in Plane Geometry, the knowledge system in

The language for the COKB model is constructed to specify knowledge bases with

knowledge of the form COKB model. This language includes the following:

the equation 2x + 3y – z + 6 = 0, and the point M has the coordinate (1, 2, 3).

following problem in plane geometry gives some examples for facts of kind 4.

In the problem we have to determine equality between two C-objects, a fact of kind 4.

**Fact of kind 8**: a determination of a function by a value or a constant expression.

such that AM = CN. Prove that two triangles ABM and CDN are equal.

vectors.

equation.

linear algebra.

of d and the axes Ox, Oy.

a. Find the central point of PQ

b. Find the projection of O onto the line d.

PROJECTION(O,d), which is also a fact of kind 9.

**3.1.4 Specification language for COKB model** 

point belongs to a line segment.

**Fact of kind 7**: a determination of a function.

**Fact of kind 9**: equality between an object and a function.

**Fact of kind 10**: equality between a function and another function.

geometry: in the triangle ABC, suppose that the length of edge BC = 5; the plane (P) has

**R** is a set of other relations on **C**, and in case a relation r is a binary relation it may have properties such as reflexivity, symmetry, etc. In plane geometry and analytic geometry, there are many such relations: relation "belongs to" of a point and a line, relation "central point" of a point and a line segment, relation "parallel" between two line segments, relation "perpendicular" between two line segments, the equality relation between triangles, etc.

The set **Ops** consists of operators on **C**. This component represents a part of knowledge about operations on the objects. Almost knowledge domains have a component consisting of operators. In analytic geometry there are vector operators such as addition, multiplication of a vector by a scalar, cross product, vector product; in linear algebra there are operations on matrices. The COKB model helps to organize this kind of knowledge in knowledge domains as a component in the knowledge base of intelligent systems.

The set **Funcs** consists of functions on Com-objects. Knowledge about functions is also a popular kind of knowledge in almost knowledge domains in practice, especially fields of natural sciences such as fields of mathematics, fields of physics. In analytic geometry we have the functions: distance between two points, distance from a point to a line or a plane, projection of a point or a line onto a plane, etc. The determinant of a square matrix is also a function on square matrices in linear algebra.

The set **Rules** represents for deductive rules. The set of rules is certain part of knowledge bases. The rules represent for statements, theorems, principles, formulas, and so forth. Almost rules can be written as the form "if <facts> then <facts>". In the structure of a deductive rule, <facts> is a set of facts with certain classification. Therefore, we use deductive rules in the COKB model. Facts must be classified so that the component **Rules** can be specified and processed in the inference engine of knowledge base system or intelligent systems.

#### **3.1.3 Facts in COKB model**

In the COKB model there are 11 kinds of facts accepted. These kinds of facts have been proposed from the researching on real requirements and problems in different domains of knowledge. The kinds of facts are as follows:


Problem: Given the points E and F, and the line (d). Suppose E, F, and (d) are determined. (P) is the plane satisfying the relations: E (P), F (P), and (d) // (P). Find the general equation of (P).

In this problem we have three facts of kind 3: (1) point E is determined or we have already known the coordinates of E, (2) point F is determined, (3) line (d) is determined or we have already known the equation of (d).

 **Fact of kind 3**: a determination of an object or an attribute of an object by a value or a constant expression. These are some examples in plane geometry and in analytic

**R** is a set of other relations on **C**, and in case a relation r is a binary relation it may have properties such as reflexivity, symmetry, etc. In plane geometry and analytic geometry, there are many such relations: relation "belongs to" of a point and a line, relation "central point" of a point and a line segment, relation "parallel" between two line segments, relation "perpendicular" between two line segments, the equality relation between triangles, etc.

The set **Ops** consists of operators on **C**. This component represents a part of knowledge about operations on the objects. Almost knowledge domains have a component consisting of operators. In analytic geometry there are vector operators such as addition, multiplication of a vector by a scalar, cross product, vector product; in linear algebra there are operations on matrices. The COKB model helps to organize this kind of knowledge in knowledge

The set **Funcs** consists of functions on Com-objects. Knowledge about functions is also a popular kind of knowledge in almost knowledge domains in practice, especially fields of natural sciences such as fields of mathematics, fields of physics. In analytic geometry we have the functions: distance between two points, distance from a point to a line or a plane, projection of a point or a line onto a plane, etc. The determinant of a square matrix is also a

The set **Rules** represents for deductive rules. The set of rules is certain part of knowledge bases. The rules represent for statements, theorems, principles, formulas, and so forth. Almost rules can be written as the form "if <facts> then <facts>". In the structure of a deductive rule, <facts> is a set of facts with certain classification. Therefore, we use deductive rules in the COKB model. Facts must be classified so that the component **Rules** can be specified and processed in the inference engine of knowledge base system or

In the COKB model there are 11 kinds of facts accepted. These kinds of facts have been proposed from the researching on real requirements and problems in different domains of

**Fact of kind 1**: information about object kind. Some examples are ABC is a right

**Fact of kind 2**: a determination of an object or an attribute of an object. The following

Problem: Given the points E and F, and the line (d). Suppose E, F, and (d) are determined. (P) is the plane satisfying the relations: E (P), F (P), and (d) // (P). Find the general

In this problem we have three facts of kind 3: (1) point E is determined or we have already known the coordinates of E, (2) point F is determined, (3) line (d) is determined or we have

 **Fact of kind 3**: a determination of an object or an attribute of an object by a value or a constant expression. These are some examples in plane geometry and in analytic

triangle, ABCD is a parallelogram, matrix A is a square matrix.

problem in analytic geometry gives some examples for facts of kind 2.

domains as a component in the knowledge base of intelligent systems.

function on square matrices in linear algebra.

knowledge. The kinds of facts are as follows:

intelligent systems.

equation of (P).

**3.1.3 Facts in COKB model** 

already known the equation of (d).

geometry: in the triangle ABC, suppose that the length of edge BC = 5; the plane (P) has the equation 2x + 3y – z + 6 = 0, and the point M has the coordinate (1, 2, 3).

 **Fact of kind 4**: equality on objects or attributes of objects. This kind of facts is also popular, and there are many problems related to it on the knowledge base. The following problem in plane geometry gives some examples for facts of kind 4.

Problem: Given the parallelogram ABCD. Suppose M and N are two points of segment AC such that AM = CN. Prove that two triangles ABM and CDN are equal.

In the problem we have to determine equality between two C-objects, a fact of kind 4.


The last five kinds of facts are related to knowledge about functions, the component **Funcs** in the COKB model. The problem below gives some examples for facts related to functions.

Problem: Let d be the line with the equation 3x + 4y - 12 = 0. P and Q are intersection points of d and the axes Ox, Oy.


For each line segment, there exists one and only one point which is the central point of that segment. Therefore, there is a function MIDPOINT(A, B) that outputs the central point M of the line segment AB. Part (a) of the above problem can be represented as to find the point I such that I = MIDPOINT(P,Q), a fact of kind 9. The projection can also be represented by the function PROJECTION(M, d) that outputs the projection point N of point M onto line d. Part (b) of the above problem can be represented as to find the point A such that A = PROJECTION(O,d), which is also a fact of kind 9.

Unification algorithms of facts were designed and used in different applications such as the system that supports studying knowledge and solving analytic geometry problems, the program for studying and solving problems in Plane Geometry, the knowledge system in linear algebra.

#### **3.1.4 Specification language for COKB model**

The language for the COKB model is constructed to specify knowledge bases with knowledge of the form COKB model. This language includes the following:

Intelligent Problem Solvers in Education: Design Method and Applications 131

fact-def ::= object-type | attribute | name | equation | relation | expression

object-type ::= cobject-type **(**name**)** | cobject-type **(**name <, name>\* **)**

**Definitions of special relations:**

**Definitions of facts:**

isa ::= ISA: name <, name>\*; hasa ::= HASA: [fact-def]

facts ::= **FACT**: fact-def+

**Definitions of relations based on facts:**

[facts]

**Definitions of functions – form 1:**

 [constraint] [facts]

**Definitions of functions – form 2:**

 [constraint] [variables] [statements] **ENDFUNCTION**; statements ::= statement-def+

asign-stmt ::= name := expr;

**3.2 Computational networks** 

relation-def ::= **RELATION** name;

 **ENDRELATION**; argument-def ::= name <, name>\*: type;

function-def ::= **FUNCTION** name;

 **ENDFUNCTION**; return-def ::= name: type;

function-def ::= **FUNCTION** name;

**RETURN**: return-def;

 **RETURN**: return-def;

relation ::= relation **(** name <, name>+ **)**

**ARGUMENT**: argument-def+

 **ARGUMENT**: argument-def+

**ARGUMENT**: argument-def+

statement-def ::= assign-stmt | if-stmt | for-stmt

if-stmt ::= **IF** logic-expr **THEN** statements+ **ENDIF**; |

for-stmt ::= **FOR** name **IN [**range**] DO** statements+ **ENDFOR**;

networks will be useful tool for design intelligent systems, especially IPSE.

**IF** logic-expr **THEN** statements+ **ELSE** statements+ **ENDIF**;

In this section, we present the models computational networks with simple valued variables and networks of computational objects. They have been used to represent knowledge in many domains of knowledge. The methods and techniques for solving the problems on the


The followings are some structures of definitions for expressions, Com-Objects, relations, facts, and functions.

#### **Definitions of expressions:**


#### **Definitions of Com-object type:**


#### **Definitions of computational relations:**


#### **Definitions of special relations:**


#### **Definitions of facts:**

130 Intelligent Systems

The followings are some structures of definitions for expressions, Com-Objects, relations,

logic-expr ::= logic-expr **OR** logic-term | logic-expr **IMPLIES** logic-term |

logic-primary ::= expr | rel-expr |function-call | quantify-expr |**TRUE** | **FALSE** quantify-expr ::= **FORALL(**name <, name>\***),** logic-expr | **EXISTS(**name**),** logic-



expr ::= expr | rel-expr | logic-expr expr ::= expr add-operator term | term term ::= term mul-operator factor | factor factor ::= **–** factor | element **^** factor | element element ::= **(** expr **)** | name | number | function-call

rel-expr ::= expr rel-operator expr

cobject-type ::= **COBJECT** name;

**Definitions of computational relations: crelations ::= CRELATION: crelation-def+ ENDCRELATION;** 

crelation-def ::= **CR** name;

 **ENDCR;** equation ::= expr = expr

 **MF**: name <, name>\*; **MFEXP**: equation;

**NOT** logic-term |logic-term

logic-term ::= logic-term **AND** logic-primary |logic-primary





facts, and functions.

**Definitions of expressions:** 

 expr **Definitions of Com-object type:** 

 [isa] [hasa] [constructs] [attributes] [constraints] [crelations] [facts] [rules] **ENDCOBJECT;** 


#### **Definitions of relations based on facts:**


#### **Definitions of functions – form 1:**


#### **Definitions of functions – form 2:**


#### **3.2 Computational networks**

In this section, we present the models computational networks with simple valued variables and networks of computational objects. They have been used to represent knowledge in many domains of knowledge. The methods and techniques for solving the problems on the networks will be useful tool for design intelligent systems, especially IPSE.

Intelligent Problem Solvers in Education: Design Method and Applications 133

Step 2: ifG H then begin Solution\_found true; goto step 4; end

 while not Solution\_found and (f found) do begin if (applying f from H produces new facts)

H H M(f); Add f to Solution;

**Algorithm 3.2:** Find a good solution from a solution S = [f1, f2, ..., fk] of the problem HG on

On a computational net (M, F), in many cases the problem HG has a solution S in which there are relations producing some redundancy variables. At those situations, we must determine necessary variables of each step in the problem solving process. The following theorem shows the way to analyze the solution to determine necessary variables to compute

**Theorem 3.2:** Given a computational net K = (M, F). Let [f1, f2, ..., fm] be a good solution of the problem HG. denote A0 = H, Ai = [f1, f2, ..., fi](H), with i=1, ..., m. Then there exists a

3. For i=1,...,m, [fi] is a solution of the problem Bi-1 Bi but not to be a solution of the

Solution\_found true;

**Algorithm 3.1:** Find a solution of the problem HG.

else Solution\_found false;

then begin

Step 4: if not Solution\_found then

 Insert fk at the beginning of NewS; V (V – v(fk)) (u(fk) – H);

list [B0, B1, ..., Bm-1, Bm] satisfying the following conditions:

problem B Bi , with B is any proper subset B of Bi-1.

 end; if G H then

 Select new f F; end**;**  while Until Solution\_found **or** (H = Hold);

There is no solution found;

Solution is a solution of the problem;

Step 1: Solution empty;

 Hold H; Select f F;

Step 3: Repeat

else

computational net (M, F). Step 1: NewS []; V G; Step 2: for i := k downto 1 do If v(fk) V the Begin

Step 3: NewS is a good solution.

2. Bi Ai , with i=0, 1, ..., m.

End

at each step.

1. Bm = G,

#### **3.2.1 Computational networks with simple valued variables**

In this part a simple model of computational networks will be presented together related problems and techniques for solving them. Although this model is not very complicated, but it is a very useful tool for designing many knowledge base systems in practice.

**Definition 3.3:** A *computational network (Com-Net) with simple valued variables* is a pair (M, F), in which M = x1, x2, ..., xn is a set of variables with simple values (or unstructured values), and F = f1, f2, ..., fm is a set of computational relations over the variables in the set M. Each computational relation f F has the following form:


Remark: In many applications equations can be represented as deduction rules.

**Problems:** Given a computational net (M, F). The popular problem arising from reality applications is that to find a solution to determine a set H M from a set G M. This problem is denoted by the symbol HG, H is the hypothesis and G is the goal of the problem. To solve the problem we have to answer two questions below:

Q1: Is the problem solvable based on the knowledge K = (M, F)?

Q2: How to obtain the goal G from the hypothesis H based on the knowledge K = (M, F) in case the problem is solvable?

**Definition 3.4:** Given a computational net K = (M, F).


**Definition 3.5:** Given a computational net K = (M, F). Let A be a subset of M. It is easy to verify that there exists a unique set A M such that the problem A A is solvable; the set A is called the closure of A.

The following are some algorithms and results that show methods and techniques for solving the above problems on computational nets.

**Theorem 3.1:** Given a computational net K = (M, F). The following statements are equivalent.


**Algorithm 3.1:** Find a solution of the problem HG.

132 Intelligent Systems

In this part a simple model of computational networks will be presented together related problems and techniques for solving them. Although this model is not very complicated, but

**Definition 3.3:** A *computational network (Com-Net) with simple valued variables* is a pair (M, F), in which M = x1, x2, ..., xn is a set of variables with simple values (or unstructured values), and F = f1, f2, ..., fm is a set of computational relations over the variables in the set M. Each

ii. Deductive rule f : u(f) v(f), with u(f) M, v(f) M, and there are corresponding formulas to determine (or to compute) variables in v(f) from variables in u(f).We also

**Problems:** Given a computational net (M, F). The popular problem arising from reality applications is that to find a solution to determine a set H M from a set G M. This problem is denoted by the symbol HG, H is the hypothesis and G is the goal of the

Q2: How to obtain the goal G from the hypothesis H based on the knowledge K = (M, F) in

i. For each A M and f F, denote f(A) = A M(f) be the set obtained from A by applying f. Let S = [f1, f2, ..., fk] be a list consisting relations in F, the notation S(A) = fk(fk-1(… f2(f1(A)) … )) is used to denote the set of variables obtained from A by applying

ii. The list S = [f1, f2, ..., fk] is called a *solution* of the problem HG if S(H) G. Solution S is called a good solution if there is not a proper sublist S' of S such that S' is also a

**Definition 3.5:** Given a computational net K = (M, F). Let A be a subset of M. It is easy to verify that there exists a unique set A M such that the problem A A is solvable; the set

The following are some algorithms and results that show methods and techniques for

**Theorem 3.1:** Given a computational net K = (M, F). The following statements are

solution of the problem. The problem is *solvable* if there is a solution to solve it.

it is a very useful tool for designing many knowledge base systems in practice.

Remark: In many applications equations can be represented as deduction rules.

problem. To solve the problem we have to answer two questions below:

Q1: Is the problem solvable based on the knowledge K = (M, F)?

**Definition 3.4:** Given a computational net K = (M, F).

solving the above problems on computational nets.

iii. There exists a list of relations S such that S(H) G.

**3.2.1 Computational networks with simple valued variables** 

computational relation f F has the following form:

i. An equation over some variables in M, or

define the set M(f) = u(f) v(f).

case the problem is solvable?

relations in S.

A is called the closure of A.

i. Problem HG is solvable.

equivalent.

ii. H G.

```
 Step 1: Solution  empty; 
 Step 2: if G  H then begin Solution_found  true; goto step 4; end 
 else Solution_found  false; 
 Step 3: Repeat 
 Hold  H; 
 Select f  F; 
 while not Solution_found and (f found) do begin 
 if (applying f from H produces new facts) 
 then begin 
 H  H  M(f); Add f to Solution; 
 end; 
 if G  H then 
 Solution_found  true; 
 Select new f  F; 
 end;  while 
 Until Solution_found or (H = Hold); 
 Step 4: if not Solution_found then 
 There is no solution found; 
 else 
 Solution is a solution of the problem;
```
**Algorithm 3.2:** Find a good solution from a solution S = [f1, f2, ..., fk] of the problem HG on computational net (M, F).

Step 1: NewS []; V G; Step 2: for i := k downto 1 do If v(fk) V the Begin Insert fk at the beginning of NewS; V (V – v(fk)) (u(fk) – H); End Step 3: NewS is a good solution.

On a computational net (M, F), in many cases the problem HG has a solution S in which there are relations producing some redundancy variables. At those situations, we must determine necessary variables of each step in the problem solving process. The following theorem shows the way to analyze the solution to determine necessary variables to compute at each step.

**Theorem 3.2:** Given a computational net K = (M, F). Let [f1, f2, ..., fm] be a good solution of the problem HG. denote A0 = H, Ai = [f1, f2, ..., fi](H), with i=1, ..., m. Then there exists a list [B0, B1, ..., Bm-1, Bm] satisfying the following conditions:


Intelligent Problem Solvers in Education: Design Method and Applications 135

O = {O1: triangle ABC with AB = AC, O2 : triangle AEG, O3 : square ABDE, O4 : square

**Definition 3.8:** Let (O, F) be a network of Com-objects, and M be a set of concerned

a. For each f F, denote f(A) is the union of the set A and the set consists of all attributes in M deduced from A by f. Similarly, for each Com-object Oi O, Oi(A) is the union of the set A and the set consists of all attributes (in M) that the object Oi can determine

b. Suppose D = [t1, t2, ..., tm] is a list of elements in F O. Denote A0 = A, A1 = t1(A0), . . .,

We have A0 A1 . . . Am = D(A) M. Problem HG is called *solvable* if there is a list

Technically the above theorems and algorithms can be developed to obtain the new ones for solving the problem HG on network of Com-objects (O,F). They will be omitted here except the algorithm to find a solution of the problem. The worthy of note is that the objects

D F O such that D(A) B. In this case, we say that D is a *solution* of the problem.

**Algorithm 3.3:** Find a solution of the problem HG on a network of Com-objects.

Step 2: ifG H then begin Solution\_found true; goto step 5; end

while not Solution\_found and (f found) do begin

H H M(f); Add f to Solution;

if G H then Solution\_found true;

if (applying f from H produces new facts) then begin

The problem can be considered on the network of Com-objects (O, F) as follows:

f1 : O1.c = O3.a the edge c of triangle ABC = the edge of the square ABDE f2 : O1.b = O4.a the edge b of triangle ABC = the edge of the square ACFG f3 : O2.b = O4.a the edge b of triangle AEG = the edge of the square ACFG f4 : O2.c = O3.a the edge c of triangle AEG = the edge of the square ABDE

ACFG }, and F = {f1, f2, f3, f4, f5} consists of the following relations

f5 : O1.A + O2.A = .

attributes. Suppose A is a subset of M.

Am = tm(Am-1), and D(A) = Am.

Step 1: Solution empty;

 Hold H; Select f F;

Step 3: Repeat

may participate in solutions as computational relations.

Else Solution\_found false;

end;

 Select new f F; end; while Until Solution\_found or (H = Hold); Step 4: if not Solution\_found thenbegin

 Select Oi O such that Oi(H) H; if (the selection is successful) then begin H Oi(H); Add Oi to Solution;

if (G H) thenbegin

from attributes in A.

#### **3.2.2 Networks of computational objects**

In many problems we usually meet many different kinds of objects. Each object has attributes and internal relations between them. Therefore, it is necessary to consider an extension of computational nets in which each variable is a computational object.

**Definition 3.6:** A computational object (or Com-object) has the following characteristics:

	- Given a subset A of M(O). The object O can show us the attributes that can be determined from A.
	- The object O will give the value of an attribute.
	- It can also show the internal process of determining the attributes.

Example 3.1: A triangle with some knowledge (formulas, theorems, etc ...) is an object. The attributes of a "triangle" object are 3 edges, 3 angles, etc. A "triangle" object can also answer some questions such as "Is there a solution for the problem that to compute the surface from one edge and two angles?".

**Definition 3.7:** A computational relation f between attributes or objects is called a *relation between the objects*. A network of Com-objects will consists of a set of Com-objects O = O1, O2, ..., On and a set of computational relations F = f1, f2, ... , fm. This network of Comobjects is denoted by (O, F).

On the network of Com-objects (O, F), we consider the problem that to determine (or compute) attributes in set G from given attributes in set H. The problem will be denoted by HG.

Example 3.2: In figure 2 below, suppose that AB = AC, the values of the angle A and the edge BC are given (hypothesis). ABDE and ACFG are squares. Compute EG.

Fig. 2. A problem in geometry

In many problems we usually meet many different kinds of objects. Each object has attributes and internal relations between them. Therefore, it is necessary to consider an

1. It has valued attributes. The set consists of all attributes of the object O will be denoted

2. There are internal computational relations between attributes of a Com-object O. These

Example 3.1: A triangle with some knowledge (formulas, theorems, etc ...) is an object. The attributes of a "triangle" object are 3 edges, 3 angles, etc. A "triangle" object can also answer some questions such as "Is there a solution for the problem that to compute the surface from

**Definition 3.7:** A computational relation f between attributes or objects is called a *relation between the objects*. A network of Com-objects will consists of a set of Com-objects O = O1, O2, ..., On and a set of computational relations F = f1, f2, ... , fm. This network of Com-

On the network of Com-objects (O, F), we consider the problem that to determine (or compute) attributes in set G from given attributes in set H. The problem will be denoted by

Example 3.2: In figure 2 below, suppose that AB = AC, the values of the angle A and the

edge BC are given (hypothesis). ABDE and ACFG are squares. Compute EG.


**Definition 3.6:** A computational object (or Com-object) has the following characteristics:

extension of computational nets in which each variable is a computational object.


are manifested in the following features of the object:


**3.2.2 Networks of computational objects** 

determined from A.

one edge and two angles?".

objects is denoted by (O, F).

Fig. 2. A problem in geometry

HG.

by M(O).

The problem can be considered on the network of Com-objects (O, F) as follows:

O = {O1: triangle ABC with AB = AC, O2 : triangle AEG, O3 : square ABDE, O4 : square ACFG }, and F = {f1, f2, f3, f4, f5} consists of the following relations f1 : O1.c = O3.a the edge c of triangle ABC = the edge of the square ABDE f2 : O1.b = O4.a the edge b of triangle ABC = the edge of the square ACFG f3 : O2.b = O4.a the edge b of triangle AEG = the edge of the square ACFG f4 : O2.c = O3.a the edge c of triangle AEG = the edge of the square ABDE f5 : O1.A + O2.A = .

**Definition 3.8:** Let (O, F) be a network of Com-objects, and M be a set of concerned attributes. Suppose A is a subset of M.


We have A0 A1 . . . Am = D(A) M. Problem HG is called *solvable* if there is a list D F O such that D(A) B. In this case, we say that D is a *solution* of the problem.

Technically the above theorems and algorithms can be developed to obtain the new ones for solving the problem HG on network of Com-objects (O,F). They will be omitted here except the algorithm to find a solution of the problem. The worthy of note is that the objects may participate in solutions as computational relations.

**Algorithm 3.3:** Find a solution of the problem HG on a network of Com-objects.

```
 Step 1: Solution  empty; 
 Step 2: if G  H then begin Solution_found  true; goto step 5; end 
 Else Solution_found  false; 
 Step 3: Repeat 
 Hold  H; 
 Select f  F; 
 while not Solution_found and (f found) do begin 
 if (applying f from H produces new facts) then begin 
 H  H  M(f); Add f to Solution; 
 end; 
 if G  H then Solution_found  true; 
 Select new f  F; 
 end;  while 
 Until Solution_found or (H = Hold); 
 Step 4: if not Solution_found then begin
 Select Oi  O such that Oi(H)  H; 
 if (the selection is successful) then begin 
 H  Oi(H); Add Oi to Solution; 
 if (G  H) then begin
```
Intelligent Problem Solvers in Education: Design Method and Applications 137

 M = Mv Mf is a set of attributes or elements, with simple valued or functional valued.Mv = {xv1, xv2, …, xvk} is the set of simple valued variables. Mf = {xf1, xf2, … xfm} is

 R = Rvv Rfv Rvf Rfvf is the set of deduction rules, and R is the union of four subsets of rules Rvv, Rfv, Rvf, Rfvf. Each rule r has the form r: u(r)v(r), with u(r) is the hypotheses of r and v(r) is the conclusion of r. A rule is also one of the four cases below.

Case 4: r Rfvf. For this case, u(r) M, u(r) Mf , u(r) Mv , and v(r) Mf.

relations, that are deduction rules or the computational relations.

O = {O1, O2, …, On} is the set of extensive computational objects.

Each rule in R has the corresponding computational relation in F = Fvv Ffv Fvf Ffvf.

**Definition 3.10:** An *extensive computational Object* (ECom-Object) is an object O has structure

 A set of attributes Attr(O) = Mv Mf, with Mv is a set of simple valued variables; Mf is a set of functional variables. Between the variables (or attributes) there are internal

 The object O has behaviors of reasoning and computing on attributes of objects or facts such as: find the closure of a set A Attr(O); find a solution of problems which has the form A→B, with A Attr(O) and B Attr(O); perform computations; consider

**Definition 3.11:** An *extensive computational objects network* is a model **(O, M, F, T)** that has the

 M is a set of object attributes. We will use the following notations: Mv(Oi) is the set of simple valued attributes of the object Oi, Mf(Oi) is the set of functional attributes of Oi,

On the structure (O,T), there are expressions of objects. Each expression of objects has its

The main components of an IPSE are considered here consists of the knowledge base, the inference engine. Design of these components will be discussed and presented in this

The design process of the system presented in section 2 consists of seven stages. After stage 1 for collecting real knowledge, the knowledge base design includes stage 2 and stage 3. It

M(Oi) = Mv(Oi) Mf(Oi), M(O) = M(O1) M(O2) … M(On), and M M(O). F = F(O) is the set of the computational relations on attributes in M and on objects in O.

the set of functional valued elements.

determination of objects or facts.

T={t1, t2, …, tk} is set of operators on objects.

including:

components below.

attributes as if it is an object.

section.

**4. Design of main components** 

**4.1 Design of knowledge base** 

includes the following tasks.

Case 1: r Rvv. For this case, u(r) Mv and v(r) Mv. Case 2: r Rfv. For this case, u(r) Mf and v(r) Mv. Case 3: r Rvf. For this case, u(r) Mv and v(r) Mf.

 Solution\_found true; goto step 5; end; goto step 3;

end;

else

end;

 Step 5: if not Solution\_found then There is no solution found; else Solution is a solution of the problem;

Example 3.3: Consider the network (O, F) in example 3.2, and the problem HG, where

H = O1.a, O1.A, and G = O2.a.

Here we have: M(f1) = O1.c , O3.a , M(f2) = O1.b , O4.a , M(f3) = O2.b , O4.a ,

M(f4) = O2.c , O3.a , M(f5) = O1. , O2. ,

M = O1.a, O1.b, O1.c, O1.A, O2.b, O2.c, O2.A , O2.a, O3.a, O4.a .

The above algorithms will produce the solution D = f5, O1, f1, f2, f3, f4, O2, and the process of extending the set of attributes as follows:

A0 <sup>f</sup> 5 A1 <sup>O</sup> <sup>1</sup> A2 f <sup>1</sup> A3 <sup>f</sup> <sup>2</sup> A4 <sup>f</sup> 3 A5 <sup>f</sup> <sup>4</sup> A6 <sup>O</sup> <sup>2</sup> A7 with A0 = A = O1.a , O1.A, A1 = O1.a , O1.A, O2.A, A2 = O1.a , O1.A, O2.A, O1.b, O1.c , A3 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, A4 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, O4.a, A5 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, O4.a, O2.b, A6 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, O4.a, O2.b, O2.c, A7 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, O4.a, O2.b, O2.c, O2.a.

#### **3.2.3 Extensions of computational networks**

Computational Networks with simple valued variables and networks of computational objects can be used to represent knowledge in many domains of knowledge. The basic components of knowledge consist of a set of simple valued variables and a set of computational relations over the variables. However, there are domains of knowledge based on a set of elements, in which each element can be a simple valued variables or a function. For example, in the knowledge of alternating current the alternating current intensity i(t) and the alternating potential u(t) are functions. It requires considering some extensions of computational networks such as *extensive computational networks* and *extensive computational objects networks* that are defined below.

**Definition 3.9:** An *extensive computational network* is a structure (M,R) consisting of two following sets:

Solution\_found true; goto step 5;

Example 3.3: Consider the network (O, F) in example 3.2, and the problem HG, where

Here we have: M(f1) = O1.c , O3.a , M(f2) = O1.b , O4.a , M(f3) = O2.b , O4.a ,

<sup>1</sup> A3 <sup>f</sup>

H = O1.a, O1.A, and G = O2.a.

M(f4) = O2.c , O3.a , M(f5) = O1. , O2. ,

M = O1.a, O1.b, O1.c, O1.A, O2.b, O2.c, O2.A , O2.a, O3.a, O4.a . The above algorithms will produce the solution D = f5, O1, f1, f2, f3, f4, O2, and the process

Computational Networks with simple valued variables and networks of computational objects can be used to represent knowledge in many domains of knowledge. The basic components of knowledge consist of a set of simple valued variables and a set of computational relations over the variables. However, there are domains of knowledge based on a set of elements, in which each element can be a simple valued variables or a function. For example, in the knowledge of alternating current the alternating current intensity i(t) and the alternating potential u(t) are functions. It requires considering some extensions of computational networks such as *extensive computational networks* and *extensive computational* 

**Definition 3.9:** An *extensive computational network* is a structure (M,R) consisting of two

<sup>2</sup> A4 <sup>f</sup>

3 A5 <sup>f</sup>

<sup>4</sup> A6 <sup>O</sup>

<sup>2</sup> A7

Step 5: if not Solution\_found then There is no solution found;

else Solution is a solution of the problem;

<sup>1</sup> A2 f

end;

goto step 3;

else

end;

of extending the set of attributes as follows:

 A2 = O1.a , O1.A, O2.A, O1.b, O1.c , A3 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, A4 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, O4.a, A5 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, O4.a, O2.b, A6 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, O4.a, O2.b, O2.c, A7 = O1.a , O1.A, O2.A, O1.b, O1.c, O3.a, O4.a, O2.b, O2.c, O2.a.

**3.2.3 Extensions of computational networks** 

*objects networks* that are defined below.

following sets:

5 A1 <sup>O</sup>

with A0 = A = O1.a , O1.A, A1 = O1.a , O1.A, O2.A,

A0 <sup>f</sup>

end;


Case 1: r Rvv. For this case, u(r) Mv and v(r) Mv. Case 2: r Rfv. For this case, u(r) Mf and v(r) Mv. Case 3: r Rvf. For this case, u(r) Mv and v(r) Mf. Case 4: r Rfvf. For this case, u(r) M, u(r) Mf , u(r) Mv , and v(r) Mf.

Each rule in R has the corresponding computational relation in F = Fvv Ffv Fvf Ffvf.

**Definition 3.10:** An *extensive computational Object* (ECom-Object) is an object O has structure including:


**Definition 3.11:** An *extensive computational objects network* is a model **(O, M, F, T)** that has the components below.


On the structure (O,T), there are expressions of objects. Each expression of objects has its attributes as if it is an object.

#### **4. Design of main components**

The main components of an IPSE are considered here consists of the knowledge base, the inference engine. Design of these components will be discussed and presented in this section.

#### **4.1 Design of knowledge base**

The design process of the system presented in section 2 consists of seven stages. After stage 1 for collecting real knowledge, the knowledge base design includes stage 2 and stage 3. It includes the following tasks.

Intelligent Problem Solvers in Education: Design Method and Applications 139




[<operator symbol>, <list of concepts>, <result concept>], <property>, <property>,

[<operator symbol>, <list of concepts>, <result concept>], <property>, <property>,

specification of relations. The structure of file RELATIONS.txt is as follows

[<relation name>, <concept>, <concept>, ... ], <property>, <property>, ...; [<relation name>, <concept>, <concept>, ... ], <property>, <property>, ...;

component H of COKB model. It has the following structure:

**begin\_constraints** 

**end\_constraints begin\_properties** 

**end\_properties** 

**begin\_relation** 

**end\_relation** 

**begin\_rules begin\_rule** 

{ facts} **goal\_part:**  { facts} **end\_rule**  ... **end\_rules end\_concept** 

...

...

...;

...; ...

**hypothesis\_part:** 

**begin\_Hierarchy** 

**end\_Hierarchy** 

**begin\_Relations** 

**end\_Relations** 

**begin\_Operators** 

**end\_Operators** 

OPERATORS.txt is as follows

[<concept name 1>, <concept name 2>]

...

specification of constraints

specification of internal facts

**begin\_computation\_relations** 

specification of a relation

**end\_computation\_relations** 

kind\_rule = "<kind of rule>";


#### **end\_concepts**

...

For each concept, we have the corresponding file with the file name <concept name>.txt, which contains the specification of this concept. This file has the following structure:

**begin\_concept**: <concept name>[based objects] specification of based objects **begin\_variables**  <attribute name> : <attribute type>;

... **end\_variables** 

 The knowledge domain collected, which is denoted by K, will be represented or modeled base on the knowledge model COKB, Com-Net, CO-Net and their extensions or restrictions known. Some restrictions of COKB model were used to design knowledge bases are the model COKB lacking operator component (C, H, R, Funcs, Rules), the model COKB without function component (C, H, R, Ops, Rules), and the simple COKB sub-model (C, H, R, Rules). From Studying and analyzing the whole of knowledge in the domain, It is not difficult to determine known forms of knowledge presenting, together with relationships between them. In case that knowledge K has the known form such as COKB model, we can use this model for representing knowledge K directly. Otherwise, the knowledge K can be partitioned into knowledge sub-domains Ki (i = 1, 2, …, n) with lower complexity and certain relationships, and each Ki has the form known. The relationships between knowledge sub-domains must be clear so that we can integrate knowledge models of the

 Each knowledge sub-domain Ki will be modeled by using the above knowledge models, so a knowledge model M(Ki) of knowledge Ki will be established. The relationships on {Ki} are also specified or represented. The models {M(Ki)} together with their relationships are integrated to produce a model M(K) of the knowledge K. Then we obtain a knowledge model M(K) for the whole knowledge K of the

 Next, it is needed to construct a specification language for the knowledge base of the system. The COKB model and CO-Nets have their specification language used to specify knowledge bases of these form. A restriction or extension model of those models also have suitable specification language. Therefore, we can easily construct a specification language L(K) for the knowledge K. This language gives us to specify

 From the model M(K) and the specification language L(K), the knowledge base organization of the system will be established. It consists of structured text files that stores the components concepts, hierarchical relation, relations, operators, functions, rules, facts and objects; and their specification. The knowledge base is stored by the

For each concept, we have the corresponding file with the file name <concept name>.txt, which contains the specification of this concept. This file has the following

components of knowledge K in the organization of the knowledge base.


knowledge sub-domains later.

system of files listed below,

specification of based objects

<attribute name> : <attribute type>;

**begin\_concept**: <concept name>[based objects]

**begin\_concepts**  <concept name 1> < concept name 2>

**end\_concepts** 

**begin\_variables** 

**end\_variables** 

structure:

...

...

application.

**begin\_constraints**  specification of constraints **end\_constraints begin\_properties**  specification of internal facts **end\_properties begin\_computation\_relations begin\_relation**  specification of a relation **end\_relation**  ... **end\_computation\_relations begin\_rules begin\_rule**  kind\_rule = "<kind of rule>"; **hypothesis\_part:**  { facts} **goal\_part:**  { facts} **end\_rule**  ... **end\_rules end\_concept**  - File HIERARCHY.txt stores information of the Hasse diagram representing the component H of COKB model. It has the following structure: **begin\_Hierarchy**  [<concept name 1>, <concept name 2>] ... **end\_Hierarchy**  - Files RELATIONS.txt and RELATIONS\_DEF.txt are structured text files that store the specification of relations. The structure of file RELATIONS.txt is as follows **begin\_Relations**  [<relation name>, <concept>, <concept>, ... ], <property>, <property>, ...; [<relation name>, <concept>, <concept>, ... ], <property>, <property>, ...; ... **end\_Relations**  - Files OPERATORS.txt and OPERATORS\_DEF.txt are structured text files that store the specification of operators (the component Ops of KBCO model). The structure of file OPERATORS.txt is as follows **begin\_Operators**  [<operator symbol>, <list of concepts>, <result concept>], <property>, <property>, ...;

[<operator symbol>, <list of concepts>, <result concept>], <property>, <property>, ...;

#### ... **end\_Operators**

Intelligent Problem Solvers in Education: Design Method and Applications 141

Com-objects, F is a set of facts on objects, and Goal is a set consisting of goals. The basic technique for designing deductive algorithms is the unification of facts. Based on the kinds of facts and their structures, there will be criteria for unification proposed. Then it produces algorithms to check the unification of two facts. For instance, when we have two facts *fact1* and *fact2* of kinds 1-6, the unification definition of them is as

follows: fact1 and fact2 are unified they satisfy the following conditions

( lhs(*fact1*) = lhs(*fact2*) and rhs(*fact1*) = rhs(*fact2*)) or ( lhs(*fact1*) = rhs(*fact2*) and rhs(*fact1*) = lhs(*fact2*)) if k = 4.

1. *fact1* and *fact2* have the same kind k, and

relation in *fact1* is symmetric.

2. *fact1* = *fact2* if k = 1, 2, 6.

consists of components as follows

knowledge Kp.

the form COKB-SP.

Problems modeled by using CO-Net has the form (O, F, Goal), in which O is a set of

[*fact1*[1], *fact1*[2..nops(*fact1*)]] = [*fact2*[1], *fact2*[2..nops(*fact2*)]] if k = 6 and the

lhs(*fact1*) = lhs(*fact2*) and compute(rhs(*fact1*)) = compute(rhs(*fact2*)) if k =3.

evalb(simplify(expand(lhs(*fact1*)-rhs(*fact1*)- lhs(*fact2*)+rhs(*fact2*))) = 0) or evalb(simplify(expand(lhs(*fact1*)-rhs(*fact1*)+ lhs(*fact2*)-rhs(*fact2*))) = 0) if k = 5. To design the algorithms for reasoning methods to solve classes of problems, the forward chaining strategy can be used with artificial intelligent techniques such as deductive method with heuristics, deductive method with sample problems, deductive method based on organization of solving methods for classes of frame-based problems. To classes of frame-based problems, designing reasoning algorithms for solving them is not very difficult. To classes of general problems, the most difficult thing is modeling for experience, sensible reaction and intuitional human to find heuristic rules, which were able to imitate the human thinking for solving problems. We can use Com-Nets, CO-Nets, and their extensions to model problems; and use artificial intelligent techniques to design algorithms for automated reasoning. For instance, a reasoning algorithm for COKB model with sample problems can be briefly presented below. **Definition 4.1**: Given knowledge domain K = (C, H, R, Ops, Funcs, Rules), knowledge subdomain of knowledge K is knowledge domain which was represented by COKB model, it

Kp = (Cp, Hp, Rp, Opsp, Funcsp, Rulesp )

**Definition 4.2**: Given knowledge sub-domain Kp, Sample Problem (SP) is a problem which was represented by networks of Com-Objects on knowledge Kp, it consists of three components (Op, Fp, Goalp); Op and Fp contain objects and facts were specificated on

**Definition 4.3**: A model of Computational Object Knowledge Base with Sample Problems (COKB-SP) consists of 7 components: (C, H, R, Ops, Funcs, Rules, Sample); in which, (C, H, R, Ops, Funcs, Rules) is knowledge domain which presented by COKB model, the Sample

**Algorithm 4.1**: To find a solution of problem P modelled by (O,F,Goal) on knowledge K of

where, Cp C, Hp H, Rp R, Opsp Ops, Funcsp Funcs, Rulesp Rules.

component is a set of Sample Problems of this knowledge domain.

Knowledge domain Kp is a restriction of knowledge K.


#### **begin\_Functions**

<result concept> <function name>(sequence of concepts); <result concept> <function name>(sequence of concepts);

#### ... **end\_Functions**


#### **begin\_Factkinds**

```
1, <fact structure>, <fact structure>, ...; 
2, <fact structure>, <fact structure>, ...;
```
...

#### **end\_Factkinds**


```
begin_Rules 
begin_rule 
kind_rule = "<rule kind>"; 
<object> : <concept>;
```

```
... 
hypothesis_part: 
<set of facts> 
goal_part: 
<set of facts>
```
#### **end\_rule**  ...

```
end_Rules
```

```
begin_objects 
<object name> : <concept>; 
<object name> : <concept>; 
... 
end_objects
```
#### **4.2 Design the inference engine**

Design the inference engine is stage 4 of the design process. The inference engine design includes the following tasks:

 From the collection of problems obtained in stage 1 with an initial classification, we can determine classes of problems base on known models such as Com-Net, CO-Net, and their extensions. This task helps us to model classes of problems as frame-based problem models, or as Com-Nets and CO-Nets for general forms of problems. Techniques for modeling problems are presented in the stage 4 of the design process. Problems modeled by using CO-Net has the form (O, F, Goal), in which O is a set of Com-objects, F is a set of facts on objects, and Goal is a set consisting of goals.

	- 1. *fact1* and *fact2* have the same kind k, and

140 Intelligent Systems





Design the inference engine is stage 4 of the design process. The inference engine design

 From the collection of problems obtained in stage 1 with an initial classification, we can determine classes of problems base on known models such as Com-Net, CO-Net, and their extensions. This task helps us to model classes of problems as frame-based problem models, or as Com-Nets and CO-Nets for general forms of problems. Techniques for modeling problems are presented in the stage 4 of the design process.

and facts. The structure of file OBJECTS.txt is as follows

<result concept> <function name>(sequence of concepts); <result concept> <function name>(sequence of concepts);

FUNCTIONS.txt is as follows

facts. Its structure is as follows

kind\_rule = "<rule kind>"; <object> : <concept>;

1, <fact structure>, <fact structure>, ...; 2, <fact structure>, <fact structure>, ...;

**begin\_Functions** 

**end\_Functions** 

**begin\_Factkinds** 

**end\_Factkinds** 

**hypothesis\_part:**  <set of facts> **goal\_part:**  <set of facts> **end\_rule**  ...

**end\_Rules** 

**begin\_objects** 

**end\_objects** 

<object name> : <concept>; <object name> : <concept>;

**4.2 Design the inference engine** 

includes the following tasks:

...

...

...

...

follows **begin\_Rules begin\_rule** 


**Definition 4.1**: Given knowledge domain K = (C, H, R, Ops, Funcs, Rules), knowledge subdomain of knowledge K is knowledge domain which was represented by COKB model, it consists of components as follows

Kp = (Cp, Hp, Rp, Opsp, Funcsp, Rulesp )

where, Cp C, Hp H, Rp R, Opsp Ops, Funcsp Funcs, Rulesp Rules.

Knowledge domain Kp is a restriction of knowledge K.

**Definition 4.2**: Given knowledge sub-domain Kp, Sample Problem (SP) is a problem which was represented by networks of Com-Objects on knowledge Kp, it consists of three components (Op, Fp, Goalp); Op and Fp contain objects and facts were specificated on knowledge Kp.

**Definition 4.3**: A model of Computational Object Knowledge Base with Sample Problems (COKB-SP) consists of 7 components: (C, H, R, Ops, Funcs, Rules, Sample); in which, (C, H, R, Ops, Funcs, Rules) is knowledge domain which presented by COKB model, the Sample component is a set of Sample Problems of this knowledge domain.

**Algorithm 4.1**: To find a solution of problem P modelled by (O,F,Goal) on knowledge K of the form COKB-SP.

Intelligent Problem Solvers in Education: Design Method and Applications 143

solving problems about alternating current in physics, program for solving problems in inorganic chemistry, program for solving algebraic problems, etc. In this section, we introduce some applications and examples about solutions of problems produced by

 The system that supports studying knowledge and solving analytic geometry problems. The system consists of three components: the interface, the knowledge base, the knowledge processing modules or the inference engine. The program has menus for users searching knowledge they need and they can access knowledge base. Besides, there are windows for inputting problems. Users are supported a simple language for specifying problems. There are also windows in which the program shows solutions of

 The program for studying and solving problems in plane geometry. It can solve problems in general forms. Users only declare hypothesis and goal of problems base on a simple language but strong enough for specifying problems. The hypothesis can consist of objects, relations between objects or between attributes. It can also contain formulas, determination properties of some attributes or their values. The goal can be to compute an attribute, to determine an object, a relation or a formula. After specifying a problem, users can request the program to solve it automatically or to give instructions that help them to solve it themselves. The program also gives a human readable solution, which is easy to read and agree with the way of thinking and writing by students and teachers. The second function of the program is "Search for Knowledge". This function helps users to find out necessary knowledge quickly. They can search for concepts, definitions, properties, related theorems or formulas,

Examples below illustrate the functions of a system for solving problems of analytic geometry, a system for solving problems in plane geometry, and a system for solving algebraic problems. The systems were implemented using C#, JAVA and MAPLE. Each example presents the problem in natural language, specifies the problem in specification

**Example 5.1**: Let d be the line with the equation 3x + 4y - 12 = 0. P and Q are intersection

Hypothesis = { d.f = (3\*x+4\*y-12 = 0), Ox.f = (y = 0), O = [0, 0], P = INTERSECT(Ox, d),

Step 1: {d.f = (3\*x+4\*y-12 = 0), Ox.f = (y = 0), Oy.f = (x = 0)} {d.f, Ox.f , Oy.f }.

language to input into the system, and a solution produced from the system.

Q = INTERSECT (Oy, d), H = PROJECTION(O, d), Oy . f = (x = 0) }.

computer programs.

problems and figures.

and problem patterns.

points of d and the axes Ox, Oy.

b. Find the projection of O on the line d.

Goal = { MIDPOINT(P, Q), H }.

Objects = {[d,line], [P,point], [Q,point]}.

a. Find the midpoint of PQ

Specification of the problem:

Solution found by the system:


#### **Algorithm 4.2**: To find sample problems.

Given problem P = (O, F, Goal) on on knowledge K of the form COKB-SP. The Sample Problem can be applied on P has been found by the following procedure:

```
Step 1: H ← O  F 
Step 2: SP ← Sample 
 Sample_found← false 
Step 3: Repeat 
 Select S in SP 
 if facts of H can be applied in (S.Op and S.Fp) then 
 begin 
 if kind of S.Goalp = kind of Goal then 
 Sample_found ← true 
 Else if S.Goalp  H then 
 Sample_found ← true 
 end 
 SP ← (SP – S) 
 Until SP = {} or Sample_found 
Step 3: if Sample_found then 
 S is a sample problem of the problem; 
 else 
 There is no sample problem found;
```
This algorithm simulates a part of human mind when to find SP that relate to practical problem. Thereby, the inference of system has been more quickly and effectively. Moreover, the solution of problem is natural and precise.

#### **5. Applications**

The design method for IPS and IPSE presented in previous sections have been used to produce many applycations such as program for studying and solving problems in Plane Geometry, program for studying and solving problems in analytic geometry, program for

**Step 5.** If selection in step 3 fails then search for any rule which can be used to deduce new

**Step 6.** If there is a rule found in step 3 or in step 4 then record the information about the

**Step 8.** Reduce the solution found by excluding redundant rules and information in the

Given problem P = (O, F, Goal) on on knowledge K of the form COKB-SP. The Sample

This algorithm simulates a part of human mind when to find SP that relate to practical problem. Thereby, the inference of system has been more quickly and effectively. Moreover,

The design method for IPS and IPSE presented in previous sections have been used to produce many applycations such as program for studying and solving problems in Plane Geometry, program for studying and solving problems in analytic geometry, program for

rule, new facts in Solution, and new situation (previous objects and facts together

**Step 4.** Using heuristic rules to select a rule for producing new facts or new objects.

**Step 1.** Record the elements in hypothesis part and goal part.

with new facts and new objects), and goto step 2.

**Step 7.** Else {search for a rule fails} Conclusion: Solution not found, and stop.

Problem can be applied on P has been found by the following procedure:

**Step 3.** Check goal G. If G is obtained then goto step 8.

**Step 2.** Find *the Sample Problem* can be applied.

facts or new objects.

**Algorithm 4.2**: To find sample problems.

if facts of H can be applied in (S.Op and S.Fp) then

if kind of S.Goalp = kind of Goal then

S is a sample problem of the problem;

There is no sample problem found;

the solution of problem is natural and precise.

 Sample\_found ← true Else if S.Goalp H then Sample\_found ← true

 Until SP = {} or Sample\_found Step 3: if Sample\_found then

solution.

Sample\_found← false

Step 1: H ← O F Step 2: SP ← Sample

Step 3: Repeat Select S in SP

begin

end

else

SP ← (SP – S)

**5. Applications** 

solving problems about alternating current in physics, program for solving problems in inorganic chemistry, program for solving algebraic problems, etc. In this section, we introduce some applications and examples about solutions of problems produced by computer programs.


Examples below illustrate the functions of a system for solving problems of analytic geometry, a system for solving problems in plane geometry, and a system for solving algebraic problems. The systems were implemented using C#, JAVA and MAPLE. Each example presents the problem in natural language, specifies the problem in specification language to input into the system, and a solution produced from the system.

**Example 5.1**: Let d be the line with the equation 3x + 4y - 12 = 0. P and Q are intersection points of d and the axes Ox, Oy.


Specification of the problem:

Objects = {[d,line], [P,point], [Q,point]}. Hypothesis = { d.f = (3\*x+4\*y-12 = 0), Ox.f = (y = 0), O = [0, 0], P = INTERSECT(Ox, d), Q = INTERSECT (Oy, d), H = PROJECTION(O, d), Oy . f = (x = 0) }. Goal = { MIDPOINT(P, Q), H }.

Solution found by the system:

Step 1: {d.f = (3\*x+4\*y-12 = 0), Ox.f = (y = 0), Oy.f = (x = 0)} {d.f, Ox.f , Oy.f }.

Intelligent Problem Solvers in Education: Design Method and Applications 145

[« BELONG », N, SEGMENT[A, C]], SEGMENT[A, M] = SEGMENT[C, N] }.

Goal = { O2 = O3}.

Solution found by the system: Step 1: Hypothesis

> { O2.SEGMENT[A, M] = O3. SEGMENT[C, N], O2.SEGMENT[A, B] = O1. SEGMENT[A, B], O3.SEGMENT[C, D] = O1.SEGMENT[C, D]}. Step 2: Produce new objects related to O2, O3, and O1 {[O4, TRIANGLE[A, B, C]], [O5, TRIANGLE[C, D, A]]}.

Step 3: {[O1, PARALLELOGRAM[A, B, C, D]} {O4 = O5, SEGMENT[A, B] = SEGMENT[C, D]}. Step 4: { O2.SEGMENT[A, B] = O1.SEGMENT[A, B], O3.SEGMENT[C, D] = O1.SEGMENT[C, D], SEGMENT[A, B] = SEGMENT[C, D]} {O2.SEGMENT[A, B] = O3.SEGMENT[C, D]}. Step 5: {[« BELONG », M, SEGMENT[A, C]]}

Step 6: {[« BELONG », N, SEGMENT[A, C]]}

Step 9: { O2.SEGMENT[A, M] = O3. SEGMENT[C, N], O2.SEGMENT[A, B] = O3.SEGMENT[C, D],

**Example 5.4**: Let the equation, with m is a parameter, and x is a variable:

<sup>2</sup> m 4x 2 m

<sup>2</sup> m 4x 2 m

<sup>2</sup> m 4x 2 m

{O4.angle\_A = O2.angle\_A}.

{ O5.angle\_A = O3.angle\_A }.

 {O4.angle\_A = O5.angle\_A}. Step 8: { O4.angle\_A = O2.angle\_A , O5.angle\_A = O3.angle\_A , O4.angle\_A = O5.angle\_A } { O2.angle\_A = O3.angle\_A}.

O2.angle\_A = O3.angle\_A }

Step 7: {O4 = O5 }

{O2 = O3}.

Solve this equation by m.

Solution found by the system: Solve the equation:

Step 2: {Ox.f, Oy.f, d.f} {Ox, Oy, d}. Step 3: {P = INTERSECT(Ox,d), d, Ox} {P = [4, 0]}. Step 4: {d, Oy, Q = INTERSECT(Oy,d)} {Q = [0, 3]}. Step 5: {P = [4, 0], Q = [0, 3]} {P, Q}. Step 6: {P, Q} {MIDPOINT(P,Q) = [2, 3/2]}. Step 7: {d, H = PROJECTION(O,d), O} { H = [36/25, 48/25]}. Step 8: {H = [36/25, 48/25]} {H}.

**Example 5.2**: Given two points P(2, 5) and Q(5,1). Suppose d is a line that contains the point P, and the distance between Q and d is 3. Find the equation of line d.

Specification of the problem:

Objects = {[P, point], [Q, point], [d, line]}. Hypothesis = {DISTANCE(Q, d) = 3, P = [2, 5], Q = [5, 1], ["BELONG", P, d]}. Goal = [d.f ].

Solution found by the system:

Step 1: {P = [2, 5]} {P}. Step 2: {DISTANCE(Q, d) = 3} {DISTANCE(Q, d)}. Step 3: {d, P} {2d[1]+5d[2]+d[3] = 0}. Step 4: {DISTANCE(Q, d) = 3}

$$\begin{aligned} \Rightarrow & \Rightarrow \frac{|\text{5d}[1] + \text{d}[2] + \text{d}[3] |}{\sqrt{\text{d}[1]^2 + \text{d}[2]^2}} = 3 \\\\ \text{Step 5:} & \left\{ \text{d}[1] = 1, 2 \text{d}[1] + \text{5d}[2] + \text{d}[3] = 0, \frac{|\text{5d}[1] + \text{d}[2] + \text{d}[3] |}{\sqrt{\text{d}[1]^2 + \text{d}[2]^2}} = 3 \right\} \\\\ \Rightarrow & \left\{ \text{d}. \text{f} = (\text{x} + \frac{24}{7} \text{ y} \cdot \frac{134}{7} = 0), \text{d}. \text{f} = (\text{x} - 2 = 0) \right\} \\\\ \text{Step 6:} & \left\{ \text{d}. \text{f} = (\text{x} + \frac{24}{7} \text{ y} \cdot \frac{134}{7} = 0), \text{d}. \text{f} = (\text{x} - 2 = 0) \right\} \\\\ \Rightarrow & \{ \text{d}. \text{f} \} \end{aligned}$$

**Example 5.3**: Given the parallelogram ABCD. Suppose M and N are two points of segment AC such that AM = CN. Prove that two triangles ABM and CDN are equal.

Specification of the problem:

```
Objects = {[A, POINT], [B, POINT], [C, POINT], [D, POINT], [M, POINT], [N, POINT], 
 [O1, PARALLELOGRAM[A, B, C, D], [O2, TRIANGLE[A, B, M]], 
 [O3, TRIANGLE [C, D, N]]}. 
Hypothesis = { [« BELONG », M, SEGMENT[A, C]],
```
 [« BELONG », N, SEGMENT[A, C]], SEGMENT[A, M] = SEGMENT[C, N] }. Goal = { O2 = O3}.

Solution found by the system:

144 Intelligent Systems

**Example 5.2**: Given two points P(2, 5) and Q(5,1). Suppose d is a line that contains the point

Hypothesis = {DISTANCE(Q, d) = 3, P = [2, 5], Q = [5, 1], ["BELONG", P, d]}.

Step 2: {Ox.f, Oy.f, d.f} {Ox, Oy, d}.

Step 5: {P = [4, 0], Q = [0, 3]} {P, Q}.

Step 8: {H = [36/25, 48/25]} {H}.

Objects = {[P, point], [Q, point], [d, line]}.

Step 3: {d, P} {2d[1]+5d[2]+d[3] = 0}.


Specification of the problem:

Solution found by the system:

2 2

{ d.f }

Specification of the problem:

Step 1: {P = [2, 5]} {P}.

Step 4: {DISTANCE(Q, d) = 3}

Goal = [d.f ].

Step 6: {P, Q} {MIDPOINT(P,Q) = [2, 3/2]}.

Step 3: {P = INTERSECT(Ox,d), d, Ox} {P = [4, 0]}. Step 4: {d, Oy, Q = INTERSECT(Oy,d)} {Q = [0, 3]}.

Step 7: {d, H = PROJECTION(O,d), O} { H = [36/25, 48/25]}.

P, and the distance between Q and d is 3. Find the equation of line d.

Step 2: {DISTANCE(Q, d) = 3} {DISTANCE(Q, d)}.

.

Step 5: 2 2

 

 24 134 d.f=(x+ y- =0),d.f=(x-2=0) 7 7 

Step 6: 24 134 d.f=(x+ y- =0),d.f=(x-2=0) 7 7

[O3, TRIANGLE [C, D, N]]}.

Hypothesis = { [« BELONG », M, SEGMENT[A, C]],


AC such that AM = CN. Prove that two triangles ABM and CDN are equal.

 

d[1] [2] *d*

**Example 5.3**: Given the parallelogram ABCD. Suppose M and N are two points of segment

Objects = {[A, POINT], [B, POINT], [C, POINT], [D, POINT], [M, POINT], [N, POINT],

[O1, PARALLELOGRAM[A, B, C, D], [O2, TRIANGLE[A, B, M]],

```
Step 1: Hypothesis 
  { O2.SEGMENT[A, M] = O3. SEGMENT[C, N], 
 O2.SEGMENT[A, B] = O1. SEGMENT[A, B], 
 O3.SEGMENT[C, D] = O1.SEGMENT[C, D]}. 
Step 2: Produce new objects related to O2, O3, and O1 
  {[O4, TRIANGLE[A, B, C]], [O5, TRIANGLE[C, D, A]]}. 
Step 3: {[O1, PARALLELOGRAM[A, B, C, D]} 
 {O4 = O5, SEGMENT[A, B] = SEGMENT[C, D]}. 
Step 4: { O2.SEGMENT[A, B] = O1.SEGMENT[A, B], 
 O3.SEGMENT[C, D] = O1.SEGMENT[C, D], 
 SEGMENT[A, B] = SEGMENT[C, D]} 
 {O2.SEGMENT[A, B] = O3.SEGMENT[C, D]}. 
Step 5: {[« BELONG », M, SEGMENT[A, C]]} 
  {O4.angle_A = O2.angle_A}. 
Step 6: {[« BELONG », N, SEGMENT[A, C]]} 
  { O5.angle_A = O3.angle_A }. 
Step 7: {O4 = O5 } 
  {O4.angle_A = O5.angle_A}. 
Step 8: { O4.angle_A = O2.angle_A , 
 O5.angle_A = O3.angle_A , 
 O4.angle_A = O5.angle_A } 
 { O2.angle_A = O3.angle_A}. 
Step 9: { O2.SEGMENT[A, M] = O3. SEGMENT[C, N], 
 O2.SEGMENT[A, B] = O3.SEGMENT[C, D], 
 O2.angle_A = O3.angle_A } 
  {O2 = O3}.
```
**Example 5.4**: Let the equation, with m is a parameter, and x is a variable:

$$\left(\text{m}^2 - 4\right)\mathbf{x} + \mathbf{2} = \mathbf{m}$$

Solve this equation by m.

Solution found by the system:

Solve the equation:

$$
\left(\mathbf{m}^2 - 4\right)\mathbf{x} + 2 = \mathbf{m}
$$

$$
\left(\mathbf{m}^2 - 4\right)\mathbf{x} = -2 + \mathbf{m}
$$

Intelligent Problem Solvers in Education: Design Method and Applications 147

On the network of computational objects, operators will be considered. Such future works our models more powerful for representing knowledge in practice. Besides, To have a userinterface using natural language we have to develop methods for translating problems in

Chitta Baral (2003). Knowledge Representation, Reasoning and Declarative Problem

Fatos X.; Leonard B. & Petraq J. P. (2010). Complex intellgent systems and their applications. *Springer Science+Business Media, LLC*. ISBN 978-1-4419-1635-8 Frank van Harmelem, Vladimir & Bruce (2008). Handbook of Knowledge Representation.

George F. Luger (2008). Artificial Intelligence: Structures And Strategies For Complex

Johns M. Tim (2008). Artificial Intelligence – A System Approach. *Infinity Science Press LLC*,

Michel Chein & Marie-Laure Mugnier (2009). Graph-based Knowledge representation:

Mike O'Docherty (2005). Object-oriented analysis and design: understanding

Nhon Do & Hien Nguyen (2011). A Reasoning Method on Computational Network and Its

Nhon Do & Hoai P. Truong & Trong T. Tran (2010). An Approach for Translating

Nhon Van Do (2010). Model for Knowledge Bases of Computational Objects. *International* 

Nhon Van Do (2009). Computational Networks for Knowledge Representation.

Nhon Do (2008). An ontology for knowledge representation And Applications*. Proceedings of* 

Nhon Van Do (2000). A Program for studying and Solving problems in Plane Geometry.

Sowa, John F. (2002). Architectures for Intelligent Systems. *IBM Systems Journal*, vol. 41, no.3,

Sowa, John F. (2000). Knowledge Representation: Logical, Philosophical and Computational

Foundations. *Brooks/Cole Thomson Learning*, ISBN 0 534-94965-7

*Artificial Intelligence and Education*, Hangzhou, China. 10-2010

*Journal of Computer Science Issues*, Vol. 7, Issue 3, No 8, pp. 11-20

Computational foundations of Conceptual Graphs. *Springer-Verlag London Limited*.

system development with UML 2.0. *John Wiley & Sons Ltd*, ISBN-13 978-0-470-

Applications. *Proceedings of the International MultiConference of Engineers and* 

Mathematics Problems in Natural Language to Specification Language COKB of Intelligent Education Software. *Proceedings of 2010 International Conference on* 

*Proceedings of World Academy of Science, Engineering and Technology*, Volume 56,

*World Academy of Science, Engineering and Technology*, Volumn 32, pp. 23-31, ISSN

*Proceedings of International on Artificial Intelligence 2000*, Las Vegas, USA, 2000, pp.

Solving. Cambridge University Press, ISBN 0 521 81802 8.

natural language to specification language, and vice versa.

*Elsevier*, ISBN: 978-0-444-52211-5.

ISBN: 978-0-9778582-3-1

ISBN: 978-1-84800-285-2.

09240-8.

pp. 266-270

2070-3740

1441-1447

pp. 331-349

Problem Solving. Addison Wesley Longman

*Computer Scientists* 2011 (ICAIA'11), pp. 137-141

**7. References** 

The coefficient of x has a set of roots:

{ 2 , 2}

+ if parameter m 2 , then:

"This equation has no root"

+ if parameter m 2 , then:

"This equation has set of roots is the set of real numbers "

+ if parameter m 2 , 2 , then:

$$\infty = \frac{-2 + \text{m}}{\text{m}^2 - 4} = \frac{1}{\text{m} + 2}$$

#### **6. Conclusions and future work**

In this chapter, we proposed a method for designing intelligent problem solvers (IPS), especially those in education (IPSE). These systems have suitable knowledge base used by the inference engine to solve problems in certain knowledge domain, they not only give human readable solutions of problems but also present solutions as the way instructors and learners usually write them. Knowledge bases contain concepts of computational objects (Com-Objects), relations, operators, functions, facts and rules. The *Computational Object Knowledge Base* model (COKB) and its specification language can be used for knowledge modeling, for designing and implementing knowledge bases. The COKB model has been established from Object-Oriented approach for knowledge representation together with programming techniques for symbolic computation. The design of inference engine requires to model problems and to design reasoning algorithms with heuristics and sample problems. Computational networks (Com-Net) and networks of computational objects (CO-Net) can be used effectively for modeling problems and construction of reasoning algorithms in practical knowledge domains. These models are tools for designing inference engine of systems.

The proposed design method has been used to produce applycations in many fields such as mathematics, physics and chemistry. They support studying knowledge and solving problems automatically based on knowledge bases. Users only declare hypothesis and goal of problems base on a simple language but strong enough for specifying problems. The programs produce a human readable solution, which is easy to read and agree with the way of thinking and writing by students and instructors.

Designing an intelligent problem solver in education is a very challenging task, as domain knowledge and human thinking are very complicated and abstract. There are domains of knowledge with functional attributes such as knowledge of alternating current in physics. This motivates another extensions of COKB model, Com-Net and CO-Net, and develops design techniques. They will accept simple valued variables and also functional variables. On the network of computational objects, operators will be considered. Such future works our models more powerful for representing knowledge in practice. Besides, To have a userinterface using natural language we have to develop methods for translating problems in natural language to specification language, and vice versa.

#### **7. References**

146 Intelligent Systems

{ 2 , 2}

"This equation has set of roots is the set of real numbers "

x

2

2m 1

m 4 m 2 

In this chapter, we proposed a method for designing intelligent problem solvers (IPS), especially those in education (IPSE). These systems have suitable knowledge base used by the inference engine to solve problems in certain knowledge domain, they not only give human readable solutions of problems but also present solutions as the way instructors and learners usually write them. Knowledge bases contain concepts of computational objects (Com-Objects), relations, operators, functions, facts and rules. The *Computational Object Knowledge Base* model (COKB) and its specification language can be used for knowledge modeling, for designing and implementing knowledge bases. The COKB model has been established from Object-Oriented approach for knowledge representation together with programming techniques for symbolic computation. The design of inference engine requires to model problems and to design reasoning algorithms with heuristics and sample problems. Computational networks (Com-Net) and networks of computational objects (CO-Net) can be used effectively for modeling problems and construction of reasoning algorithms in practical knowledge domains. These models are tools for designing inference

The proposed design method has been used to produce applycations in many fields such as mathematics, physics and chemistry. They support studying knowledge and solving problems automatically based on knowledge bases. Users only declare hypothesis and goal of problems base on a simple language but strong enough for specifying problems. The programs produce a human readable solution, which is easy to read and agree with the way

Designing an intelligent problem solver in education is a very challenging task, as domain knowledge and human thinking are very complicated and abstract. There are domains of knowledge with functional attributes such as knowledge of alternating current in physics. This motivates another extensions of COKB model, Com-Net and CO-Net, and develops design techniques. They will accept simple valued variables and also functional variables.

The coefficient of x has a set of roots:

"This equation has no root"

+ if parameter m 2 , then:

+ if parameter m 2 , then:

+ if parameter m 2 , 2 , then:

**6. Conclusions and future work** 

of thinking and writing by students and instructors.

engine of systems.


**Logic of Integrity, Fuzzy Logic and Knowledge** 

Fast development of ICT clearly shows how training is behind this development. Quick modification of process devices leaves far behind it hard on the modernization of education. Though new directions in teaching how to e- learning, m-learning, online learning, machine learning etc. were formed by ICT technical means, but these means are not yet in a position to influence the fundamental change in education. The reason is that educational principles existing more than 300 years of memory oriented traditional teaching didactics were invested into these technical means. Knowledge was memorized through the transfer of private, unsystematically, non-logically constructed knowledge. But as we know, besides human memory there is human cognition, whose reserves are used no more than 2-3% (Gordon D., Jannette Voz 2000). Learning can be changed qualitatively and made accessible to all, if we make a shift from memorization and quantitative accumulation of knowledge to the organization of reasoning activities of students on knowledge for understanding. Learning in the mainstream of intellectual activity allows increasing the use of these resources several times and adequately raising the quality of education of all students. Cognitive activity - the activity of the intellect, is the area of psychology, the subject matter

Didactics – is the science about learning ways of a student, a science that seeks in more and more sophisticated ways to teach the students certain knowledge. Psychology of intelligence states that the knowledge that students receive is a reflection of what they have heard or read, in the best case, it is a converted form of information. The present knowledge should be built with the understanding of students on the basis of their previous experiences. New knowledge must be combined, divided, associated with previous and subsequent knowledge. Each student on the basis of his/her design of understanding constructs proposed knowledge in his/her understanding. In practice, it actually looks this way, but it is apparently only on the condition that a learner designs the structure of knowledge, structure of mental activity and structure of learning activities. Furthermore, these structures must be identical to the structures of intelligence. Under these conditions, a student regardless of learning forms, based on the capabilities of his/her internal understanding enters an active cognitive activity. Engaging in active learning activities, students gain active knowledge to expand their **understanding**. Knowledge, while at work, in such learning process destroys its traditional vertical structure of construction and it is

**1. Introduction** 

of its study.

**Modeling for Machine Education** 

Fatma Khanum Bunyatova

 *Intellect School, Baku,* 

*Azerbaijan* 

Stuart Russell & Peter Norvig (2010). Artificial Intelligence – A modern approach (Third edition). *Prentice Hall*, by Pearson Education, Inc. **7** 

### **Logic of Integrity, Fuzzy Logic and Knowledge Modeling for Machine Education**

 Fatma Khanum Bunyatova  *Intellect School, Baku, Azerbaijan* 

#### **1. Introduction**

148 Intelligent Systems

Stuart Russell & Peter Norvig (2010). Artificial Intelligence – A modern approach (Third

Fast development of ICT clearly shows how training is behind this development. Quick modification of process devices leaves far behind it hard on the modernization of education. Though new directions in teaching how to e- learning, m-learning, online learning, machine learning etc. were formed by ICT technical means, but these means are not yet in a position to influence the fundamental change in education. The reason is that educational principles existing more than 300 years of memory oriented traditional teaching didactics were invested into these technical means. Knowledge was memorized through the transfer of private, unsystematically, non-logically constructed knowledge. But as we know, besides human memory there is human cognition, whose reserves are used no more than 2-3% (Gordon D., Jannette Voz 2000). Learning can be changed qualitatively and made accessible to all, if we make a shift from memorization and quantitative accumulation of knowledge to the organization of reasoning activities of students on knowledge for understanding. Learning in the mainstream of intellectual activity allows increasing the use of these resources several times and adequately raising the quality of education of all students. Cognitive activity - the activity of the intellect, is the area of psychology, the subject matter of its study.

Didactics – is the science about learning ways of a student, a science that seeks in more and more sophisticated ways to teach the students certain knowledge. Psychology of intelligence states that the knowledge that students receive is a reflection of what they have heard or read, in the best case, it is a converted form of information. The present knowledge should be built with the understanding of students on the basis of their previous experiences. New knowledge must be combined, divided, associated with previous and subsequent knowledge. Each student on the basis of his/her design of understanding constructs proposed knowledge in his/her understanding. In practice, it actually looks this way, but it is apparently only on the condition that a learner designs the structure of knowledge, structure of mental activity and structure of learning activities. Furthermore, these structures must be identical to the structures of intelligence. Under these conditions, a student regardless of learning forms, based on the capabilities of his/her internal understanding enters an active cognitive activity. Engaging in active learning activities, students gain active knowledge to expand their **understanding**. Knowledge, while at work, in such learning process destroys its traditional vertical structure of construction and it is

Logic of Integrity, Fuzzy Logic and Knowledge Modeling for Machine Education 151

e. Building of holistic logic and fuzzy logic models of knowledge on the basis of these two

On the basis of these two theories, holistic logic and fuzzy logic models of knowledge are built F.Bunyatova (2009). This model can be a psychopedagogical and technologically constructed model for subject knowledge of not only machine learning in all its directions, but for learning in a whole. This model is constructed from the perspective of pedagogy,

These three principles as a result, change not only the form and method of learning, but also the structure of contents of learning. Students here do not gain knowledge. They, relying on their experience construct them by means of logically constructed structures of knowledge.

Change of the way of training means shift from private practices to the technology of training. In this case, technology of constructive teaching by Bunyatova is proposed. This technology is based on the cognitive theory by Piaget. Constructive teaching technology aims to develop structures of intelligence of students and organize their reasoning activities

Constructive teaching is a creative, activity-and operational learning, which provides an opportunity for each student to build new knowledge, based on his/her experience and available knowledge. The philosophy of constructive teaching is a synthesis of Eastern and Western philosophy of learning; this is a shift from private knowledge to the integrity of knowledge or the vice versa from the integrity of knowledge to the private. The constructive teaching is aimed to change the form of activities of learning and a student in the learning process, which in the end leads not only to the change of learning process, but all its

CT by Bunyatova takes its origins from the psychology schools of Piaget and Vigotsky; it contains the co-operative structure of the educational activities of the American psychologist Spencer Kagan. While establishment of machine learning classes, the CT principles do not change principally, they just bear an individual character of learning, enriching the methods

The lesson in a constructive teaching starts with searching the **theme of the lesson**, that is determines the real understanding of entity of the study. The teacher poses questions for discussion. Answers of students generate new questions and new discussions. Students, by discussing, actively adapt their knowledge and attitudes to these logically designed questions. In this process, they may make mistakes, go the wrong way, go back and start again. By scientific definition of meaning, they come from the definition of that meaning with their personal terms. In the interaction of ideas, in online education, these personal definitions, by self-correcting, self-regulating and self-enriching transform into a new

These structures of knowledge are built by isomorphic structures of intelligence.

**Creative learning technology by Bunyatova – CT by Bunyatova** 

theories

over knowledge.

relevant components.

of individual learning.

structure of knowledge.

**Principles of constructive teaching by Bunyatova** 

**Principle 1. Searching meaning of the topic** 

psychology, and high technology.

logically arranged in a horizontal structure of thinking. That is, according to Piaget, Swiss psychologist "alogism of learning" is destroyed. Each new structure of knowledge, entering into logical operation with the old ones or adequate structures of the following knowledge builds them in the scheme of integrity of structures of knowledge. By taking advantage of high-tech, relying on psycho-pedagogy it is possible to model the contents of subject knowledge of isomorphic structures of intelligence. In this model, the following transfer and alignment occur:

Conversion:


These conversions origin from the juncture of three sciences: pedagogy, psychology and high technology, which develop a new direction in the science of **nanopsychopedagogy**  (F.Bunyatova 2009).

#### **2. Necessary fundamental changes in education**

To rotate learning from the position of memorization to the position of cognitive activity and for organization of learning without a student, it is needed to make the following fundamental changes in education:

a. Changing the way of learning

it is necessary to shift from the position of the transfer of knowledge to the position of the construction of the educational process, in which each student builds the knowledge based on his/her competence (Ф.Бунятова 2007). This means the transition from private practices to technology of training. In this case, the technology of constructive teaching by Bunyatova is applied (Ф.Бунятова 2009).

b. Modeling of structures of knowledge

The structures of subject knowledge should be modeled. This model should be constructed similar to models of natural and artificial intelligence brain (Лотфи Заде 1976).

c. Model of natural intelligence

The model of natural intelligence was constructed by Piaget and of artificial intelligence was constructed by Zadeh. Logic integrity of J. Piaget or the work of reasoning at an operational level that is at the formal-logical level is the work of human brain (Р.Алиев 1999) .

d. Model of artificial Intelligence by Zadeh

Zadeh had created an SC, which provided joint use of such numerical new approaches as fuzzy logic, neural networks, evolutionary computing, etc. These technologies are important for data compression and system design with high MIG and the best model of SC is human brain as noted by Zadeh (Р.Алиев 1999).

logically arranged in a horizontal structure of thinking. That is, according to Piaget, Swiss psychologist "alogism of learning" is destroyed. Each new structure of knowledge, entering into logical operation with the old ones or adequate structures of the following knowledge builds them in the scheme of integrity of structures of knowledge. By taking advantage of high-tech, relying on psycho-pedagogy it is possible to model the contents of subject knowledge of isomorphic structures of intelligence. In this model, the following transfer and

a. The traditional concept of didactic unit of knowledge will be understood as the

b. Knowledge structures will be considered as the logical structures of knowledge of

These conversions origin from the juncture of three sciences: pedagogy, psychology and high technology, which develop a new direction in the science of **nanopsychopedagogy** 

To rotate learning from the position of memorization to the position of cognitive activity and for organization of learning without a student, it is needed to make the following

it is necessary to shift from the position of the transfer of knowledge to the position of the construction of the educational process, in which each student builds the knowledge based on his/her competence (Ф.Бунятова 2007). This means the transition from private practices to technology of training. In this case, the technology of constructive teaching by Bunyatova

The structures of subject knowledge should be modeled. This model should be constructed

The model of natural intelligence was constructed by Piaget and of artificial intelligence was constructed by Zadeh. Logic integrity of J. Piaget or the work of reasoning at an operational

Zadeh had created an SC, which provided joint use of such numerical new approaches as fuzzy logic, neural networks, evolutionary computing, etc. These technologies are important for data compression and system design with high MIG and the best model of SC is human

similar to models of natural and artificial intelligence brain (Лотфи Заде 1976).

level that is at the formal-logical level is the work of human brain (Р.Алиев 1999) .

c. Traditional learning exercises will be completed by logical operations of thinking; d. The logical structures of knowledge will be considered as the nanostructures of

alignment occur:

Bunyatova;

knowledge.

(F.Bunyatova 2009).

structure of knowledge;

fundamental changes in education: a. Changing the way of learning

is applied (Ф.Бунятова 2009).

c. Model of natural intelligence

b. Modeling of structures of knowledge

d. Model of artificial Intelligence by Zadeh

brain as noted by Zadeh (Р.Алиев 1999).

**2. Necessary fundamental changes in education** 

Conversion:

e. Building of holistic logic and fuzzy logic models of knowledge on the basis of these two theories

On the basis of these two theories, holistic logic and fuzzy logic models of knowledge are built F.Bunyatova (2009). This model can be a psychopedagogical and technologically constructed model for subject knowledge of not only machine learning in all its directions, but for learning in a whole. This model is constructed from the perspective of pedagogy, psychology, and high technology.

These three principles as a result, change not only the form and method of learning, but also the structure of contents of learning. Students here do not gain knowledge. They, relying on their experience construct them by means of logically constructed structures of knowledge. These structures of knowledge are built by isomorphic structures of intelligence.

#### **Creative learning technology by Bunyatova – CT by Bunyatova**

Change of the way of training means shift from private practices to the technology of training. In this case, technology of constructive teaching by Bunyatova is proposed. This technology is based on the cognitive theory by Piaget. Constructive teaching technology aims to develop structures of intelligence of students and organize their reasoning activities over knowledge.

Constructive teaching is a creative, activity-and operational learning, which provides an opportunity for each student to build new knowledge, based on his/her experience and available knowledge. The philosophy of constructive teaching is a synthesis of Eastern and Western philosophy of learning; this is a shift from private knowledge to the integrity of knowledge or the vice versa from the integrity of knowledge to the private. The constructive teaching is aimed to change the form of activities of learning and a student in the learning process, which in the end leads not only to the change of learning process, but all its relevant components.

CT by Bunyatova takes its origins from the psychology schools of Piaget and Vigotsky; it contains the co-operative structure of the educational activities of the American psychologist Spencer Kagan. While establishment of machine learning classes, the CT principles do not change principally, they just bear an individual character of learning, enriching the methods of individual learning.

#### **Principles of constructive teaching by Bunyatova**

#### **Principle 1. Searching meaning of the topic**

The lesson in a constructive teaching starts with searching the **theme of the lesson**, that is determines the real understanding of entity of the study. The teacher poses questions for discussion. Answers of students generate new questions and new discussions. Students, by discussing, actively adapt their knowledge and attitudes to these logically designed questions. In this process, they may make mistakes, go the wrong way, go back and start again. By scientific definition of meaning, they come from the definition of that meaning with their personal terms. In the interaction of ideas, in online education, these personal definitions, by self-correcting, self-regulating and self-enriching transform into a new structure of knowledge.

Logic of Integrity, Fuzzy Logic and Knowledge Modeling for Machine Education 153

In constructive teaching besides this on the structures of knowledge are performed logical operations of thinking. These operations allow students to integrate structures of knowledge into groups to clarify their interrelationship and relationship, classify, enrich, or replace them with other structures. A student, performing thinking activities of such operations on the structures of knowledge adequately builds the structure of his/her thinking and logical structure of his/her knowledge. Logical operations on the structures of knowledge are

With the assistance of this operation, students acquire intellectual skills of the partition of

Performing the logical operation of seriation on structures of knowledge, students form intellectual skills of grouping structures of knowledge according to their combining criteria

By this operation structures are replaced by others (for example, in mathematics, numbers replaced by letters (6 +7) = (a + b); replacement of signs 5 x 5 = 52, etc.). This is very basic logical operation on the structures of knowledge and intellectually important skill that qualitatively transforms knowledge. This operation destroys the vertical structure of knowledge and develops in students the ability to arrange them into thinking in a

In carrying out this operation on the structures of knowledge, students who have knowledge enriched their knowledge with new structures of knowledge and turn it into new knowledge. The operation of enrichment, as the operation of substitution, generates

This operation is used simultaneously on multiple structures of knowledge that have common relations or communications (example: in a linguistic knowledge it is the change of

According to psychology of intelligence by Piaget, a child from birth to 8 years old goes through a pre logical operations; from 8 to 12 years, the stage of concrete operations, and from 12 to the end of youth stage of formal operations. Each stage of development corresponds to its settings, its commands. Under pre logical operations stage students typically argue based on their life experiences, and try as much as possible to extend it with such tasks that are set for them. In the bowels of pre logical operations students in a logical setting ask teacher questions generated by the makings of concrete operations, which are expanded, enriched, balanced and smoothly go to transition to the stage of concrete operations. In the frequent repetition of the same commands, units of the structures of

performed by commands or setting of a student.

the set into subsets according to certain criteria.

1. The operation of classification

2. The operation of seriation

3. The operation of substitution

4. The operation of enrichment

5. Multiplicative operations

future knowledge from the acquired knowledge.

some parts of speech - on cases, by the numbers.

**Principle 5. Mental models of students** 

or for only one criterion.

horizontal structure [6].

#### **Principle 2. Integrity scheme of knowledge structure**

Structures of knowledge in constructive teaching are provided for studying in the scheme of integrity. Knowledge is divided into invariant and variable (example: in language studies invariants - are parts of speech, in mathematics – they are figures; variables - are the rules of language, and in math they are mathematical operations and etc.). For example, if we study the numeral, then the scheme of integrity will be constructed as follows: the numeral in the middle, to the left of previous knowledge - adjective, noun, and to the right following pronoun, verb and adverb. In the end, it will be as follows: a noun, adjective, numeral, pronoun, verb and adverb. At the end, the scheme will be the following: noun, adjective, numeral, pronoun, verb and adverb. In this scheme of integrity, the numeral is studied in depth in a combination of relations, connections and interdependence with a noun, adjective, pronoun and verb in the system of integrity of the language.

#### **Principle 3. Logical structures of knowledge by Bunyatova – LSK by Bunyatova**

Using the tools of integrity logic by J. Piaget in the contents of subject studies have been identified following logical connections, relationships and dependencies among the structures of knowledge (Ф.Бунятова.2001)


Two structures of knowledge, agreeing with common relationships are connected and form a new structure of knowledge.


United by common relationships, structures of knowledge are reversible and transformed.


Thinking always retains the ability to rejection and finding other variants of solution. The result obtained in various ways, in all cases is the same.


The structure of knowledge is annulled, disappears, canceled if it is multiplied by zero.


Two identical structures can be combined into one complex structure.

The logical structure of knowledge (LSk) identified in knowledge, are the load bearing parts of constructive teaching.

#### **Principle 4. Logical operations of thinking**

In the analysis of tasks to the subject in the traditional way of learning it is revealed that most of the tasks are included in the exercises, that is repeating same actions in order to assimilate it, which consist of finding, underlining, determination, etc. These tasks are performed on one or two of the structures of knowledge and are aimed to determine the level of knowledge and skills.

In constructive teaching besides this on the structures of knowledge are performed logical operations of thinking. These operations allow students to integrate structures of knowledge into groups to clarify their interrelationship and relationship, classify, enrich, or replace them with other structures. A student, performing thinking activities of such operations on the structures of knowledge adequately builds the structure of his/her thinking and logical structure of his/her knowledge. Logical operations on the structures of knowledge are performed by commands or setting of a student.

1. The operation of classification

152 Intelligent Systems

Structures of knowledge in constructive teaching are provided for studying in the scheme of integrity. Knowledge is divided into invariant and variable (example: in language studies invariants - are parts of speech, in mathematics – they are figures; variables - are the rules of language, and in math they are mathematical operations and etc.). For example, if we study the numeral, then the scheme of integrity will be constructed as follows: the numeral in the middle, to the left of previous knowledge - adjective, noun, and to the right following pronoun, verb and adverb. In the end, it will be as follows: a noun, adjective, numeral, pronoun, verb and adverb. At the end, the scheme will be the following: noun, adjective, numeral, pronoun, verb and adverb. In this scheme of integrity, the numeral is studied in depth in a combination of relations, connections and interdependence with a noun, adjective, pronoun and verb in the system of integrity of

**Principle 3. Logical structures of knowledge by Bunyatova – LSK by Bunyatova** 

Using the tools of integrity logic by J. Piaget in the contents of subject studies have been identified following logical connections, relationships and dependencies among the

Two structures of knowledge, agreeing with common relationships are connected and form

United by common relationships, structures of knowledge are reversible and transformed.

Thinking always retains the ability to rejection and finding other variants of solution. The

The logical structure of knowledge (LSk) identified in knowledge, are the load bearing parts

In the analysis of tasks to the subject in the traditional way of learning it is revealed that most of the tasks are included in the exercises, that is repeating same actions in order to assimilate it, which consist of finding, underlining, determination, etc. These tasks are performed on one or two of the structures of knowledge and are aimed to determine the

The structure of knowledge is annulled, disappears, canceled if it is multiplied by zero.

Two identical structures can be combined into one complex structure.

**Principle 2. Integrity scheme of knowledge structure** 

structures of knowledge (Ф.Бунятова.2001)

result obtained in various ways, in all cases is the same.

the language.


a new structure of knowledge.





of constructive teaching.

level of knowledge and skills.

**Principle 4. Logical operations of thinking** 

With the assistance of this operation, students acquire intellectual skills of the partition of the set into subsets according to certain criteria.

2. The operation of seriation

Performing the logical operation of seriation on structures of knowledge, students form intellectual skills of grouping structures of knowledge according to their combining criteria or for only one criterion.

3. The operation of substitution

By this operation structures are replaced by others (for example, in mathematics, numbers replaced by letters (6 +7) = (a + b); replacement of signs 5 x 5 = 52, etc.). This is very basic logical operation on the structures of knowledge and intellectually important skill that qualitatively transforms knowledge. This operation destroys the vertical structure of knowledge and develops in students the ability to arrange them into thinking in a horizontal structure [6].

4. The operation of enrichment

In carrying out this operation on the structures of knowledge, students who have knowledge enriched their knowledge with new structures of knowledge and turn it into new knowledge. The operation of enrichment, as the operation of substitution, generates future knowledge from the acquired knowledge.

5. Multiplicative operations

This operation is used simultaneously on multiple structures of knowledge that have common relations or communications (example: in a linguistic knowledge it is the change of some parts of speech - on cases, by the numbers.

#### **Principle 5. Mental models of students**

According to psychology of intelligence by Piaget, a child from birth to 8 years old goes through a pre logical operations; from 8 to 12 years, the stage of concrete operations, and from 12 to the end of youth stage of formal operations. Each stage of development corresponds to its settings, its commands. Under pre logical operations stage students typically argue based on their life experiences, and try as much as possible to extend it with such tasks that are set for them. In the bowels of pre logical operations students in a logical setting ask teacher questions generated by the makings of concrete operations, which are expanded, enriched, balanced and smoothly go to transition to the stage of concrete operations. In the frequent repetition of the same commands, units of the structures of

Logic of Integrity, Fuzzy Logic and Knowledge Modeling for Machine Education 155

1. It is a structure of knowledge. Based on the content of the topic, the teacher finds the goal in subject knowledge of logical structure of knowledge, that is, he/she can logically combine and separate one structure of knowledge from other structures; each structure of knowledge beyond that may be associated, coincide with another, or

2. Having identified the logical structures, the student determines for himself/herself what mental operations students will perform over the logical structures of knowledge. 3. It is a structure of educational activity. A teacher, making the design of the lesson, determines which determines in which structures of activity students will work in pairs

In a traditional approach to education work on exercises, examples, tasks are usually expressed as assignments: underline, find, agree, put in the required form, solve, run, etc. All these tasks are aimed at numerous varying repeat of learned and its application to one or two structures of knowledge. In contrast to this approach in constructive teaching several logically built mental activities, that is, operations are performed over the knowledge. Mental operations, or operations of thinking – they are commands, setups, expressed in verb, for example, isolate and connect the appropriate, replace one another, and convert and explain the outcome, express in a different form and create a new, etc. Students by regularly performing tasks over these settings, then in the follow up these settings gradually turn to

This element serves for connecting a structure of knowledge with others. In constructive teaching, knowledge is represented in the scheme of integrity. In this scheme of integrity it is clearly visible not only structural connection inside of the subject of knowledge, but also interdisciplinary communications. This communication allows for considering the subject matter under scrutiny from different viewpoints, identify among them the similarities and differences, determine its compatibility, identify options and make replacement of certain

In a constructive teaching, the learner by making lesson design should determine with what question he will start the lesson and with what question he will finalize it. The questions posed by training, help him determine what students know about the topic, how they imagine it, how they by arguing explain their vision. Responses of students may be correct and not correct, complete, or short, and the whole spectrum of responses of students which were received by the training gives him an opportunity with the help of same training go to the right answer, that is to the point of knowledge to which he moved

In constructive teaching, as mentioned above, the lesson is built structurally:

generally, after the actions on it, eliminated.

**2.2 Structures** 

or in teams.

**2.3 Logical operations of thinking** 

their instrument of knowledge.

**2.4 Connection** 

knowledge with others.

**2.5 Questions** 

purposefully.

knowledge, students gain not only stable knowledge, but also reveal their relationship and communication with other structures of knowledge. The concretization of knowledge and its consideration in relations and connections, allow the student to generate knowledge based on previous knowledge, or by identifying stable relationship, connections among the structures of knowledge, he/she formalizes them and moves to the stage of formal operations. It can be said that knowledge of every student in a constructive teaching goes through three stages: intuitive, concrete and formal. Quantity of previous knowledge through these stages is the result of intelligence, its richness, its diversity.

#### **Principle 6. Lesson structure**

The lesson in constructive teaching is designed structurally. The learner designs logical structures of knowledge, structures of mental activity and structure of the learning activities of students.

#### *The structure of mental activity*

Construction of thinking and cooperative structure activities are designed in isomorphic stages of intellectual development of students. Over the logical structures of knowledge students perform mental operations. This intellectual activity of students individually or in co-operation builds, destroys, strengthens and develops structures of intelligence of each separately helps a student to build his/her own individual way of thinking, understanding, which he/she later turns into personal tool of knowledge.

#### *Structure of educational activities*

Learning activities of students can be individual or interactive. For organization of an interactive activity students use cooperative learning structures of American scientist Spencer Kagan. Interaction of students in these structures leads to reflection, which is one of the major factors in construction of knowledge. Social skills acquired in the structures of interactive activities are combined with the intellectual skills that are generated by thinking activity.

Each lesson in constructive teaching is designed by the teacher in advance. For design of the lesson of constructive teaching 7 elements are used.

#### **Seven elements of constructive teaching**

While establishing a lesson, the CT uses the following elements:

#### **2.1 Search**

The teacher poses questions to the subject, to determine what students know about the topic what they will do, what decisions will be taken and what conclusions they will come. Search is to identify the essence of the meaning of the theme, determine its place in the system of knowledge; search – it is a motivation of cognitive activities of a student and understanding of given tasks from his/her pyramids of knowledge. Search – it is a concept for a teacher, how well students understand and explain the topic. Search – it is definition by the teacher zone of knowledge of students about the given subject and the prospect of its extension and application. Search – are statements by teachers to bring students on the track, which he has drawn.

#### **2.2 Structures**

154 Intelligent Systems

knowledge, students gain not only stable knowledge, but also reveal their relationship and communication with other structures of knowledge. The concretization of knowledge and its consideration in relations and connections, allow the student to generate knowledge based on previous knowledge, or by identifying stable relationship, connections among the structures of knowledge, he/she formalizes them and moves to the stage of formal operations. It can be said that knowledge of every student in a constructive teaching goes through three stages: intuitive, concrete and formal. Quantity of previous knowledge

The lesson in constructive teaching is designed structurally. The learner designs logical structures of knowledge, structures of mental activity and structure of the learning activities

Construction of thinking and cooperative structure activities are designed in isomorphic stages of intellectual development of students. Over the logical structures of knowledge students perform mental operations. This intellectual activity of students individually or in co-operation builds, destroys, strengthens and develops structures of intelligence of each separately helps a student to build his/her own individual way of thinking, understanding,

Learning activities of students can be individual or interactive. For organization of an interactive activity students use cooperative learning structures of American scientist Spencer Kagan. Interaction of students in these structures leads to reflection, which is one of the major factors in construction of knowledge. Social skills acquired in the structures of interactive activities are combined with the intellectual skills that are generated by thinking

Each lesson in constructive teaching is designed by the teacher in advance. For design of the

The teacher poses questions to the subject, to determine what students know about the topic what they will do, what decisions will be taken and what conclusions they will come. Search is to identify the essence of the meaning of the theme, determine its place in the system of knowledge; search – it is a motivation of cognitive activities of a student and understanding of given tasks from his/her pyramids of knowledge. Search – it is a concept for a teacher, how well students understand and explain the topic. Search – it is definition by the teacher zone of knowledge of students about the given subject and the prospect of its extension and application. Search – are statements by teachers to bring students on the track, which he has

through these stages is the result of intelligence, its richness, its diversity.

which he/she later turns into personal tool of knowledge.

lesson of constructive teaching 7 elements are used.

While establishing a lesson, the CT uses the following elements:

**Seven elements of constructive teaching** 

**Principle 6. Lesson structure** 

*The structure of mental activity*

*Structure of educational activities*

of students.

activity.

**2.1 Search** 

drawn.

In constructive teaching, as mentioned above, the lesson is built structurally:


#### **2.3 Logical operations of thinking**

In a traditional approach to education work on exercises, examples, tasks are usually expressed as assignments: underline, find, agree, put in the required form, solve, run, etc. All these tasks are aimed at numerous varying repeat of learned and its application to one or two structures of knowledge. In contrast to this approach in constructive teaching several logically built mental activities, that is, operations are performed over the knowledge. Mental operations, or operations of thinking – they are commands, setups, expressed in verb, for example, isolate and connect the appropriate, replace one another, and convert and explain the outcome, express in a different form and create a new, etc. Students by regularly performing tasks over these settings, then in the follow up these settings gradually turn to their instrument of knowledge.

#### **2.4 Connection**

This element serves for connecting a structure of knowledge with others. In constructive teaching, knowledge is represented in the scheme of integrity. In this scheme of integrity it is clearly visible not only structural connection inside of the subject of knowledge, but also interdisciplinary communications. This communication allows for considering the subject matter under scrutiny from different viewpoints, identify among them the similarities and differences, determine its compatibility, identify options and make replacement of certain knowledge with others.

#### **2.5 Questions**

In a constructive teaching, the learner by making lesson design should determine with what question he will start the lesson and with what question he will finalize it. The questions posed by training, help him determine what students know about the topic, how they imagine it, how they by arguing explain their vision. Responses of students may be correct and not correct, complete, or short, and the whole spectrum of responses of students which were received by the training gives him an opportunity with the help of same training go to the right answer, that is to the point of knowledge to which he moved purposefully.

Logic of Integrity, Fuzzy Logic and Knowledge Modeling for Machine Education 157

and the adverb, and the support parts of speech are the proposition, the conjunction, the

The act of intelligence organized in the holistic structure is transferred by holistic structures of Russian. In this case, these actions are transferred by the holistic structure of Russian.

2. Piaget established 4 stipulations for the "groups" of mathematical order and 5

According to a psychological theory of Piaget, **"groups" and "groupings"** definitions

"Groups" and "groupings" fall in the framework of integrity. Within the framework of integrity, there are 10 parts of speech in the language. Piaget consecutively compares the psychological definition of "groups" with the parts of speech in Russian (at the same time,

Hence, the operational "groups" of the natural language will be 10. Encoding these groups or the parts of speech, that is, altering the logical operation, we get the following holistic

1 -Noun; 2 – Adjective; 3 – Numeral; 4 – Pronoun; 5 – Verb; 6 – Adverb; 7 – Proposition;

We will consider **"the groupings" in this holistic structure of the language** as word groups bearing common features. For example: *2 ----3 ----4----5---, 6 –* are grouped as words

Groupings *7----8----9----10* are supportive words, which are not used independently. At the same time, in these "groupings", the parts of speech can be combined with each other in terms of relations and features, that is, within the framework of the holistic system in a new

**Note:** 1 ----2; 3 ----4; 4 ----5 coincide with each other in terms of gender, quantity and the

**How are these psychological rules of Piaget interpreted and compared with the language** 

Two of the elements of "Grouping" may coincide with each other and as a result, may

For example: **new - 2; home – 1.** Combination of 1 – noun (**home**) and 2 – adjective (**new)** 

interjection and the determiners.

in any language).

structure of the language:

approached to an object;

cases of noun).

**rules?** 

arrangement, related to quality.

3. The way of "grouping"

stipulations for "groupings" of quality orders.

8 – Conjunction; 9 – Interjection; and 10 – Determiner

For example: **1 ----2; 3 ----4; 1 ----5; 5 ----6; 3 ----4; 4 ----5,** etc.

form a new element or a new unit of knowledge. From the point of a language, it is understood like this:

forms a new element of knowledge – phrase of **new home;** 

*Two* relations A >B> C may be A >C, where they exist.

**1 ----2----3----4----5----6----7----8----9----10** 

compare the following definitions of the Grammar of Russian:

#### **2.6 Adjunction and connections**

This element of the lesson serves as communication of knowledge under study with past and future knowledge. Addition of a student in this communication of knowledge often changes them qualitatively and that knowledge of high school by moving down becomes knowledge of junior classes. This means the integration of knowledge horizontally, that is, integrity of knowledge is built, which creates a large space for the flight of thought.

#### **2.7 Reflection or rejection (Presentation or a reflection of the students accumulated)**

Presentation is the last element in the design of the lesson of constructive teaching. Teacher, in constructive teaching, prepares worksheets in advance for both individually and gives tasks on the basis of its set goal. Tasks and settings should cover work done in classes; they must start from easy and smoothly move to a more complex, which includes not only understanding, but also a deep comprehension and the transition of this meaning to a higher level, to the level of generation of new knowledge.

#### **3. Modeling knowledge structure**

In a traditional training, the structures of knowledge are established vertically and students get a comprehensive knowledge through the educational years. In the example of Russian, we can say that, at the primary school, students do not get holistic knowledge of a language. A student accomplishes this holistic knowledge in only 6th year of education. Such a set of knowledge leads to fragmentary and non-systematic knowledge. This knowledge as a whole can only be kept by memorizing. Therefore, it becomes difficult to students to find ties, relations, adequacy, and etc between knowledge. Logic modeling of the structures of knowledge allows building a structure of subjective knowledge in the scheme of integrity. Study of the structures in this scheme allows the students to build his knowledge adequately and hold them in a manner of the logic operation of thinking.

#### **4. Building a formal logic model of a language on the basis of the logic of integrity by Piaget**

The model of natural intelligence was established by Piaget. The logic integrity of J. Piaget or the work of sense, at an operational level, that is the level of formal logic, is the work of human brain.

Piaget developed this model in a natural language. Hence, we can say that if the cognitive logic is established in a natural language, then this natural language itself can be adequately and logically structured. Building of a fuzzy-logic language model will be constructed on the basis of Russian by the logic of integrity.

1. By terms of **"groups"** and **"grouping"**, Piaget define a certain equilibrium form of intellectual operations, that is, actions interiorized and organized in the **structure of integrity** (Ж.Пиаже 2001).

The **structure of integrity** of Russian comprises of 10 parts of speech.

**Note: In Russian,** 6 of the parts of speech are independent and 4 of them are supportive. Independent parts of speech are the noun, the adjective, the numeral, the pronoun, the verb and the adverb, and the support parts of speech are the proposition, the conjunction, the interjection and the determiners.

The act of intelligence organized in the holistic structure is transferred by holistic structures of Russian. In this case, these actions are transferred by the holistic structure of Russian.

2. Piaget established 4 stipulations for the "groups" of mathematical order and 5 stipulations for "groupings" of quality orders.

According to a psychological theory of Piaget, **"groups" and "groupings"** definitions compare the following definitions of the Grammar of Russian:

"Groups" and "groupings" fall in the framework of integrity. Within the framework of integrity, there are 10 parts of speech in the language. Piaget consecutively compares the psychological definition of "groups" with the parts of speech in Russian (at the same time, in any language).

Hence, the operational "groups" of the natural language will be 10. Encoding these groups or the parts of speech, that is, altering the logical operation, we get the following holistic structure of the language:

1 -Noun; 2 – Adjective; 3 – Numeral; 4 – Pronoun; 5 – Verb; 6 – Adverb; 7 – Proposition; 8 – Conjunction; 9 – Interjection; and 10 – Determiner

**1 ----2----3----4----5----6----7----8----9----10** 

156 Intelligent Systems

This element of the lesson serves as communication of knowledge under study with past and future knowledge. Addition of a student in this communication of knowledge often changes them qualitatively and that knowledge of high school by moving down becomes knowledge of junior classes. This means the integration of knowledge horizontally, that is,

**2.7 Reflection or rejection (Presentation or a reflection of the students accumulated)**  Presentation is the last element in the design of the lesson of constructive teaching. Teacher, in constructive teaching, prepares worksheets in advance for both individually and gives tasks on the basis of its set goal. Tasks and settings should cover work done in classes; they must start from easy and smoothly move to a more complex, which includes not only understanding, but also a deep comprehension and the transition of this meaning to a

In a traditional training, the structures of knowledge are established vertically and students get a comprehensive knowledge through the educational years. In the example of Russian, we can say that, at the primary school, students do not get holistic knowledge of a language. A student accomplishes this holistic knowledge in only 6th year of education. Such a set of knowledge leads to fragmentary and non-systematic knowledge. This knowledge as a whole can only be kept by memorizing. Therefore, it becomes difficult to students to find ties, relations, adequacy, and etc between knowledge. Logic modeling of the structures of knowledge allows building a structure of subjective knowledge in the scheme of integrity. Study of the structures in this scheme allows the students to build his knowledge

adequately and hold them in a manner of the logic operation of thinking.

The **structure of integrity** of Russian comprises of 10 parts of speech.

**4. Building a formal logic model of a language on the basis of the logic of** 

The model of natural intelligence was established by Piaget. The logic integrity of J. Piaget or the work of sense, at an operational level, that is the level of formal logic, is the work of

Piaget developed this model in a natural language. Hence, we can say that if the cognitive logic is established in a natural language, then this natural language itself can be adequately and logically structured. Building of a fuzzy-logic language model will be constructed on

1. By terms of **"groups"** and **"grouping"**, Piaget define a certain equilibrium form of intellectual operations, that is, actions interiorized and organized in the **structure of** 

**Note: In Russian,** 6 of the parts of speech are independent and 4 of them are supportive. Independent parts of speech are the noun, the adjective, the numeral, the pronoun, the verb

integrity of knowledge is built, which creates a large space for the flight of thought.

higher level, to the level of generation of new knowledge.

**3. Modeling knowledge structure** 

the basis of Russian by the logic of integrity.

**integrity** (Ж.Пиаже 2001).

**integrity by Piaget** 

human brain.

**2.6 Adjunction and connections** 

We will consider **"the groupings" in this holistic structure of the language** as word groups bearing common features. For example: *2 ----3 ----4----5---, 6 –* are grouped as words approached to an object;

Groupings *7----8----9----10* are supportive words, which are not used independently. At the same time, in these "groupings", the parts of speech can be combined with each other in terms of relations and features, that is, within the framework of the holistic system in a new arrangement, related to quality.

For example: **1 ----2; 3 ----4; 1 ----5; 5 ----6; 3 ----4; 4 ----5,** etc.

**Note:** 1 ----2; 3 ----4; 4 ----5 coincide with each other in terms of gender, quantity and the cases of noun).

3. The way of "grouping"

#### **How are these psychological rules of Piaget interpreted and compared with the language rules?**

 Two of the elements of "Grouping" may coincide with each other and as a result, may form a new element or a new unit of knowledge.

From the point of a language, it is understood like this:

For example: **new - 2; home – 1.** Combination of 1 – noun (**home**) and 2 – adjective (**new)**  forms a new element of knowledge – phrase of **new home;** 

*Two* relations A >B> C may be A >C, where they exist.

Logic of Integrity, Fuzzy Logic and Knowledge Modeling for Machine Education 159

**Note:** Identification of the gender of the words subject to Class 4 is performed through

Class 5: It includes the words with ending of *–мя;* these words are declined as *nouns of neuter* 

At the present case, the operation of classification coincides with division of nouns into

2. **The second "grouping" is formed though the operation of seriation** of appropriate

words with endings of *-а,-я* (книга (book) -feminine; земля (land) –feminine, девочка (girl)

The words of this serial includes the nouns of feminine gender and some words of

The words with no endings of the masculine gender and the words of the neuter gender


transformation, that is, by changing of these words in terms of cases of the noun.

*нет коня* (masculine gender) *нет лошади* – feminine gender

endings. The operation of seriation coincides with declination. **Note:** There are three forms of declination in the grammar of Russian: Form 1 includes the words with endings of -**а** , -**я** and **ья;**  Form 2 includes the words with no endings and endings of -**ь;**  Form 3 includes the words with endings of **-о, -е** and **–ье.**

Class 1: the words of feminine gender: книга, земля, девочка, соня book, land, girl, dormouse

Class 2: the words of masculine gender:

стол, ученик, юноша table, student, youth

конь, лошадь, мышь foal, horse, mouse

*gender*

gender.

Serial 1:

Serial 2:

Class 3: the words of neuter gender:

знамя, поле, окно, время, солнце flag, field, window, time, sun Class 4: the words having two ways

There is no foal. *There is no horse.* 

masculine gender, with endings of –а and –я

with endings of –о , -е, and -ье. стол,ученик, поле, окно, солнце. table, student, field, window, sun

For example, 1 – *egg* 5 - *broils* 6 – *light*: can also be expressed like this: *1 – light 6 – egg light egg.* 

*Note: Although the verb defines the action of an object and the adjective defines the character of the action, the adverb defines the character of the noun.* 


For example: A strong wind started to blow in the morning. In this sentence, from the point of point morphology, *strong* – *is a supportive pat of the speech and wind is a key part.* From the point of syntax, *the phrase of strong wind* is a sequence, that is? These words coincide in terms of gender, quantity and case of noun.

An operation united with its repeated operation, it is cancelled, for example:

*Home (noun in singular) + home (noun in singular) = homes (noun in plural) repeated operation - homes = home (noun in singular).* 

 Identical operation – In the knowledge of language, it reconciles the combination of two simple sentences or words in a complex sentence or phrase.

For example: **The sky suddenly got darkened. It started raining strongly. Suddenly the sky got darkened and it started raining strongly …** 

#### **5. Forming of groupings or the parts of speech**

The system of "groupings" is formed through, so-called, logical operations. The logical operations of thinking are understood in the context of language meanings as logical units for conversion or consecutive exercises.

Such operations are implemented as follows:

A present cluster of words are classified and are subjected to seriation:

*книга, стол, дом, земля, девочка, тетрадь, ученик,юноша, знамя, соня, лошадь, поле, окно, время, солнце. день, конь*.

*book, table, house, girl, copybook, school-child, youth, flag, dormouse, horse, field, window, time, sun, day, foal.* 

1. the first operation is the operation of classification, that is, classification of words in terms of gender.

**Note:** In Russian, the nouns are divided into three genders or three categories in terms of their endings:

The feminine gender includes the words with endings of **–а, -я, -ья** and –**ь**. The masculine gender includes the words with no ending, with ending of **ь,** and in some words, with endings of **–а** and **–я**. The neuter gender includes the nouns with endings of **-о -е , -ье,** and in 10 words, with **–мя.**

Class 1: the words of feminine gender:

158 Intelligent Systems

For example, 1 – *egg* 5 - *broils* 6 – *light*: can also be expressed like this: *1 – light 6 – egg light egg. Note: Although the verb defines the action of an object and the adjective defines the character of the* 

All forms of transformation are available. For example, new home; if we separate these

 The composition of the operations is "associated". A physical definition of composition is commented as consideration of the roles of "groupings", that is, the parts of speech

For example: A strong wind started to blow in the morning. In this sentence, from the point of point morphology, *strong* – *is a supportive pat of the speech and wind is a key part.* From the point of syntax, *the phrase of strong wind* is a sequence, that is? These words coincide in terms

*Home (noun in singular) + home (noun in singular) = homes (noun in plural) repeated* 

Identical operation – In the knowledge of language, it reconciles the combination of two

For example: **The sky suddenly got darkened. It started raining strongly. Suddenly the** 

The system of "groupings" is formed through, so-called, logical operations. The logical operations of thinking are understood in the context of language meanings as logical units

*книга, стол, дом, земля, девочка, тетрадь, ученик,юноша, знамя, соня, лошадь, поле,* 

*book, table, house, girl, copybook, school-child, youth, flag, dormouse, horse, field, window,* 

1. the first operation is the operation of classification, that is, classification of words in

**Note:** In Russian, the nouns are divided into three genders or three categories in terms of

The feminine gender includes the words with endings of **–а, -я, -ья** and –**ь**. The masculine gender includes the words with no ending, with ending of **ь,** and in some words, with endings of **–а** and **–я**. The neuter gender includes the nouns with endings of **-о -е , -ье,** and

An operation united with its repeated operation, it is cancelled, for example:

simple sentences or words in a complex sentence or phrase.

A present cluster of words are classified and are subjected to seriation:

*action, the adverb defines the character of the noun.* 

from the points of morphology and syntax.

of gender, quantity and case of noun.

*operation - homes = home (noun in singular).* 

for conversion or consecutive exercises.

*окно, время, солнце. день, конь*.

*time, sun, day, foal.* 

their endings:

terms of gender.

in 10 words, with **–мя.**

Such operations are implemented as follows:

**sky got darkened and it started raining strongly …** 

**5. Forming of groupings or the parts of speech** 

phrases, we can get the followings: 2 – new; 1 – home;

книга, земля, девочка, соня book, land, girl, dormouse

Class 2: the words of masculine gender:

стол, ученик, юноша table, student, youth

Class 3: the words of neuter gender:

знамя, поле, окно, время, солнце flag, field, window, time, sun

Class 4: the words having two ways

конь, лошадь, мышь foal, horse, mouse

**Note:** Identification of the gender of the words subject to Class 4 is performed through transformation, that is, by changing of these words in terms of cases of the noun.

*нет коня* (masculine gender) *нет лошади* – feminine gender There is no foal. *There is no horse.* 

Class 5: It includes the words with ending of *–мя;* these words are declined as *nouns of neuter gender*

At the present case, the operation of classification coincides with division of nouns into gender.

2. **The second "grouping" is formed though the operation of seriation** of appropriate endings. The operation of seriation coincides with declination.

**Note:** There are three forms of declination in the grammar of Russian:

Form 1 includes the words with endings of -**а** , -**я** and **ья;**  Form 2 includes the words with no endings and endings of -**ь;**  Form 3 includes the words with endings of **-о, -е** and **–ье.**

Serial 1:

words with endings of *-а,-я* (книга (book) -feminine; земля (land) –feminine, девочка (girl) -feminine, юноша (youth) - masculine, соня (dormouse) – feminine and masculine)

The words of this serial includes the nouns of feminine gender and some words of masculine gender, with endings of –а and –я

Serial 2:

The words with no endings of the masculine gender and the words of the neuter gender with endings of –о , -е, and -ье.

стол,ученик, поле, окно, солнце.

table, student, field, window, sun

Logic of Integrity, Fuzzy Logic and Knowledge Modeling for Machine Education 161

Mathematical theory of fuzzy sets by *Zadeh* has allowed since more than a quarter of a century ago up to now to describe fuzzy definitions and knowledge and operate with this

content the Property R, is indicated as the set of Ordered Pairs **A = {mA (***х***)/***х***}**, where **mA(***х***)** –

Difference of a fuzzy sub-set from a certain one is that there is not a single valued answer

In this respect, Fuzzy set A of Universal set **E** is indicated as the set of Ordered Pairs **A =** 

Let us understand the term "a universal set" by Zadeh as "groups" and "groupings" or the vocabulary of a language and sign it though – E. Then, the "elements of a set" by Zadeh and "groups" by Piaget will be considered as the parts of speech. Let us sign these elements by x.

**7. Comparison of psychological theories by Piaget with mathematical** 

**6. Definition of Zadeh's fuzzy logic theory** 

Value 1, if *x* contents Property **R,** and vice versa **0**.

"yes-no" for the elements of *x* of E, regarding Property **R**.

The universal set is indicated by – E Elements of a set are indicated by -- X

*is a characteristic function,* accepting

**{mA(***х* of Sub-set **A** of Universal Set **E**.

**theories of Zadeh** 

knowledge as well as make fuzzy outputs. (Пивкин , Бакулин и др)

**R -** Indicates several properties, and *x* – indicates element **E**.

Ordinary (certain) Sub-set **A** of Universal Set **E**, whose elements

These words are transformed, that is, declined in a same manner

Serial 3:

This serial includes the words of the feminine gender with ending of -**ь** (лошадь - horse, мышь - mouse, дочь - daughter)

Serial 4:

Words with ending of –мя. There are only 10 words with such an ending in this language and they are exceptions.

The operation of seriation in Russian is subject to declination of nouns. So, Serial 1 is subject to the first declination; Serial 2 to the second declination; and Serial 3 to the second declination. Serial 4 is an exception.

The operations of classification and seriation form "groups" and "groupings". And with the remaining operations are used to form "sub-groups" and "sub-groupings".


For example: *1—*школа (school), 2-новая (new); 1-- school), 3- одна (alone); 1—школа (school), 4 моя (my) 1 -школа (school); 1- школа (school) 5—строится (to be built); 5 строится (to be built) 6--- быстро (rapidly).

5. **Multiple operations** are those that are included to more than one system at a given period.

**Note:** For example: A category of declination, the number covers 1 --- 2 --- 3 ---4; and the category of number covers 1---2---3—4---5.

As said above, the structures of knowledge are divided into invariant and variable ones.

Invariant structures of knowledge are the parts of speech or the "groups".

Variable structures are categorical structures of knowledge.

**Note:** There are 10 categories of language in Russian.

While modeling subject knowledge, the invariant structures are placed in horizontals of coordinate planes, but the variable ones are placed vertically. In this coordinate plate, the parts of speech of 1,2,.....9,10 will be placed horizontally and1.2,3,.......10 language categories will be placed vertically.

While the study process, the structures of the knowledge of formal logic model of a language are subjected to logical operations of thinking.

Exactly these two important points – logical construction of knowledge and logical operations transform these separate structures of knowledge into a logical model, which is adequate to the way of building of thinking.

#### **6. Definition of Zadeh's fuzzy logic theory**

Mathematical theory of fuzzy sets by *Zadeh* has allowed since more than a quarter of a century ago up to now to describe fuzzy definitions and knowledge and operate with this knowledge as well as make fuzzy outputs. (Пивкин , Бакулин и др)

The universal set is indicated by – E

160 Intelligent Systems

This serial includes the words of the feminine gender with ending of -**ь** (лошадь - horse,

Words with ending of –мя. There are only 10 words with such an ending in this language

The operation of seriation in Russian is subject to declination of nouns. So, Serial 1 is subject to the first declination; Serial 2 to the second declination; and Serial 3 to the second

The operations of classification and seriation form "groups" and "groupings". And with the

3. **The third major operation is the operation of alteration. It replaces a definition or a word with another one. In the Russian grammar, it is equal to definitions of** 

4. **The fourth operation is the operation of enrichment. It has relationship uniting** 

For example: *1—*школа (school), 2-новая (new); 1-- school), 3- одна (alone); 1—школа (school), 4 моя (my) 1 -школа (school); 1- школа (school) 5—строится (to be built); 5—

5. **Multiple operations** are those that are included to more than one system at a given

**Note:** For example: A category of declination, the number covers 1 --- 2 --- 3 ---4; and the

While modeling subject knowledge, the invariant structures are placed in horizontals of coordinate planes, but the variable ones are placed vertically. In this coordinate plate, the parts of speech of 1,2,.....9,10 will be placed horizontally and1.2,3,.......10 language categories

While the study process, the structures of the knowledge of formal logic model of a

Exactly these two important points – logical construction of knowledge and logical operations transform these separate structures of knowledge into a logical model, which is

As said above, the structures of knowledge are divided into invariant and variable ones.

Invariant structures of knowledge are the parts of speech or the "groups".

Variable structures are categorical structures of knowledge.

**Note:** There are 10 categories of language in Russian.

language are subjected to logical operations of thinking.

adequate to the way of building of thinking.

remaining operations are used to form "sub-groups" and "sub-groupings".

**elements of one or another class, that is, equivalence.** 

These words are transformed, that is, declined in a same manner

Serial 3:

Serial 4:

мышь - mouse, дочь - daughter)

declination. Serial 4 is an exception.

**synonyms, antonyms and homonyms.** 

строится (to be built) 6--- быстро (rapidly).

category of number covers 1---2---3—4---5.

and they are exceptions.

period.

will be placed vertically.

Elements of a set are indicated by -- X

**R -** Indicates several properties, and *x* – indicates element **E**.

Ordinary (certain) Sub-set **A** of Universal Set **E**, whose elements

content the Property R, is indicated as the set of Ordered Pairs **A = {mA (***х***)/***х***}**, where **mA(***х***)** – *is a characteristic function,* accepting

Value 1, if *x* contents Property **R,** and vice versa **0**.

Difference of a fuzzy sub-set from a certain one is that there is not a single valued answer "yes-no" for the elements of *x* of E, regarding Property **R**.

In this respect, Fuzzy set A of Universal set **E** is indicated as the set of Ordered Pairs **A = {mA(***х* of Sub-set **A** of Universal Set **E**.

#### **7. Comparison of psychological theories by Piaget with mathematical theories of Zadeh**

Let us understand the term "a universal set" by Zadeh as "groups" and "groupings" or the vocabulary of a language and sign it though – E. Then, the "elements of a set" by Zadeh and "groups" by Piaget will be considered as the parts of speech. Let us sign these elements by x.

Logic of Integrity, Fuzzy Logic and Knowledge Modeling for Machine Education 163

у2 – are the categories of gender – feminine gender (1), masculine gender (2), and neuter

**8. Replacing the formal logic model of the language with the concepts of linguistic logic by Zadeh the model of formal-fuzzy logic at the example of** 

the following fuzzy model of the language: (F.Bunyatova 2009]

Replacing the concepts of formal logic by Piaget with the linguistic logic by Zadeh, we get

(a) If we replace the definition of "groups" with the definition of "elements of a set", then under "set" we will consider "groups" and" groupings" within the framework of holistic.

Under the "groups" will turn the whole vocabulary of the language, and Universal Set

*The variable concepts – are grammatical categories of the language and they will be indicated by У*.

*The variable and invariant concepts will be replaced with the concepts of linguistic variable.* 

gender (3);

**Russian lanqiaqe** 

*Х1, х2,;х3....х10* will be elements of Set E

*E* – will consist of the vocabulary of Russian.

We can indicate the invariant concepts with *X*.

*The Scheme of Fuzzy Model of Language* 

у9 у10

у3 - is the category of case (1-6) ; у4 –are the categories of person; у5 – is the category of person; у6 – is the category of tense; у7 - is the category of type; у8 – is the category of declination;

As we mentioned above, there are 10 parts of speech in Russian. Let us sign each part of speech by – x. Then x1 will be a noun, x2 – a adjective, x3 – a numeral, x4 – a pronoun, x5 – a verb, x6 – an adverb, x7 – conjunction, x8 – proposition, x9 – determiner, and x10 – interjection. Hence, Universal Set E has 10 elements x1, x2, … x10. If we note the element of Universal Set E by 0, 1, as marks of a set, then 0,1 – the word indicates an object; 0,2 – a property of the object; 0,3 – the quantity of the object; and 0,4 points out the object; 0,5 – indicates action of the object; and 0,6 indicates a sign of the action. 0,7 is a conjunction; 0,8 is a proposition; 0,9 is a determiner; and 1 is an interjection.

The parts of speech of Russian, according to the psychological concepts by Piaget, are "groups", and according to mathematical concepts by Zadeh are "elements of a set".

#### **Parts of speech – "groups" – "elements of a set" X – an element of Set E Let us sign the properties of this element by - R**

**Note.** Each part of speech has its own properties – internal rules, which have relationship with rules of other particles of the speech, that is, elements of a set.

The **"fuzzy set"** may be presented in a form of a language vocabulary.

Then, under functions of properties of the "fuzzy set", the words will be interpreted in a direct lexical meaning. For example: *дом (house), стол (table), берег (coast), океан (ocean), and etc.* 

Proceeding from concepts of linguistic variable by Zadeh , the membership function corresponds to two rules:


Then, *У*– will indicate the grammatical rules of the language, and *X-* the parts of speech, which also have their own rules.

Rule *X* will comply with Rule *У*, as Rule *У* complies with Rule *X.*

Then, in this property function:

*х1*– is a noun*; х2*– is an adjective; *х3*– is a numeral; *х4*– is a pronoun; *х5 –*is a verb; *х6* – is an adverb; х7- is a conjunction; х8 – is a proposition; х9 – is a determiner; and х10 - is an interjection. У – are categories of a language: – у1 – is the category of quantity – singular and plural


As we mentioned above, there are 10 parts of speech in Russian. Let us sign each part of speech by – x. Then x1 will be a noun, x2 – a adjective, x3 – a numeral, x4 – a pronoun, x5 – a verb, x6 – an adverb, x7 – conjunction, x8 – proposition, x9 – determiner, and x10 – interjection. Hence, Universal Set E has 10 elements x1, x2, … x10. If we note the element of Universal Set E by 0, 1, as marks of a set, then 0,1 – the word indicates an object; 0,2 – a property of the object; 0,3 – the quantity of the object; and 0,4 points out the object; 0,5 – indicates action of the object; and 0,6 indicates a sign of the action. 0,7 is a conjunction; 0,8 is

The parts of speech of Russian, according to the psychological concepts by Piaget, are

**Note.** Each part of speech has its own properties – internal rules, which have relationship

Then, under functions of properties of the "fuzzy set", the words will be interpreted in a direct lexical meaning. For example: *дом (house), стол (table), берег (coast), океан (ocean), and* 

Proceeding from concepts of linguistic variable by Zadeh , the membership function

1. syntactical, which is set in the form of a grammar that generates the title variable in the

2. semantic, which determines the procedure to compute the meaning of algorithmic each

Then, *У*– will indicate the grammatical rules of the language, and *X-* the parts of speech,

"groups", and according to mathematical concepts by Zadeh are "elements of a set".

a proposition; 0,9 is a determiner; and 1 is an interjection.

with rules of other particles of the speech, that is, elements of a set.

Rule *X* will comply with Rule *У*, as Rule *У* complies with Rule *X.*

The **"fuzzy set"** may be presented in a form of a language vocabulary.

**Parts of speech – "groups" – "elements of a set"** 

**Let us sign the properties of this element by - R** 

**X – an element of Set E** 

corresponds to two rules:

value.

*х1*– is a noun*; х2*– is an adjective; *х3*– is a numeral; *х4*– is a pronoun; *х5 –*is a verb; *х6* – is an adverb; х7- is a conjunction; х8 – is a proposition; х9 – is a determiner; and х10 - is an interjection.

form of language categories.

which also have their own rules.

Then, in this property function:

У – are categories of a language: –

у1 – is the category of quantity – singular and plural

*etc.* 

у10

#### **8. Replacing the formal logic model of the language with the concepts of linguistic logic by Zadeh the model of formal-fuzzy logic at the example of Russian lanqiaqe**

Replacing the concepts of formal logic by Piaget with the linguistic logic by Zadeh, we get the following fuzzy model of the language: (F.Bunyatova 2009]

(a) If we replace the definition of "groups" with the definition of "elements of a set", then under "set" we will consider "groups" and" groupings" within the framework of holistic.

*Х1, х2,;х3....х10* will be elements of Set E

Under the "groups" will turn the whole vocabulary of the language, and Universal Set *E* – will consist of the vocabulary of Russian.

*The variable and invariant concepts will be replaced with the concepts of linguistic variable.* 

*The variable concepts – are grammatical categories of the language and they will be indicated by У*.

We can indicate the invariant concepts with *X*.

*The Scheme of Fuzzy Model of Language* 

Logic of Integrity, Fuzzy Logic and Knowledge Modeling for Machine Education 165

In order to build a model of subject knowledge for machine learning without a teacher, the

1. Divide structures of knowledge into variables that is into linguistic variables and

3. This knowledge is classified divided into classes or elements of the set and is build on the horizontal. The resulting classes of invariant knowledge or elements of the set are denoted by x 0.1 ... x0, 9. Invariant knowledge or the set of knowledge can be finite or

4. Variable, categorical knowledge or linguistic variable are located vertically and denoted by U. They are also classified and denoted by 0.1; 0.2 and etc [3]. In work [Nordhausen and Langley, 1990] it was noted that formation of categories - is the basis of a unified theory of scientific research. Denoting classes and groups of the set, as well as categorical knowledge of a linguistic variable in number, we can present this knowledge with their properties. Each property takes a serial number. So the sentence "The day was sunny" can be defined on a coordinate plane as points: xa, 1,1 y a, 1, x a, 5, y, a 4 / 1 xa, 2 ya1 / 5. Each element of the set, for example, x0,1 is characterized by its own linguistic variable - y0,. Numerical parameters x0,1,1 by the rules of Russian grammar mean masculine noun, y0, 1 mean a single number, etc. Analogous to these characters, a set of proposals on the basis of available knowledge can be constructed. This property can be one of the justifications for machine learning without a teacher as it was derived from an example of the natural language. Each class of knowledge or element of the set is at the same time, a cluster of the structure of knowledge. They have their own rules and laws. Structures of knowledge clusters can be combined, divided, associated and canceled according to logical settings of rules of a linguistic variable. Belonging of the linguistic variable into elements of the set or the logical structures of clusters is determined by logical operations such as operation of substitution, enrichment, identical and multiplicative operations. That is, in this case as the rules of linguistic variable and the rules of elements of the set are in mobile motion all the time, by combining according to the given settings around the logical structure of knowledge

**9. How is a model of subject knowledge built for machine learning?** 

2. Invariant knowledge or universal set of knowledge is denoted by X.

infinite according to the classification of the classes themselves.

or elements of the set become nanostructures of knowledge.

Piaget and fuzzy logic can be applied to them.

the process of knowledge construction will go.

**10. What innovations can bring this knowledge model in education?** 

1. Knowledge is considered in the scheme of integrity. Since language is a means of communication and expression of ideas, then, certainly, the conditions of integrity scheme and logic of Zadeh can be in any scientific knowledge and the rules of logic of

2. Structures of knowledge will be divided into invariant and variable or into syntactic and semantic. Numeric designation of categorical and invariant properties of knowledge makes it possible to build coordinates of knowledge on the basis of which

3. Operationality of thinking enables to collect structures of knowledge into clusters, figure out their interrelationship and attitudes, classify them, enrich, or replace with the other structures. These logical operations gather as a magnet around the structures of

invariant variables namely, into a universal set.

following needs to be done:

**This fuzzy model of a natural language, treating** the conversion of the definition tool of the logic by Zadeh, gets the following form:


For example: *х3- х1* --один мальчик (one boy) , *х2- х1 --* высокий мальчик (a tall boy) ; х1 х4 мой друг (my firend), etc.

3. Fuzzy set A may be presented in the following form:

(We present the noun in the form of А =*0,1 – игра (game): 0,1�+ 0,2-сильная игра (a strong game); 0,1+ 0,3;-* 

*одна игра (a game); 0,1+ 0,4;- моя игра (my game); etc.* 

4. The operation of unification

Unification of two fuzzy sets *A* and *B* will be specified as the following: if *А* –*1-is a noun*, and *В – 2 is an adjective*, then the function of property у*1* and у*2* belongs to *А* and *В.* 

*Based on the abovementioned conversion, the sentence: "На улице стоял сильный мороз" (There was heavy snow on the street) will be indicated as signs like the followings:* 

$$\mathbf{x}\text{7-x1y2 y3/6 - x5y2y5-x2y1y2y3 - x1y1y2y3.}$$

*If we present this sentence in a coordinate plane, this sentence graphically may be indicated as spots in that coordinate plane:* 

$$\infty\text{7-x1y2 y3/6 - x5y2y5-x2y1y2y3 - x1y1y2y3.}$$

**This fuzzy model of a natural language, treating** the conversion of the definition tool of the

1. Fuzzy set may have an end or may be endless; as a part of speech it has an end, but as a

2. Fuzzy set A may be characterized by a set of pairs (composition according to the logic

For example: *х3- х1* --один мальчик (one boy) , *х2- х1 --* высокий мальчик (a tall boy) ; х1-

(We present the noun in the form of А =*0,1 – игра (game): 0,1�+ 0,2-сильная игра (a strong* 

Unification of two fuzzy sets *A* and *B* will be specified as the following: if *А* –*1-is a noun*, and

*Based on the abovementioned conversion, the sentence: "На улице стоял сильный мороз" (There* 

х7- х1у2 у3/6 - х5у2у5-х2у1у2у3 – х1у1у2у3. *If we present this sentence in a coordinate plane, this sentence graphically may be indicated as spots* 

х7- х1у2 у3/6 - х5у2у5-х2у1у2у3 – х1у1у2у3.

*В – 2 is an adjective*, then the function of property у*1* and у*2* belongs to *А* and *В.* 

*was heavy snow on the street) will be indicated as signs like the followings:* 

vocabulary of a language it is endless. Х = (х1, х2... – х10)

3. Fuzzy set A may be presented in the following form:

*одна игра (a game); 0,1+ 0,4;- моя игра (my game); etc.* 

logic by Zadeh, gets the following form:

by Piaget): А= Е *х*

х4 мой друг (my firend), etc.

4. The operation of unification

*0,3;-* 

*in that coordinate plane:* 

*game); 0,1+* 

### **9. How is a model of subject knowledge built for machine learning?**

In order to build a model of subject knowledge for machine learning without a teacher, the following needs to be done:


#### **10. What innovations can bring this knowledge model in education?**


**8** 

*Argentina* 

**Morphosyntactic Linguistic Wavelets** 

Morphosyntactics studies grammatical categories and linguistic units that have both morphological and syntactic properties. In its proscriptive form, morphosyntactics describes the set of rules that govern linguistic units whose properties are definable by both

Thus, morphosyntactics establishes a commons framework for oral and written language that guides the process of externally encoding ideas produced in the mind. Speech is an important vehicle for exchanging thoughts, and phonetics also has a significant influence on oral communication. Hearing deficiency causes a leveling and distortion of phonetic processes and hinders morphosyntactic development, particularly when present during the

Fundamental semantic and ontologic elements of speech become apparent though word usage. For example, the distance between successive occurrences of a word has a distinctive Poisson distribution that is well characterized by a stretched exponential scaling (Altmann, 2004). The variance in this analysis depends strongly on semantic type, a measure of the

Distribution characteristics are related to the semantics and functions of words. The use of words provides a uniquely precise and powerful lens into human thought and activity (Altmann, 2004). As a consequence, word usage is likely to affect other manifestations of

Zipf's empirical law was formulated using mathematical statistics. It refers to the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributions

1 In the English language, the probability of encountering the rth most common word is given roughly

**1. Introduction** 

morphological and syntactic paradigms.

second and third years of life (Kampen, 2005).

**1.1 Words may follow Zipf's empirical law** 

collective human dynamics.

(Figure 1)1 (Wolfram, 2011).

by P(r)=0.1/r (r>1000).

abstractness of each word, and only weakly on frequency.

**for Knowledge Management** 

Daniela López De Luise *Universidad de Palermo* 

knowledge relating to them relevant knowledge and turn them into a nanostructure of knowledge.

4. The logic of integrity by Piaget and fuzzy logic by Zadeh break down the traditional vertical construction of structures of knowledge and arrange them into a horizontal structure (Ф.Бунятова 1990) Such construction of knowledge based on psychology, pedagogy, and high technology, in its turn makes nanopsychopedagogical approach to training (F.Bunyatova 2009)].

#### **11. References**

Gordon Dryden&DR.Jannette Voz (2000) The learning revolution. The learning web.


## **Morphosyntactic Linguistic Wavelets for Knowledge Management**

Daniela López De Luise *Universidad de Palermo Argentina* 

### **1. Introduction**

166 Intelligent Systems

4. The logic of integrity by Piaget and fuzzy logic by Zadeh break down the traditional vertical construction of structures of knowledge and arrange them into a horizontal structure (Ф.Бунятова 1990) Such construction of knowledge based on psychology, pedagogy, and high technology, in its turn makes nanopsychopedagogical approach to

Gordon Dryden&DR.Jannette Voz (2000) The learning revolution. The learning web.

технологиях CopyrightAgency of the Azerbaijan Republic № 328

of nanopsychopedagogy. Ankara, Turkey

URLwww. Eidos-internet-magazine.

содержания предметных знаний.

управления. URL www. allmath.ru/appliedmath

Ф. Д. Бунятова. (2001) Жан Пиаже в школе. Еlmi axtarşlar. Bak. Nordhausen and Langley, 1990 URL.www.filosof.historic.ru/books

Р. А. Алиев, (1999) Баку Soft computering. Баку. АзДНА.

применение к принятию приближенных решений

образование.Баку 1990 Маариф Жан Пиаже. ( 2001) Москва. Избранные труды.

numunələr

F.Bunyatova IETC(2009) Ankara, Turkey Constructive teaching technology and perspectives

Ф. Д. Бунятова. (01.07.02).Baku Применение нечётной логики в образовательных

Ф.Д.Бунятова (2007) Педагогическая технология. Конструктивное обучение.

Ф.Д. Бунятова. AICT-2009 Baku . Логика целостности, нечетка логика и моделирование

Ф. Д. Бунятова. (1990) Баку Логический способ обучения. Альтернативное

В. Я. Пивкин, Е. П. Бакулин, Д. И. Коренькова Нечеткие множества в системах

Лотфи Аскер Заде. (1976). Москва. Мир.Понятие лингвистической переменной и его

F. C. Bunyatova, (2008) Bak Konstruktiv təlim. Mahiyyəti, prinsip, vəzifələr və dərslərdən

knowledge.

**11. References** 

training (F.Bunyatova 2009)].

knowledge relating to them relevant knowledge and turn them into a nanostructure of

Morphosyntactics studies grammatical categories and linguistic units that have both morphological and syntactic properties. In its proscriptive form, morphosyntactics describes the set of rules that govern linguistic units whose properties are definable by both morphological and syntactic paradigms.

Thus, morphosyntactics establishes a commons framework for oral and written language that guides the process of externally encoding ideas produced in the mind. Speech is an important vehicle for exchanging thoughts, and phonetics also has a significant influence on oral communication. Hearing deficiency causes a leveling and distortion of phonetic processes and hinders morphosyntactic development, particularly when present during the second and third years of life (Kampen, 2005).

Fundamental semantic and ontologic elements of speech become apparent though word usage. For example, the distance between successive occurrences of a word has a distinctive Poisson distribution that is well characterized by a stretched exponential scaling (Altmann, 2004). The variance in this analysis depends strongly on semantic type, a measure of the abstractness of each word, and only weakly on frequency.

Distribution characteristics are related to the semantics and functions of words. The use of words provides a uniquely precise and powerful lens into human thought and activity (Altmann, 2004). As a consequence, word usage is likely to affect other manifestations of collective human dynamics.

#### **1.1 Words may follow Zipf's empirical law**

Zipf's empirical law was formulated using mathematical statistics. It refers to the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributions (Figure 1)1 (Wolfram, 2011).

 1 In the English language, the probability of encountering the rth most common word is given roughly by P(r)=0.1/r (r>1000).

Morphosyntactic Linguistic Wavelets for Knowledge Management 169

mathematical heuristic/statistical

extraction of information that highlights and reinforce knowledge that is not readily available in the raw text

they are analogous to the knowledge structure model (Hisgen, 2010). The sequence of the sentences is essential for contextualizing spoken/written

are the Eci, that represent sentence content and retain its main

Eci symbolizes morphosyntactic representations of sentences

also represents the time and frequency dimensionsIV

decompose data into Eci (representation of

Ece (abstract knowledge)V

represents knowledge at different levels of abstraction and detail

concrete/specific knowledge) and

decompose using morphosyntactic rules and "mother sequence" of

words

features

any text

filters

ID **Topic Traditional wavelet MLW**  1 application just for signalsI any textII

> further information that is not readily available in the raw signal

measured as a function of time. They have not undergone any

processed to transform then into a

oscillations per seconds in a signal, measured in Hertz (Hz, cycles per

can represent signal in both the frequency and time domainsIII

decompose data x(t) into a twodimensional function of time and

decompose x(t) using a "mother"

I. Detectable physical quantity or impulse by which information may be sent II. Although this theory is explained in general, it has only been proved in Spanish

IV. This is true within the MLW context, given the statements in rows 4 and 5

V. The knowledge derived from the filtering processing is called Ece in the MLW context

Figure 2 shows a graphical comparison between a signal and its FFT. Figure 3 is a linguistic version: Eci and ER. The graphics in Figure 2 represent the original signal (time-domain) and the resulting FFT decomposition (Lahm, 2002). The images in Figure 3 represent a translated original Spanish text (content from wikipedia.org, topic Topacio) transformed into an Eci (López De Luise, 2007) that models dialog knowledge. (Hisgen, 2010) Statistical modeling of

3 goal highlight, reinforce and obtain

transformation

6 unit Frequency: the number of the

second)

7 domain any type of data, even with sharp discontinuitiesIII

9 scaling role important. Can process at different scales and resolutions

frequency

wavelet W(x)

III. This is an advantage over the FFT alternative

Table 1. Traditional wavelets versus MLW

useful representation

2 type of

transformation

4 time-domain signals

5 frequency-

8 type of

10 data

11 data

result

information

decomposition

decomposition procedure

domain signals

There is no theoretical proof that Zipf's law applies to most languages (Brillouin, 2004), but Wentian Li (Li, 1992) demonstrated empirical evidence supporting the validity of Zipf's law in the domain of language. Li generated a document by choosing each character at random from a uniform distribution including letters and the space character. Its words follow the general trend of Zipf's law. Some experts explain this linguistic phenomenon as a natural conservation of effort in which speakers and hearers minimize the work needed to reach understanding, resulting in an approximately equal distribution of effort consistent with the observed Zipf distribution (Ferrer, 2003).

Fig. 1. Zipf's law for English

Whatever the underlying cause of this behavior, word distribution has established correspondences between social activities and natural and biological phenomena. As language is a natural instrument for representation and communication (Altmann, 2004), it becomes a particularly interesting and promising domain for exploration and indirect analysis of social activity, and it offers a way to understand how humans perform conceptualization. Word meaning is directly related to its distribution and location in context. A word's position is also related to its thematic importance and its usefulness as a keyword (López De Luise, 2008b, 2008c). This kind of information (recurrence, distribution and position) is strongly correlated with morphosyntactic analysis and strongly supports "views of human conceptual structure" in which all concepts, no matter how abstract, directly or indirectly engage contextually specific experience tracing language in the ever larger digital databases of human communications can be a most promising tool for tracing human and social dynamics". Thus, morphosyntactic analysis offers a new and promising tool for the study of dynamic social interaction. (Altmann, 2004).

#### **1.2 Why morphosyntactic wavelets?**

The evidence that wavelets offer the best description of such morphosyntactic decomposition is revealed by comparing the details of both traditional and morphosyntactical analyses.

There is no theoretical proof that Zipf's law applies to most languages (Brillouin, 2004), but Wentian Li (Li, 1992) demonstrated empirical evidence supporting the validity of Zipf's law in the domain of language. Li generated a document by choosing each character at random from a uniform distribution including letters and the space character. Its words follow the general trend of Zipf's law. Some experts explain this linguistic phenomenon as a natural conservation of effort in which speakers and hearers minimize the work needed to reach understanding, resulting in an approximately equal distribution of effort consistent with the

Whatever the underlying cause of this behavior, word distribution has established correspondences between social activities and natural and biological phenomena. As language is a natural instrument for representation and communication (Altmann, 2004), it becomes a particularly interesting and promising domain for exploration and indirect analysis of social activity, and it offers a way to understand how humans perform conceptualization. Word meaning is directly related to its distribution and location in context. A word's position is also related to its thematic importance and its usefulness as a keyword (López De Luise, 2008b, 2008c). This kind of information (recurrence, distribution and position) is strongly correlated with morphosyntactic analysis and strongly supports "views of human conceptual structure" in which all concepts, no matter how abstract, directly or indirectly engage contextually specific experience tracing language in the ever larger digital databases of human communications can be a most promising tool for tracing human and social dynamics". Thus, morphosyntactic analysis offers a new and promising

The evidence that wavelets offer the best description of such morphosyntactic decomposition is revealed by comparing the details of both traditional and

tool for the study of dynamic social interaction. (Altmann, 2004).

**1.2 Why morphosyntactic wavelets?** 

morphosyntactical analyses.

observed Zipf distribution (Ferrer, 2003).

Fig. 1. Zipf's law for English


I. Detectable physical quantity or impulse by which information may be sent

II. Although this theory is explained in general, it has only been proved in Spanish

III. This is an advantage over the FFT alternative

IV. This is true within the MLW context, given the statements in rows 4 and 5

V. The knowledge derived from the filtering processing is called Ece in the MLW context

Table 1. Traditional wavelets versus MLW

Figure 2 shows a graphical comparison between a signal and its FFT. Figure 3 is a linguistic version: Eci and ER. The graphics in Figure 2 represent the original signal (time-domain) and the resulting FFT decomposition (Lahm, 2002). The images in Figure 3 represent a translated original Spanish text (content from wikipedia.org, topic Topacio) transformed into an Eci (López De Luise, 2007) that models dialog knowledge. (Hisgen, 2010) Statistical modeling of

Morphosyntactic Linguistic Wavelets for Knowledge Management 171

Wavelets are mathematical tools that are used to decompose/transform data into different components (coefficients) that describe different levels of detail (Lahm, 2002). Thus, they can extract the main features of a signal while simultaneously and independently analyzing

These tools have been applied to several problems, including the challenges of linguistic information retrieval. For example, wavelets have been used to build a Fuzzy Wavelet Neural Network (FWNN) for decision making over multiple criteria (Chen, 2008). In that analysis, custom built linguistic labels were used to represent information about events and

Wavelets are sometimes used to replace linguistic analysis. For example, Tolba (Tolba, 2005) used consonant and vowel segmentation to develop automatic speech recognition for Arabic speech without linguistic information. Segmentation was performed with a combination of

Hui and Wanglu combined the Linguistic Cloud Model (LCM) with wavelets to produce Advanced Synthetic Aperture Radar (ASAR) image target detection (Hui, 2008). This approach first solves image segmentation, avoids noise and recovers errors. Then, it uses LCM to solve the uncertainty of pixels. Representation using LCM bridges the gap between qualitative knowledge and quantitative knowledge, and it is thus used to map linguistic

To demonstrate the concept of MLW and its relationship to its traditional counterpart, this

Fig. 4. Traditional wavelet decomposition

situations and were processed with the FWNN.

wavelet transformation and spectral analysis.

terms with contextually specific meaning to numeric processing.

table summarizes the main characteristics that unite or distinguish them:

**2.2 Comparison between MLW and traditional wavelets** 

**2. Technical overview** 

**2.1 Wavelets** 

details.

knowledge is beyond the scope of this chapter, but additional information is available in (López De Luise, 2005, 2008, 2008b, 2008c, 2007b, 2007c).

Fig. 2. Signal and frequency decomposition

Fig. 3. Original text and knowledge structure model

Figure 4 shows a sample wavelet decomposition. It is a signature decomposition using a Daubechies wavelet, a wavelet specially suited for this type of image. Figure 5 shows a MLW decomposition of a generic text. There, Ci, and Cj,k stand for abstract knowledge and Fm represents filters. This Figure will be described further in the final section.

Fig. 4. Traditional wavelet decomposition

### **2. Technical overview**

#### **2.1 Wavelets**

170 Intelligent Systems

knowledge is beyond the scope of this chapter, but additional information is available in

(López De Luise, 2005, 2008, 2008b, 2008c, 2007b, 2007c).

Fig. 2. Signal and frequency decomposition

Fig. 3. Original text and knowledge structure model

Figure 4 shows a sample wavelet decomposition. It is a signature decomposition using a Daubechies wavelet, a wavelet specially suited for this type of image. Figure 5 shows a MLW decomposition of a generic text. There, Ci, and Cj,k stand for abstract knowledge and

Fm represents filters. This Figure will be described further in the final section.

Wavelets are mathematical tools that are used to decompose/transform data into different components (coefficients) that describe different levels of detail (Lahm, 2002). Thus, they can extract the main features of a signal while simultaneously and independently analyzing details.

These tools have been applied to several problems, including the challenges of linguistic information retrieval. For example, wavelets have been used to build a Fuzzy Wavelet Neural Network (FWNN) for decision making over multiple criteria (Chen, 2008). In that analysis, custom built linguistic labels were used to represent information about events and situations and were processed with the FWNN.

Wavelets are sometimes used to replace linguistic analysis. For example, Tolba (Tolba, 2005) used consonant and vowel segmentation to develop automatic speech recognition for Arabic speech without linguistic information. Segmentation was performed with a combination of wavelet transformation and spectral analysis.

Hui and Wanglu combined the Linguistic Cloud Model (LCM) with wavelets to produce Advanced Synthetic Aperture Radar (ASAR) image target detection (Hui, 2008). This approach first solves image segmentation, avoids noise and recovers errors. Then, it uses LCM to solve the uncertainty of pixels. Representation using LCM bridges the gap between qualitative knowledge and quantitative knowledge, and it is thus used to map linguistic terms with contextually specific meaning to numeric processing.

#### **2.2 Comparison between MLW and traditional wavelets**

To demonstrate the concept of MLW and its relationship to its traditional counterpart, this table summarizes the main characteristics that unite or distinguish them:

Morphosyntactic Linguistic Wavelets for Knowledge Management 173

**LCM MSW** 

Represents the meaning

 Hand made Automatically extracted Rules Atom generation rule Morphosyntactic rules

Cannot be sliced Can be sliced

Semantic rules Clustering filters

In many approaches, they are manipulated as an undifferentiated bundle divided only into nominal and verbal atoms. The following section describes elements of the morphosyntactic

Language tendencies denote cultural characteristics, which are represented as dialects and regional practices. Noyer (Noyer, 1992) described a hierarchical tree organization defined by applying manually predefined morphological feature filters to manage morphological contrasts. They used this organization as an indicator of linguistic tendencies in language usage. Extensions of this approach attempt to derive the geometry of morphological features2 (Harley, 1994, 1998), with the goal of classifying features into subgroups based on an universal geometry while accounting for universals in feature distribution and realization. In MLW, the structure of the information is organized in a general oriented graph (Eci structure) for only the smallest unit of processing (a sentence), and a hierarchy is defined by a chained sequence of clustering filters (Hisgen, 2010). Language tendencies are

Morphosyntax has also been used to implement a language sentence generator. In an earlier study (Martínez López, 2007), Spanish adverbial phrases were analyzed to extract the reusable structures and discard the remainder, with the goal of using the reusable subset to generate new phrases. Interestingly, the shortest, simplest structures presented the most

Another study (López De Luise, 2007) suggested translating Spanish text, represented by sets of Eci, into a graphic representing the main structure of the content. This structure was

2 This is a well-known method that is used to model phonological features (Clements, 1985) (Sagey,

Represents morphosyntactic characteristics of a sentence

Indirectly manipulated by clustering filtering and morphosyntactic context

Basis Atom Eci

Semantic Directly manipulated by

Table 3. Comparison Between the LCM and the MSW

therefore visible in the configuration of a current graph.

productive patterns and represented 45% of the corpus.

**2.4.1 Detecting language tendencies** 

**2.4.2 Sentence generation** 

1986)

of a word

Fixed Can evolve

term definitions

Characteristics of the unit of processing

approach.


Table 2. Characteristics of traditional wavelets and MLW

#### **2.3 Linguistic cloud model and MLW**

LCM models linguistic knowledge (Li, 2000) using a set of predefined, customized fuzzy linguistic variables. These variables are generated in accordance with two rules:


The MSW and the LCM share a common goal. However, the MSW replaces the manual procedure used to obtain linguistic atoms with automated processing that determines an atom's linguistic category (e.g., noun or verb) (López De Luise, 2007d, 2008c). The result is not an atom or a term but is a structure named Eci (an acronym from the Spanish, Estructura de Composición Interna). The Eci is used to model the morphosyntactic configuration within sentences (López De Luise, 2007; Hisgen, 2010). Thus, the core processing is based on Eci structures instead of linguistic variables. An Eci is a plastic representation that can evolve to reflect more detailed information regarding the represented portion of text. While atoms cannot be sliced, any Eci can be partitioned as required during the learning process. Further differences between the LCM and the MSW are shown in Table 3.

#### **2.4 Morphosyntactics as a goal**

Most morphological and syntactical processing is intended for information retrieval, while alignment supports automatic translation. Those approaches are mainly descriptive and are defined by cross-classifying different varieties of features (Harley, 1994) such as number and person. When morphological operations are an autonomous subpart of the derivation, they acquire a status beyond descriptive convenience. They become linguistic primitives, manipulated by the rules of word formation.

scaled decomposition

signal

De-noising

a corrupted signal

according to results

depends on the functions used as mother function

reduced and representative signal

extract the main features of a

**characteristic morphosyntactic traditional wavelet** 

classification with granularities

summarization compression

Complement knowledge Reconstruct portions of

auto-fitting Must be manually detected

linguistic variables. These variables are generated in accordance with two rules:

generated. An atom is a variable that cannot be sliced into smaller parts.

differences between the LCM and the MSW are shown in Table 3.

LCM models linguistic knowledge (Li, 2000) using a set of predefined, customized fuzzy

1. *The atom generation rule* specifies the manner in which a linguistic "atom" may be

2. *The semantic rule* specifies the procedure by which composite linguistic terms are computed from linguistic atoms. In addition, there are connecting operators ("and" "or", etc.), modifiers ("very" "quite", etc.) and negatives that are treated as soft operators that modify an operand's (atom's) meaning to produce linguistic "terms". The MSW and the LCM share a common goal. However, the MSW replaces the manual procedure used to obtain linguistic atoms with automated processing that determines an atom's linguistic category (e.g., noun or verb) (López De Luise, 2007d, 2008c). The result is not an atom or a term but is a structure named Eci (an acronym from the Spanish, Estructura de Composición Interna). The Eci is used to model the morphosyntactic configuration within sentences (López De Luise, 2007; Hisgen, 2010). Thus, the core processing is based on Eci structures instead of linguistic variables. An Eci is a plastic representation that can evolve to reflect more detailed information regarding the represented portion of text. While atoms cannot be sliced, any Eci can be partitioned as required during the learning process. Further

Most morphological and syntactical processing is intended for information retrieval, while alignment supports automatic translation. Those approaches are mainly descriptive and are defined by cross-classifying different varieties of features (Harley, 1994) such as number and person. When morphological operations are an autonomous subpart of the derivation, they acquire a status beyond descriptive convenience. They become linguistic primitives,

ontology classification

grammatical errors

sequence of filters

Table 2. Characteristics of traditional wavelets and MLW

goal content description and

scaling concept abstraction and

Uses Extract the main concept of a signal

manage spelling and some

Types of wavelets Depends on the specific

**2.3 Linguistic cloud model and MLW** 

**2.4 Morphosyntactics as a goal** 

manipulated by the rules of word formation.


Table 3. Comparison Between the LCM and the MSW

In many approaches, they are manipulated as an undifferentiated bundle divided only into nominal and verbal atoms. The following section describes elements of the morphosyntactic approach.

#### **2.4.1 Detecting language tendencies**

Language tendencies denote cultural characteristics, which are represented as dialects and regional practices. Noyer (Noyer, 1992) described a hierarchical tree organization defined by applying manually predefined morphological feature filters to manage morphological contrasts. They used this organization as an indicator of linguistic tendencies in language usage. Extensions of this approach attempt to derive the geometry of morphological features2 (Harley, 1994, 1998), with the goal of classifying features into subgroups based on an universal geometry while accounting for universals in feature distribution and realization. In MLW, the structure of the information is organized in a general oriented graph (Eci structure) for only the smallest unit of processing (a sentence), and a hierarchy is defined by a chained sequence of clustering filters (Hisgen, 2010). Language tendencies are therefore visible in the configuration of a current graph.

#### **2.4.2 Sentence generation**

Morphosyntax has also been used to implement a language sentence generator. In an earlier study (Martínez López, 2007), Spanish adverbial phrases were analyzed to extract the reusable structures and discard the remainder, with the goal of using the reusable subset to generate new phrases. Interestingly, the shortest, simplest structures presented the most productive patterns and represented 45% of the corpus.

Another study (López De Luise, 2007) suggested translating Spanish text, represented by sets of Eci, into a graphic representing the main structure of the content. This structure was

 2 This is a well-known method that is used to model phonological features (Clements, 1985) (Sagey, 1986)

Morphosyntactic Linguistic Wavelets for Knowledge Management 175

Automatic translation has an important evolution. Translation quality depends on proper pairing or alignment of sources and on appropriate targeting of languages. This sensible

Hwang used morphosyntactics intensively for three kinds of language (Hwang, 2005). The pairs were matched on the basis of morphosyntactical similarities or differences. They investigated the effects of morphosyntactical information such as base form, part-of-speech, and the relative positional information of a word in a statistical machine translation framework. They built word and class-based language models by manipulating

They used the language pairs Japanese-Korean (languages with same word order and high inflection/agglutination4), English-Korean (a highly inflecting and agglutinating language with partial free word order and an inflecting language with rigid word order), and Chinese-Korean, (a highly inflecting and agglutinating language with partially free word

According to the language pairing and the direction of translation, different combinations of morphosyntactic information most strongly improve translation quality. In all cases, however, using morphosyntactic information in the target language optimized translation efficacy. Language models based on morphosyntactic information effectively improved performance. Eci is an important part of the MLW, and it has inbuilt morphophonemic

Speech recognition requires real-time speech detection. This is problematic when modeling languages that are highly inflectional but can be achieved by decomposing words into stems and endings and storing these word subunits (morphemes) separately in the vocabulary. An enhanced morpheme-based language model has been designed for the inflectional Dravidian language Tamil (Saraswathi, 2007). This enhanced, morphemebased language model was trained on the decomposed corpus. The results were compared with word-based bi-gram and trigram language models, a distance-based language model, a dependency-based language model and a class-based language model. The proposed model improves the performance of the Tamil speech recognition system relative to the word-based language models. The MLW approach is based on a similar decomposition into stems and endings, but it includes additional morphosyntactical features that are processed with the same importance as full words (for more information, see the last sections). Thus, we expect that this approach will be suitable for processing

4 This term was introduced by Wilhelm von Humboldt in 1836 to classify languages from a morphological point of view. An agglutinative language is a language that uses agglutination extensively: most words are formed by joining morphemes together. A morpheme is the smallest

component of a word or other linguistic unit that has semantic meaning.

**2.4.5 Improvement of translation quality/performance** 

processing be improved using morphosyntactic tools.

morphological and relative positional information.

descriptors that contribute significantly to this task.

**2.4.6 Speech recognition** 

highly inflectional languages.

order and a non-inflectional language with rigid word order).

tested with 44 subjects (López De Luise, 2005). The results showed that this treatment, even without directly managing semantics, could communicate the original content. Volunteers were able to reconstruct the original text content successfully in 100% of the cases. As MLW is based on Eci structure, it follows that it:


#### **2.4.3 Language comprehension detection**

As language is an expression of mind and its processes, it becomes also the expression of meaning (or lack of meaning) in general. This fact is also true when the subject is the language itself. A recent study focused on the most frequently recurrent morphosyntactic uses in a group of students who study Spanish as a foreign language (González Negrón, 2011) revealed a peculiar distribution of nouns and personal pronouns. These parts of speech were present at a higher frequency than in the speech of native speakers, probably to guarantee the reader comprehension of the text. Other findings included preposition repetition and a significant number of misplaced prepositions. Thus, morphosyntactic statistics detect deficient language understanding. A similar study was performed in (Konopka, 2008) with Mexican subjects living in Chicago (USA). In the case of MLW, the Eci and Ece structures will shape irregular language usage and make detection of incorrect language practices easy.

#### **2.4.4 Semantics detection**

Morphosyntactics can be used to detect certain types of semantics in a text. An analysis of vowel formant structure and vowel space dispersion revealed overall spectral reduction for certain talkers. These findings suggest an interaction between semantic and indexing factors in vowel reduction processes (Cloppera, 2008).

Two morphosyntactic experimental studies of numeral quantifiers in English (*more than* k, *at least* k, *at most* k, and *fewer than* k) (Koster-Moeller, 2008) showed that Generalized Quantifier Theory (GQT)3 must be extant to manage morphosyntactic differences between denotationally equivalent quantifiers. The formal semantic is focused on the correct set of entailment patterns of expressions but is not concerned with deep comprehension or realtime verification. However, certain systematic distinctions occur during real-time comprehension. The degree of compromise implicit in a semantic theory depends on the types of semantic primitives it assumes, and this also influences its ability to treat these phenomena. In (López De Luise, 2008b), sentences were processed to automatically obtain specific semantic interpretations. The shape of the statistics performed over the Eci's internal weighting value (named po) is strongly biased by the semantics behind sentence content.

<sup>3</sup> Generalized Quantifier Theory is a logical semantic theory that studies the interpretation of noun phrases and determinants. The formal theory of generalized quantifiers already existed as a part of mathematical logic (Mostowski, 1957), and it was implicit in Montague Grammar (Montague, 1974). It has been fully developed by Barwise & Cooper (1981) and Keenan & Stavi (Barwise, 1981) as a framework for investigating universal constraints on quantification and inferential patterns concerning quantifiers.

tested with 44 subjects (López De Luise, 2005). The results showed that this treatment, even without directly managing semantics, could communicate the original content. Volunteers were able to reconstruct the original text content successfully in 100% of the cases. As MLW

As language is an expression of mind and its processes, it becomes also the expression of meaning (or lack of meaning) in general. This fact is also true when the subject is the language itself. A recent study focused on the most frequently recurrent morphosyntactic uses in a group of students who study Spanish as a foreign language (González Negrón, 2011) revealed a peculiar distribution of nouns and personal pronouns. These parts of speech were present at a higher frequency than in the speech of native speakers, probably to guarantee the reader comprehension of the text. Other findings included preposition repetition and a significant number of misplaced prepositions. Thus, morphosyntactic statistics detect deficient language understanding. A similar study was performed in (Konopka, 2008) with Mexican subjects living in Chicago (USA). In the case of MLW, the Eci and Ece structures will shape irregular language usage and make detection of incorrect

Morphosyntactics can be used to detect certain types of semantics in a text. An analysis of vowel formant structure and vowel space dispersion revealed overall spectral reduction for certain talkers. These findings suggest an interaction between semantic and indexing factors

Two morphosyntactic experimental studies of numeral quantifiers in English (*more than* k, *at least* k, *at most* k, and *fewer than* k) (Koster-Moeller, 2008) showed that Generalized Quantifier Theory (GQT)3 must be extant to manage morphosyntactic differences between denotationally equivalent quantifiers. The formal semantic is focused on the correct set of entailment patterns of expressions but is not concerned with deep comprehension or realtime verification. However, certain systematic distinctions occur during real-time comprehension. The degree of compromise implicit in a semantic theory depends on the types of semantic primitives it assumes, and this also influences its ability to treat these phenomena. In (López De Luise, 2008b), sentences were processed to automatically obtain specific semantic interpretations. The shape of the statistics performed over the Eci's internal weighting value (named po) is strongly biased by the semantics behind sentence content.

3 Generalized Quantifier Theory is a logical semantic theory that studies the interpretation of noun phrases and determinants. The formal theory of generalized quantifiers already existed as a part of mathematical logic (Mostowski, 1957), and it was implicit in Montague Grammar (Montague, 1974). It has been fully developed by Barwise & Cooper (1981) and Keenan & Stavi (Barwise, 1981) as a framework for investigating universal constraints on quantification and inferential patterns concerning


is based on Eci structure, it follows that it:

**2.4.3 Language comprehension detection** 

in vowel reduction processes (Cloppera, 2008).


language practices easy.

quantifiers.

**2.4.4 Semantics detection** 

#### **2.4.5 Improvement of translation quality/performance**

Automatic translation has an important evolution. Translation quality depends on proper pairing or alignment of sources and on appropriate targeting of languages. This sensible processing be improved using morphosyntactic tools.

Hwang used morphosyntactics intensively for three kinds of language (Hwang, 2005). The pairs were matched on the basis of morphosyntactical similarities or differences. They investigated the effects of morphosyntactical information such as base form, part-of-speech, and the relative positional information of a word in a statistical machine translation framework. They built word and class-based language models by manipulating morphological and relative positional information.

They used the language pairs Japanese-Korean (languages with same word order and high inflection/agglutination4), English-Korean (a highly inflecting and agglutinating language with partial free word order and an inflecting language with rigid word order), and Chinese-Korean, (a highly inflecting and agglutinating language with partially free word order and a non-inflectional language with rigid word order).

According to the language pairing and the direction of translation, different combinations of morphosyntactic information most strongly improve translation quality. In all cases, however, using morphosyntactic information in the target language optimized translation efficacy. Language models based on morphosyntactic information effectively improved performance. Eci is an important part of the MLW, and it has inbuilt morphophonemic descriptors that contribute significantly to this task.

#### **2.4.6 Speech recognition**

Speech recognition requires real-time speech detection. This is problematic when modeling languages that are highly inflectional but can be achieved by decomposing words into stems and endings and storing these word subunits (morphemes) separately in the vocabulary. An enhanced morpheme-based language model has been designed for the inflectional Dravidian language Tamil (Saraswathi, 2007). This enhanced, morphemebased language model was trained on the decomposed corpus. The results were compared with word-based bi-gram and trigram language models, a distance-based language model, a dependency-based language model and a class-based language model. The proposed model improves the performance of the Tamil speech recognition system relative to the word-based language models. The MLW approach is based on a similar decomposition into stems and endings, but it includes additional morphosyntactical features that are processed with the same importance as full words (for more information, see the last sections). Thus, we expect that this approach will be suitable for processing highly inflectional languages.

<sup>4</sup> This term was introduced by Wilhelm von Humboldt in 1836 to classify languages from a morphological point of view. An agglutinative language is a language that uses agglutination extensively: most words are formed by joining morphemes together. A morpheme is the smallest component of a word or other linguistic unit that has semantic meaning.

Morphosyntactic Linguistic Wavelets for Knowledge Management 177



More details of each of these steps are outside of the scope of this chapter (but see (López De

Since knowledge management depends on previous language experiences, filtering is dynamic process that adapts itself to current cognitive capabilities. Furthermore, as shown in the Case Study section, filtering is a very sensitive step in the MLW transformation.

Filtering is a process composed of several filters. The current paper includes the following three clustering algorithms: Simple K-means, Farthest First and Expectation Maximization (Witten, 2005). They are applied sequentially for each new Ece. When an Ece is "mature", the

The distance used to evaluate clustering is based on the similarity between the descriptor values and the internal morphosyntactic metric, po, that weights EBH (representing morphemes). It has been shown that clusters generated with po represent consistent word agglomerations (López De Luise, 2008, 2008b). Although this chapter does not use fuzzy clustering algorithms, it is important to note that such filters require a specific adaptation for

**3.2.4 If "Abstraction" granularity and details are inadequate for the current problem**  Granularity is determined by the ability to discriminate the topic and by the degree of detail required to represent the Eci. In the MLW context it is the logic distance between the current Eci and the Ece partitions7 (see Figure 5). This distance depends on the desired learning approach. In the example included herein (Section 4), it is the number of elements in the Eci that fall within each Ece partition. The distribution of EBHs determines whether a new Ece is a necessary. When the EBHs are too irregular, a new Ece is built per step 3.2.4.1. Otherwise

The current Ece is cleaned so that it keeps all the Ecis that best match its partitions, and a new

Ece that includes all the Ecis that are not well represented is created and linked.

6 A meaningful linguistic unit that cannot be divided into smaller meaningful parts.

distance using the categorical metrics defined in (López De Luise, 2007e).

the new Eci is added to the partition that is the best match.

**3.2.4.1 Insert a new filter, Ece, in the knowledge organization** 

7 Partition in this context is a cluster obtained after the filtering process.

Homogénea, uniform basic structure). EBHs are linked with specific connectors.

sufficient confidence levels (López De Luise, 2007d).


**3.2.3 Apply filtering using the most suitable approach** 

Luise, 2008c) and (López De Luise, 2008)).

filter no longer changes.

### **3. Morphosyntactic linguistic wavelet approach**

#### **3.1 A sequential approach to wavelets**

Because language is complex, soft decomposition into a set of base functions (as in traditional wavelets) is a multi-step process with several components.

Developing numeric wavelets usually includes the following steps:


Language requires additional steps, which are described in more detail in the following section. In brief, these steps are:


A short description of the MLW steps is presented below, with an example in the Use case.

#### **3.2 Details of the MLW process**

Further details of the MLW process are provided in this section, with the considerations relevant to each step included.

#### **3.2.1 Take the original text sample**

Text can be extracted from Spanish dialogs, Web pages, documents, speech transcriptions, and other documents. The case study in the section 4 uses dialogs, transcriptions, and other documents. Several references mentioned in this chapter were based on Web pages.

#### **3.2.2 Compress and translate text into an oriented graph (Called Eci) preserving most morphosyntactic properties**

Original text is processed using predefined and static tables. The main components of this step are as follows:


 5 Syntagm (linguistics) is any sequenced combination of morphologic elements that is considered a unit, has stability and is commonly accepted by native speakers.

Because language is complex, soft decomposition into a set of base functions (as in

4. If the granularity and details are inadequate for the current problem, repeat from step 2

Language requires additional steps, which are described in more detail in the following

2. Compress and translate text into an oriented graph (Eci) preserving most

5. Take the resulting sequence of filtering as a current representation of the knowledge

A short description of the MLW steps is presented below, with an example in the Use case.

Further details of the MLW process are provided in this section, with the considerations

Text can be extracted from Spanish dialogs, Web pages, documents, speech transcriptions, and other documents. The case study in the section 4 uses dialogs, transcriptions, and other

**3.2.2 Compress and translate text into an oriented graph (Called Eci) preserving most** 

Original text is processed using predefined and static tables. The main components of this

5 Syntagm (linguistics) is any sequenced combination of morphologic elements that is considered a unit,

documents. Several references mentioned in this chapter were based on Web pages.

**3. Morphosyntactic linguistic wavelet approach** 

traditional wavelets) is a multi-step process with several components. Developing numeric wavelets usually includes the following steps:

5. Take the resulting coefficients as a current representation of the signal

4. If abstraction granularity and details are insufficient for the current problem

6. Take the resulting Eci as the internal representation of the new text event

2. Apply filtering (decomposition using the mother wavelet)

3. Analyze coefficients defined by the basis function

3. Apply filtering using the most suitable approach

4.1 Insert a new filter, Ece, in the knowledge organization

**3.1 A sequential approach to wavelets** 

1. Take the original signal sample

section. In brief, these steps are: 1. Take the original text sample

4.2 Repeat from step 3

**3.2 Details of the MLW process** 

**3.2.1 Take the original text sample** 

relevant to each step included.

**morphosyntactic properties** 


has stability and is commonly accepted by native speakers.

step are as follows:

morphosyntactic properties

about and ontology of the text


More details of each of these steps are outside of the scope of this chapter (but see (López De Luise, 2008c) and (López De Luise, 2008)).

#### **3.2.3 Apply filtering using the most suitable approach**

Since knowledge management depends on previous language experiences, filtering is dynamic process that adapts itself to current cognitive capabilities. Furthermore, as shown in the Case Study section, filtering is a very sensitive step in the MLW transformation.

Filtering is a process composed of several filters. The current paper includes the following three clustering algorithms: Simple K-means, Farthest First and Expectation Maximization (Witten, 2005). They are applied sequentially for each new Ece. When an Ece is "mature", the filter no longer changes.

The distance used to evaluate clustering is based on the similarity between the descriptor values and the internal morphosyntactic metric, po, that weights EBH (representing morphemes). It has been shown that clusters generated with po represent consistent word agglomerations (López De Luise, 2008, 2008b). Although this chapter does not use fuzzy clustering algorithms, it is important to note that such filters require a specific adaptation for distance using the categorical metrics defined in (López De Luise, 2007e).

#### **3.2.4 If "Abstraction" granularity and details are inadequate for the current problem**

Granularity is determined by the ability to discriminate the topic and by the degree of detail required to represent the Eci. In the MLW context it is the logic distance between the current Eci and the Ece partitions7 (see Figure 5). This distance depends on the desired learning approach. In the example included herein (Section 4), it is the number of elements in the Eci that fall within each Ece partition. The distribution of EBHs determines whether a new Ece is a necessary. When the EBHs are too irregular, a new Ece is built per step 3.2.4.1. Otherwise the new Eci is added to the partition that is the best match.

#### **3.2.4.1 Insert a new filter, Ece, in the knowledge organization**

The current Ece is cleaned so that it keeps all the Ecis that best match its partitions, and a new Ece that includes all the Ecis that are not well represented is created and linked.

 6 A meaningful linguistic unit that cannot be divided into smaller meaningful parts.

<sup>7</sup> Partition in this context is a cluster obtained after the filtering process.

Morphosyntactic Linguistic Wavelets for Knowledge Management 179

Thinking is not a PowerPoint presentation in which the thinker watches the stream of his thoughts. When a person is dreaming or hallucinating, the thinker and his thought-stream

Gelernter describes thinking as a spectrum of many methods that alternate depending on the current attentional focus. When the focus is high, the method is analytic and sharp. When the brain is not sharply focused, emotions are more involved and objects become fuzzy. That description is analogous to the filtering restriction: define sharp clustering first and leave fuzzy clustering approaches for the final steps. As Gelernter writes, "No computer will be creative unless it can simulate all the nuances of human

This section presents a sample case to illustrate the MLW procedure. The database is a set of ten Web pages with the topic "Orchids". From more than 4200 original symbols and morphemes in the original pages, 3292 words were extracted; 67 of them were automatically selected for the example. This section shows the sequential MLW decomposition. Table 4

Because the algorithm has no initial information about the text, we start with a transition state and set the d parameter to 20%. This parameter assesses the difference in the number

The K-means clustering, in the following KM, is used as the first filter with settings N= 5

Filter using KM with the same settings, and the current Diff=11%<d. Keep KM as the

Filter using KM with the same settings, and the current Diff=10%<d. Keep KM as the filter

Filter using KM with d=10% for steady state. This process will indicate whether to change the filter or build a new Ece. Clustering settings are the same, and the current Diff=20%>d.

are not separate. They are blended together. The thinker inhabits his thoughts.

**3.3.3 The separation of the thinker and the thought** 

shows the filtering results for the first six Ecis.

of elements between the most and least populated partitions.

clusters, seed 10. Diff=16%<d. Keep KM as the filter.

emotion."

**4. Case study** 

**4.1 Build Eci1** 

**4.2 Apply filters to Eci1** 

**4.3 Apply filters to Eci2** 

**4.4 Apply filters to Eci3** 

**4.5 Apply filters to Eci4** 

and exit the transition state.

Change to Farthest First (FF) as the filter.

filter.

#### **3.2.4.2 Repeat from step 3.2.3**

#### **3.2.5 Take the resulting sequence of filtering as a current representation of the knowledge about and ontology of the text**

The learned Eci's ontology is distributed along the chain of Eces.

#### **3.2.6 Take the resulting Eci as the internal representation of the new text event**

The specific acquired, concrete knowledge is now condensed in the Eci. This provides a good representation of the original text and its keywording (López De Luise, 2005).

Real texts include contradictions and ambiguities. As previously shown (López De Luise, 2007b), they are processed and handled despite potentially inadequate contextual information. The algorithm does not include detailed clause analysis of or encode linguistic knowledge about the context because these components complicate the process and make it less automatic.

Furthermore, using the po metric can distinguish the following Writing Profiles: general document, Web forum, Web index and blogs. This metric is therefore independent of document size and mentioned text styles (López De Luise, 2007c). Consequently, it is useful to define the quality of the text that is being learned and to decide whether to accept it as a source of knowledge.

#### **3.3 Gelernter's perspective on reasoning**

Section 3.2.3. defines that the clustering algorithms must be used first hard clusterings and afterwards fuzzy. It is not a trivial restriction. Its goal is to organize learning across a range from specific concrete data to abstract and fuzzy information. The filters are therefore organized as a sequence from simple k-means clustering to fuzzy clustering. This approach is compatible with Gelernter's belief that thinking is not a static algorithm that applies to every situation. Thinking requires a set of diverse algorithms that are not limited to reasoning. Some of these algorithms are sharp and deep, allowing clear manipulation of concrete objects, but there are other algorithms with different properties.

David Gelernter Theory (Gelernter, 2010) states that thinking is not the same as reasoning. When your mind wanders, you are still thinking. Your mind is still at work. This free association is an important part of human thought. No computer will be able to think like a man unless it can perform free association.

People have three common misconceptions:

#### **3.3.1 The belief that "thinking" is the same as "reasoning"**

There are several activities in the mind that are not reasoning. The brain keeps working even when the mind is wandering.

#### **3.3.2 The belief that reality and thoughts are different and separated things**

Reality is conceptualized as external while the mental landscape created by thoughts is seen as internal and mental. According to Gelernter, both are essentially the same although the attentional focus varies.

#### **3.3.3 The separation of the thinker and the thought**

Thinking is not a PowerPoint presentation in which the thinker watches the stream of his thoughts. When a person is dreaming or hallucinating, the thinker and his thought-stream are not separate. They are blended together. The thinker inhabits his thoughts.

Gelernter describes thinking as a spectrum of many methods that alternate depending on the current attentional focus. When the focus is high, the method is analytic and sharp. When the brain is not sharply focused, emotions are more involved and objects become fuzzy. That description is analogous to the filtering restriction: define sharp clustering first and leave fuzzy clustering approaches for the final steps. As Gelernter writes, "No computer will be creative unless it can simulate all the nuances of human emotion."

#### **4. Case study**

178 Intelligent Systems

**3.2.5 Take the resulting sequence of filtering as a current representation of the** 

**3.2.6 Take the resulting Eci as the internal representation of the new text event** 

representation of the original text and its keywording (López De Luise, 2005).

The specific acquired, concrete knowledge is now condensed in the Eci. This provides a good

Real texts include contradictions and ambiguities. As previously shown (López De Luise, 2007b), they are processed and handled despite potentially inadequate contextual information. The algorithm does not include detailed clause analysis of or encode linguistic knowledge about the context because these components complicate the process and make it

Furthermore, using the po metric can distinguish the following Writing Profiles: general document, Web forum, Web index and blogs. This metric is therefore independent of document size and mentioned text styles (López De Luise, 2007c). Consequently, it is useful to define the quality of the text that is being learned and to decide whether to accept it as a

Section 3.2.3. defines that the clustering algorithms must be used first hard clusterings and afterwards fuzzy. It is not a trivial restriction. Its goal is to organize learning across a range from specific concrete data to abstract and fuzzy information. The filters are therefore organized as a sequence from simple k-means clustering to fuzzy clustering. This approach is compatible with Gelernter's belief that thinking is not a static algorithm that applies to every situation. Thinking requires a set of diverse algorithms that are not limited to reasoning. Some of these algorithms are sharp and deep, allowing clear manipulation of

David Gelernter Theory (Gelernter, 2010) states that thinking is not the same as reasoning. When your mind wanders, you are still thinking. Your mind is still at work. This free association is an important part of human thought. No computer will be able to think like a

There are several activities in the mind that are not reasoning. The brain keeps working even

Reality is conceptualized as external while the mental landscape created by thoughts is seen as internal and mental. According to Gelernter, both are essentially the same although the

**3.3.2 The belief that reality and thoughts are different and separated things** 

concrete objects, but there are other algorithms with different properties.

**3.3.1 The belief that "thinking" is the same as "reasoning"** 

**3.2.4.2 Repeat from step 3.2.3** 

less automatic.

source of knowledge.

**knowledge about and ontology of the text** 

**3.3 Gelernter's perspective on reasoning** 

man unless it can perform free association. People have three common misconceptions:

when the mind is wandering.

attentional focus varies.

The learned Eci's ontology is distributed along the chain of Eces.

This section presents a sample case to illustrate the MLW procedure. The database is a set of ten Web pages with the topic "Orchids". From more than 4200 original symbols and morphemes in the original pages, 3292 words were extracted; 67 of them were automatically selected for the example. This section shows the sequential MLW decomposition. Table 4 shows the filtering results for the first six Ecis.

#### **4.1 Build Eci1**

Because the algorithm has no initial information about the text, we start with a transition state and set the d parameter to 20%. This parameter assesses the difference in the number of elements between the most and least populated partitions.

#### **4.2 Apply filters to Eci1**

The K-means clustering, in the following KM, is used as the first filter with settings N= 5 clusters, seed 10. Diff=16%<d. Keep KM as the filter.

#### **4.3 Apply filters to Eci2**

Filter using KM with the same settings, and the current Diff=11%<d. Keep KM as the filter.

#### **4.4 Apply filters to Eci3**

Filter using KM with the same settings, and the current Diff=10%<d. Keep KM as the filter and exit the transition state.

#### **4.5 Apply filters to Eci4**

Filter using KM with d=10% for steady state. This process will indicate whether to change the filter or build a new Ece. Clustering settings are the same, and the current Diff=20%>d. Change to Farthest First (FF) as the filter.

Morphosyntactic Linguistic Wavelets for Knowledge Management 181

Detect the Ece1 partition that best suits Eci7 using cohesiveness criteria. The result shows that the partition that holds Eci6 is the best. Eci7 now hangs from this partition as indicated in

Detect the Ece1 partition that best suits Eci8 using the same cohesiveness criteria. The partition that holds Eci5 is the best. Ece1,1 now contains Eci4, Eci5 and Eci6. Filter Ece1,1 using KM with clustering settings of N= 5 clusters, seed 10. The value of Diff=20%>d. Change to

**4.9 Apply filters to Eci7** 

Fig. 7. Ece1,1 hanging from Ece 1

Farthest First (FF) as the next filter.

Fig. 8. Ece1 and Ece1,1 after learning Eci8

Now the Ece sequence is as indicated in Figure 8.

**4.10 Apply filters to Eci8** 

Figure 7.

#### **4.6 Apply filters to Eci5**

Filter using FF with clustering settings N=5 clusters, seed 1. The current Diff=45%>d. Change to Expectation Maximization (EM) as the next filter.

#### **4.7 Apply filters to Eci6**

Filter using EM. The clustering settings are min stdDev :1.0E-6, num clusters: -1 (automatic), seed: 100. The current Diff=13%>d, Log likelihood: -18.04546. Split Ece and filter the more cohesive8 subset of Ecis using EM. Log likelihood: -17.99898, diff=8.


\*This is the result of EM to define the splitting of Ece1.

Table 4. Filtering results for each Eci

#### **4.8 Build Ece2 to Eci6**

Keep all the individuals as Eci1, and put in Ece2 the individuals in cluster 1 (one of the three less cohesive, with lower po). This procedure is shown in Figure 6.

Fig. 6. Ece1 after cleaning up the less cohesive Ecis

 8 Cohesiveness is defined according to MLW as distance and sequence of filters. In this case it is implemented using EM forcing 5 clusters, and selecting the four clusters with more elements.

#### **4.9 Apply filters to Eci7**

180 Intelligent Systems

Filter using FF with clustering settings N=5 clusters, seed 1. The current Diff=45%>d.

Filter using EM. The clustering settings are min stdDev :1.0E-6, num clusters: -1 (automatic), seed: 100. The current Diff=13%>d, Log likelihood: -18.04546. Split Ece and filter the more

Cluster Eci1 Eci2 Eci3 Eci4 Eci5 Eci6 Eci6\* 0 1 ( 17%) 3 ( 33%) 3 ( 20%) 3 ( 20%) 2 ( 6%) 13 ( 29%) 8 ( 18%) 1 1 ( 17%) 1 ( 11%) 5 ( 33%) 5 ( 33%) 4 ( 11%) 7 ( 16%) 8 ( 18%) 2 1 ( 17%) 2 ( 22%) 2 ( 13%) 2 ( 13%) 9 ( 26%) 11 ( 24%) 12 ( 26%) 3 2 ( 33%) 2 ( 22%) 3 ( 20%) 3 ( 20%) 18 ( 51%) 14 ( 31%) 8 ( 18%) 4 1 ( 17%) 1 ( 11%) 2 ( 13%) 2 ( 13%) 2 ( 6%) 9 ( 20%) diff 16,00% 11,00% 10,00% 20,00% 45,00% 13,00% 8,00%

Keep all the individuals as Eci1, and put in Ece2 the individuals in cluster 1 (one of the three

8 Cohesiveness is defined according to MLW as distance and sequence of filters. In this case it is

implemented using EM forcing 5 clusters, and selecting the four clusters with more elements.

Change to Expectation Maximization (EM) as the next filter.

\*This is the result of EM to define the splitting of Ece1.

Fig. 6. Ece1 after cleaning up the less cohesive Ecis

Table 4. Filtering results for each Eci

**4.8 Build Ece2 to Eci6** 

cohesive8 subset of Ecis using EM. Log likelihood: -17.99898, diff=8.

less cohesive, with lower po). This procedure is shown in Figure 6.

**4.6 Apply filters to Eci5** 

**4.7 Apply filters to Eci6** 

Detect the Ece1 partition that best suits Eci7 using cohesiveness criteria. The result shows that the partition that holds Eci6 is the best. Eci7 now hangs from this partition as indicated in Figure 7.

Fig. 7. Ece1,1 hanging from Ece 1

#### **4.10 Apply filters to Eci8**

Detect the Ece1 partition that best suits Eci8 using the same cohesiveness criteria. The partition that holds Eci5 is the best. Ece1,1 now contains Eci4, Eci5 and Eci6. Filter Ece1,1 using KM with clustering settings of N= 5 clusters, seed 10. The value of Diff=20%>d. Change to Farthest First (FF) as the next filter.

Now the Ece sequence is as indicated in Figure 8.

Fig. 8. Ece1 and Ece1,1 after learning Eci8

Morphosyntactic Linguistic Wavelets for Knowledge Management 183

(Cloppera, 2008) C. G. Cloppera & J. B. Pierrehumbert. Effects of semantic predictability and

(Ferrer, 2003) R. Ferrer & R.V. Sole. Least effort and the origins of scaling in human

(Gelernter, 2010) D. Gelernter. Dream-logic, the internet and artificial thought. *EDGE*.

(González Negrón, 2011) N. González Negrón. Usos morfosintácticos en una muestra de

(Harley, 1994) H. Harley. Hug a tree: deriving the morphosyntactic feature hierarchy. *MIT Working Papers in Linguistics* 21, 289-320. ISBN: 9780262561211. USA. (Harley, 1998) H. Harley & E. Ritter. Meaning in Morphology: motivating a feature-

(Hisgen, 2010) D. Hisgen & D. López De Luise. Dialog Structure Automatic Modeling.

(Hui, 2008) H. Hui & P. Wanglu. ASAR Image target recognition based on the combined

(Hwang, 2005) Y. Hwang, T. Watanabe & Y. Sasaki. Empirical Study of Utilizing Morph-Syntactic Information in SMT. *2nd IJCNLP*. ISBN 3-540-29172-5. Korea. (Kampen, 2005) J. Van Kampen. Morph-syntactic development and the effects on the lexicon

(Koster-Moeller, 2008) J. Koster-Moeller, J. Varvoutis & M. Hackl. Verification Procedures

(Konopka, 2008) K. Konopka. Vowels in Contact: Mexican Heritage English in Chicago. Salsa XVI- *Texas Linguistic Forum.* 52: 94–103. *ISSN* 1615-3014. Germany. (Lahm, 2002) Z. Lahm. Wavelets: A tutorial. University of Otago. In: Dep. Of Computer

(Li, 1992) W. Li. Random Texts Exhibit Zipf's-Law-Like Word Frequency Distribution. *IEEE Trans. on Information Theory.* 38 (6): 1842–1845. ISSN: 0018-9448. USA. (Li, 2000) D. Li, K. Di, D. Li & X. Shi. Mining Association Rules With Linguistic Cloud

(López De Luise, 2005) D. López De Luise. A Morphosyntactical Complementary Structure for Searching and Browsing. *CISSE 2005*. Pp. 283 – 290.*ISBN* 1-4020-5262-6. USA. (López De Luise, 2007) D. López De Luise. Una representación alternativa para textos

(López De Luise, 2007b) D. López De Luise. Ambiguity and Contradiction From a Morpho-

(Alternate representation for [Spanish] texts). *J. Ciencia y Tecnología*. Vol 6. ISSN

Syntactic Prototype Perspective. *CISSE*. Bridgeport. *ISBN* 978-1-4020-8740-0. USA. (López De Luise, 2007c) D. López De Luise. A Metric for Automatic Word categorization.

hearing deficiency. Poster. *ELA2005*. *ISBN* 9780387345871. France.

Science (2011). Available from www.cs.otago.ac.nz.

Models. *Journal of Software*,Vol.11, No. 2, pp.143-158.

*SCSS*. Bridgeport. ISBN 978-1-4020-8740-0. USA.

124, 1682-1688. ISSN: 0001-4966. USA.

100 (3): 788–791.*ISSN* 0027-8424. USA.

Available in: www.edge.org

*Pennsylvania*.

7. USA.

1850 0870. Argentina.

N. 1. ISBN: 2-9524532-0-9. Spain.

*MICAI*. *ISBN* 978-3-642-16772-0. Mexico.

ISBN:0-7803-9051-2. South Korea.

regional dialect on vowel space reduction. *Journal of the Acoustical Society of America*,

language. *Proc. of the National Academy of Sciences* of the United States of America

exámenes de estudiantes que cursan el español como idioma extranjero. *ELENET*.

geometric analysis of person and number. *Ms. University of Calgary & University of* 

wavelet transformation. *ISPRS Congress*. Beijing, Proceedings of Commission VII.

(A comparison between normal hearing children and children with a temporary

for Modified Numeral Quantifiers. *Proc. of the 27th WCCFL*. ISBN 978-1-57473-428-

#### **4.11 The representation in MLW**

We do not expect Ece content to be understood from the human point of view, but it should be considered a tool to condense and potentially regenerate knowledge from textual sources. This is a first step in the study of this type of tool that uses mathematical and statistical extraction of knowledge to automatically decompose text and represent it in a selforganizational approach.

For instance, the following sentence from the dataset,

"Dactylorhiza incarnata es orquídea de especies Europeas"

(Dactylorhiza incarnata is an European orchid species)

corresponds to the EBH number 04, and can be found (after MLW) as the sequence Ece1- Ece1,2-Eci4.

If there is an interest in understanding the topic, the main entry of the set of Ecis in the cluster can be used as a brief description. To regenerate the concepts saved in the structure for human understanding, it is only necessary to use the symbolic representation of the Eci (López De Luise, 2007).

#### **5. Conclusion**

MLW is a new approach that attempts to model natural language automatically, without the use of dictionaries, special languages, tagging, external information, adaptation for new changes in the languages, or other supports. It differs from traditional wavelets in that it depends on previous usage, but it does not require human activities to produce definitions or provide specific adaptations to regional settings. In addition, it compresses the original text into the final Eci. However, the long-term results require further testing, both to further evaluate MLW and to evaluate the correspondence between human ontology and conceptualization and the Eces sequence .

This approach can be completed with the use of a po weighting to filter the results of any query or browsing activity according to quality and to detect additional source types automatically.

It will also be important to test the use of categorical metrics for fuzzy filters and to evaluate MLW with alternate distances, filter sequences and cohesiveness parameters.

#### **6. References**


We do not expect Ece content to be understood from the human point of view, but it should be considered a tool to condense and potentially regenerate knowledge from textual sources. This is a first step in the study of this type of tool that uses mathematical and statistical extraction of knowledge to automatically decompose text and represent it in a self-

corresponds to the EBH number 04, and can be found (after MLW) as the sequence Ece1-

If there is an interest in understanding the topic, the main entry of the set of Ecis in the cluster can be used as a brief description. To regenerate the concepts saved in the structure for human understanding, it is only necessary to use the symbolic representation of the Eci

MLW is a new approach that attempts to model natural language automatically, without the use of dictionaries, special languages, tagging, external information, adaptation for new changes in the languages, or other supports. It differs from traditional wavelets in that it depends on previous usage, but it does not require human activities to produce definitions or provide specific adaptations to regional settings. In addition, it compresses the original text into the final Eci. However, the long-term results require further testing, both to further evaluate MLW and to evaluate the correspondence between human ontology and

This approach can be completed with the use of a po weighting to filter the results of any query or browsing activity according to quality and to detect additional source types

It will also be important to test the use of categorical metrics for fuzzy filters and to evaluate

(Altmann,2004) E.G. Altmann, J.B. Pierrehumbert & A.E. Motter. Beyond word frequency:

(Brillouin, 2004) L. Brillouin. La science et la théorie de l'information. Masson, Paris. *Open* 

(Chen, 2008) K. Chen, J. Li. Research on Fuzzy MCDM Method based on Wavelet Neural Network Model. *Information Sci. And Eng*. ISISE'08. 2008. ISBN: 978-0-7695-3494-7. (Clements, 1985) G.N. Clements. The Geometry of Phonological Features. *Phonology Yearbook*

Bursts, lulls, and scaling in the temporal distributions of words. *PLoS ONE* 4(11):

MLW with alternate distances, filter sequences and cohesiveness parameters.

**4.11 The representation in MLW** 

For instance, the following sentence from the dataset,

"Dactylorhiza incarnata es orquídea de especies Europeas" (Dactylorhiza incarnata is an European orchid species)

organizational approach.

(López De Luise, 2007).

conceptualization and the Eces sequence .

e7678. ISSN 1932-6203.

*Library.* ISBN 10-2876470365.

2. pp. 225 - 252. ISBN 9780521332323 . USA.

**5. Conclusion** 

automatically.

**6. References** 

Ece1,2-Eci4.


**1. Introduction** 

**9** 

**Intelligent Distributed** 

*1University of Plovdiv,* 

*1Bulgaria 2UK* 

**eLearning Architecture** 

*2de Montfort University - Leicester Country* 

S. Stoyanov1, H. Zedan2, E. Doychev1, V. Valkanov1,

One of the main characteristics of the eLearning systems today is the 'anytime-anywhereanyhow' delivery of electronic content, personalized and customized for each individual user. To satisfy this requirement new types of context-aware and adaptive software architectures are needed, which are enabled to sense aspects of the environment and use this information to adapt their behavior in response to changing situation. In conformity with [Dey,2000], a context is any information that can be used to characterize the situation of an entity. An entity may be a person, a place, or an object that is considered relevant to the interaction between a

Development of context-aware and adaptive architectures can be benefited from some ideas and approaches of pervasive computing. Pervasive computing is a new paradigm for nextgeneration distributed systems where computers disappear in the background of the users' everyday activities. In such a paradigm computation is performed on a multitude of small devices interconnected through a wireless network. Fundamental to pervasive computing is that any component (including user, hardware and software) can be mobile and that computations are context-aware. As a result, mobility and context-awareness are important features of any design framework for pervasive computing applications. Context-awareness requires applications to be able to sense aspects of the environment and use this information

One of the main goals of the Distributed eLearning Centre (DeLC) project [Ganchev, 2005] is the development of such an architecture and corresponding software that could be used efficiently for on-line eLearning distance education. The approach adopted for the design and development of the system architecture is focused on the development of a serviceoriented and agent-based intelligent system architecture providing wireless and fixed access to electronic services and electronic content. This chapter provides a general description of

Furthermore, we present the Calculus of Context-aware Ambients (CCA in short) for the modelling and verification of mobile systems that are context-aware. This process calculus is

user and an application, including the user and the application themselves.

to adapt their behaviours in response to changing situations.

the architecture for two types of access - mobile and fixed.

I. Popchev1, G. Cholakov1 and M. Sandalski1


## **Intelligent Distributed eLearning Architecture**

S. Stoyanov1, H. Zedan2, E. Doychev1, V. Valkanov1, I. Popchev1, G. Cholakov1 and M. Sandalski1 *1University of Plovdiv, 2de Montfort University - Leicester Country 1Bulgaria 2UK* 

#### **1. Introduction**

184 Intelligent Systems

(López De Luise, 2007d) D. López De Luise & J. Ale. Induction Trees for automatic Word

(López De Luise, 2007e) D. López De Luise. Aplicación de Métricas Categóricas en Sistemas

(López De Luise, 2008) D. López De Luise, M. Soffer. Modelización automática de textos en

(López De Luise, 2008b) D. López De Luise & M. Soffer. Automatic Text processing for

(López De Luise, 2008c) D. López De Luise. Mejoras en la usabilidad de la Web a través de

(Martínez López, 2007) J. A. Martínez López. Patrones e índice de frecuencia en algunas

(Montague, 1974) R. Montague. *Formal Philosophy*. *Yale University Press*. ISBN: 0300015275.

(Barwise, 1981) J. Barwise & R. Cooper. Generalized Quantifiers and Natural Language. *Linguistics and Philosophy* 4, pp. 159-219. ISBN 978-94-007-2268-2. USA. (Mostowski, 1957) A. Mostowski. A generalization of Quantifiers. *Fundamenta Mathematicae*.

(Noyer, 1992) R. Noyer. Features, Positions and Affixes in Autonomous Morphological

(Sagey, 1986) E. Sagey. The representation of Features and Relations in Non-Linear

(Saraswathi, 2007) S. Saraswathi & T.V. Geetha. Comparison of Performance of Enhanced

*Asian Lang. Information.* V. 6, N. 3, Article 9. ISBN: 978-1-4503-0475-7. USA. (Tolba, 2005) M. F. Tolba, T. Nazmy, A. A. Abdelhamid & M. E. Gadallah. A novel method

(Witten, 2005) I.H. Witten & E. Frank. Data Mining - *Practical Machine Learning Tools And Techniques*, 2Nd Edition. Elsevier. *ISBN*: 978-0-12-374856-0. New Zeland. (Wolfram,2011) Zipf's Law. (2011), Wolfram Research, Inc. In: *Wolfram MathWorld*, 2011,

Available from http://mathworld.wolfram.com/ZipfsLaw.html

Morpheme-Based Language Model with Different Word-based Language Models for Improving the Performance of Tamil Speech Recognition System. *ACM Trans.* 

for Arabic consonant/vowel segmentation using wavelet transform. *IJICIS*, Vol. 5,

Structure. MIT PhD dissertation. Cambridge MITWPL. USA.

Phonology. PhD dissertation. MIT. MITWPL. USA.

una estructura complementaria. PhD thesis. Universidad Nacional de La Plata.

locuciones adverbiales. *Forma funcion*, Bogotá. v 20. pp 59-78. ISSN 0120-338X.

Difusos. *IEEE LATIN AMERICA*. ISSN: 1548-0992. Brasil.

castellano. *ANDESCON*. ISBN 978-603-45345-0-6. Peru.

vol 44. pp. 12 - 36. ISSN : 0016-2736. Polish.

No. 1. ISBN: 978-960-474-064-2. USA.

Spanish Texts. *CERMA 2008*. ISBN: 978-0-7695-3320. Mexico.

Classification. *CACIC*.

Argentine.

Colombia.

USA.

One of the main characteristics of the eLearning systems today is the 'anytime-anywhereanyhow' delivery of electronic content, personalized and customized for each individual user. To satisfy this requirement new types of context-aware and adaptive software architectures are needed, which are enabled to sense aspects of the environment and use this information to adapt their behavior in response to changing situation. In conformity with [Dey,2000], a context is any information that can be used to characterize the situation of an entity. An entity may be a person, a place, or an object that is considered relevant to the interaction between a user and an application, including the user and the application themselves.

Development of context-aware and adaptive architectures can be benefited from some ideas and approaches of pervasive computing. Pervasive computing is a new paradigm for nextgeneration distributed systems where computers disappear in the background of the users' everyday activities. In such a paradigm computation is performed on a multitude of small devices interconnected through a wireless network. Fundamental to pervasive computing is that any component (including user, hardware and software) can be mobile and that computations are context-aware. As a result, mobility and context-awareness are important features of any design framework for pervasive computing applications. Context-awareness requires applications to be able to sense aspects of the environment and use this information to adapt their behaviours in response to changing situations.

One of the main goals of the Distributed eLearning Centre (DeLC) project [Ganchev, 2005] is the development of such an architecture and corresponding software that could be used efficiently for on-line eLearning distance education. The approach adopted for the design and development of the system architecture is focused on the development of a serviceoriented and agent-based intelligent system architecture providing wireless and fixed access to electronic services and electronic content. This chapter provides a general description of the architecture for two types of access - mobile and fixed.

Furthermore, we present the Calculus of Context-aware Ambients (CCA in short) for the modelling and verification of mobile systems that are context-aware. This process calculus is

Intelligent Distributed eLearning Architecture 187

existing clusters (the reorganization is done on a virtual level, it does not affect the real organization). For example, the reorganization of an existing cluster can be made not by removing a node but by denying the access to the offered by it services. The reorganization does not disturb the function of other nodes (as nodes are autonomous self-sufficient

An important feature of the eLearning Nodes is the access to supported services and

The current version of DeLC (Fig.2), two standardized architecture supporting fixed and mobile access to the eLearning services and teaching contend have been implemented. The fixed access architecture is adapted for the following domains implemented as particular

Specialized nod for examination of creative thinking and handling of students (CA).

Intelligent agents that support the eLearning services provided by the DeLC portal (AV).

A distinguishable feature of contemporary mobile eLearning (mLearning) systems is the anywhere-anytime-anyhow aspect of delivery of electronic content, which is personalised and customised to suit a particular mobile user [Barker,2000], [Maurer,2001]. In addition, mobile service content is expected to be delivered to users always in the best possible way through the most appropriate connection type according to the always best connected and best served communication paradigm [O'Droma,2007], [Passas,2006]. In the light of these trends, the goal is to develop an intelligent mobile eLearning node which uses an InfoStation-based communication environment with distributed control [Frenkiel,1996], [Ganchev,2007]. The InfoStation paradigm is an extension of the wireless Internet, where mobile clients interact directly with Web service providers (i.e. InfoStations). By their mobile devices the users request services from the nearest InfoStation utilizing Bluetooth or WiFi

The continuing evolution in the capabilities and resources available within modern mobile devices has precipitated an evolution in the realm of eLearning. The architecture presented here attempts to harness the communicative potential of these devices in order to present learners with a more pervasive learning experience, which can be dynamically altered and

educational units providing one or more integral educational services).

electronic content. In relation to the access there are two kinds of nodes:

For both nodes individual reference architectures are proposed within DeLC.

Education portal supporting blended learning in the secondary school;

The node adapts the Creativity Assistant environment [Zedan,2008];

 Specialized node for electronic testing (DeLC Test Center); Specialized node for education in software engineering (eLSE);

The Agent Village will be presented in this chapter in more detail.

 Mobile eLearning Node and Fixed eLearning Node.

**3. Mobile eLearning node** 

wireless communication.

**3.1 InfoStation-based network architecture** 

nodes:

built upon the calculus of mobile ambient and introduces new constructs to enable ambients and processes to be aware of the environment in which they are being executed. This results in a powerful calculus where both mobility and context-awareness are first-class citizens. We present the syntax and a formal semantics of the calculus. We also present a new theory of equivalence of processes which allows the identification of systems that have the same context aware behaviours. We prove that CCA encodes the Pi-calculus which is known to be a universal model of computation.

We have used our CCA to specify DeLC in its entirety, hence achieving its correctness. Such a dynamic system must enforce complex policies to cope with security, mobility and context-awareness. We show how these policies can be formalised and verified using CCA. In particular an important liveness property of the mLearning system is proved using the reduction semantics of CCA.

#### **2. DeLC overview**

Distributed eLearning Center (DeLC) is a reference architecture, supporting a reactive, proactive and personalized provision of education services and electronic content. The DeLC architecture is modeled as a network (Fig.1.), which consists of separate nodes, called eLearning Nodes (eLNs). Nodes model real units (laboratories, departments, faculties, colleges, and universities), which offer a complete or partial educational cycle. Each eLearning Node is an autonomous host of a set of electronic services. The configuration of the network edges is such as to enable the access, incorporation, use and integration of electronic services located on the different eLNs.

#### Fig. 1. DeLC Network Model

The eLearning Nodes can be isolated (eLNp) or integrated in more complex virtual structures, called clusters. Remote eService activation and integration is possible only within a cluster. In the network model we can easily create new clusters, reorganize or remove

built upon the calculus of mobile ambient and introduces new constructs to enable ambients and processes to be aware of the environment in which they are being executed. This results in a powerful calculus where both mobility and context-awareness are first-class citizens. We present the syntax and a formal semantics of the calculus. We also present a new theory of equivalence of processes which allows the identification of systems that have the same context aware behaviours. We prove that CCA encodes the Pi-calculus which is known to be

We have used our CCA to specify DeLC in its entirety, hence achieving its correctness. Such a dynamic system must enforce complex policies to cope with security, mobility and context-awareness. We show how these policies can be formalised and verified using CCA. In particular an important liveness property of the mLearning system is proved using the

Distributed eLearning Center (DeLC) is a reference architecture, supporting a reactive, proactive and personalized provision of education services and electronic content. The DeLC architecture is modeled as a network (Fig.1.), which consists of separate nodes, called eLearning Nodes (eLNs). Nodes model real units (laboratories, departments, faculties, colleges, and universities), which offer a complete or partial educational cycle. Each eLearning Node is an autonomous host of a set of electronic services. The configuration of the network edges is such as to enable the access, incorporation, use and integration of

**eLN1** 

**eLNp** 

The eLearning Nodes can be isolated (eLNp) or integrated in more complex virtual structures, called clusters. Remote eService activation and integration is possible only within a cluster. In the network model we can easily create new clusters, reorganize or remove

**eLNm** 

a universal model of computation.

reduction semantics of CCA.

Fig. 1. DeLC Network Model

electronic services located on the different eLNs.

**Cluster**

**eLNk** 

**2. DeLC overview** 

existing clusters (the reorganization is done on a virtual level, it does not affect the real organization). For example, the reorganization of an existing cluster can be made not by removing a node but by denying the access to the offered by it services. The reorganization does not disturb the function of other nodes (as nodes are autonomous self-sufficient educational units providing one or more integral educational services).

An important feature of the eLearning Nodes is the access to supported services and electronic content. In relation to the access there are two kinds of nodes:


For both nodes individual reference architectures are proposed within DeLC.

The current version of DeLC (Fig.2), two standardized architecture supporting fixed and mobile access to the eLearning services and teaching contend have been implemented. The fixed access architecture is adapted for the following domains implemented as particular nodes:


Intelligent agents that support the eLearning services provided by the DeLC portal (AV). The Agent Village will be presented in this chapter in more detail.

#### **3. Mobile eLearning node**

A distinguishable feature of contemporary mobile eLearning (mLearning) systems is the anywhere-anytime-anyhow aspect of delivery of electronic content, which is personalised and customised to suit a particular mobile user [Barker,2000], [Maurer,2001]. In addition, mobile service content is expected to be delivered to users always in the best possible way through the most appropriate connection type according to the always best connected and best served communication paradigm [O'Droma,2007], [Passas,2006]. In the light of these trends, the goal is to develop an intelligent mobile eLearning node which uses an InfoStation-based communication environment with distributed control [Frenkiel,1996], [Ganchev,2007]. The InfoStation paradigm is an extension of the wireless Internet, where mobile clients interact directly with Web service providers (i.e. InfoStations). By their mobile devices the users request services from the nearest InfoStation utilizing Bluetooth or WiFi wireless communication.

#### **3.1 InfoStation-based network architecture**

The continuing evolution in the capabilities and resources available within modern mobile devices has precipitated an evolution in the realm of eLearning. The architecture presented here attempts to harness the communicative potential of these devices in order to present learners with a more pervasive learning experience, which can be dynamically altered and

Intelligent Distributed eLearning Architecture 189

Repository)

Fig. 3. The 3-tier InfoStation-based network architecture

role of mediators, they also act as the primary service providing nodes.

Centre also acts as the host for global services.

as an integration of two components [Stoyanov,2008]:

**3.2 Context-aware service provision** 

**3rd Tier:** *InfoStations Centre*

**2nd Tier:** *InfoStations* 

The second tier consists of InfoStations, satisfying the users' requests for services through Bluetooth and/or WiFi wireless mobile connections. The InfoStations maintain connections with mobile devices, create and manage user sessions, provide interface to global services offered by the InfoStation Centre, and host local services. The implementation of these local services is an important aspect of this system. By implementing particular services within specific localised regions throughout the University campus, we can enrich the service users experience within these localities. A prime example of how this type of local service can enrich a learners experience, is the deployment of library-based services [Ganchev,2008a]. Within the library domain, library users experience can be greatly enhanced through the facilitation of services offering resource location capabilities or indeed account notifications. The division of global and local services allows for a reduction of the workload placed on the InfoStation Centre. In the original InfoStation architecture, the InfoStations operated only as mediators between the user mobile devices and a centre, on which a variety of electronic services are deployed and executed. The InfoStations within this architecture do not only occupy the

The third tier is the InfoStation Centre concerned with controlling the InfoStations, and overall updating and synchronisation of information across the system. The InfoStation

In order to ensure a context-aware service provision we propose that an application is built

 A standardized **middleware**, which is able to detect the dynamic changes in the environment during the processing of user requests for services *(contex-awareness*) and correspondingly to ensure their efficient and non-problematic execution (*adaptability*); A set of **electronic services** realizing the functionality of the application area

(education), which could be activated and controlled by the middleware.

**1st Tier:** *Mobile Devices*

Assistants for mobile users)

(with Profile Managers and Global Services' Content

(with cached copies of recently used user/service profiles, and Local Services' Content Repository)

(with Intelligent Agents acting as Personal

Fig. 2. Distributed eLearning Center

tailored to suit them. The following network architecture enables mobile users to access various mLearning services, via a set of intelligent wireless access points, or InfoStations, deployed in key points across the University Campus. The InfoStation-based network consists of three tiers as shown in Figure 3.

The first tier encompass the user mobile devices (cell phones, laptops, PDAs), equipped with intelligent agents acting as Personal Assistants to users. The Personal Assistant gathers information about the operating environment onboard the mobile device, as well as soliciting information about the user. Supplied with this information, the InfoStation can make better decisions on applicable services and content to deliver to the Personal Assistant.

**Education Portal for Secondary School**

**3rd Tier:** *InfoStations Centre*

**2nd Tier:** *InfoStations* 

**1st Tier:** *Mobile Devices*

Assistants for mobile users)

Repository)

tailored to suit them. The following network architecture enables mobile users to access various mLearning services, via a set of intelligent wireless access points, or InfoStations, deployed in key points across the University Campus. The InfoStation-based network

The first tier encompass the user mobile devices (cell phones, laptops, PDAs), equipped with intelligent agents acting as Personal Assistants to users. The Personal Assistant gathers information about the operating environment onboard the mobile device, as well as soliciting information about the user. Supplied with this information, the InfoStation can make better decisions on applicable services and content to deliver to the Personal Assistant.

(with Profile Managers and Global Services' Content

(with cached copies of recently used user/service profiles, and Local Services' Content Repository)

(with Intelligent Agents acting as Personal

**DeLC Portal**

**Agent Village (AV)**

**eLearning in Software Engineering (eLSE)**

**Creativity Assistant (CA)**

**DeLC Test Center (DeTC)**

**Info Stations (IS)**

Fig. 2. Distributed eLearning Center

consists of three tiers as shown in Figure 3.

Fig. 3. The 3-tier InfoStation-based network architecture

The second tier consists of InfoStations, satisfying the users' requests for services through Bluetooth and/or WiFi wireless mobile connections. The InfoStations maintain connections with mobile devices, create and manage user sessions, provide interface to global services offered by the InfoStation Centre, and host local services. The implementation of these local services is an important aspect of this system. By implementing particular services within specific localised regions throughout the University campus, we can enrich the service users experience within these localities. A prime example of how this type of local service can enrich a learners experience, is the deployment of library-based services [Ganchev,2008a]. Within the library domain, library users experience can be greatly enhanced through the facilitation of services offering resource location capabilities or indeed account notifications. The division of global and local services allows for a reduction of the workload placed on the InfoStation Centre. In the original InfoStation architecture, the InfoStations operated only as mediators between the user mobile devices and a centre, on which a variety of electronic services are deployed and executed. The InfoStations within this architecture do not only occupy the role of mediators, they also act as the primary service providing nodes.

The third tier is the InfoStation Centre concerned with controlling the InfoStations, and overall updating and synchronisation of information across the system. The InfoStation Centre also acts as the host for global services.

#### **3.2 Context-aware service provision**

In order to ensure a context-aware service provision we propose that an application is built as an integration of two components [Stoyanov,2008]:


Intelligent Distributed eLearning Architecture 191

 *'Change of InfoStation'* - within the InfoStation paradigm, the connection between the InfoStations themselves and the user mobile devices is by definition geographically intermittent. With a number of InfoStations positioned around a University campus, the users may pass through a number of InfoStation serving areas during the service session. This transition between InfoStation areas must be completely transparent to the

 *'Change of InfoStation and user mobile device'* - most complicated scenario whereby the user may change the device simultaneously with the change of the InfoStation.

To support the third aspect of the context change (different communication type), the development of an intelligent component (agent) working within the communication layer (c.f. Figure 4) is envisaged. This component operates with the capability to define and choose the optimal mode of communication, depending on the current prevailing access network conditions (e.g. congestion level, number of active users, average data rate available to each active user, etc.). The user identification and corresponding service personalisation is subject to a middleware adaptation for use in the particular application area. In the case of eLearning, the architecture is extended to support the three fundamental eLearning models - the educational domain model, the user/learner model, and the

The layered system architecture (Figure 4) is a distributed architecture, meaning that its functional entities are deployed across the different tiers/nodes, i.e. on mobile devices, InfoStations, and InfoStation Centre. In this architecture the role of the InfoStations is expanded, enabling them to act (besides the mediation role) as hosts for the local mLearning services (LmS) and for preparation, adaptation, and conclusive delivery of global mLearning services (GmS). This way the service provision is efficiently distributed across the whole architecture. Each of the system network nodes have a different structure depending on their functioning within the system. However, each node is built upon a Communication Layer whose main task is to initialize, control and maintain communications between different nodes. This layer is also concerned with choosing the most appropriate mode of communication between a mobile device and an InfoStation whether that be Bluetooth or WiFi, or indeed as the platform evolves perhaps WiMAX in the future. The software architecture of the InfoStations and InfoStation Centre includes a Service Layer on the top. The main task of this layer is to prepare the execution of the users' service requests, to activate and receive the results of the execution of different

The InfoStations' middle layer is responsible for the execution of scenarios and control of user sessions. It is at this layer where the user service requests are mainly processed by taking into account all contex-aware aspects and applying corresponding adaptive actions. The middle layer of the InfoStation Centre ensures the needed synchronisation during particular scenarios (c.f. Section 8). In addition, different business supporting components,

e.g. for user accounting, charging and billing, may operate here.

The software architecture of the user mobile devices contains two other layers:

user, ensuring the user has continuous access to the service;

pedagogical model [Stoyanov,2005],[ Ganchev,2008c].

**3.3 Layered system architecture** 

services (local and global).

As the middleware is concerned with the context-awareness and adaptability aspects, it is important to clarify these concepts. Within our development approach, Dey's definition [Dey,2000] was adopted, according to which "context is any information that can be used to characterize the situation at an entity". An entity could be a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves. Context could be of different type, e.g. location, identity, activity, time.

Dey's definition is utilized here as a basis for further discussions. In order to elaborate on this definition a working one for the creation of the desired middleware architecture, we first solidify the definition as presented further in the chapter. We want clearly to differentiate context-awareness from the adaptability. Context-awareness is the middleware's ability to identify the changes in the environment/context as regards:


The goal of adaptability is to ensure trouble-free, transparent and adequate fulfilment of user requests for services by taking into account the various aspects of the context mentioned above. In other words, after identifying a particular change in the service environment, the middleware must be able to take compensating actions (counter-measures) such as handover of user service sessions from one InfoStation to another, reformatting/transcoding of service content due to a change of mobile device (varying device capabilities), service personalisation, etc.

To ensure adequate support for user mobility and device mobility (the first two aspects of the context change), the following four main communications scenarios are identified for support in our middleware architecture [Ganchev,2008b]:


As the middleware is concerned with the context-awareness and adaptability aspects, it is important to clarify these concepts. Within our development approach, Dey's definition [Dey,2000] was adopted, according to which "context is any information that can be used to characterize the situation at an entity". An entity could be a person, place, or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves. Context could be of different type, e.g. location, identity,

Dey's definition is utilized here as a basis for further discussions. In order to elaborate on this definition a working one for the creation of the desired middleware architecture, we first solidify the definition as presented further in the chapter. We want clearly to differentiate context-awareness from the adaptability. Context-awareness is the

 Mobile device's location (*device mobility*) - in some cases this mobility leads to changing the serving InfoStation. This is especially important due to the inherent mobility within the system, as users move throughout the University campus. This information has a bearing on the local services deployed within a particular area i.e. within the University

 User device (*user mobility*) - this mobility offers different options for the delivery of the service request's results back to the user. What is important here is to know the capabilities of the new device activated by the user, so as to adapt the service content

 Communication type - depending on the current prevailing wireless network conditions/constraints, the user may avail of different communications possibilities

 User preferences - service personalisation may be needed as to reflect the changes made by users in their preferences, e.g., the way the service content is visualised to them, etc.;

 Environmental context issues such as classmates and/or learner/educator interactions. The goal of adaptability is to ensure trouble-free, transparent and adequate fulfilment of user requests for services by taking into account the various aspects of the context mentioned above. In other words, after identifying a particular change in the service environment, the middleware must be able to take compensating actions (counter-measures) such as handover of user service sessions from one InfoStation to another, reformatting/transcoding of service content due to a change of mobile device (varying device

To ensure adequate support for user mobility and device mobility (the first two aspects of the context change), the following four main communications scenarios are identified for

*'No change'* - a mLearning service is provided within the range of the same InfoStation

 *'Change of user mobile device'* - due to the inherent mobility, it is entirely possible that during an mLearning service session, the user may shift to another mobile device, e.g. with greater capabilities, in order to experience a much richer service environment and

middleware's ability to identify the changes in the environment/context as regards:

activity, time.

Library;

accordingly;

(e.g. Bluetooth or WiFi);

capabilities), service personalisation, etc.

utilize a wider range of resources;

Goal-driven sequencing of tasks engaged in by the user;

support in our middleware architecture [Ganchev,2008b]:

and without changing the user mobile device;


To support the third aspect of the context change (different communication type), the development of an intelligent component (agent) working within the communication layer (c.f. Figure 4) is envisaged. This component operates with the capability to define and choose the optimal mode of communication, depending on the current prevailing access network conditions (e.g. congestion level, number of active users, average data rate available to each active user, etc.). The user identification and corresponding service personalisation is subject to a middleware adaptation for use in the particular application area. In the case of eLearning, the architecture is extended to support the three fundamental eLearning models - the educational domain model, the user/learner model, and the pedagogical model [Stoyanov,2005],[ Ganchev,2008c].

#### **3.3 Layered system architecture**

The layered system architecture (Figure 4) is a distributed architecture, meaning that its functional entities are deployed across the different tiers/nodes, i.e. on mobile devices, InfoStations, and InfoStation Centre. In this architecture the role of the InfoStations is expanded, enabling them to act (besides the mediation role) as hosts for the local mLearning services (LmS) and for preparation, adaptation, and conclusive delivery of global mLearning services (GmS). This way the service provision is efficiently distributed across the whole architecture. Each of the system network nodes have a different structure depending on their functioning within the system. However, each node is built upon a Communication Layer whose main task is to initialize, control and maintain communications between different nodes. This layer is also concerned with choosing the most appropriate mode of communication between a mobile device and an InfoStation whether that be Bluetooth or WiFi, or indeed as the platform evolves perhaps WiMAX in the future. The software architecture of the InfoStations and InfoStation Centre includes a Service Layer on the top. The main task of this layer is to prepare the execution of the users' service requests, to activate and receive the results of the execution of different services (local and global).

The InfoStations' middle layer is responsible for the execution of scenarios and control of user sessions. It is at this layer where the user service requests are mainly processed by taking into account all contex-aware aspects and applying corresponding adaptive actions. The middle layer of the InfoStation Centre ensures the needed synchronisation during particular scenarios (c.f. Section 8). In addition, different business supporting components, e.g. for user accounting, charging and billing, may operate here.

The software architecture of the user mobile devices contains two other layers:

Intelligent Distributed eLearning Architecture 193

conveyance of context information and the delivery of adapted and personalised service content. This multi-agent approach differs from the classic multi-tier architectures in which

Conceptually we define different layers in the system architecture in order to present the functionality of the middleware that is being developed in a more systematic fashion. Implementation-wise, the middleware architecture is considered as a set of interacting intelligent agents. Communication between the user mobile devices and the serving

An agent operating within the InfoStation discovers all new devices entering the range

Personal Assistant agents on the user mobile devices are the active part in

In the current implementation of the prototype architecture, the former approach is used for

Figure 4 highlights the main components necessary to ensure continuity to the service provision, i.e. support for the continuous provision of services and user sessions in the case of scenario change or resource deficiency. The agents which handle the connection and

Also illustrated within Figure 5 are the components which serve to facilitate a level of context sensitivity and personalisation to the presented services. A short description of the various agents (for Bluetooth communication) within the architecture is presented below. The first step in the delivery of the services involves the Scanner agent, which continuously searches for mobile devices/Personal Assistant agents within the service area of the InfoStation. In addition, this agent retrieves a list of services required by users (registered on their mobile devices upon installation of the client part of the application), as well as the profile information, detailing the context (i.e. device capability and user preference information. The Scanner agent receives this information in the form of an XML file, which itself is extracted from the content of an ACL message. The contents of this XML file are then passed on via the Connection Advisor agent, to the Profile Processor agent, which parses the received profile and extracts meaningful information. This information can in turn be

The information is also very important in relation to the tasks undertaken by the Scenario Manager agent. The role of this agent is to monitor and respond to changes in the operating environment, within which the services are operating (i.e. change of mobile device). In the event of a significant change of service environment, this agent gathers the new capability and preference information (CPI) via the Scanner agent. Then, in conjunction with the Query Manager agent and the Content Adaptation agent, facilitates the dynamic adaptation of the

the relationships between the components at a particular tier are much stronger.

and subsequently initiates communication with them; or

session establishment perform different actions, such as:

 Creating a list of services required by mobile devices; Initiation of a wireless connection with mobile devices;

Data transfer to- and from mobile devices.

service content to meet the new service context.

communication, and initiate the connection with the InfoStation.

Bluetooth communication, whereas the latter applies for WiFi communication.

Searching for and finding mobile devices within the range of an InfoStation;

utilized to perform the requisite alterations to services and service content.

InfoStations could be realized in two ways:


Fig. 4. The layered system architecture

#### **3.4 Agent-oriented middleware architecture**

The main implementation challenges within this system are related to the support of distributed control, as the system should be capable of detecting all relevant changes in the environment (context-awareness) and according to these changes, facilitate the service offerings in the most flexible and efficient manner (adaptability). The system architecture presented in the previous section is implemented as a set of cooperating intelligent agents. An agent oriented approach has been adopted in the development of this architecture in order to:


Moreover, the agent-oriented architecture can easily be extended with new agents (where required) that cooperate with the existing ones and communicate by means of a standardized protocol (in this case the FIPA -Agent Communication Language (ACL) [FIPA,2002]). Indeed the InfoStations and InfoStation Centre exist as networks of interoperating agents and services, with the agents fulfilling various essential roles necessary for system management. Within each of these platforms, agents take responsibility for selecting and establishing a client-server cross-platform connection,

 Personal Assistant - its task is to help the user in specifying the service requests sent to the system, accomplish the communication with the InfoStations' software, receive and visualise the service requests' results to the user, etc. Moreover the assistant can provide information needed for the personalisation of services (based on information stored in

 Graphical User Interface (GUI) - its task is to prepare and present the forms for setting up the service requests, and visualise the corresponding results received back from the

The main implementation challenges within this system are related to the support of distributed control, as the system should be capable of detecting all relevant changes in the environment (context-awareness) and according to these changes, facilitate the service offerings in the most flexible and efficient manner (adaptability). The system architecture presented in the previous section is implemented as a set of cooperating intelligent agents. An agent oriented approach has been adopted in the development of this architecture in

 Ensure pro-active middleware behaviour which is quite beneficial in many situations; Use more efficiently the information resources spread over different InfoStations.

Moreover, the agent-oriented architecture can easily be extended with new agents (where required) that cooperate with the existing ones and communicate by means of a standardized protocol (in this case the FIPA -Agent Communication Language (ACL) [FIPA,2002]). Indeed the InfoStations and InfoStation Centre exist as networks of interoperating agents and services, with the agents fulfilling various essential roles necessary for system management. Within each of these platforms, agents take responsibility for selecting and establishing a client-server cross-platform connection,

the user profile) and/or for the synchronisation of scenario execution;

system.

order to:

Fig. 4. The layered system architecture

**3.4 Agent-oriented middleware architecture** 

 Model adequately the real distributed infrastructure; Allow for realisation of distributed models of control; conveyance of context information and the delivery of adapted and personalised service content. This multi-agent approach differs from the classic multi-tier architectures in which the relationships between the components at a particular tier are much stronger.

Conceptually we define different layers in the system architecture in order to present the functionality of the middleware that is being developed in a more systematic fashion. Implementation-wise, the middleware architecture is considered as a set of interacting intelligent agents. Communication between the user mobile devices and the serving InfoStations could be realized in two ways:


In the current implementation of the prototype architecture, the former approach is used for Bluetooth communication, whereas the latter applies for WiFi communication.

Figure 4 highlights the main components necessary to ensure continuity to the service provision, i.e. support for the continuous provision of services and user sessions in the case of scenario change or resource deficiency. The agents which handle the connection and session establishment perform different actions, such as:


Also illustrated within Figure 5 are the components which serve to facilitate a level of context sensitivity and personalisation to the presented services. A short description of the various agents (for Bluetooth communication) within the architecture is presented below.

The first step in the delivery of the services involves the Scanner agent, which continuously searches for mobile devices/Personal Assistant agents within the service area of the InfoStation. In addition, this agent retrieves a list of services required by users (registered on their mobile devices upon installation of the client part of the application), as well as the profile information, detailing the context (i.e. device capability and user preference information. The Scanner agent receives this information in the form of an XML file, which itself is extracted from the content of an ACL message. The contents of this XML file are then passed on via the Connection Advisor agent, to the Profile Processor agent, which parses the received profile and extracts meaningful information. This information can in turn be utilized to perform the requisite alterations to services and service content.

The information is also very important in relation to the tasks undertaken by the Scenario Manager agent. The role of this agent is to monitor and respond to changes in the operating environment, within which the services are operating (i.e. change of mobile device). In the event of a significant change of service environment, this agent gathers the new capability and preference information (CPI) via the Scanner agent. Then, in conjunction with the Query Manager agent and the Content Adaptation agent, facilitates the dynamic adaptation of the service content to meet the new service context.

Intelligent Distributed eLearning Architecture 195

The Query Manager agent receives user service requests via the Connection agent, and may communicate with various services. Once it has passed the request on to the services, all service content is passed back to the Query Manager via the Content Adaptation agent. The Profile Processor agent parses and validates received profiles (XML files) and creates a Document Object Model (DOM) tree [W3C,2010]. Using this DOM tree the XML information may be operated on, to discern the information most pertinent to the adaptation of service content. The Content Adaptation agent receives requests-responses from the services, queries the Profile Processor agent regarding the required context, and then either selects a pre-packaged service content package which closely meets the requirements of the mobile device, or applies a full transformation to the service content to meet the constraints of the

The tasks undertaken by the Content Adaptation agent, the Scenario Manager agent and the Profile Processor agent, enable the system to dynamically adapt to changing service environments, even during a particular service session. Once the connection to a particular service has been initialized and the service content adapted to the requisite format, the

The fixed nodes of the DeLC are implemented as education portals, which provides personalized educational services and teaching material. A standardized portal architecture is described in this section, which is used as generic framework for implementation of particular education portals for university and secondary school. The architecture has been extended by intelligent components (agents, called assistants) in order to enhance the

The architecture of the educational portal is service-oriented and multi-layered, consisting of

The user interface supports the connection between the users and the portal. Through it the users can register in the system and create their own personalized educational environment. The user interface visualizes and provides access for the user to services, depending on their

Two kinds of e-services are located in the middle layer - system services and eLearning services. The system services, called 'engines', are transparent for the users and their basic purpose is to assist in the processing of the eLearning services. Using the information, contained in the meta-objects, they can effectively support the activation, execution and finalization of the eLearning services. In the current portal architecture the next engines are

three logical layers (Figure 6): user interface, e-services and digital libraries.

Connection agent facilitates the transfer of the information to the user mobile device.

operating environment of the device.

**4. Fixed eLearning node** 

**4.1 Education portal architecture** 

role, assigned during the registration.

Events and Remainders Engine;

implemented:

 SCORM Engine; Exams Engine;

 Integration Engine; User Profiling.

flexibility, reactivity and pro-activeness of the portals.

The main duty of the Connection Adviser agent is to filter the list (received from the Scanner agent) of mobile devices as well as requested services. The filtration is carried out with respect to a given (usually heuristic) criterion. Information needed for the filtration is stored in a local database. The Connection Adviser agent sends the filtered list to the Connection Initiator agent, who takes on the task of initiating a connection with the Personal Assistant onboard the mobile device. This agent generates the so-called Connection Object, through which a communication with the mobile device is established via Bluetooth connection. Once this connection has been established, the Connection Initiator generates an agent to which it hands over the control of the connection, called a Connection agent.

From this point on, all communications between the InfoStation and the Personal Assistant are directed by the Connection agent. The internal architecture of the Connection agent contains three threads: an agent thread used for communication with the Query Manager agent, and a Send thread and Receive thread, which look after each direction of the wireless communication with the mobile device.

The Query Manager performs one of the most crucial tasks within the InfoStation architecture. It determines where information received from the mobile device is to be directed, e.g. directly to simple services, or via Interface agents to sophisticated services. It also transforms messages coming from the Connection agent into messages of the correct protocols to be understood by the relevant services, i.e. for simple services - UDDI or SOAP, or for increasingly sophisticated services by using more complicated, semanticoriented protocols (e.g. OWL-S [OWL-S,2010]). The Query Manager agent also interacts with the Content Adaptation agent in order to facilitate the Personal Assistant with increasingly contextualised service content. This Content Adaptation agent, operating under the remit of the Query Manager agent, essentially performs the role of an adaptation engine, which takes in the profile information provided by the Profile Processor agent, and executes the requisite adaptation operations on the service content (e.g. file compression, image resizing etc.)

Fig. 5. The Agent-Oriented Middleware Architecture

The main duty of the Connection Adviser agent is to filter the list (received from the Scanner agent) of mobile devices as well as requested services. The filtration is carried out with respect to a given (usually heuristic) criterion. Information needed for the filtration is stored in a local database. The Connection Adviser agent sends the filtered list to the Connection Initiator agent, who takes on the task of initiating a connection with the Personal Assistant onboard the mobile device. This agent generates the so-called Connection Object, through which a communication with the mobile device is established via Bluetooth connection. Once this connection has been established, the Connection Initiator generates an agent to which it hands over the control of the connection, called a

From this point on, all communications between the InfoStation and the Personal Assistant are directed by the Connection agent. The internal architecture of the Connection agent contains three threads: an agent thread used for communication with the Query Manager agent, and a Send thread and Receive thread, which look after each direction of the wireless

The Query Manager performs one of the most crucial tasks within the InfoStation architecture. It determines where information received from the mobile device is to be directed, e.g. directly to simple services, or via Interface agents to sophisticated services. It also transforms messages coming from the Connection agent into messages of the correct protocols to be understood by the relevant services, i.e. for simple services - UDDI or SOAP, or for increasingly sophisticated services by using more complicated, semanticoriented protocols (e.g. OWL-S [OWL-S,2010]). The Query Manager agent also interacts with the Content Adaptation agent in order to facilitate the Personal Assistant with increasingly contextualised service content. This Content Adaptation agent, operating under the remit of the Query Manager agent, essentially performs the role of an adaptation engine, which takes in the profile information provided by the Profile Processor agent, and executes the requisite adaptation operations on the service content

Connection agent.

communication with the mobile device.

(e.g. file compression, image resizing etc.)

Fig. 5. The Agent-Oriented Middleware Architecture

The Query Manager agent receives user service requests via the Connection agent, and may communicate with various services. Once it has passed the request on to the services, all service content is passed back to the Query Manager via the Content Adaptation agent. The Profile Processor agent parses and validates received profiles (XML files) and creates a Document Object Model (DOM) tree [W3C,2010]. Using this DOM tree the XML information may be operated on, to discern the information most pertinent to the adaptation of service content. The Content Adaptation agent receives requests-responses from the services, queries the Profile Processor agent regarding the required context, and then either selects a pre-packaged service content package which closely meets the requirements of the mobile device, or applies a full transformation to the service content to meet the constraints of the operating environment of the device.

The tasks undertaken by the Content Adaptation agent, the Scenario Manager agent and the Profile Processor agent, enable the system to dynamically adapt to changing service environments, even during a particular service session. Once the connection to a particular service has been initialized and the service content adapted to the requisite format, the Connection agent facilitates the transfer of the information to the user mobile device.

#### **4. Fixed eLearning node**

The fixed nodes of the DeLC are implemented as education portals, which provides personalized educational services and teaching material. A standardized portal architecture is described in this section, which is used as generic framework for implementation of particular education portals for university and secondary school. The architecture has been extended by intelligent components (agents, called assistants) in order to enhance the flexibility, reactivity and pro-activeness of the portals.

#### **4.1 Education portal architecture**

The architecture of the educational portal is service-oriented and multi-layered, consisting of three logical layers (Figure 6): user interface, e-services and digital libraries.

The user interface supports the connection between the users and the portal. Through it the users can register in the system and create their own personalized educational environment. The user interface visualizes and provides access for the user to services, depending on their role, assigned during the registration.

Two kinds of e-services are located in the middle layer - system services and eLearning services. The system services, called 'engines', are transparent for the users and their basic purpose is to assist in the processing of the eLearning services. Using the information, contained in the meta-objects, they can effectively support the activation, execution and finalization of the eLearning services. In the current portal architecture the next engines are implemented:


Intelligent Distributed eLearning Architecture 197

*User Interface*

*Sevices*

Test Engine Engine

*Digital Libraries*

*Catalogue* 

**DLibM DLibQ DLibT DLibP DLibD** 

eLearning Service

Event

AVCall Processor

Integration & User Profiling

In order to provide more effective and personalized user support, we need to enhance the flexibility, reactivity and pro-activeness of the portal including intelligent components into the architecture. The pro-activity improves the usability and friendliness of the system to the users. Pro-activity means that the software can operate "on behalf'' of the user'' and "activate itself'' when it "estimates'' that its intervention is necessary. Two approaches are available: Direct integration of intelligent components in the currently existing architecture - in

The litter approach is preferable because it match with DeLC philosophy for building of more complex structures. Moreover, the former approach involves difficulties in the integration of two environments with different characteristics - portal frame and agent-

The education cluster consists of two nodes - the existing portal and a new node, called Agent Village (AV), where the "assistants'' will "live in'' (Figure 7). Three basic problems

AV node is implemented as an agent-oriented server, by help of JADE environment

Fig. 6. Standarised Architecture of the portal

SCORM Engine

this way we extend the existing portal architecture;

What kind of intelligent assistance for the portal services.

**4.2 Education cluster** 

oriented environment.

[Bellifemine,2007].

Building an education cluster.

Architecture of the AV node;

have to be solve in order to create the cluster:

Interaction between the portal and AV;

SCORM Engine is implemented in the portal architecture for delivering an interpreter of the electronic content, developed in accordance with the SCORM 2004 standard. The Test Engine assists in performing electronic testing using the portal. It processes basically the meta objects, which describe the questions and the patterns of the tests. The Event Engine supports a model for event management, enabling the users to see and create events and also be notified for them in advance. The events in the system reflect important moments for the users, such as a lecture, examination, test, national holiday, birthday, etc. One event is characterized by attributes, such as a name, start and end date and time, details, and information if it is a recurring one, as well as rules for its recurrence. The Event Engine supports yearly, monthly and weekly recurring. The User Profiling implements the user model of the portal. The profiles could be classified by roles, user groups, communities, and organizations. The standard user profile consists of three main groups of attributes:


The portal gives an opportunity for extending the user profile with some additional attributes. The users' profiles contain the whole information needed for personalization of the provided by DeLC portal services, educational content and user interface. The profile is created automatically during the first user's log in, through a call to the university's database, filling in the standard and custom attributes. The integration with the university database and with ?nother external components is supported by the Integration Engine. Extended attributes are filled by the user. During each next user's log in in the portal the information in their profile is synchronized, as eventual updates in the university's database are automatically migrated in the user's profile, for example passage in the upper course or changing the subject.

Educational services serve all stages in one educational process. Supported by the portal, services are grouped in three categories:


The third layer contains electronic content in the form of repositories, known as digital libraries. In the current version are supported lecture courses digital library, questionary library, test templates library, course projects library and diploma theses library. The supported portal services work directly with the digital libraries. The digital libraries content can be navigated by help of a generalized catalog.

SCORM Engine is implemented in the portal architecture for delivering an interpreter of the electronic content, developed in accordance with the SCORM 2004 standard. The Test Engine assists in performing electronic testing using the portal. It processes basically the meta objects, which describe the questions and the patterns of the tests. The Event Engine supports a model for event management, enabling the users to see and create events and also be notified for them in advance. The events in the system reflect important moments for the users, such as a lecture, examination, test, national holiday, birthday, etc. One event is characterized by attributes, such as a name, start and end date and time, details, and information if it is a recurring one, as well as rules for its recurrence. The Event Engine supports yearly, monthly and weekly recurring. The User Profiling implements the user model of the portal. The profiles could be classified by roles, user groups, communities, and organizations. The standard user profile consists of three main groups

Standard attributes - necessary for user identification through username, password, e-

Extended attributes - addresses, phone numbers, Internet pages, IM, social networks

DeLC custom attributes - other user identifications. Thus, for example, for users with

The portal gives an opportunity for extending the user profile with some additional attributes. The users' profiles contain the whole information needed for personalization of the provided by DeLC portal services, educational content and user interface. The profile is created automatically during the first user's log in, through a call to the university's database, filling in the standard and custom attributes. The integration with the university database and with ?nother external components is supported by the Integration Engine. Extended attributes are filled by the user. During each next user's log in in the portal the information in their profile is synchronized, as eventual updates in the university's database are automatically migrated in the user's profile, for example passage in the upper course or

Educational services serve all stages in one educational process. Supported by the portal,

 Services for conduction and management of the education process - examples of these services are electronic lectures, electronic testing, online and offline consultations; Services for recording and documenting the educational process - these services support automated generation of the documents recording the educational process (examination protocols, student books, teachers' personal notebooks and archives). The third layer contains electronic content in the form of repositories, known as digital libraries. In the current version are supported lecture courses digital library, questionary library, test templates library, course projects library and diploma theses library. The supported portal services work directly with the digital libraries. The digital libraries

Services for training, organizing and planning of the educational process;

role "student" these can be faculty number, subject, faculty, and course.

of attributes:

mail, and others;

changing the subject.

services are grouped in three categories:

content can be navigated by help of a generalized catalog.

contacts, and others;

Fig. 6. Standarised Architecture of the portal

#### **4.2 Education cluster**

In order to provide more effective and personalized user support, we need to enhance the flexibility, reactivity and pro-activeness of the portal including intelligent components into the architecture. The pro-activity improves the usability and friendliness of the system to the users. Pro-activity means that the software can operate "on behalf'' of the user'' and "activate itself'' when it "estimates'' that its intervention is necessary. Two approaches are available:


The litter approach is preferable because it match with DeLC philosophy for building of more complex structures. Moreover, the former approach involves difficulties in the integration of two environments with different characteristics - portal frame and agentoriented environment.

The education cluster consists of two nodes - the existing portal and a new node, called Agent Village (AV), where the "assistants'' will "live in'' (Figure 7). Three basic problems have to be solve in order to create the cluster:


AV node is implemented as an agent-oriented server, by help of JADE environment [Bellifemine,2007].

Intelligent Distributed eLearning Architecture 199

to satisfy the preferences (wishes) of the user. The agent can inform the user of its actions

The difficulties, associated with the management of the pro-activity of our architecture, result from the fact that the portal is designed for reaction of the user's requests. Therefore the pro-activity can be managed only asynchronously and for this purpose we provide development of a specialized service, which is to check a "mailbox'' periodically for

According our architecture, the reactivity and the pro-activity are possible if the environment of the agents (Agent Village) remains not more passive. In order to be identified, the agents need a wrapper (the environment), which "masks'' it as a web service for the portal. In such a way the portal send the request to this service (masked environment), which in its turn transform the request into an ACL message, understandable for the agents. In a similar manner the active environment transform ACL messages into

The Evaluator Assistant (EA) provides expert assistance to the teacher in assessment of the electronic tests. In the Exam Engine a service is built for automated assessment of "choice like'' questions. In the standard version of the architecture questions of the "free text'' type are assessed by the teacher and the ratings are entered manually in the service to prepare the final assessment of the test. In the cluster the Exam Engine calls the assistant (an intelligent agent), which makes an "external'' assessment of the "free text'' type questions. In the surrounding environment of the EA, the received SOAP Request messages are transformed into ACL messages, understandable for the agent. Some of the basic parameters

The EA plans the processing of the request. In the current version of the assistant two

 **Word Matching (WM)** method - counts "exact hits'' of the keywords in the answer. The minimum threshold of percentage match (i.e. a keyword to be considered as "guessed''), which is laid in the experiments, is between 70% and 80%. Intentionally, the method does not look for 100% match, in order to give a chance to words with some minor typos also to be recognized. To calculate the points, offered by this method, a coefficient is formed in the following way: the number of hits is divided by the number of keywords. The actual number of points for the answer is calculated as the maximum

SOAP responses, which can be process from the portal services.

Text, which is an answer of a "free text'' type question.

number of points is multiplied by this coefficient;

 Parameters for the used estimation method. Maximum number of points for this answer.

methods are available for estimation:

The next assistants are developing in the first version of the AV node:

through the educational portal.

incoming messages from AV.

Evaluator Assistant (EA);

 FraudDetector; Statistician; Intelbos

of the messages are:

The connection of the educational portal and the AV node is made through the middle layer of the portal architecture, where the electronic services are located. Depending on the direction of the asked assistance we distinguish reactive and proactive behavior of the architecture. In the reactive behavior the interaction between the two nodes is initiated by the portal. This is necessary in the cases when a user request is processed and a service needs an "expert'' assistance. The service addresses the corresponding agent, located in the AV. The problem is that, in their nature, the services are passive and static software modules, intended mainly for the convenient realization and integration of some business functionality. Therefore they must "transfer'' the responsibility for the activation and support of the connection to an active component of the architecture, as agents do. To do this, the service sends a concrete message to the agent's environment, which, on its behalf, identifies the change of the environment and reacts by interpreting the message. Depending on the identified need of assistance the agent activates the necessary actions. The reactive behavior of the architecture could be implemented using a:


Fig. 7. Cluster architecture

In the proactive behavior (agents work "on behalf of the user''), an agent from the AV can determine that in its environment "something is happening'', that would be interesting for the user, who is assisted by that agent. The agent activates and it can perform certain actions to satisfy the preferences (wishes) of the user. The agent can inform the user of its actions through the educational portal.

The difficulties, associated with the management of the pro-activity of our architecture, result from the fact that the portal is designed for reaction of the user's requests. Therefore the pro-activity can be managed only asynchronously and for this purpose we provide development of a specialized service, which is to check a "mailbox'' periodically for incoming messages from AV.

According our architecture, the reactivity and the pro-activity are possible if the environment of the agents (Agent Village) remains not more passive. In order to be identified, the agents need a wrapper (the environment), which "masks'' it as a web service for the portal. In such a way the portal send the request to this service (masked environment), which in its turn transform the request into an ACL message, understandable for the agents. In a similar manner the active environment transform ACL messages into SOAP responses, which can be process from the portal services.

The next assistants are developing in the first version of the AV node:


198 Intelligent Systems

The connection of the educational portal and the AV node is made through the middle layer of the portal architecture, where the electronic services are located. Depending on the direction of the asked assistance we distinguish reactive and proactive behavior of the architecture. In the reactive behavior the interaction between the two nodes is initiated by the portal. This is necessary in the cases when a user request is processed and a service needs an "expert'' assistance. The service addresses the corresponding agent, located in the AV. The problem is that, in their nature, the services are passive and static software modules, intended mainly for the convenient realization and integration of some business functionality. Therefore they must "transfer'' the responsibility for the activation and support of the connection to an active component of the architecture, as agents do. To do this, the service sends a concrete message to the agent's environment, which, on its behalf, identifies the change of the environment and reacts by interpreting the message. Depending on the identified need of assistance the agent activates the necessary actions. The reactive

 Synchronous model - this model is analogous to calling subroutines in programming languages. In this model the service sends a message to AV and waits for the result

Asynchronous model - in the asynchronous model the interaction is accomplished

In the proactive behavior (agents work "on behalf of the user''), an agent from the AV can determine that in its environment "something is happening'', that would be interesting for the user, who is assisted by that agent. The agent activates and it can perform certain actions

behavior of the architecture could be implemented using a:

Fig. 7. Cluster architecture

from the corresponding agent before continuing its execution.

through some kind of a mechanism for sending and receiving messages.

The Evaluator Assistant (EA) provides expert assistance to the teacher in assessment of the electronic tests. In the Exam Engine a service is built for automated assessment of "choice like'' questions. In the standard version of the architecture questions of the "free text'' type are assessed by the teacher and the ratings are entered manually in the service to prepare the final assessment of the test. In the cluster the Exam Engine calls the assistant (an intelligent agent), which makes an "external'' assessment of the "free text'' type questions. In the surrounding environment of the EA, the received SOAP Request messages are transformed into ACL messages, understandable for the agent. Some of the basic parameters of the messages are:


The EA plans the processing of the request. In the current version of the assistant two methods are available for estimation:

 **Word Matching (WM)** method - counts "exact hits'' of the keywords in the answer. The minimum threshold of percentage match (i.e. a keyword to be considered as "guessed''), which is laid in the experiments, is between 70% and 80%. Intentionally, the method does not look for 100% match, in order to give a chance to words with some minor typos also to be recognized. To calculate the points, offered by this method, a coefficient is formed in the following way: the number of hits is divided by the number of keywords. The actual number of points for the answer is calculated as the maximum number of points is multiplied by this coefficient;

Intelligent Distributed eLearning Architecture 201

method will be presented to the teacher as main result, and the results of the other methods will be presented as an alternative. Another feature of this agent will be also to provide actual statistics on the performance of each of the calculating methods, as the "weakest'' of them goes out of service until new and better performing methods are added to the Evaluator agent. This monitoring of the methods' behavior becomes really significant when the so-called genetic algorithms are added, which we are still working on - as it is known, they can be "trained'' and thus their effectiveness can change. In this process a knowledge base is developing for each specific subject, which supports the methods in their work.

Context-awareness requires applications to be able to adapt themselves to the environment in which they are being used such as user, location, nearby people and devices, and user's social situations. In this section we use small examples to illustrate the ability of CCA to

This section introduces the syntax of the language of CCA. Like in the π-calculus [Milner,1999], [Sangiorgi,2001], the simplest entities of the calculus are *names*. These areused to name for example ambients, locations, resources and sensors data. We assume a countably-infnite set of names, elements of which are written in lower-case letters, e.g. *n, x* and *y*. We let *ỹ* denote a list of names and |��| the arity of such a list. We sometimes use �� as a set of names where it is appropriate. We distinguish three main syntactic categories:

The syntax of processes and capabilities is given in Table 1 where *P*, *Q* and *R* stand for processes, and *M* for capabilities. The first five process primitives (inactivity, parallel composition, name restriction, ambient and replication) are inherited from MA [Cardelli,2000]. The process *0* does nothing and terminates immediately. The process *P* | *Q*  denotes the process *P* and the process *Q* running in parallel. The process (*υn*) *P* states that the scope of the name *n* is limited to the process *P*. The replication !*P* denotes a process which can always create a new copy of *P*. Replication was first introduced by Milner in the π-calculus [Milner,1999]. The process *n*[*P*] denotes an ambient named *n* whose behaviours are described by the process *P*. The pair of square brackets `[' and `]' outlines the boundary of that ambient. This is the textual representation of an ambient. The graphical

CCA departs from MA and other processes calculi such as [Zimmer,2005], [Bucur,2008], [Bugliesi,2004] with the notion of *context-guarded capabilities*, whereby a capability is guarded by a context-expression which specifes the condition that must be met by the environment of the executing process. A process prefxed with a context-guarded capability is called a

The graphical representation highlights the nested structure of ambients.

**5. Calculus of context aware systems - CCA** 

processes *P*, capabilities *M* and context expressions *κ*.

model applications that are contextaware.

**5.1 Syntax of processes and capabilities** 

representation of that ambient is:

 **Optimistic Percentage (OP)** method - makes an optimistic estimation of the points for the answer. Its essence is to iterate over the keywords list and summarize their percentage matches. Thus, the calculated amount of rates for each keyword, divided by the maximum possible match (in %), gives the reduction coefficient. The actual number of points for the answer is calculated by multiplying the maximum number of points by the coefficient of the reduction. This method is more "tolerant'' to allowing spelling mistakes in the answers, because low percentage matches are not ignored (unlike the first method) and are included in the formation of the final amount of points.

When the calculations finish, the EA generates an answer as an ACL message, which then is transformed by the environment into a SOAP Response message (a result from a web service call). In the answer there is a parameter, representing the calculated amount of points, extracted afterwards by the Exam Engine. A comparison of the scores, given by the two methods and by the teacher, are presented in Figure 8.

Fig. 8. Comparison of WM, OP and the teacher for 18 tests

The FraudDetector will try to recognize any attempts to cheat in the answer given by the student. Such attempts would be to guess the keywords or copy/paste results from Internet search engines. This assistant cooperates with the Evaluator agent and if its receptors detect a probability of a cheating attempt, it informs the Evaluator agent, which for its part informs the assessing teacher that this answer requires a special attention, because it is a suspicious one. The Statistician stores information about all processed answers with a full history of the details from all calculating methods used by the Evaluator agent. This assistant needs a feedback how many points are finally given by the teacher for each answer. Thus it accumulates a knowledge base for each teacher and is able to decide which of the methods best suits the assessment style of the current assessing teacher. Upon returning the results of the Evaluator assistant, information by this agent determines which results from each

 **Optimistic Percentage (OP)** method - makes an optimistic estimation of the points for the answer. Its essence is to iterate over the keywords list and summarize their percentage matches. Thus, the calculated amount of rates for each keyword, divided by the maximum possible match (in %), gives the reduction coefficient. The actual number of points for the answer is calculated by multiplying the maximum number of points by the coefficient of the reduction. This method is more "tolerant'' to allowing spelling mistakes in the answers, because low percentage matches are not ignored (unlike the

first method) and are included in the formation of the final amount of points.

two methods and by the teacher, are presented in Figure 8.

Fig. 8. Comparison of WM, OP and the teacher for 18 tests

When the calculations finish, the EA generates an answer as an ACL message, which then is transformed by the environment into a SOAP Response message (a result from a web service call). In the answer there is a parameter, representing the calculated amount of points, extracted afterwards by the Exam Engine. A comparison of the scores, given by the

The FraudDetector will try to recognize any attempts to cheat in the answer given by the student. Such attempts would be to guess the keywords or copy/paste results from Internet search engines. This assistant cooperates with the Evaluator agent and if its receptors detect a probability of a cheating attempt, it informs the Evaluator agent, which for its part informs the assessing teacher that this answer requires a special attention, because it is a suspicious one. The Statistician stores information about all processed answers with a full history of the details from all calculating methods used by the Evaluator agent. This assistant needs a feedback how many points are finally given by the teacher for each answer. Thus it accumulates a knowledge base for each teacher and is able to decide which of the methods best suits the assessment style of the current assessing teacher. Upon returning the results of the Evaluator assistant, information by this agent determines which results from each method will be presented to the teacher as main result, and the results of the other methods will be presented as an alternative. Another feature of this agent will be also to provide actual statistics on the performance of each of the calculating methods, as the "weakest'' of them goes out of service until new and better performing methods are added to the Evaluator agent. This monitoring of the methods' behavior becomes really significant when the so-called genetic algorithms are added, which we are still working on - as it is known, they can be "trained'' and thus their effectiveness can change. In this process a knowledge

### **5. Calculus of context aware systems - CCA**

Context-awareness requires applications to be able to adapt themselves to the environment in which they are being used such as user, location, nearby people and devices, and user's social situations. In this section we use small examples to illustrate the ability of CCA to model applications that are contextaware.

base is developing for each specific subject, which supports the methods in their work.

#### **5.1 Syntax of processes and capabilities**

This section introduces the syntax of the language of CCA. Like in the π-calculus [Milner,1999], [Sangiorgi,2001], the simplest entities of the calculus are *names*. These areused to name for example ambients, locations, resources and sensors data. We assume a countably-infnite set of names, elements of which are written in lower-case letters, e.g. *n, x* and *y*. We let *ỹ* denote a list of names and |��| the arity of such a list. We sometimes use �� as a set of names where it is appropriate. We distinguish three main syntactic categories: processes *P*, capabilities *M* and context expressions *κ*.

The syntax of processes and capabilities is given in Table 1 where *P*, *Q* and *R* stand for processes, and *M* for capabilities. The first five process primitives (inactivity, parallel composition, name restriction, ambient and replication) are inherited from MA [Cardelli,2000]. The process *0* does nothing and terminates immediately. The process *P* | *Q*  denotes the process *P* and the process *Q* running in parallel. The process (*υn*) *P* states that the scope of the name *n* is limited to the process *P*. The replication !*P* denotes a process which can always create a new copy of *P*. Replication was first introduced by Milner in the π-calculus [Milner,1999]. The process *n*[*P*] denotes an ambient named *n* whose behaviours are described by the process *P*. The pair of square brackets `[' and `]' outlines the boundary of that ambient. This is the textual representation of an ambient. The graphical representation of that ambient is:

$$\begin{array}{|c|}\hline n\\\hline\\ P\\\hline\\\hline\end{array}$$

The graphical representation highlights the nested structure of ambients.

CCA departs from MA and other processes calculi such as [Zimmer,2005], [Bucur,2008], [Bugliesi,2004] with the notion of *context-guarded capabilities*, whereby a capability is guarded by a context-expression which specifes the condition that must be met by the environment of the executing process. A process prefxed with a context-guarded capability is called a

Intelligent Distributed eLearning Architecture 203

� �〈�̃〉 behaves like the process linked to *x* at location �, in which each actual parameter in �̃ is substituted for each occurrence of the corresponding formal parameter. A process call can only take place if the corresponding process abstraction is available at the specified location. In CCA, an ambient provides context by (re)defining process abstractions to account for its specific functionality. Ambients can interact with each other by making process calls. Because ambients are mobile, the same process call, e.g. ↑ �〈�̃〉, may lead to different behaviours depending on the location of the calling ambient. So process abstraction is used as a mechanism for context provision while process call is a mechanism for context

Ambients exchange messages using the capability � 〈�̃〉 to send a list of names �̃ to a location �, and the capability � (��) to receive a list of names from a location �. Similarly to a process call, an ambient can send message to any parent, i.e. ↑ 〈�̃〉; a specific parent �, i.e. � ↑ 〈�̃〉; any child, i.e. ↓ 〈�̃〉; a specific child �, i.e. � ↓ 〈�̃〉; any sibling, i.e. ∷ 〈�̃〉; a specific sibling �, i.e.

An *input prefix* is a process of the form �(��)� � , where �� is a list of variable symbols and � is a continuation process. It receives a list of names �̃ from the location � and continues like the process �{����̃}, where �{����̃} is the process � in which each name in the list �̃ is

The mobility capabilities in and out are defined as in MA [Cardelli,2000] with the exception that the capability out has no explicit parameter in CCA, the implicit parameter being the current parent (if any) of the ambient performing the action. An ambient that performs the capability in � moves into the sibling ambient �. The capability out moves the ambient that performs it out of that ambient parent. Obviously, a root ambient, i.e. an ambient with no parents, cannot perform the capability out. The capability del � deletes an ambient of the form �[�] situated at the same level as that capability, i.e. the process del �� �|�[�] reduces to �. The capability del acts as a garbage collector that deletes ambients which have completed their computations. It is a constrained version of the capability open used in MA to unleash the content of an ambient. As mentioned in [Bugliesi,2004], the open capability brings about serious security concerns in distributed applications, e.g. it might open an ambient that contains a malicious code. Unlike the capability open, the capability del is secure because it only opens ambients that are empty, so no risk of opening a virus or a malicious ambient.

In CCA the notion of ambient, inherited from MA, is the basic structure used to model entities of a context-aware system such as: a user, a location, a computing device, a software agent or a sensor. As described in Table 1, an ambient has a name, a boundary, a collection of local processes and can contain other ambients. Meanwhile, an ambient can move from one location to another by performing the mobility capabilities in and out. So the structure of a CCA process, at any time, is a hierarchy of nested ambients. This hierarchical structure changes as the process executes. In such a structure, the context of a sub-process is obtained by replacing in the structure that sub-process by a placeholder ′⨀′. For example, suppose a system is modelled by the process �|�[�|�[�|�]]. So, the context of the process � in that system is �|�[�|�[⨀|�]], and that of ambient � is �|�[�|⨀]. Following are examples of

substituted for each occurrence of the corresponding variable symbol in the list ��.

acquisition.

� ∷ 〈�̃〉; or itself, i.e. 〈�̃〉.

**5.2 Context model** 

*context-guarded prefix* and it has the form ���. �. Such a process waits until the environment satisfies the context expression *κ*, then performs the capability *M* and continues like the process *P*. The process learns about its context (i.e. its environment) by evaluating the guard. The use of context-guarded capabilities is one of the two main mechanisms for context acquisition in CCA (the second mechanism for context acquisition is the call to a process abstraction as discussed below). The syntax and the semantics of context expressions are given below. We let *M.P* denote the process **True**?*M.P*, where **True** is a context expression satisfied by all context.


Table 1. Syntax of CCA processes and capabilities

A process abstraction � ⊳ (��). � denotes the linking of the name *x* to the process *P* where �� is a list of *formal parameters*. This linking is local to the ambient where the process abstraction is defined. So a name *x* can be linked to a process *P* in one ambient and to a diferent process *Q* in another ambient. A call to a process abstraction named *x* is done by a capability of the form ���〈��〉 where � specifies the location where the process abstraction is defined and �� is the list of *actual parameters*. There must be as many actual parameters as there are formal parameters to the process abstraction being called. The location � can be ′↑′ for any parent, ′� ↑ ′ for a specifc parent *n*, ′↓′ for any child, ′� ↓ ′ for a specific child *n*, ′∷′ for any sibling, ′� ∷� for a specific sibling *n*, or � (empty string) for the calling ambient itself. A process call

*context-guarded prefix* and it has the form ���. �. Such a process waits until the environment satisfies the context expression *κ*, then performs the capability *M* and continues like the process *P*. The process learns about its context (i.e. its environment) by evaluating the guard. The use of context-guarded capabilities is one of the two main mechanisms for context acquisition in CCA (the second mechanism for context acquisition is the call to a process abstraction as discussed below). The syntax and the semantics of context expressions are given below. We let *M.P* denote the process **True**?*M.P*, where **True** is a context expression

A process abstraction � ⊳ (��). � denotes the linking of the name *x* to the process *P* where �� is a list of *formal parameters*. This linking is local to the ambient where the process abstraction is defined. So a name *x* can be linked to a process *P* in one ambient and to a diferent process *Q* in another ambient. A call to a process abstraction named *x* is done by a capability of the form ���〈��〉 where � specifies the location where the process abstraction is defined and �� is the list of *actual parameters*. There must be as many actual parameters as there are formal parameters to the process abstraction being called. The location � can be ′↑′ for any parent, ′� ↑ ′ for a specifc parent *n*, ′↓′ for any child, ′� ↓ ′ for a specific child *n*, ′∷′ for any sibling,

for a specific sibling *n*, or � (empty string) for the calling ambient itself. A process call

satisfied by all context.

�� �� � ∷� **Process** 0 inactivity

n[P] ambient ! P repliaction

� ∷� **Locations** ↑ any parent n ↑ parent n ↓ any child n ↓ child n ∷ any sibling n ∷ sibling n ϵ locally

� ∷� **Capabilities** del n delete n in n move in n out move out α x〈z�〉 process call α (y�) input α 〈y�〉 output

Table 1. Syntax of CCA processes and capabilities

′� ∷�

P|Q parallel composition (υn) P name restriction

κ! M. P context-guarder action x ⊳ (y�). P process abstraction

� �〈�̃〉 behaves like the process linked to *x* at location �, in which each actual parameter in �̃ is substituted for each occurrence of the corresponding formal parameter. A process call can only take place if the corresponding process abstraction is available at the specified location.

In CCA, an ambient provides context by (re)defining process abstractions to account for its specific functionality. Ambients can interact with each other by making process calls. Because ambients are mobile, the same process call, e.g. ↑ �〈�̃〉, may lead to different behaviours depending on the location of the calling ambient. So process abstraction is used as a mechanism for context provision while process call is a mechanism for context acquisition.

Ambients exchange messages using the capability � 〈�̃〉 to send a list of names �̃ to a location �, and the capability � (��) to receive a list of names from a location �. Similarly to a process call, an ambient can send message to any parent, i.e. ↑ 〈�̃〉; a specific parent �, i.e. � ↑ 〈�̃〉; any child, i.e. ↓ 〈�̃〉; a specific child �, i.e. � ↓ 〈�̃〉; any sibling, i.e. ∷ 〈�̃〉; a specific sibling �, i.e. � ∷ 〈�̃〉; or itself, i.e. 〈�̃〉.

An *input prefix* is a process of the form �(��)� � , where �� is a list of variable symbols and � is a continuation process. It receives a list of names �̃ from the location � and continues like the process �{����̃}, where �{����̃} is the process � in which each name in the list �̃ is substituted for each occurrence of the corresponding variable symbol in the list ��.

The mobility capabilities in and out are defined as in MA [Cardelli,2000] with the exception that the capability out has no explicit parameter in CCA, the implicit parameter being the current parent (if any) of the ambient performing the action. An ambient that performs the capability in � moves into the sibling ambient �. The capability out moves the ambient that performs it out of that ambient parent. Obviously, a root ambient, i.e. an ambient with no parents, cannot perform the capability out. The capability del � deletes an ambient of the form �[�] situated at the same level as that capability, i.e. the process del �� �|�[�] reduces to �. The capability del acts as a garbage collector that deletes ambients which have completed their computations. It is a constrained version of the capability open used in MA to unleash the content of an ambient. As mentioned in [Bugliesi,2004], the open capability brings about serious security concerns in distributed applications, e.g. it might open an ambient that contains a malicious code. Unlike the capability open, the capability del is secure because it only opens ambients that are empty, so no risk of opening a virus or a malicious ambient.

#### **5.2 Context model**

In CCA the notion of ambient, inherited from MA, is the basic structure used to model entities of a context-aware system such as: a user, a location, a computing device, a software agent or a sensor. As described in Table 1, an ambient has a name, a boundary, a collection of local processes and can contain other ambients. Meanwhile, an ambient can move from one location to another by performing the mobility capabilities in and out. So the structure of a CCA process, at any time, is a hierarchy of nested ambients. This hierarchical structure changes as the process executes. In such a structure, the context of a sub-process is obtained by replacing in the structure that sub-process by a placeholder ′⨀′. For example, suppose a system is modelled by the process �|�[�|�[�|�]]. So, the context of the process � in that system is �|�[�|�[⨀|�]], and that of ambient � is �|�[�|⨀]. Following are examples of

$$e\_1 \succeq \operatorname{conf} [P \mid \operatorname{bob} [\odot] \mid \operatorname{alice} [Q]]\_\nu$$

$$e\_2 \triangleq \operatorname{alice}[Q] \mid \operatorname{conf} \left[ P \mid \operatorname{bob}[\mathbb{G}] \right].$$

$$e\_3 \cong \operatorname{conf} \left| P' \mid \operatorname{bob} \left[ \odot \right] \mid \operatorname{pada} [R] \right|\_2$$


$$E\_1(E\_2) = \begin{cases} E\_1 & \text{if } E\_1 \text{ is a ground context} \\ E\_1 \{ \odot \leftarrow E\_2 \} & \text{otherwise} \end{cases}$$

$$e\_3(0) \triangleq \operatorname{conf} \left[ P' \mid \operatorname{bob} \left[ pada[R] \right] \right]\_{\cdot}$$


Intelligent Distributed eLearning Architecture 207

On this computer, the command edit is configured to launch emacs. So in this context, the

Our agent *agt* might have even moved to a site where the command edit is not *available* because no process abstraction of that name is defined. In this case the agent *agt* will not be able to edit the file *foo* at this site and might consider moving to a nearby computer to

As we have mentioned earlier that eLearning is becoming an authentic possible alternative educational approach as the technologies regarding that area are developing so fast, and there is a recognisable growth of a great variety of wide-band telecommunication delivery technologies. The infostation paradigm first proposed by Frenkiel et al. [Frenkiel,1996] and used in [Ganchev,2007] to devise an infostation-based mlearning system which allows mobile devices such as cellular phones, laptops and personal digital assistants (PDAs) to communicate to each other and to a number of services within a university campus. This mLearning system provides a number of services among which are: mLecture, mTutorial, mTest and communication services (private chat, intelligent message notification and phone calls). This section presents the architecture of the infostation-based mLearning system and

This section introduces at glance each of the mServices provided by the infostation-based

 *AAA*: in order for any user to use any mService in the system, the user device should be registered. The AAA service (Authentication, Authorisation and Accounting) allows the users to register their devices with the system to gain the ability of using the mLearning

file *foo* will be opened in emacs as illustrated by the following reduction:

**6. InfoStation-based mLearning system** 

describes the policies of the mLecture service.

services offered by the system.

do so.

**6.1 mServices** 

mLearning system.

where �� is the list of formal parameters. A process call to this process abstraction has the following syntax:

�〈��〉

where �� is the list of the actual parameters. This process call behaves exactly like the process � where each actual parameter in �� is substituted for each occurrence of the corresponding formal parameter in ��. In the smart phone example presented above, switchto is a process abstraction.

Suppose a software agent ��� (here modelled as an ambient) is willing to edit a text file *foo*. This is done by calling a process abstraction named edit say, as follows:

where the symbol ↑ indicates that the edit process called here is the one that is defined in the parent ambient of the calling ambient *agt*. Now suppose agent *agt* has migrated to a computing device win running Microsoft Windows operating system:

On this machine, the process abstraction edit is defined to launch the text editor notepad as follows:

```
edit ⊳(y).notepad〈�〉. 0.
```
So the request of the agent *agt* to edit the fille *foo* on this machine will open that file in notepad according to the following reduction:

Note that the command notepad has replaced the command edit in the calling ambient *agt*.

Now assume the agent *agt* first moved to a computer *lin* running linux operating system:

where �� is the list of formal parameters. A process call to this process abstraction has the

�〈��〉 where �� is the list of the actual parameters. This process call behaves exactly like the process � where each actual parameter in �� is substituted for each occurrence of the corresponding formal parameter in ��. In the smart phone example presented above, switchto is a process

Suppose a software agent ��� (here modelled as an ambient) is willing to edit a text file *foo*.

where the symbol ↑ indicates that the edit process called here is the one that is defined in the parent ambient of the calling ambient *agt*. Now suppose agent *agt* has migrated to a

On this machine, the process abstraction edit is defined to launch the text editor notepad as

edit ⊳(y).notepad〈�〉. 0. So the request of the agent *agt* to edit the fille *foo* on this machine will open that file in

Note that the command notepad has replaced the command edit in the calling ambient

Now assume the agent *agt* first moved to a computer *lin* running linux operating system:

This is done by calling a process abstraction named edit say, as follows:

computing device win running Microsoft Windows operating system:

notepad according to the following reduction:

following syntax:

abstraction.

follows:

*agt*.

On this computer, the command edit is configured to launch emacs. So in this context, the file *foo* will be opened in emacs as illustrated by the following reduction:

Our agent *agt* might have even moved to a site where the command edit is not *available* because no process abstraction of that name is defined. In this case the agent *agt* will not be able to edit the file *foo* at this site and might consider moving to a nearby computer to do so.

#### **6. InfoStation-based mLearning system**

As we have mentioned earlier that eLearning is becoming an authentic possible alternative educational approach as the technologies regarding that area are developing so fast, and there is a recognisable growth of a great variety of wide-band telecommunication delivery technologies. The infostation paradigm first proposed by Frenkiel et al. [Frenkiel,1996] and used in [Ganchev,2007] to devise an infostation-based mlearning system which allows mobile devices such as cellular phones, laptops and personal digital assistants (PDAs) to communicate to each other and to a number of services within a university campus. This mLearning system provides a number of services among which are: mLecture, mTutorial, mTest and communication services (private chat, intelligent message notification and phone calls). This section presents the architecture of the infostation-based mLearning system and describes the policies of the mLecture service.

#### **6.1 mServices**

This section introduces at glance each of the mServices provided by the infostation-based mLearning system.

 *AAA*: in order for any user to use any mService in the system, the user device should be registered. The AAA service (Authentication, Authorisation and Accounting) allows the users to register their devices with the system to gain the ability of using the mLearning services offered by the system.

Intelligent Distributed eLearning Architecture 209

During the execution of the service, the user is free to move into a different infostation,

A user cannot use the mLecture and mTest services simultaneously. The mTest service

This section presents the formalisation of the policies of the infostation-based context-aware mLearning system using . We first introduce some naming conventions (sect. 6.2.1) which are used in the specification of the system. Then we give the specification of two mServices

The following naming conventions are used to differentiate between variables' names and constants. A variable name begins with a lowercase letter while a constant begins with a number or a uppercase letter. The list of the constant names that are used in the formalisation process is given in Table 4. And the list of variable names is given in

The system consists mainly of one central ISC, multiple ISs and multiple user devices. Each component of the system is modelled as an ambient. That is, the ISC, each IS and each user device is modelled as an ambient. In particular, a device, *PC* say, being used by a user, *303* say, is modelled by an ambient named *PC303*. The ISC ambient runs in parallel with the IS ambients, and all the user devices within the range of an IS are child ambients of that IS

to switch between devices or to do both.

should operate unaccompanied at all occasions.

which are AAA and the *mLecture* services (sect. 6.3).

**6.3 A Model of the InfoStation-based mLearning system** 

**6.2.1 Notations** 

Table 4. Constants

ambient.

Table 5.


#### **6.2 Policies**

The InforStationCentre (ISC) provides the User Authentication, Authorisation and Accounting (AAA) service which identifies each mobile user and provides him with a list of services the user is authorised to access. This service is regulated by the following policies:


If the user chooses the mLecture service, then the following policies of the mLecture service apply:

 The PA forwards the mLecture service request to the InfoStation, which instantiates the service. If the IS is unable to satisfy fully the user service request it is forwarded to the ISC, which is better equipped to deal with it. In either case, the lecture is adapted and customised to suit the capabilities of the user devices and the user own preferences, and then delivered to their mobile devices.


This section presents the formalisation of the policies of the infostation-based context-aware mLearning system using . We first introduce some naming conventions (sect. 6.2.1) which are used in the specification of the system. Then we give the specification of two mServices which are AAA and the *mLecture* services (sect. 6.3).

#### **6.2.1 Notations**

208 Intelligent Systems

 *mLecture*: this service allows the users to gain access to the lecture material through their mobile devices. The users can request a specific lecture, which is adapted according to

 *mTutorial* : this service allows the users to gain access to a self-assessment test. It is a combination between the mLecture and the mTest services. A user can request a selfassessment test in a similar way as requesting a mLecture. After the user submits their answers, he receives a feedback on his performance and the correct answers to the

 *Intelligent Message Notification (IMN in short)*: this service allows the users to communicate with each other by exchanging messages via their mobile devices. *VoIP*: this service allows the users to communicate with each other via phone calls

The InforStationCentre (ISC) provides the User Authentication, Authorisation and Accounting (AAA) service which identifies each mobile user and provides him with a list of services the user is authorised to access. This service is regulated by the following

 When a user is within the range of an IS, the intelligent agent (PA) of the user's device and the IS mutually discover each other. The PA sends a request to the IS for user Authorisation, Authentication and Accounting (AAA). This request also includes a description of the mobile device currently being used and any updates of user pro\_le

 The IS forwards this AAA request to the ISC along with the profile updates. If the user is successfully authenticated and authorised to utilise the services by the AAA module within the ISC, a new account record is created for the user and a positive acknowledgement is sent back to the IS. Then the IS compiles a list of applicable services and sends this to the PA along with the acknowledgement. The PA displays the information regarding these services to the user who then makes a request for the

If the user chooses the mLecture service, then the following policies of the mLecture service

 The PA forwards the mLecture service request to the InfoStation, which instantiates the service. If the IS is unable to satisfy fully the user service request it is forwarded to the ISC, which is better equipped to deal with it. In either case, the lecture is adapted and customised to suit the capabilities of the user devices and the user own preferences, and

the capabilities of the user devices and then delivered to their mobile devices. *mTest*: this service is crucial to the learning process. The mTest service allows the users to gain access to test materials that provide means of an evaluating process. A user can request, like the mLecture service, a specific test, which is also adapted to the capabilities of the user device then delivered to the user mobile device. The mTest service may only runs individually on a user device and unaccompanied with any other

service whatsoever.

questions he got wrong.

and user service profile.

service he wishes to use.

then delivered to their mobile devices.

**6.2 Policies** 

policies:

apply:

throughout the infostation-based mLearning system.

The following naming conventions are used to differentiate between variables' names and constants. A variable name begins with a lowercase letter while a constant begins with a number or a uppercase letter. The list of the constant names that are used in the formalisation process is given in Table 4. And the list of variable names is given in Table 5.


#### Table 4. Constants

#### **6.3 A Model of the InfoStation-based mLearning system**

The system consists mainly of one central ISC, multiple ISs and multiple user devices. Each component of the system is modelled as an ambient. That is, the ISC, each IS and each user device is modelled as an ambient. In particular, a device, *PC* say, being used by a user, *303* say, is modelled by an ambient named *PC303*. The ISC ambient runs in parallel with the IS ambients, and all the user devices within the range of an IS are child ambients of that IS ambient.


$$\begin{array}{c|c|c|c} ISC[P\_{ISC}] & \text{I} & \text{IS1} \left[ \text{PDA} \, 303 \left[ \text{PDA} \, 303 \right] \right] & \text{PC4} \, 01 \left[ \text{P}\_{PC401} \right] \\ & \text{|} & P\_{IS1} \, \\ & \text{|} & IS2 \left[ \text{PDA} \, 301 \left[ \text{P}\_{P \text{DA} \, 301} \right] \right] \\ & \text{|} & \text{Phom} \, 02 \left[ \text{P}\_{P \text{Mon} \, 409} \right] & \text{|} & P\_{IS8} \, \\ & \text{|} & IS3 \left[ \text{Phom} \, 300 \left[ \text{P}\_{P \text{Mon} \, 409} \right] \right] & \text{|} & P \text{S3} \, \\ & \text{|} & IS4 \left[ \text{Phom} \, 03 \left[ \text{P}\_{P \text{Mon} \, 409} \right] \right] & \text{|} & P \text{S3} \, 02 \left[ \text{P}\_{P \text{S3} \, 302} \right] \\ & \text{|} & P \text{S4} \end{array} \tag{1}$$

$$\begin{pmatrix} \, ! \, !AA.req\_i \downarrow \langle uid, type, aname \rangle.ISC :: \, \langle AAAreq, uid, \\ \, dtype, aname, IS\_i \rangle. \mathbf{0} \mid \\ \, !ISC :: \, (ack, aname, slist). \mathbf{(has}(aname)) \text{?} \\ \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \, \,$$

$$\mathbb{N} :: \langle \text{lectid}, \text{uid}, \text{dtype}, \text{aname} \rangle. \text{IS}\_l \uparrow \langle \text{lectid}, \text{uid}, \text{dtype}, \text{aname} \rangle. \text{0} \tag{3}$$

$$1 :: \langle \text{letch}, \text{relply}, \text{aname} \rangle. \text{aname} :: \langle \text{lredit}, \text{relply} \rangle. \text{0} \tag{4}$$

$$P\_{L\_l} \cong Eq. \text{ (3)} \mid Eq. \text{(4)}\tag{5}$$

$$\begin{array}{l} P\_c \stackrel{\circ}{=} !\uparrow (left d, uid, dtype, aname). \\ \left( \begin{array}{l} \mathsf{has}(lecid) ? levelid \downarrow \langle dtype, aname \rangle. \\ \mathsf{let} \, !\downarrow (right, aname). \\ \uparrow \langle lecid, uid, dtype, replay, aname \rangle. \mathsf{0} \\ \uparrow \langle lecid, uid, dtype, \\ NULL, aname). \uparrow \langle lecid, content, dtype \rangle. \\ \mathsf{let} \, !\downarrow (content, dtype). \mathsf{let} \, !\downarrow (ack). \end{array} \right) \end{array} \tag{6}$$

$$\begin{array}{l} P\_{\text{recid}} \stackrel{\scriptstyle \mathfrak{h}}{=} ! \uparrow (d type, aname). \\ \left( \begin{array}{l} \mathsf{has}(dtype) ? dtype \downarrow \langle \rangle. \, dtype \downarrow (repyl). \\ \uparrow (repyl, aname). \, \mathsf{0} \mid \\ \neg \mathsf{has}(dtype) ? \uparrow \langle NULL, aname \rangle. \\ \uparrow (content, dtype). dtype \downarrow \langle content \rangle. dtype \downarrow \langle). \\ \uparrow \langle ACK \rangle. \, \mathsf{0} \end{array} \right) \end{array} \tag{7}$$

$$\text{A } \text{Lectre} \\ \text{q}\_l \downarrow \text{(lectid, uid, dtype, aname)}. \text{Cache}\_l \downarrow \text{(lectid, uid, dtype, aname)}. \text{0} \tag{8}$$

$$\begin{pmatrix}Cache\_i \downarrow (leveli, uid, dtype, crepyl, aname) \\ \hline \neg (crepyl = NUM) ? \, ISC :: \langle lectid, uid, dtype, \\ \neg aname, 1 \rangle . \mathbf{C} \mid \\ \langle crepyly = NUM \rangle ? \, ISC :: \langle lectid, uid, dtype, \\ \neg aname, 0 \rangle . \mathbf{N} \end{pmatrix} \tag{9}$$

$$\begin{pmatrix} \mathbf{C} \ \hat{=} \ ISC :: (leetid, reply, aname). \\ \begin{pmatrix} (reply = OK \land \mathbf{has}(aname)) \text{?} \\ \mathbf{Letteq}\_{\ell} \downarrow \langle levelid, reply, aname \rangle. \mathbf{0} \ | \\ (reply = OK \land \neg \mathbf{has}(aname)) \text{?} \\ \mathbf{ISC} :: \langle levelid, uid, type, aname, 0 \rangle. \mathbf{0} \ | \\ (repply = DECIIED \land \mathbf{has}(aname)) \text{?} \\ \mathbf{Letteq}\_{\ell} \downarrow \langle levelid, reply, aname \rangle. \mathbf{0} \end{pmatrix} \end{pmatrix}$$

$$\begin{array}{l} \mathbf{N} \stackrel{\circ}{=} ISC :: (lectid, aname, replay). \\ \begin{pmatrix} \langle \neg (relp \, \mathbf{y} = DENTED \rangle \wedge \mathsf{has}(aname) \rangle \end{pmatrix} ? \\ \begin{pmatrix} Cache\_{\mathsf{t}} \downarrow \langle \langle levelid, reply, dtype \rangle. Lectreqq \downarrow \langle \langle levelid, \cdot \rangle \rangle \\ replay, aname \rangle \cdot \mathit{Cache\_{\mathsf{t}}} \downarrow \langle \mathsf{acks}, \downarrow \langle \mathsf{a}ck \rangle. \mathbf{0} \mid \\ \neg (relp \, \mathbf{y} = DENTED) \wedge \neg \mathsf{has}(aname) \rangle \vert ? \\ \mathbf{Cache\_{\mathsf{t}}} \downarrow \langle \langle levelid, reply, dtype \rangle. ISC :: \langle \langle levelid, \cdot \rangle \rangle. \\ \mathbf{t} \rangle\_{p} = \langle replay, aname, 0 \rangle. \mathbf{Cache\_{\mathsf{t}}} \downarrow \langle \langle ak \rangle. \mathbf{0} \mid \\ \mathbf{(reply = DENIED \wedge \mathbf{has}(aname)) \rangle ? \\ \mathbf{Lectreq}\_{\mathbf{t}} \downarrow \langle \langle levelid, reply, aname \rangle. \mathbf{0} \end{pmatrix} . \end{array}$$

$$P\_{I\Sigma} \cong Eq.\text{.}\text{(2)}\text{ }\vert Eq.\text{.(8)}\vert\,Eq.\text{.(9)}\tag{10}$$

$$!I\!\!\!\!\!\!\!\!\!\!\!\!\!\!\/\begin \text{x}\text{)}.\text{x}\downarrow\text{()}.\text{x}\downarrow\text{(y)}.\text{I}\!\!\!\!\!\!\!\!\!\!\/\text \text{(y)}\!\!\!\!\!\!\/\text\text{(}\!\!\!\!\!\/)\text{0}\tag{11}$$

$$\land I \mathcal{S} \mathcal{T} \land \langle \mathfrak{x}, \mathfrak{n} \rangle. \mathcal{X} \downarrow \langle \mathfrak{n} \rangle. \mathcal{X} \downarrow \langle \mathfrak{.} \rangle. I \mathcal{S} \mathcal{C} \dagger \langle \mathfrak{x}, A \mathcal{C} \mathcal{K} \rangle. \mathbf{0} \tag{12}$$

$$P\_{uid} \cong Eq. (11) \mid Eq. (12).$$

$$\text{1IS}\_l :: (aan\,\text{eq}, \text{uid}, \text{dtype}, \text{aname}). \text{uid} \downarrow \text{\{lS}\_l\}. \text{uid} \downarrow \text{\{ack\}}. \tag{13}$$

$$\mathbb{TS}\_l :: \langle ack, aname, \mathbb{S}\_l \text{LSTM} \rangle. \mathbf{0}$$

$$\text{1IS}\_l :: \{ \text{left}, \text{uid}, \text{dtype}, \text{aname}, \text{flag} \}. \text{uid} \downarrow \text{(Utest)}. \text{uid} \downarrow \text{(y)}. \{ P\_{\text{ISC}\_l} \mid P\_{\text{ISC}\_l} \} \tag{14}$$

 $P\_{I \to \mathbb{C}\_1} \cong \{ \mathbf{y} = 1 \}$  $? \,\mathrm{u} \,\mathrm{d} \downarrow \langle \mathrm{Loc} \rangle. \,\mathrm{u} \,\mathrm{d} \downarrow \langle \mathrm{z} \rangle. \,\mathrm{z} \,\cdots \,\mathrm{\{e} \,\mathrm{c} \,\mathrm{tid}, \,\mathrm{DENIED}, \,\mathrm{aname} \}. \,\mathrm{0}$ 

$$P\_{ISC2} \cong \begin{pmatrix} (y = 0 \land flag = 1) ?uid \downarrow \langle Loc \rangle.uid \downarrow (z). \\\ z :: \langle location, OK, aname \rangle. \mathbf{0} \mid \\\ (y = 0 \land flag = 0) ?Lectures \downarrow \langle location, uid, \\\ dtype, aname \rangle.Lectures \downarrow \langle levelid, uid, \\\ dtype, reply, aname \rangle.uid \downarrow \langle Loc \rangle.uid \downarrow (z). \\\ z :: \langle location, reply, aname \rangle. \mathbf{0} \end{pmatrix}$$

$$P\_{ISC} \cong Eq. \text{(13)} \mid Eq. \text{(14)}$$


Intelligent Distributed eLearning Architecture 217

[Barker,2000] P. Barker, Designing Teaching Webs: Advantages, Problems and Pitfalls, in

[Bellifemine,2007] F. Bellifemine, G. Caire, D. Greenwood, Developing Multi-Agent Systems

[Bucur,2008] D. Bucur, M. Nielsen, Secure Data Flow in a Calculus for Context Awareness,

[Bugliesi,2004] M. Bugliesi, G. Castagna, S. Crafa, Access Control for Mobile Agents: The

[Cardelli,2000] L. Cardelli, A. Gordon, Mobile Ambients, Theoretical Computer Science 240,

[Dey,2000] Dey, A.K., Abowd, G.D.Towards a better understanding of context and context-

[FIPA,2002] FIPA, ACL Message Structure Specification, Foundation for Intelligent Physical

[Frenkiel,1996] R. Frenkiel and T. Imielinski, Infostations: The joy of `many-time, many-

[Ganchev, 2005] Ganchev, I., S. Stojanov, M. O'Droma. Mobile Distributed e-Learning

[Ganchev,2007] I. Ganchev, et al., An InfoStation-Based Multi-Agent System Supporting

[Ganchev,2008a] I. Ganchev, et al., On InfoStation-Based Mobile Services Support for

[Ganchev,2008b] I. Ganchev, et al., InfoStation-Based Adaptable Provision of m-Learning

[Ganchev,2008c] I. Ganchev, et al., InfoStation-based mLearning System Architectures: Some

[Maurer,2001] H. Maurer and M. Sapper, E-Learning Has to be Seen as Part of General

[Milner,1999] R. Milner. Communication and Mobile Systems: The π-Calculus. Cambridge

Technologies, (ICALT'08), Santander, Spain, 2008, pp. 504-505.

How of Context-Awareness, New York, ACM Press, 2000.

where' communications, WINLAB Technical Report,1996.

10.1109/ICALT.2005.199. ISBN 0-7695-2338-2. 5-8 July 2005.

Knowledge (IJ ITK), vol. 2, pp. 475-482, 2008.

Chalottesville, VA, 2001, pp. 1249-1253.

Agents, Geneva, Switzerland SC00061G, 3rd December 2002.

in Education, Charlottesville, VA, 2000, pp. 54-59.

with JADE, John Wiley & Sons Ltd., 2007.

Science, Springer, 2008, pp. 439-456.

Systems, 26 (1), 2004, 57-124.

2, pp. 21-33, May 2007.

University Press, 1999.


2000, 177-213.

Proc. of ED-MEDIA 2001 World Conference on Educational Multimedia, Hypermedia Telecommunication, Association for the Advancement of Computing

in: Concurrency, Graphs and Models, Vol. 5065 of Lecture Notes in Computer

Calculus of Boxed Ambients, ACM Trans. on Programming Languages and

awareness. Proceedings of the Workshop on the What, Who, Where, When and

Center. In Proc. of the 5th IEEE International Conference on Advanced Learning Technologies (IEEE ICALT'05), pp. 593-594, Kaohsiung, Taiwan. DOI

Intelligent Mobile Services Across a University Campus, Journal of Computers, vol.

Library Information Systems, in 8th IEEE International Conference on Advanced Learning Technologies (IEEE ICALT-08), Santander, Cantabria, Spain, 2008, pp. 679

Services: Main Scenarios, International Journal Information Technologies and

Development Aspects, in 8th IEEE International Conference on Advanced Learning

Knowledge Management, in Proc. of ED-MEDIA 2001 World Conference on Educational Multimedia, Hypermedia Telecommunications, Tampere, AACE,

**9. References** 

$$\rightarrow \text{ \{Rule (Red Com R6) in Table 10; the } Lectreq\_i \text{ ambient } \rightarrow \text{ forwards the request the request to the information\}}$$

At this stage, the infostation ܫܵ has received a lecture request from the ܮ݁ܿݎݐ݁ݍ ambient and is willing to check with the ܥ݄ܽܿ݁ ambient whether it has the requested lecture for the specified type of device. The behaviour of the ܥ݄ܽܿ݁ ambient is specified by the process ܲ in Eq. (6). If the requested lecture *Lect001* exists in the cache for the specified type of device, then the ܥ݄ܽܿ݁ ambient gets a copy of the lecture for the specified type of device by interacting with the child ambient named *Lect001* whose behaviour is specified by the process ܲ௧ଵ as in Eq. (7). It can also be seen from Eq. (6) and Eq. (7) that if the requested lecture *Lect001* does not exist in the cache for the specified type of device, then a reply message 'NULL' is forwarded to the infostation ܫܵ. So in either situation, the infostation ܫܵ receives a reply from the cache.

Once a reply is received from the ܥ݄ܽܿ݁ ambient, the infostation ܫܵ contacts the infostation centre *ISC* as specified by the process in Eq. (9). How the *ISC* reacts is modelled by Eq. (14); it replies with a 'DENIED' message if the user requesting the lecture is currently using a mTest service, otherwise it replies with a 'OK' message and possibly a copy of the requested lecture if it is not available locally in ܫܵ's cache. How each of these types of reply is handled by the ܫܵ is modelled by the component *C* and *N* in Eq. (9). One can see that for every case where the user is still in the range of the infostation ܫܵ) i.e. the context expression 'has(*aname*)' holds), the infostation ܫܵ sends a reply to the ܮ݁ܿݎݐ݁ݍ ambient which subsequently forwards the reply to the user device as specified in Eq. (4). This completes the proof of Case 1.

The proof of Case 2 can be done in a similar manner as in Case 1, with the user behaviours specified as in Eq. (16), where ്݆݅, i.e. the request is sent from one infostation and the reply to that request is received after the user has moved to a different infostation.

$$PC303 \left\lceil \begin{array}{l} \text{Lectreq}\_{i} :: \langle Liter001, 303, PC, PC303 \rangle. \textsf{outt.}\\ \text{in } IS\_{j}.AAAreq\_{i} :: \langle 303, PC, PC303 \rangle. \textsf{0.} \rangle\\ \text{at}(IS\_{j}) ? AAArea\_{i} :: \langle ack, slist \rangle. \\ \text{Lectreq}\_{i} :: (decid, replay). \textsf{0.} \end{array} \right\rceil \tag{7}$$

This ambient sends a lecture request from the infostation ܫܵ, moves to a different infostation ܫܵ , registers with this infostation by sending an AAA request then waits for the acknowledgement of its registration. Once its registration has been confirmed, it then prompts to receive the reply to the lecture request and then terminates.

#### **8. Acknowledgment**

The authors wish to acknowledge the support of the National Science Fund (Research Project Ref. No. DO02-149/2008) and the Science Fund of the University of Plovdiv "Paisij Hilendarski" (Research Project Ref. No. NI11-FMI-004).

#### **9. References**

216 Intelligent Systems

At this stage, the infostation ܫܵ has received a lecture request from the ܮ݁ܿݎݐ݁ݍ ambient and is willing to check with the ܥ݄ܽܿ݁ ambient whether it has the requested lecture for the specified

the requested lecture *Lect001* exists in the cache for the specified type of device, then the ܥ݄ܽܿ݁ ambient gets a copy of the lecture for the specified type of device by interacting with the child ambient named *Lect001* whose behaviour is specified by the process ܲ௧ଵ as in Eq. (7). It can also be seen from Eq. (6) and Eq. (7) that if the requested lecture *Lect001* does not exist in the cache for the specified type of device, then a reply message 'NULL' is forwarded to the infostation ܫܵ. So in either situation, the infostation ܫܵ receives a reply from the cache.

Once a reply is received from the ܥ݄ܽܿ݁ ambient, the infostation ܫܵ contacts the infostation centre *ISC* as specified by the process in Eq. (9). How the *ISC* reacts is modelled by Eq. (14); it replies with a 'DENIED' message if the user requesting the lecture is currently using a mTest service, otherwise it replies with a 'OK' message and possibly a copy of the requested lecture if it is not available locally in ܫܵ's cache. How each of these types of reply is handled by the ܫܵ is modelled by the component *C* and *N* in Eq. (9). One can see that for every case where the user is still in the range of the infostation ܫܵ) i.e. the context expression 'has(*aname*)' holds), the infostation ܫܵ sends a reply to the ܮ݁ܿݎݐ݁ݍ ambient which subsequently forwards the reply to

The proof of Case 2 can be done in a similar manner as in Case 1, with the user behaviours specified as in Eq. (16), where ്݆݅, i.e. the request is sent from one infostation and the reply

This ambient sends a lecture request from the infostation ܫܵ, moves to a different infostation ܫܵ , registers with this infostation by sending an AAA request then waits for the acknowledgement of its registration. Once its registration has been confirmed, it then

The authors wish to acknowledge the support of the National Science Fund (Research Project Ref. No. DO02-149/2008) and the Science Fund of the University of Plovdiv "Paisij

in Eq. (6). If

(7)

type of device. The behaviour of the ܥ݄ܽܿ݁ ambient is specified by the process ܲ

the user device as specified in Eq. (4). This completes the proof of Case 1.

prompts to receive the reply to the lecture request and then terminates.

Hilendarski" (Research Project Ref. No. NI11-FMI-004).

**8. Acknowledgment** 

to that request is received after the user has moved to a different infostation.


**1. Introduction**

results of the type.

where

if *ω* ∈ *A*,

[73]). It is a function

(multi-valued function), *ϕ<sup>A</sup>* : Ω → [0, 1].

successfull for many centuries.

One of the most important results of mathematics in the 20th century is the Kolmogorov model of probability and statistics. It gave many impulses for research and develop so in theoretical

**Analysis of Fuzzy Logic Models** 

**10**

Beloslav RieĀan

*Slovakia* 

*M. Bel University, Banská Bystrica, Matematický Ústav SAV, Bratislava* 

It is reasonable to ask why the Kolmogorov approach played so important role in the probability theory and in mathematical statistics. In disciplines which have been very

Of course, Kolmogorov stated probability and statistics on a new and very effective foundation - set theory. For the first time in the history basic notions of probability theory have been defined precisely but simply. So a random event has been defined as a subset of a space, a random variable as a measurable function and its mean value as an integral. More precisely, abstract Lebesgue integral. It is hopeful to wait some new stimuls from the fuzzy generalization of the classical set theory. The aim of the chapter is a presentation of some

*χ<sup>A</sup>* : Ω → {0, 1}

*χA*(*ω*) = 1,

*χA*(*ω*) = 0, if *ω* ∈/ *A*. From the mathematical point of view a fuzzy set is a natural generalization of *χA*(see

*ϕ<sup>A</sup>* : Ω → [0, 1]. Evidently any set (i.e. two-valued function on Ω, *χ<sup>A</sup>* → {0, 1}) is a special case of a fuzzy set

Any subset *A* of a given space Ω can be identified with its characteristic function

area as well as in applications in a large scale of subjects.

**2. Fuzzy systems and their algebraizations**

