**2.2 Linking by interaction**

This kind of linking between knowledge items is established by analysing how these are related to each other. In our approach we have considered the way in which the documents and the topics within the knowledge tree are organised, and how users relate to the documents they provide to the system. The analysis follows a process that goes through three stages.

Firstly, the AM establishes the knowledge items included in the tree that need to be treated. The first time that a process is carried out in a node, all the tree knowledge items must be processed, but in consecutive processes only the items that have changed need to be treated or the ones linked to these. In our approach, for instance, changes in documents affect the topics of the branch of the knowledge tree where they are located, the node and the items related to ones and others in some way, but do not affect all the elements in the repository. In this selective process of the items it seems to be essential in systems with large knowledge bases or with an intense activity.

Secondly, the AM identifies the users responsible for the valid knowledge items in the knowledge tree. In this approach only the links between users and the valid documents that they have provided to the system are considered. The text associated with each user, according to the documents they provide, is the link to all the descriptive texts. Other kind of links similar to these could be treated in a similar way.

Lastly, the AM recovers the textual components that constitute the texts associated with the knowledge items through the Web. In our approach, we have started from text documents that are linked to them in a consubstantial way, in order to establish the texts associated with other items according to the relationship (above-mentioned) taken into account among them. We have used GNU Wget program (GNU Wget, 2011) in our approach to recover files that contain textual information corresponding to the different knowledge items and to integrate them –where there are more than one- to form the descriptive texts associated. These texts are usually the link to several files; for instance the text associated with a topic shall be made up of texts associated with each document and subtopic included.

#### **2.3 Knowledge enrichment and its exploitation**

As mentioned before, in our approach the knowledge developed, as a result of the analysis process, is incorporated into the system in an explicit way; either as descriptors that describe the pre-existing knowledge items or in the form of new knowledge items.

The level of similarity between two vectors is a coefficient between zero and one. The closer the value is to the unit, the more similar vectors are, and the closer to zero, the less similar they shall be. The relation of similarity established between knowledge items is described with this coefficient. In our approach the knowledge items that exceed a specific threshold of similarity coefficient of the relation between both are considered to be related. Unfortunately, this threshold may not be established neither in a fixed way nor in a general way for all cases, given that depending on circumstances such as theme nodes or the nature

The similarity between two set of DWW is summarized in a CDF file. The Fig. 6 shows a CDF file example, where rows fi and columns ci represent DWW files (respectively of documents and topics in this case) and he numbers the similarity sim(fi,ci) between them. In Fig.6, for instance, sim(f6,c5) (similarity between the document D113 and the topic T107) is 0.85, that is much higher than sim(f6,c10) (similarity between the document D113 and the

This kind of linking between knowledge items is established by analysing how these are related to each other. In our approach we have considered the way in which the documents and the topics within the knowledge tree are organised, and how users relate to the documents they provide to the system. The analysis follows a process that goes through

Firstly, the AM establishes the knowledge items included in the tree that need to be treated. The first time that a process is carried out in a node, all the tree knowledge items must be processed, but in consecutive processes only the items that have changed need to be treated or the ones linked to these. In our approach, for instance, changes in documents affect the topics of the branch of the knowledge tree where they are located, the node and the items related to ones and others in some way, but do not affect all the elements in the repository. In this selective process of the items it seems to be essential in systems with large knowledge

Secondly, the AM identifies the users responsible for the valid knowledge items in the knowledge tree. In this approach only the links between users and the valid documents that they have provided to the system are considered. The text associated with each user, according to the documents they provide, is the link to all the descriptive texts. Other kind

Lastly, the AM recovers the textual components that constitute the texts associated with the knowledge items through the Web. In our approach, we have started from text documents that are linked to them in a consubstantial way, in order to establish the texts associated with other items according to the relationship (above-mentioned) taken into account among them. We have used GNU Wget program (GNU Wget, 2011) in our approach to recover files that contain textual information corresponding to the different knowledge items and to integrate them –where there are more than one- to form the descriptive texts associated. These texts are usually the link to several files; for instance the text associated with a topic

As mentioned before, in our approach the knowledge developed, as a result of the analysis process, is incorporated into the system in an explicit way; either as descriptors that describe

shall be made up of texts associated with each document and subtopic included.

the pre-existing knowledge items or in the form of new knowledge items.

of the documents taken into account, the election of its value may vary greatly.

topic T53), that is 0.03.

three stages.

**2.2 Linking by interaction** 

bases or with an intense activity.

of links similar to these could be treated in a similar way.

**2.3 Knowledge enrichment and its exploitation** 

The descriptors added to the knowledge items provide new data to show hidden aspects of the elements they describe. For instance, the interest a particular item arouses may be suitable to make it stand out among the other items or to put all in order. In addition, the most characteristic terms that an item includes may result in an interesting reference to search information related to it in other information repositories.

In our approach, the knowledge specified by the analysis process is incorporated into the system in the form of a new knowledge element category that represents the relation among items of all kinds previously considered (documents, topics, users and nodes). The links incorporated this way in the repository provide the base to offer users new multidimensional views of the knowledge and new services to facilitate its exploitation. In particular, to demonstrate this proposal we have implemented an interactive view of the graph of a relation among knowledge items in the system (see Fig. 7 left), as well as a context sensitive recommendation service that provides reference items -of different kindsrelated to the item the user is working with in each moment (see Fig. 7 right).

Fig. 7. Interactive view of the knowledge as graph related items (left window) and a context sensitive recommendation (inferior center window and right window).

The view in the form of a graph integrates the static relations established in the system with other dynamics that progress through time. Among the first we can find hierarchical links that join the topics of the knowledge tree or authorship links that connect users to the documents they provide to the knowledge base. Among the dynamic relations the derivation of the item's character present in the repository in each moment and the ones due to the interactions established among them as a result of the system activity may be mentioned. An example of this kind of view can be seen in the illustration above (see Fig. 7 left), where the topics are represented by orange circles, the documents by clear squares, the static relations by black lines and the dynamic relations between the items taken into

Digestion of Knowledge in a KM System to Reveal Implicit Knowledge 111

procedures. For this, three groups of experiments described in the following paragraphs

In order to prove automatic grouping and classification of knowledge items three

The first one starts from node KC on OOSS, where the two most successful documents for each item have been used to establish the weight of words vectors (WWV) for each one of them. Next, the levels of similarity between the WWV for the remaining documents and the WWV for the topics previously mentioned have been established. With these data a graph (see Fig. 8) has been obtained, where each row of points corresponds to one topic and each column to a document. The levels of similarity between documents and topics have been represented by colour points, the higher the value of coefficient, the lighter the colours. The documents have been organised into topics which students had initially classified manually.

**3.1 Automatic grouping and classification experiments of knowledge items** 

Fig. 8. Automatic classification of documents (rows) by topics (columns) in node KC

As a result of the first experiment, in the graph we can see how the highest similarity values -lighter points- are aligned mainly in the rows that correspond to the topics in which they were classified manually. This indicates that automatic classification matches with manual classification in most cases. By analysing the anomalies a posteriori it can be seen that they correspond to ambiguous topics in which heterogeneous documents had

The second experiment compares the WWV of the documents of the original node OOSS between them. In the graph (see Fig. 9 left) each row of points -vertical and horizontalcorrespond to a document and these appear organised into the topics in which they were classified manually. Like in the previous case, the levels of similarity among elements have been represented by colour points, the higher the value of coefficient, the lighter the

As can be seen in the graph corresponding to this second experiment, the highest degrees of similarity are grouped into blocks around the diagonal. Under the conditions shown, this indicates that most documents are more similar to each other when they deal with the same subjects. However, in this case it is interesting to see how the light points outside the groups of the diagonal appear in bands that show how relationships among documents of similar topics are established. In the third experiment performed, the results are similar to the previous one where compared, in the same conditions, the WWV of the documents of a KC node on CS, as can be seen in the corresponding graph (see Fig. 9 right). In this case, as the

have been performed.

on OOSS

been classified.

colours.

experiments have been carried out.

account are represented in the form of colour lines according to the level of similarity among the corresponding item vectors, included as a tag.

On the other hand, the recommendation service illustrates how to profit from the new knowledge to facilitate the use of the system and to make interaction with it more dynamic and attractive. In the illustration above (see Fig. 7 right) we can see an example of the window system showing a document and incorporating a recommendation panel on the bottom part, where representative icons of different kinds of knowledge items appear in warmer colours representing the high level of similarity among the vectors associated with the corresponding items. In addition, we can see in the window on top a representation in the form of a graph of the most important relationship that the document we are working with has with other items in the system. In this graph, as in the example of the view, the topics are represented by circles and the documents by parallelograms. However, in this case the colours of the figures represent the coefficient of similarity of the relations that link with the central knowledge item. Other services would be possible applying a similar approach, such as an assistant to locate documents in the most appropriate topic within the knowledge tree, or one to find experts in some topic or other users interested in the same subjects.

Both the view in the form of a graph and the recommendation service implemented allow navigation by mode knowledge different to the one the system allowed before making use of latent knowledge of the system. In both cases, the graphs have been generated by Graphviz (Gansner & North, 2000).

### **3. Experiments performed**

To check the viability of these approaches, we have developed a prototype of the three elements shown that are part of the SKC system: analysis module (AM), graph visualizer for relation among knowledge items (KV) and context recommendation service (RS). They have all been incorporated into a KnowCat system (KC).

The prototype has allowed to perform several experiments with KC nodes, having been prepared for this during several years in teaching activities carried out in la Escuela Politécnica Superior de la Universidad Autónoma de Madrid. In particular, four KC nodes have been used: one node Operating Systems (OOSS); two Formal Languages and Automata Theory (FLAT); and one more Computer systems (CS).

The node Operating Systems is the result of the development of a list of topics on this subject carried out by the students during four consecutive years and which consists of a two level depth knowledge tree with over 20 topics and 350 documents.

The nodes FLAT organise different documents into two knowledge trees provided by the students during the academic year. Both nodes deal with the same subject, but in each of them the documents and the structure of the list of topics are different. Both trees have two levels, one node with 6 topics and 24 documents and the other with 12 topics and 50 documents.

Lastly, in node CS a topic about the corresponding knowledge area has been developed, hence the students from one subject have provided over 180 documents concerning around 40 different topics structured within a knowledge tree during an academic year.

The experiments carried out have been addressed to check the viability of automatic grouping of knowledge items using weight of words vectors assigned by the proposed

account are represented in the form of colour lines according to the level of similarity among

On the other hand, the recommendation service illustrates how to profit from the new knowledge to facilitate the use of the system and to make interaction with it more dynamic and attractive. In the illustration above (see Fig. 7 right) we can see an example of the window system showing a document and incorporating a recommendation panel on the bottom part, where representative icons of different kinds of knowledge items appear in warmer colours representing the high level of similarity among the vectors associated with the corresponding items. In addition, we can see in the window on top a representation in the form of a graph of the most important relationship that the document we are working with has with other items in the system. In this graph, as in the example of the view, the topics are represented by circles and the documents by parallelograms. However, in this case the colours of the figures represent the coefficient of similarity of the relations that link with the central knowledge item. Other services would be possible applying a similar approach, such as an assistant to locate documents in the most appropriate topic within the knowledge tree, or one to find experts in some topic or other users interested in the same

Both the view in the form of a graph and the recommendation service implemented allow navigation by mode knowledge different to the one the system allowed before making use of latent knowledge of the system. In both cases, the graphs have been generated by

To check the viability of these approaches, we have developed a prototype of the three elements shown that are part of the SKC system: analysis module (AM), graph visualizer for relation among knowledge items (KV) and context recommendation service (RS). They have

The prototype has allowed to perform several experiments with KC nodes, having been prepared for this during several years in teaching activities carried out in la Escuela Politécnica Superior de la Universidad Autónoma de Madrid. In particular, four KC nodes have been used: one node Operating Systems (OOSS); two Formal Languages and Automata

The node Operating Systems is the result of the development of a list of topics on this subject carried out by the students during four consecutive years and which consists of a

The nodes FLAT organise different documents into two knowledge trees provided by the students during the academic year. Both nodes deal with the same subject, but in each of them the documents and the structure of the list of topics are different. Both trees have two levels, one node with 6 topics and 24 documents and the other with 12 topics and 50

Lastly, in node CS a topic about the corresponding knowledge area has been developed, hence the students from one subject have provided over 180 documents concerning around

The experiments carried out have been addressed to check the viability of automatic grouping of knowledge items using weight of words vectors assigned by the proposed

40 different topics structured within a knowledge tree during an academic year.

the corresponding item vectors, included as a tag.

subjects.

documents.

Graphviz (Gansner & North, 2000).

all been incorporated into a KnowCat system (KC).

Theory (FLAT); and one more Computer systems (CS).

two level depth knowledge tree with over 20 topics and 350 documents.

**3. Experiments performed** 

procedures. For this, three groups of experiments described in the following paragraphs have been performed.

#### **3.1 Automatic grouping and classification experiments of knowledge items**

In order to prove automatic grouping and classification of knowledge items three experiments have been carried out.

The first one starts from node KC on OOSS, where the two most successful documents for each item have been used to establish the weight of words vectors (WWV) for each one of them. Next, the levels of similarity between the WWV for the remaining documents and the WWV for the topics previously mentioned have been established. With these data a graph (see Fig. 8) has been obtained, where each row of points corresponds to one topic and each column to a document. The levels of similarity between documents and topics have been represented by colour points, the higher the value of coefficient, the lighter the colours. The documents have been organised into topics which students had initially classified manually.

Fig. 8. Automatic classification of documents (rows) by topics (columns) in node KC on OOSS

As a result of the first experiment, in the graph we can see how the highest similarity values -lighter points- are aligned mainly in the rows that correspond to the topics in which they were classified manually. This indicates that automatic classification matches with manual classification in most cases. By analysing the anomalies a posteriori it can be seen that they correspond to ambiguous topics in which heterogeneous documents had been classified.

The second experiment compares the WWV of the documents of the original node OOSS between them. In the graph (see Fig. 9 left) each row of points -vertical and horizontalcorrespond to a document and these appear organised into the topics in which they were classified manually. Like in the previous case, the levels of similarity among elements have been represented by colour points, the higher the value of coefficient, the lighter the colours.

As can be seen in the graph corresponding to this second experiment, the highest degrees of similarity are grouped into blocks around the diagonal. Under the conditions shown, this indicates that most documents are more similar to each other when they deal with the same subjects. However, in this case it is interesting to see how the light points outside the groups of the diagonal appear in bands that show how relationships among documents of similar topics are established. In the third experiment performed, the results are similar to the previous one where compared, in the same conditions, the WWV of the documents of a KC node on CS, as can be seen in the corresponding graph (see Fig. 9 right). In this case, as the

Digestion of Knowledge in a KM System to Reveal Implicit Knowledge 113

ones, so that every couple of new homologous topics has a similar number of different documents, but relevantly similar. Later the WWV of the topics of the new nodes have been calculated and have been compared with each other. A graph (see Fig. 10 left) has been produced with the values of similarity obtained, where the lines of colour blocks correspond to the topics of one node and the columns to the other. The topics have been organised into both dimensions in order that the homologous topics are in the same position in the corresponding entries of the table. Like in previous graphs, the highest values of similarity

As can be seen in the image of this first experiment, the highest grades of similarity -blocks of light colour- are over the diagonal in almost every case. With the proposed approach, this means that it is possible to identify the branches of the knowledge trees that contain

For the second experiment two KC nodes on FLAT that have different trees to organise the knowledge have been used. Again the WWV of the topics have been calculated from the documents included within them and the grades of similarity have been calculated from the topics of different nodes comparing their corresponding vectors. The result is shown in a graph (see Fig. 10 right) where the topics of one node are in the axis of abscissa and the other in the organised axis. As on other occasions, the grade levels of similarity are shown in colour blocks, where again, the higher the value of coefficient, the lighter the colours. In this case, the pair of topics that are considered linked to each other through their contents by means of a manual analysis by an expert on the subject have been

As a result of this second experiment, it can be seen that most of the associations made by an expert fit in over light colour blocks and that every light block is found in topic pairs associated by the expert. Therefore, it is possible to identify the proposed procedure and the

Starting from the documents included in five KC nodes, the one belonging to CS used in the first group of experiments, the two OOSS prepared for the previous group and the two FLAT used in the same group, a WWV has been established for each of them. In every case, the documents included in the nodes are different. By comparing these WWV a graph (see Fig. 11) has been obtained, in which each line of blocks, vertical and horizontal,

topics that deal with related issues in different knowledge trees automatically.

**3.3 Automatic association experiment among knowledge nodes** 

are represented by the lighter colours.

documents dealing with the same topics.

Fig. 11. Grouping of KC nodes per topics

marked with a cross.

number of topics is higher and the number of documents per topic is lower, groupings appear like smaller light blocks.

(a) OOSS KC node documents similarity (b) CS KC node documents similarity

Fig. 9. Automatic grouping of documents by topics of knowledge area
