**6. Discussion and conclusion**

LOD systems or frameworks and algorithms are fundamentally aimed to provide a standard platform for integrating/analyzing different datasets or models to extract snippets of information that are relevant to the users, independent of the various formats or syntax. In other words, LOD stands as the bridge between the different data formats/sources and knowledge acquisition or information retrieval. For example, Cunningham [31] notes that the process of extracting information from the several sources may simply imply taking text documents, speech, graphics, etc., as input and produces fixed-format unambiguous data (or information) as output. In turn, the discovered information or data may be directly displayed to the users, stored in a database or spreadsheet for later analysis, or may be used for indexing purposes in IR-supported applications such as the web search technologies, internet, or search engines like Google, Bing, etc.

Studies have shown that IE technologies may be distinctive from the IR systems or functions. Whereas, Cunningham [31] notes that the IR systems aims to find relevant information (e.g. texts) and presents them to the users, an IE application analyses the texts and presents only the specific information from the text that the user is interested in. Apparently, this kind of tailored information analysis is where ontology-based information extraction systems such as the SBLODF framework described in this chapter construes its incentives.

For example, a user of an IR-supported system wanting information on higher educational institutions that offers a particular course would enter a list of relevant words or keywords in the search module and receive in return a set of documents (e.g. various university prospectus, course guidelines, etc.) that contain likely matches based on the keywords. In turn, the user would read through the matches or documents and extract the requisite information they need themselves, or yet store them on their computer storage for future reference. Nonetheless, unlike IR, an IE system would automatically populate a list of tables or spreadsheets directly with the names of relevant universities and their course offerings making it easier for the users to extract or learn the specific information they need or seek to acquire.

**53**

right locations.

*Linked Open Data: State-of-the-Art Mechanisms and Conceptual Framework*

However, there may also exist some limitations with IE supported LOD frameworks or systems when compared to IR only. One of the limitations is that IE systems are more difficult and knowledge-intensive to build and are to a certain extent tied to particular domains or case scenarios. Also, IEs are more computationally intensive than IRs. Although, on the other hand, when compared to applications where there are large text or document volumes, IEs are potentially much more efficient than IRs due to the capacity of dramatically reducing the amount of time people may spend reading through text documents to find the relevant information. Perhaps, the aforementioned benefit of the IEs is only possible as a result of applying the ontological (semantics) schema to represent and manipulate the underlying

Moreover, in settings where the results need to be presented, for example, in several languages; the fixed-format and unambiguous nature of IEs outputs make the information retrieval process relatively direct when compared to the full translation facilities that are consequently needed for interpretation of the multilingual texts found by IRs. Indeed, this means that IEs only present the specific information in a form that the user is interested in, and this feature is where the ontology-based IE systems are more powerful given that ontology is one of such tools that have the capability of providing information in a structured format. For instance, the automatic population of the different class hierarchies in ontologies within OBIE [9] applications is capable of formally identifying process instances or element within a text file that belongs to or references certain concepts in the pre-defined ontologies, and then trails to add those instances to the model in the

Having said that, we note that OBIE systems such as the SBLODF attempts to classify the several entities in a more scalar way; as there may be different categories to which an entity can belong to and cataloging the discrepancies between those classifications is more or less straightforward when using the OBIE framework [17]. Furthermore, to explain the application of the OBIE concept in the context of information retrieval and extraction or semantic-based knowledge representation, Yankova [20] refers to an identity resolution method of deciding whether an instance extracted from a text by an IE application refers to a known entity within a target domain ontology. Technically, the authors [20] developed a customizable rule-based framework for identity resolution and merging that uses ontologies for knowledge representation by using customizable identity criteria put in place to decide on the similarity between two process instances or entities. The criteria utilizes ontological operations and similarity computation between extracted and stored values that are weighted. Besides, the weighting criteria are routinely speci-

Accordingly, studies have also shown that aggregation of the extracted information from the different data sources has greater advantages (e.g. complementing partial information from one source to another, increasing the confidence of the extracted information, and storage of updated information within the knowledge bases) [11, 14, 15, 17, 20, 23, 53]. Truly, the resultant methods prove to provide standard structures for resolving the identities or properties description of the different class(es) of entities (process instances) by using ontologies as the core (fundamental) knowledge representation tools that help to provide the formal

Interestingly, Yankova [20] reveals that one fundamental problem to be addressed when providing a structure for distribution of the conceptual knowledge such as with OBIE systems; is that of identifying and merging the instances extracted from the multiple sources. Basically, the process should aim at identifying newly extracted facts, e.g. from the derived models, and linking them to

fied according to the type of entities and the application domain.

descriptions that are complemented with semantics.

*DOI: http://dx.doi.org/10.5772/intechopen.94504*

information as described in Section 3 of this chapter.

### *Linked Open Data: State-of-the-Art Mechanisms and Conceptual Framework DOI: http://dx.doi.org/10.5772/intechopen.94504*

However, there may also exist some limitations with IE supported LOD frameworks or systems when compared to IR only. One of the limitations is that IE systems are more difficult and knowledge-intensive to build and are to a certain extent tied to particular domains or case scenarios. Also, IEs are more computationally intensive than IRs. Although, on the other hand, when compared to applications where there are large text or document volumes, IEs are potentially much more efficient than IRs due to the capacity of dramatically reducing the amount of time people may spend reading through text documents to find the relevant information. Perhaps, the aforementioned benefit of the IEs is only possible as a result of applying the ontological (semantics) schema to represent and manipulate the underlying information as described in Section 3 of this chapter.

Moreover, in settings where the results need to be presented, for example, in several languages; the fixed-format and unambiguous nature of IEs outputs make the information retrieval process relatively direct when compared to the full translation facilities that are consequently needed for interpretation of the multilingual texts found by IRs. Indeed, this means that IEs only present the specific information in a form that the user is interested in, and this feature is where the ontology-based IE systems are more powerful given that ontology is one of such tools that have the capability of providing information in a structured format. For instance, the automatic population of the different class hierarchies in ontologies within OBIE [9] applications is capable of formally identifying process instances or element within a text file that belongs to or references certain concepts in the pre-defined ontologies, and then trails to add those instances to the model in the right locations.

Having said that, we note that OBIE systems such as the SBLODF attempts to classify the several entities in a more scalar way; as there may be different categories to which an entity can belong to and cataloging the discrepancies between those classifications is more or less straightforward when using the OBIE framework [17].

Furthermore, to explain the application of the OBIE concept in the context of information retrieval and extraction or semantic-based knowledge representation, Yankova [20] refers to an identity resolution method of deciding whether an instance extracted from a text by an IE application refers to a known entity within a target domain ontology. Technically, the authors [20] developed a customizable rule-based framework for identity resolution and merging that uses ontologies for knowledge representation by using customizable identity criteria put in place to decide on the similarity between two process instances or entities. The criteria utilizes ontological operations and similarity computation between extracted and stored values that are weighted. Besides, the weighting criteria are routinely specified according to the type of entities and the application domain.

Accordingly, studies have also shown that aggregation of the extracted information from the different data sources has greater advantages (e.g. complementing partial information from one source to another, increasing the confidence of the extracted information, and storage of updated information within the knowledge bases) [11, 14, 15, 17, 20, 23, 53]. Truly, the resultant methods prove to provide standard structures for resolving the identities or properties description of the different class(es) of entities (process instances) by using ontologies as the core (fundamental) knowledge representation tools that help to provide the formal descriptions that are complemented with semantics.

Interestingly, Yankova [20] reveals that one fundamental problem to be addressed when providing a structure for distribution of the conceptual knowledge such as with OBIE systems; is that of identifying and merging the instances extracted from the multiple sources. Basically, the process should aim at identifying newly extracted facts, e.g. from the derived models, and linking them to

*Linked Open Data - Applications, Trends and Future Developments*

rect, and were consistently observed for all the test sets.

**6. Discussion and conclusion**

net, or search engines like Google, Bing, etc.

described in this chapter construes its incentives.

( )

( )

respectively.

defined model that has "TrueTrace\_Fitness\_(TP)" and "FalseTrace\_Fitness\_(TN)"

"TestLog\_ forSpecifiedClass and hasTraceFitness some 'TrueTrace\_Fitness\_ TP ".

"TestLog\_ forSpecifiedClass and hasTraceFitness some 'FalseTrace\_Fitness\_ TN ".

Thus, as reported in **Table 1**, each results of the classification process for the discovered models, i.e., the true positives and true negatives traces, were determined. From the results of the classification method (**Table 1**), we note for each run set of parameters retrieved from the model that the commission error, otherwise referred to as error-rate (false positives (FP) and false negatives (FN)) was null, thus, equal to 0. This means that the reasoner (classifier) did not make critical mistakes. For instance, a case whereby a trace could be considered to be an instance of a class while it is categorically an instance of another class. In the same vein, the work notes that the accuracy rate (i.e., true positives (TP) and true negatives (TN)) when determining the different traces and classifications was very high, thus, cor-

LOD systems or frameworks and algorithms are fundamentally aimed to provide a standard platform for integrating/analyzing different datasets or models to extract snippets of information that are relevant to the users, independent of the various formats or syntax. In other words, LOD stands as the bridge between the different data formats/sources and knowledge acquisition or information retrieval. For example, Cunningham [31] notes that the process of extracting information from the several sources may simply imply taking text documents, speech, graphics, etc., as input and produces fixed-format unambiguous data (or information) as output. In turn, the discovered information or data may be directly displayed to the users, stored in a database or spreadsheet for later analysis, or may be used for indexing purposes in IR-supported applications such as the web search technologies, inter-

Studies have shown that IE technologies may be distinctive from the IR systems or functions. Whereas, Cunningham [31] notes that the IR systems aims to find relevant information (e.g. texts) and presents them to the users, an IE application analyses the texts and presents only the specific information from the text that the user is interested in. Apparently, this kind of tailored information analysis is where ontology-based information extraction systems such as the SBLODF framework

For example, a user of an IR-supported system wanting information on higher educational institutions that offers a particular course would enter a list of relevant words or keywords in the search module and receive in return a set of documents (e.g. various university prospectus, course guidelines, etc.) that contain likely matches based on the keywords. In turn, the user would read through the matches or documents and extract the requisite information they need themselves, or yet store them on their computer storage for future reference. Nonetheless, unlike IR, an IE system would automatically populate a list of tables or spreadsheets directly with the names of relevant universities and their course offerings making it easier for the users to extract or learn the specific information they need or seek to

( )

( )

**52**

acquire.

their previous references or mentions. To this effect, we note that ontology-based systems, in general, poses two main challenges that are directed towards [31]:


Perhaps, it is also important to mention that when the ontologies are populated with the process instances or concepts assertions; the ultimate function of the resultant (OBIE) systems would simply be to manipulate the process elements, for example, by uncovering the relationships that exist amongst the process instances and revealing those to the users or search initiators based on the query modules [6, 9, 16, 31, 44, 54]. Moreover, for rule-based systems like OBIE, such procedures are relatively unswerving. But for learning-based IE systems, it appears to be more problematic due to the fact that training data are most often required to train the models, and collecting the necessary training data is, on the other hand, likely to be cumbersome/bottleneck [31]. Although to resolve such issues, new training datasets may need to be created either manually or semi-automatically; which are a lot of the time is time-consuming and/or burdensome task.

However, new and emerging systems/methods are being developed with the aim to help address such *metadata creation* problems for knowledge management or data analysis to support the IE and LOD at large [1, 11–15, 23, 33, 55, 56]. Moreover, unlike the traditional IE systems where the extracted facts (or information) are only classified as belonging to pre-defined types, an ontology-based (semantic) IE system (such as the SBLODF) seeks to identify, analyze and represent information at the conceptual (abstraction) levels by establishing a link (references) between the entities residing in the underlying systems' knowledge-bases and their mentions within the contextual domain. Henceforth, semantically-based LOD systems should not only support the formal representation of the different domains. But should also, on the other hand, provide information about the several known entities and their properties descriptions. Thus, ontology-based LOD systems such as the SBLODF introduced in this chapter must integrate well-defined entities with their semantic descriptions for an efficient explicit and implicit information extraction and/or analysis, i.e., machine-readable and machine-understandable system.
