**2. Preliminaries**

### **2.1 Semantics? the missing link in LOD systems**

Research on why "domain knowledge" is useful in bridging the semantic gap in existing systems or applications that aims to store and/or process data has long been discussed in existing works of literature [1, 6, 8–11]. Whereas, Declerck et al. [1] note that one of the main aims of LOD supported systems is to develop new ways or methods for construing data values (interlinks) that are applicable to a broad range of applications or platforms (based on language technologies or resource descriptions) through semantic technologies. Wang [12] notes that contemporary studies on LOD methods and tools are mainly directed towards ascertaining different levels or types of process instances (entities), thereby resulting in the central task of finding relationships (schema-level) or links that exist amongst the LOD datasets or models in question being ignored.

According to Wang [12], ontological representations (mappings) are a very crucial way of solving the data heterogeneity or missing link. Moreover, ontologies can be described as an essential tool that proves useful towards establishing the semantic-level links in LOD [6, 13–18]. For example, Selvan et al. [19] proposed an ontology-based recommender system that is built on cloud services to store and retrieve data for further analysis using Type-2 fuzzy logic.

Studies have shown that there exists a (semantic) gap between different datasets and the various tools/algorithms that are applied to analyze or understand the data including results of the analysis in all stages of the data processing; ranging from the data pre-processing to implementation of the algorithms, and the interpretation of the results [6, 8, 11, 16]. For instance, data pre-processing usually involves the process of filtering and cleaning of data, standardization by defining formats for its integration, transformation and properties extraction and retrieval of the defined formats/structures, and then selected for the purpose of analysis. Nevertheless, in many settings, there exist the issue of semantic gaps in the several phases of the data pre-processing. For example, we note that in the absence of considering the formal structure (semantics) of the data models, most of the resulting systems have resort to empirical or ad-hoc methods to determine the quality of the underlying datasets or concepts. Whereas, it is certain that data semantics is necessary for understanding the relations that exist amongst the different process elements in the models, especially during the standardization and transformation step. Thus far, it is important to determine the correlation between the different data elements by taking into account the underlying properties/attributes of the data when performing

**43**

*Linked Open Data: State-of-the-Art Mechanisms and Conceptual Framework*

data standardization or processing at large. Apparently, tightly (closely) correlated attributes can be generalized into one combined attribute or classification for the

Typically, in terms of the different application domains and rule-based information extraction systems, Yankova [20] conducted a semantic-based identity resolution and experiment that aims to identify conceptual information expressed within a domain ontology. The experiment was based on a generic and adaptable human language technology. In the experimentation, they extracted company information from several sources and update the existing ontologies with the resolved entities. The method for information extraction is a rule-based system they referred to as Identity Resolution Framework (IdRF) built using Proton [20] that provides a general solution to identifying known and new facts in a certain domain, and can also be applied to other domains regardless of the type of entities that may need to be resolved. Moreover, input to the IdRF includes different entities together with their associated properties and values, and the expected output is an integrated representation of the entities that are consequently resolved to have new properties

On the one hand, ontologies have shown to be beneficial in such data processing or conceptualization scenarios [21–23]. Ontologies are formal structures that are used to capture knowledge about some specific domain processes of interest [24–25]. Technically, the "ontologies" or formal expressions (taxonomies) per se are used to describe concepts within process domains as well as the relationships that hold between those concepts. Ontologies range from the tools or mechanisms used to create the taxonomies, to the population of the classified elements or database schemas to fully axiomatized theories [11]. Practically, ontologies are used by the domain experts to (manually, semi-automatic, or automatically) fill the semantic gaps that

On the other hand, it is also noteworthy to mention that ontologies are now central to many applications; such as scientific knowledge portals, information management and integration systems, electronic commerce and web services, etc.

**2.2 State-of-the-art: semantic schema for data integration and processing**

semantic-based process mining and analysis [10, 16, 18, 38–40].

Indeed, several areas of application and definition of ontologies (schema) have been noted in the current works of literature especially as it concerns the varied domains of interest. For example, Hashim [26] notes that the term "ontology" is borrowed from the philosophy field that is concerned with being or existence, and further mention that in context of computer and information science, it symbolizes as an "artefact that is designed to model any domain knowledge of interest". Ontology has also been broadly used in many sub-fields of the computer science and AI, particularly in data pre-processing, management, and LOD related areas such as intelligent information integration and analysis [27], cooperative information management systems [28], knowledge engineering and representation [29], information retrieval [30], information extraction [31], ontology-based information extraction systems [13, 15, 32–34], database management systems [35–37], and

Gruber [25] describes the ontological concept or notion as "a formal explicit specification of a conceptualization". To date, the aforementioned breadth has been the most widely applied and cited definition of ontologies within the computer science field. The description means that ontologies are able to explicitly define (i.e. specify) concepts and relationships that are paramount for modeling any given process or domain of interest. Moreover, with such expressive application

*DOI: http://dx.doi.org/10.5772/intechopen.94504*

or values within the ontology.

purpose of tractability and conceptualized analysis.

are allied to the data analysis procedures and models.

which are all grounded or built on the LOD scheme.

### *Linked Open Data: State-of-the-Art Mechanisms and Conceptual Framework DOI: http://dx.doi.org/10.5772/intechopen.94504*

data standardization or processing at large. Apparently, tightly (closely) correlated attributes can be generalized into one combined attribute or classification for the purpose of tractability and conceptualized analysis.

Typically, in terms of the different application domains and rule-based information extraction systems, Yankova [20] conducted a semantic-based identity resolution and experiment that aims to identify conceptual information expressed within a domain ontology. The experiment was based on a generic and adaptable human language technology. In the experimentation, they extracted company information from several sources and update the existing ontologies with the resolved entities. The method for information extraction is a rule-based system they referred to as Identity Resolution Framework (IdRF) built using Proton [20] that provides a general solution to identifying known and new facts in a certain domain, and can also be applied to other domains regardless of the type of entities that may need to be resolved. Moreover, input to the IdRF includes different entities together with their associated properties and values, and the expected output is an integrated representation of the entities that are consequently resolved to have new properties or values within the ontology.

On the one hand, ontologies have shown to be beneficial in such data processing or conceptualization scenarios [21–23]. Ontologies are formal structures that are used to capture knowledge about some specific domain processes of interest [24–25]. Technically, the "ontologies" or formal expressions (taxonomies) per se are used to describe concepts within process domains as well as the relationships that hold between those concepts. Ontologies range from the tools or mechanisms used to create the taxonomies, to the population of the classified elements or database schemas to fully axiomatized theories [11]. Practically, ontologies are used by the domain experts to (manually, semi-automatic, or automatically) fill the semantic gaps that are allied to the data analysis procedures and models.

On the other hand, it is also noteworthy to mention that ontologies are now central to many applications; such as scientific knowledge portals, information management and integration systems, electronic commerce and web services, etc. which are all grounded or built on the LOD scheme.
