**2.2 State-of-the-art: semantic schema for data integration and processing**

Indeed, several areas of application and definition of ontologies (schema) have been noted in the current works of literature especially as it concerns the varied domains of interest. For example, Hashim [26] notes that the term "ontology" is borrowed from the philosophy field that is concerned with being or existence, and further mention that in context of computer and information science, it symbolizes as an "artefact that is designed to model any domain knowledge of interest". Ontology has also been broadly used in many sub-fields of the computer science and AI, particularly in data pre-processing, management, and LOD related areas such as intelligent information integration and analysis [27], cooperative information management systems [28], knowledge engineering and representation [29], information retrieval [30], information extraction [31], ontology-based information extraction systems [13, 15, 32–34], database management systems [35–37], and semantic-based process mining and analysis [10, 16, 18, 38–40].

Gruber [25] describes the ontological concept or notion as "a formal explicit specification of a conceptualization". To date, the aforementioned breadth has been the most widely applied and cited definition of ontologies within the computer science field. The description means that ontologies are able to explicitly define (i.e. specify) concepts and relationships that are paramount for modeling any given process or domain of interest. Moreover, with such expressive application

*Linked Open Data - Applications, Trends and Future Developments*

robust links between related concepts or items.

**2.1 Semantics? the missing link in LOD systems**

retrieve data for further analysis using Type-2 fuzzy logic.

models in question being ignored.

**2. Preliminaries**

semantical web or schema (e.g. using ontologies) [4–6] of interconnected data [7]. According to Snyder et al. [7], LOD has since been epitomized as a way of improving the process of discovering useful information or resources by creating a series of

The work done in this chapter notes that one of the main challenges with LOD has been on how to create systems or methods that are capable of providing an understandable format (both machine-readable and machine-understandable) for the various datasets that may come from different sources, as well as, making the derived formats or standards explicable across the several platforms. To this end, the work proposes a semantic-based LOD framework (SBLODF) that provides an additional function to LOD that allows for formal integration of the process elements or concepts through metadata creation (process description) using the semantic technologies or schema. This is called Semantic-based Linked Open Data.

Research on why "domain knowledge" is useful in bridging the semantic gap in existing systems or applications that aims to store and/or process data has long been discussed in existing works of literature [1, 6, 8–11]. Whereas, Declerck et al. [1] note that one of the main aims of LOD supported systems is to develop new ways or methods for construing data values (interlinks) that are applicable to a broad range of applications or platforms (based on language technologies or resource descriptions) through semantic technologies. Wang [12] notes that contemporary studies on LOD methods and tools are mainly directed towards ascertaining different levels or types of process instances (entities), thereby resulting in the central task of finding relationships (schema-level) or links that exist amongst the LOD datasets or

According to Wang [12], ontological representations (mappings) are a very crucial way of solving the data heterogeneity or missing link. Moreover, ontologies can be described as an essential tool that proves useful towards establishing the semantic-level links in LOD [6, 13–18]. For example, Selvan et al. [19] proposed an ontology-based recommender system that is built on cloud services to store and

Studies have shown that there exists a (semantic) gap between different datasets and the various tools/algorithms that are applied to analyze or understand the data including results of the analysis in all stages of the data processing; ranging from the data pre-processing to implementation of the algorithms, and the interpretation of the results [6, 8, 11, 16]. For instance, data pre-processing usually involves the process of filtering and cleaning of data, standardization by defining formats for its integration, transformation and properties extraction and retrieval of the defined formats/structures, and then selected for the purpose of analysis. Nevertheless, in many settings, there exist the issue of semantic gaps in the several phases of the data pre-processing. For example, we note that in the absence of considering the formal structure (semantics) of the data models, most of the resulting systems have resort to empirical or ad-hoc methods to determine the quality of the underlying datasets or concepts. Whereas, it is certain that data semantics is necessary for understanding the relations that exist amongst the different process elements in the models, especially during the standardization and transformation step. Thus far, it is important to determine the correlation between the different data elements by taking into account the underlying properties/attributes of the data when performing

**42**

or schema, it means that the processes can be represented in the form of classes, relations, individuals, and axioms (C,R,I,A). Thus, we note that the structural layer of ontologies can be defined as a *quadruple* which are construed on connected sets of taxonomies (RDF + Axioms) or yet formal structure (Triple + Facts). Whereby the *subjects* include the represented class(es), *C*, the *objects* include the individual process elements or instances, *I*, the *predicates* are used to express the relationships, *R*, that exist amongst the subjects and objects, and then sets of axioms that state facts, *A*, [11]. Thus;

$$\mathbf{Out} = \{\mathbf{C}, \mathbf{R}, \mathbf{I}, \mathbf{A}\} \tag{1}$$

Following the aforementioned definition of the ontological concept or schema, this work note that ontologies serve and are built to perform the main functional mechanisms for the integration of data models for the various systems (e.g. LOD) as follows:

