Preface

Today, modern tools and techniques for the collection and analysis of data in all fields of science and technology are proving to be more complex. The growing complexities are evidenced by the need for a more generalized and standardized description (integration) of the various data sources and formats to allow for the flexible exploration of different data types. Theoretically, the challenge has been how to create automated systems capable of providing an understandable format for the different datasets, as well as making the derived formats and/or standards applicable across the different platforms. Over the past few decades, one of the recent technologies that have proved indispensable in this area is the Linked Open Data (LOD). The LOD systems consist of a number of machine-readable datasets with Resource Description Framework (RDF) triples that are useful in describing data classes and the underlying properties. Moreover, indications from early research note that one of the problems with such existing data or information processing systems is the need for not just representing the data (or information) in formats that can be easily understood by humans, but also for building the intelligent systems that process the information that they contain or support. In other words, "machine-understandable" systems. By machine-understandable system, we assume that the extracted information or models are either semantically labeled (annotated) to ease the analysis process, or represented in a formal structure (ontology) that allows a computer (the reasoning engine) to infer new facts by making use of the underlying relations. Indeed, the main idea for such data or information processing systems or those aspects of aggregating the data and computing the hierarchy of several process elements; is that they should not only be machine-readable but also machine-understandable. An adequate knowledgebase system is perceived to be, on the one hand, understandable by people, and on the other hand understandable by the machines. As devices become smarter and produce data about themselves, it has become increasingly important for data scientists to take advantage of more powerful tools and/or data integration techniques to help provide a common standard for information dissemination across the different platforms. To this end, the content of this book demonstrates that technologies such as the semantic web, machine learning, deep learning, natural language processing, internet of things, knowledge graph, process mining, and artificial intelligence, etc. which encompasses the wider spectrum of the LOD are of paramount importance. Therefore, this book presents two main drivers for the LOD technologies as follows: (i) encoding knowledge about specific data and process domains, and (ii) advanced reasoning and analysis of the big datasets at a more conceptual level.

This book intends to provide the reader with a comprehensive overview of the latest developments within the LOD framework and the benefits of the supported methods – ranging from the semantics-aware techniques that exploit knowledge kept in big data to improved data reasoning (big analysis) beyond the possibilities offered by most traditional data mining techniques. Fundamentally, the book covers the entire spectrum of "Linked Open Data - Applications, Trends and Future Developments". It consists of six chapters selected after a rigorous review

by both the academic editor, reviewers, and the IntechOpen book editorial team. Technically, each of the individual chapters provides a comprehensive conceptualisation of the LOD framework and its main application components. The authors of the different chapters are reputable scholars and researchers from across the world with wide areas of research interests. Ranging from the computational fields of computer science, data science, information science, software engineering, knowledge graph and library linked data, internet of things, and semantic web technologies to the engineering and manufacturing fields of process modelling, process intelligence, and then to knowledge and data management, e-commerce and financial analytics, and educational innovation. Fundamentally, the rich contents of this book, conveyed by the authors through the various chapters, explore the different topics of interest in relation to LOD, ranging from the latest in LOD clouds and systems, to research problems and challenges with LOD, and then its application in real-time or real-world settings. This includes detailed subjective knowledge of the gaps in the existing literature, suitable methodologies applied to address the identified gaps, design and development of LOD frameworks, and case studies.

To this end, *Chapter 1* looks at the extent to which the Bibliographic Framework Initiative (BIBFRAME) has been used to integrate library data from the silos of online catalogues, and then discusses some of the challenges that need to be addressed in order to optimize the potential capabilities that the BIBFRAME model holds. *Chapter 2* discusses the several attempts within the Linked Data research area to transform table and list datasets into machine-readable formats by proposing a data model named TULIP. The method focused on transforming tables and lists into RDF format while thoroughly maintaining their essential configurations and for the future development of the Semantic Web. *Chapter 3* presents the latest mechanisms and conceptual framework of the LOD by proposing a Semantic-Based Linked Open Data Framework (SBLODF) that integrates the different elements or entities in information systems or models with semantics (metadata descriptions) to produce explicit and implicit information based on the user's search queries. The SBLODF framework is a machine-readable and machine-understandable system that proves to be useful in encoding knowledge about different process domains and representation of the discovered information or models at a more conceptual level. *Chapter 4* discusses the need and issues that involves the analysis of effective load balancing techniques in a distributed environment. It looks at the heterogeneous nature of distributed computing, interoperability, fault occurrence, resource selection, and task scheduling for performance optimization of web resources through various balancing algorithms and scheduling methods, and then provides a concise narrative of the problems encountered and dimensions for future extension. *Chapter 5* is on the study of the internet of things (IoT) and big data analysis for advanced processes. It discusses a semiconductor process for manufacturing that is set on the cloud database for big data analysis and decisionmaking through a continuous monitoring system. *Chapter 6* assesses the implication of the backtesting approach in financial time series analysis when choosing a reliable Generalized Auto-Regressive Conditional Heteroscedastic (GARCH) model for analysing stock returns in a case study of financial institution settings.

Resourcefully, this book is a reference and educational book targeted to be beneficial for data scientists, software developers, semantic web engineers, information system designers, process managers, teachers, and researchers, and general consumers in application and implementation of the LOD framework

**V**

and research in the various contexts. With this book, the editor and authors focused on bridging the practical and theoretical gap in the methodological use and commercial application of the LOD concepts in computer science and

informative contributions that form the rich content of this book.

I would like to thank IntechOpen publishers, and the publishing team especially the book project Process Manager, Ms. Jasna Bozić, for the professional commitment to ensuring the successful completion of this book. Most importantly, I would like to specially thank the authors for their dedicated hard work, wonderful research, and

> **Dr. Kingsley Okoye** Data Architect,

Monterrey, Mexico

Tecnologico de Monterrey,

engineering.

and research in the various contexts. With this book, the editor and authors focused on bridging the practical and theoretical gap in the methodological use and commercial application of the LOD concepts in computer science and engineering.

I would like to thank IntechOpen publishers, and the publishing team especially the book project Process Manager, Ms. Jasna Bozić, for the professional commitment to ensuring the successful completion of this book. Most importantly, I would like to specially thank the authors for their dedicated hard work, wonderful research, and informative contributions that form the rich content of this book.

> **Dr. Kingsley Okoye** Data Architect, Tecnologico de Monterrey, Monterrey, Mexico

**IV**

case studies.

by both the academic editor, reviewers, and the IntechOpen book editorial team. Technically, each of the individual chapters provides a comprehensive

conceptualisation of the LOD framework and its main application components. The authors of the different chapters are reputable scholars and researchers from across the world with wide areas of research interests. Ranging from the computational fields of computer science, data science, information science, software engineering, knowledge graph and library linked data, internet of things, and semantic web technologies to the engineering and manufacturing fields of process modelling, process intelligence, and then to knowledge and data management, e-commerce and financial analytics, and educational innovation. Fundamentally, the rich contents of this book, conveyed by the authors through the various chapters, explore the different topics of interest in relation to LOD, ranging from the latest in LOD clouds and systems, to research problems and challenges with LOD, and then its application in real-time or real-world settings. This includes detailed subjective knowledge of the gaps in the existing literature, suitable methodologies applied to address the identified gaps, design and development of LOD frameworks, and

To this end, *Chapter 1* looks at the extent to which the Bibliographic Framework Initiative (BIBFRAME) has been used to integrate library data from the silos of online catalogues, and then discusses some of the challenges that need to be addressed in order to optimize the potential capabilities that the BIBFRAME model holds. *Chapter 2* discusses the several attempts within the Linked Data research area to transform table and list datasets into machine-readable formats by proposing a data model named TULIP. The method focused on transforming tables and lists into RDF format while thoroughly maintaining their essential configurations and for the future development of the Semantic Web. *Chapter 3* presents the latest mechanisms and conceptual framework of the LOD by proposing a Semantic-Based Linked Open Data Framework (SBLODF) that integrates the different elements or entities in information systems or models with semantics (metadata descriptions) to produce explicit and implicit information based on the user's search queries. The SBLODF framework is a machine-readable and machine-understandable system that proves to be useful in encoding knowledge about different process domains and representation of the discovered information or models at a more conceptual level. *Chapter 4* discusses the need and issues that involves the analysis of effective load balancing techniques in a distributed environment. It looks at the heterogeneous nature of distributed computing, interoperability, fault occurrence, resource selection, and task scheduling for performance optimization of web resources through various balancing algorithms and scheduling methods, and then provides a concise narrative of the problems encountered and dimensions for future extension. *Chapter 5* is on the study of the internet of things (IoT) and big data analysis for advanced processes. It discusses a semiconductor process for manufacturing that is set on the cloud database for big data analysis and decisionmaking through a continuous monitoring system. *Chapter 6* assesses the implication of the backtesting approach in financial time series analysis when choosing a reliable Generalized Auto-Regressive Conditional Heteroscedastic (GARCH) model

for analysing stock returns in a case study of financial institution settings.

Resourcefully, this book is a reference and educational book targeted to be beneficial for data scientists, software developers, semantic web engineers, information system designers, process managers, teachers, and researchers, and general consumers in application and implementation of the LOD framework

**1**

**Chapter 1**

**Abstract**

(MARC)

**1. Introduction**

Data Model

BIBFRAME Linked Data:

A Conceptual Study on the

*Jung-Ran Park, Andrew Brenza and Lori Richards*

Prevailing Content Standards and

The BIBFRAME model is designed with a high degree of flexibility in that it can accommodate any number of existing models as well as models yet to be developed within the Web environment. The model's flexibility is intended to foster extensibility. This study discusses the relationship of BIBFRAME to the prevailing content standards and models employed by cultural heritage institutions across museums, archives, libraries, historical societies, and community centers or those in the process of being adopted by cultural heritage institutions. This is to determine the degree to which BIBFRAME, as it is currently understood, can be a viable and extensible framework for bibliographic description and exchange in the Web environment. We highlight the areas of compatibility as well as areas of incompatibility. BIBFRAME holds the promise of freeing library data from the silos of online catalogs permitting library data to interact with data both within and outside the library community. We discuss some of the challenges that need to be addressed in order to optimize the potential capabilities that the BIBFRAME model holds.

**Keywords:** linked data, functional requirements for bibliographic records (FRBR), resource description and access (RDA), semantic web, machine readable cataloging

Over the last several decades, the library community has been faced with the challenge of remaining relevant as an authoritative source of bibliographic data within the larger networked environment of the Web. This relevance has particularly been tested by what a number of information professionals see as the library community's reliance on resource description such as Machine Readable Cataloging (MARC), which do not fully support the establishment of relationships between resources across the Web at large nor optimize library data for machine readability. As a result, the vast majority of bibliographic data held in libraries has been locked in library catalogs, which, although automated, essentially function as electronic

However, due to the rapidly changing technology environment, there is now the opportunity for the library community to expose the data created by cataloging

equivalents of the physical card catalogs of a hundred years ago [1].
