**2.3 Elements of the semantic web**

contents into a searchable format by the Semantic Web. This chapter discusses open

Many forms of conversion have already been proposed by many research (we will review them in Section 3). However, the conversion of tables and lists is still problematic. We propose a novel method to convert tables and lists to five-star open data with the data model called TULIP. The following sections discuss TULIP

Before mentioning the Semantic Web, it is useful to describe the development of the Web in a nutshell. Nova Spivack has explained Web 3.0, the latest development of the Web [1]. The first era is Web 1.0 which consist of contents that can rarely be changed, most of which are generated by research institutions and business organizations. The next era, Web 2.0, has contents that can be changed frequently, and most of them come from the users creating and updating their information, such as Weblog (blog), wiki, social networks, etc. Now we are in the era of Web 3.0 and Semantic Web. It focuses on linking the data between com-

In terms of simplicity of data processing by computers, the data can be classified

1.Structured data is the data that has a definite structure, such as data contained

structure, such as a table, list, chart, etc. Although humans can see these data

3.Unstructured data is the data that has no simple structure, such as text in the form of essays, pictures, audio, video, etc. They must be preprocessed by specific methods, such as natural language processing (NLP) and other methods to convert them into a format that can be manipulated by computers.

Indeed, the Semantic Web is not all new technology. Tim Berners-Lee, who invented the World Wide Web in 1990 [2], announced the concept of Semantic Web in 2001 in the *Scientific American* article [3]. Semantic Web is an extension of the Web that we currently use in which information is given well-defined meaning. In other words, Semantic Web is a Web of data that can be processed directly or indirectly and "understood" by computers. Steve Bratt, CEO of World Wide Web Consortium (W3C) [4], contrasts the World Wide Web which uses hyperlinks to link various resources between computers connected by the Internet and Semantic Web which uses relationship or "meaning" to link resources or "objects" together.

in relational databases. This type of data can be directly processed.

2. Semi-structured data is the data that cannot be wholly identified for its

as "structured" and can easily understand them, it is not possible for computers to manipulate these data directly because of uncertainty and ambiguity in terms of structure and meaning. It is necessary to convert them

by means of various methods before further processing.

This type of data has the highest uncertainty and ambiguity.

data standards and the making of machine-understandable data.

*Linked Open Data - Applications,Trends and Future Developments*

vocabulary as well as brief examples of its application.

puters together and processing them directly by computers.

**2.1 Structured, semi-structured, and unstructured data**

**2. Background**

into three types:

**2.2 What is the semantic web**

**20**

As with other services on the Internet, most of which is the integration of standard or commonly used components. In the case of Semantic Web, it consists of various components such as Unicode, Uniform Resource Identifier (URI), Extensible Markup Language (XML), and other standards. Some frameworks have been developed, improved, or modified from the existing ones, such as the Resource Description Framework (RDF), RDF Schema (RDFS), Web Ontology Language (OWL), and SPARQL Protocol and RDF Query Language (SPARQL). In this chapter, we mainly focus on RDF and SPARQL.

Resource Description Framework (RDF) is the main structure for storing the smallest components of facts in the knowledge base linked within the Semantic Web. Basically, an RDF is a "sentence" that has three parts: a subject, a verb (or predicate), and an object. Both subject and object will be the identity, i.e., the name of the resource in the form of a URI (in the case of the latter, it can be literal or constant). A predicate (also in the form of URI) describes the relationship between them. These sentences are called RDF triples. The triples are linked together as a graph structure called the RDF graph, which is sometimes referred to as the semantic graph or knowledge graph.
