**3. Towards machine-understandable data**

There are many works related to transforming data from tables and lists to Linked Data. Some research involves extracting table-type data in various formats such as spreadsheets, relational databases, etc. and then converting them to RDF data. Those researches can be divided into groups as follows.

### **3.1 Research related to the creation of facts into linked data**

There are many research works related to filling facts into Linked Data. The most discussed projects [8–11] are DBpedia, YAGO, Freebase, Wikidata, and OpenCyc. There are also several related researches which can be divided into groups as follows:

### *3.1.1 Extracting facts from various parts of Wikipedia*

DBpedia is a joint research of the Free University of Berlin and Leipzig University in Germany [12]. The objective is to extract Wikipedia structured data such as infoboxes and categories including some unstructured data such as abstracts [13]. DBpedia supports Wikipedia information in many languages [14]. The goal of this project is to be the core to link other datasets of Linked Data together [15]. The result is a core that has more than 3 billion facts about 4.58 million topics which are divided into nearly 600 million facts from English Wikipedia articles, and the remainder is more than 2.5 billion facts from Wikipedia in other languages.

With the limitation of extracting data from tables, such as how to classify different types of tables and assigning names to them, DBpedia then chooses to extract only the structured data [13]. However, there are additional capabilities in the later version of the framework for extracting table data in Wikipedia, but it only extracts the table data as HTML tag block, not as the RDF triples that can be directly queried using SPARQL [12]. DBpedia has encouraged the development of algorithms to extract data from tables and lists in Wikipedia using the DBpedia framework by proposing a project in the Google Summer of Code (GSoC) from mid-2016 which continued to develop the project until 2017.<sup>3</sup> It has yet to publish the relevant academic work and has not yet been implemented in the latest version of DBpedia framework.

Isbell and Butler published a research paper created at HP's Digital Media Systems Laboratory [16]. They conducted a study of the conversion of data from Wikipedia structured infoboxes and has some parts that cover semi-structured and unstructured data.

YAGO is a research project from the Max Planck Institute for Informatics in Germany [17]. The objective is to extract structured data from Wikipedia categories by applying Synsets data from Princeton University's WordNet project. The project contains 120 million facts in 10 million topics.

<sup>3</sup> https://wiki.dbpedia.org/blog/dbpedia-google-summer-code-2016 DBpedia @ GSoC 2016
