**3.2. Publishing linked data of the citation information**

By using the citation ontology, we can publish the citation information linked data in the form of RDF triples. We used D2R as the linked data release software for this purpose. D2R is a very popular tool for linked data publication which serves to convert the massive, relational database format data into linked data RDF triples. We then imported the linked data into the semantic repository Virtuoso.

per different dimension. In this chapter, we initially plan to implement 11 dimensions of citation analysis. Among them, citation quantity analysis, citation strength analysis, citation type analysis, citation language analysis, citation country analysis, citation age analysis, citation journal analysis, and co-citation analysis are based on the bibliographic data of traditional citation analysis, while the remaining three dimensions (citation function analysis, citation sentiment analysis, and citation position analysis) are based on a full-text citation

The Impact on Citation Analysis Based on Ontology and Linked Data

http://dx.doi.org/10.5772/intechopen.76377

201

The citation analysis process (for age and function, as examples) is shown in **Figure 7**. Citing literatures A and B constitute the citing subset, while references [1–7] serve as the cited paper subset. The relationship between them is complex and involves many factors. As mentioned above, the citation functions between them have been marked with "cito:extends," and the age information have been published as linked data. These citation relationships can thus be

Once the triples are complete, we need to write a specific SPARQL search query to extract the

The first SPARQL query is used to retrieve all the publication year information for the references cited by paper A, and the second query to retrieve all references to reference [4], which extends the function of document 4. The search results are then calculated and displayed as the final results. Visualization software (e.g., Power BI, Tableau) could also be applied to

analysis perspective.

**Figure 3.** Marking the full-text citation information.

transformed into RDF triples as shown in **Figure 4**.

specific citation information as shown in **Figure 5**.

In terms of bibliographic citation data, we use the library, information science, and technology abstracts (LISTA) database as the data source. LISTA is a citation abstract database which contains the structured data of more than 600 core journals and 5000 core authors [35]. We have successfully published these data as linked data to form a strong foundation for subsequent citation analysis.

In the full-text citation information set, the most often-cited papers in the specific field were selected first as the citing work subset. The reference literatures were extracted as the cited work subset. On this basis, quoted sentences in the citing literature and cited literature were extracted, and the citation function, citation sentiment, and citation position information were marked by two trained coders. The full-text citation information was then organized into RDF triples as shown in **Figure 3**.

#### **3.3. Citation analysis method implementation**

The essence of the citation analysis method based on the linked data is to write the corresponding SPARQL query, which can be used to extract the citation information of specific dimensions. The search results are then calculated and visualized to analyze the citations

**Figure 3.** Marking the full-text citation information.

**3.2. Publishing linked data of the citation information**

**Figure 2.** Classes and properties of full-text citation ontology.

semantic repository Virtuoso.

200 Scientometrics

quent citation analysis.

triples as shown in **Figure 3**.

**3.3. Citation analysis method implementation**

By using the citation ontology, we can publish the citation information linked data in the form of RDF triples. We used D2R as the linked data release software for this purpose. D2R is a very popular tool for linked data publication which serves to convert the massive, relational database format data into linked data RDF triples. We then imported the linked data into the

In terms of bibliographic citation data, we use the library, information science, and technology abstracts (LISTA) database as the data source. LISTA is a citation abstract database which contains the structured data of more than 600 core journals and 5000 core authors [35]. We have successfully published these data as linked data to form a strong foundation for subse-

In the full-text citation information set, the most often-cited papers in the specific field were selected first as the citing work subset. The reference literatures were extracted as the cited work subset. On this basis, quoted sentences in the citing literature and cited literature were extracted, and the citation function, citation sentiment, and citation position information were marked by two trained coders. The full-text citation information was then organized into RDF

The essence of the citation analysis method based on the linked data is to write the corresponding SPARQL query, which can be used to extract the citation information of specific dimensions. The search results are then calculated and visualized to analyze the citations per different dimension. In this chapter, we initially plan to implement 11 dimensions of citation analysis. Among them, citation quantity analysis, citation strength analysis, citation type analysis, citation language analysis, citation country analysis, citation age analysis, citation journal analysis, and co-citation analysis are based on the bibliographic data of traditional citation analysis, while the remaining three dimensions (citation function analysis, citation sentiment analysis, and citation position analysis) are based on a full-text citation analysis perspective.

The citation analysis process (for age and function, as examples) is shown in **Figure 7**. Citing literatures A and B constitute the citing subset, while references [1–7] serve as the cited paper subset. The relationship between them is complex and involves many factors. As mentioned above, the citation functions between them have been marked with "cito:extends," and the age information have been published as linked data. These citation relationships can thus be transformed into RDF triples as shown in **Figure 4**.

Once the triples are complete, we need to write a specific SPARQL search query to extract the specific citation information as shown in **Figure 5**.

The first SPARQL query is used to retrieve all the publication year information for the references cited by paper A, and the second query to retrieve all references to reference [4], which extends the function of document 4. The search results are then calculated and displayed as the final results. Visualization software (e.g., Power BI, Tableau) could also be applied to

mode. It uses domain ontology to express the semantic representation of the citation knowledge base and to associate the citation knowledge data with the domain knowledge. According to user registration information and a user need survey, a user log flow provides users with

The Impact on Citation Analysis Based on Ontology and Linked Data

http://dx.doi.org/10.5772/intechopen.76377

203

In the process of creating a citation knowledge service, we construct the citation knowledge base, the lightweight citation ontology base, and the domain ontology base by using ontology and other technologies. We use the ontology to reorganize the citation knowledge unit, organize, store, and query citation data in a machine-readable mode. According to user's search habits that captured user behavior preferences and knowledge preferences, the system is able to understand user needs and establish a matching knowledge discovery mechanism [36].

This chapter presents a framework of an ontology-based citation knowledge service system, which contains four core layers, a data resource layer, an ontology layer, a semantic associa-

The data resource layer is at the bottom of the knowledge base and contains the citation

targeted knowledge to ensure the effectiveness of the knowledge services.

**4. Framework for a citation knowledge service system**

tion layer, and a functional layer, as shown in **Figure 6**.

**4.1. Data resource layer**

knowledge base and the user database.

**Figure 6.** Citation knowledge service system framework.

**Figure 4.** Example citation network for citation function analysis and citation age analysis.

**Figure 5.** SPARQL queries and the corresponding citation analysis type.

simplify the display of results, and other dimensions of citation analysis can be implemented according to the same principle. As the quality of data is continually improved, more dimensions of citation analysis can also be achieved in follow-up experiments.

The citation knowledge service system based on ontology introduces ontology-related theory and technology into citation knowledge organization and knowledge retrieval and constructs an ontology-based citation knowledge service system. This system introduces a lightweight cube ontology to organize, store, and query citation knowledge data in a machine-readable mode. It uses domain ontology to express the semantic representation of the citation knowledge base and to associate the citation knowledge data with the domain knowledge. According to user registration information and a user need survey, a user log flow provides users with targeted knowledge to ensure the effectiveness of the knowledge services.
