**3.4 Semantic web**

The current Web environment is structured in such a way that machines, and thus users, are unable to take full advantage of the links that are established among and between resources. In other words, the Web is an environment composed of Web pages and hypertext links that do not describe the nature of the links that connect pages together nor the nature of the data (content) contained in Web pages. In other words, as many researchers note, the current web is a "Web of Documents" versus a "Web of Data" [22, 23]. As a result, current search mechanisms, such as the major search engines, are limited in their ability to utilize information on the Web, relying almost solely on harvesting algorithms to index the content of Web pages and then to match this indexed information against the search terms entered by users. While, as one researcher notes, this method has served the Web well, permitting users to locate needed resources within the vast sea of online information, it lacks the ability to lead users to related content, even when complex and intelligent relevancy algorithms are employed [14]. Furthermore, within the context of the library community, it means that most library data remains relatively difficult to locate online and relatively static with regard to other online resources relevant to

**11**

*BIBFRAME Linked Data: A Conceptual Study on the Prevailing Content Standards and Data…*

library holdings. In other words, library data, in its current form, remains in the

However, through the employment of Semantic Web technologies, there is the potential to expand the uses of library data in the Web environment and thereby to enhance user experience of this data. As is commonly the current case on the Web, a typical hyperlink connects resources but the nature of the connection remains unexplained. However, through the use of Semantic Web and Linked Data principles, such as the use of URIs to identify resources and the embedding of URIs in RDF statements, the nature of these connections can be exposed. In this scenario, a hyperlink can then be defined in almost any way that the user can imagine, indicating the link points to a reference, an author, a subject an authority, etc. Machines can then use this data to "infer" other resources that have been described similarly, such as resources with the same subject heading as the one in question, and permit

At the heart of the Semantic Web are four principles that Tim Berners-Lee, inventor of the World Wide Web and founder and director of the W3C, set forth in his paper entitled "Linked Data" [24]. These principles define the nature of Linked Data as it can be implemented in the current Web environment. Furthermore, they serve as a framework and guide for those interested in making their Web content viable within the Semantic Web, as some conformance to a standard model is required for successful implementation. These principles are as follows:

2.Use Hypertext Transfer Protocol (HTTP) URIs so that people can look up those

3.When someone looks up a URI, provide useful information, using the

4.Include links to other URIs, so that they can discover more things [24].

Perhaps most significantly, the conception of Linked Data requires the use of URIs to identify resources or, more specifically, the data elements of resources (Principle 1). In other words, as was mentioned in the discussion on MARC above, the use of text strings to identify resources makes machine processing difficult. The shift to URIs as identifiers means that machines can better understand the identity of resources, especially if they are known by different names or to disambiguate different resources known by the same name. Furthermore, the shift to URIs also signals the shift in understanding in regards to the nature of information resources as described in the above FRBR section. It emphasizes the identification of discrete data elements within information resources versus the identification of the resource as a whole. In other words, it emphasizes the atomization of resources into their

Principle 2 emphasizes the need for a common schema for the definition of URIs. Since HTTP is already the foundation of data transfer on the Web and since it appears to be serving its function well, Berners-Lee suggests that using this common protocol for the definition of URIs will increase the usefulness of data described in Semantic Web compliant ways. Furthermore, as the BIBFRAME initiative notes, these URI schemes should not be obscure, even if they are represented in

Principle 3 emphasizes the need for a common framework for the exchange of information described with URIs. Typically this means the use of RDF for the

HTTP, in order to facilitate data interaction and reuse [4].

*DOI: http://dx.doi.org/10.5772/intechopen.91849*

users to explore these relationships more readily.

1.Use URIs as names for things [24].

standards (RDF\*, SPARQL) [24].

names [24].

relevant components.

proverbial silo of its online catalogs.

*BIBFRAME Linked Data: A Conceptual Study on the Prevailing Content Standards and Data… DOI: http://dx.doi.org/10.5772/intechopen.91849*

library holdings. In other words, library data, in its current form, remains in the proverbial silo of its online catalogs.

However, through the employment of Semantic Web technologies, there is the potential to expand the uses of library data in the Web environment and thereby to enhance user experience of this data. As is commonly the current case on the Web, a typical hyperlink connects resources but the nature of the connection remains unexplained. However, through the use of Semantic Web and Linked Data principles, such as the use of URIs to identify resources and the embedding of URIs in RDF statements, the nature of these connections can be exposed. In this scenario, a hyperlink can then be defined in almost any way that the user can imagine, indicating the link points to a reference, an author, a subject an authority, etc. Machines can then use this data to "infer" other resources that have been described similarly, such as resources with the same subject heading as the one in question, and permit users to explore these relationships more readily.

At the heart of the Semantic Web are four principles that Tim Berners-Lee, inventor of the World Wide Web and founder and director of the W3C, set forth in his paper entitled "Linked Data" [24]. These principles define the nature of Linked Data as it can be implemented in the current Web environment. Furthermore, they serve as a framework and guide for those interested in making their Web content viable within the Semantic Web, as some conformance to a standard model is required for successful implementation. These principles are as follows:


4.Include links to other URIs, so that they can discover more things [24].

Perhaps most significantly, the conception of Linked Data requires the use of URIs to identify resources or, more specifically, the data elements of resources (Principle 1). In other words, as was mentioned in the discussion on MARC above, the use of text strings to identify resources makes machine processing difficult. The shift to URIs as identifiers means that machines can better understand the identity of resources, especially if they are known by different names or to disambiguate different resources known by the same name. Furthermore, the shift to URIs also signals the shift in understanding in regards to the nature of information resources as described in the above FRBR section. It emphasizes the identification of discrete data elements within information resources versus the identification of the resource as a whole. In other words, it emphasizes the atomization of resources into their relevant components.

Principle 2 emphasizes the need for a common schema for the definition of URIs. Since HTTP is already the foundation of data transfer on the Web and since it appears to be serving its function well, Berners-Lee suggests that using this common protocol for the definition of URIs will increase the usefulness of data described in Semantic Web compliant ways. Furthermore, as the BIBFRAME initiative notes, these URI schemes should not be obscure, even if they are represented in HTTP, in order to facilitate data interaction and reuse [4].

Principle 3 emphasizes the need for a common framework for the exchange of information described with URIs. Typically this means the use of RDF for the

*Linked Open Data - Applications, Trends and Future Developments*

of current and future resource types [4].

extensibility.

including RDA.

**3.4 Semantic web**

of these difficulties remain to be seen.

Thus, this intentional under-specification is designed to maximize the extensibility of the model and to help ensure its usefulness in a wide range of extant and future information management contexts and use scenarios, as well as for the widest variety

However, since the BIBFRAME initiative has positioned the model to be the replacement for MARC as the primary method of bibliographic description and data exchange between libraries, the initiative is doing more than simply ensuring the openness of the model to accommodate RDA and other content standards. Per the initiative, the designers are planning on taking an active look at the elements in RDA and other content standards, including the *Anglo-American Cataloging Rules, Second Edition* (AACR2). As a number of researchers have noted, it appears that BIBFRAME is also being designed to specifically accommodate RDA [1, 13, 20], which suggests that this particular content standard may be playing a stronger role in the design of the model than may have been suggested initially. As BIBFRAME is still under development, it remains to be seen exactly to what degree RDA plays a role in the design of the model and what effects this might have on the model's

Nevertheless, BIBFRAME designers suggest that the use of profiles will be another way to accommodate a variety of content standards within the model. A BIBFRAME profile is "a document, or set of documents, that puts a Profile (e.g., local cataloguing practices) into a broader context of functional requirements, domain models, guidelines on syntax and usage, and possibly data formats" [10]. According to the initiative, such profiles can be used to define constraints in the creation of BIBFRAME records such as those required by any content standard,

As other researchers have noted, RDA may not have gone far enough in distinguishing the content from the carrier of information resources [1, 14]. This potential fundamental flaw in the content standard may pose further difficulties in mapping RDA to BIBFRAME. Such difficulties are presented in the study [21] which shows the uneven mapping between existing RDA classes and BIBFRAME 2.0— particularly the RDA Expression class. The study demonstrates many-tomany relationships in the mapping between RDA and BIBFRAME. Nevertheless, as BIBFRAME is in a relatively early stage of development, the nature and magnitude

The current Web environment is structured in such a way that machines, and thus users, are unable to take full advantage of the links that are established among and between resources. In other words, the Web is an environment composed of Web pages and hypertext links that do not describe the nature of the links that connect pages together nor the nature of the data (content) contained in Web pages. In other words, as many researchers note, the current web is a "Web of Documents" versus a "Web of Data" [22, 23]. As a result, current search mechanisms, such as the major search engines, are limited in their ability to utilize information on the Web, relying almost solely on harvesting algorithms to index the content of Web pages and then to match this indexed information against the search terms entered by users. While, as one researcher notes, this method has served the Web well, permitting users to locate needed resources within the vast sea of online information, it lacks the ability to lead users to related content, even when complex and intelligent relevancy algorithms are employed [14]. Furthermore, within the context of the library community, it means that most library data remains relatively difficult to locate online and relatively static with regard to other online resources relevant to

**10**

modeling of data, which, as the BIBFRAME initiative notes, is the most common framework within the LOD community [4]. As a conceptual framework for representing resources on the Web [15], RDF can be understood as a kind of syntax for structuring data in such a way that it fosters the machine readability of that data through the use of URIs and the delineation of relationships between data elements. RDF is typically rendered in XML, but other languages, such as N3, Turtle, and N-Triples, are also used [22]. In its basic format RDF consists of statements, called triples, which, like sentences, contain subjects, predicates, and objects. A basic RDF statement might read as "Book A (subject)—Written By (predicate)—Author A (object)," where Book A, Written By, and Author A are all identified by URIs, with the possible exception of the object, which could be populated with a text string [22]. The power of this model is that the type of relationships between resources (Book A and Author A) is defined (Written By). **Figure 3** illustrates this statement graphically. Thus, as a result of delineating relationships between data elements, tools called "reasoners" can make inferences about the data [19].

A reasoner is a software application that can make logical inferences based on a set of statements, or axioms, provided to it through queries. Although there are many query languages that can be used to access and manipulate data modeled in RDF, the SPARQL Protocol and RDF Query Language (SPARQL) has emerged as the most popular [23]. For instance, a reasoner, beginning with a SPARQL query to a database that contained the above RDF statement, could use that statement to make inferences about other books written by Author A and present those to users without the user specifically querying the system to do so (**Figure 4**). Furthermore, there are no restrictions on the number of RDF triples that can be created for a particular resource, which fosters the development of rich data graphs, or the decentralized interconnections between data elements, within the Web environment. Although RDF is not a data format, but a model for representing data elements on the Web, it has been serialized in a number of ways. For instance, BIBFRAME has been modeled in RDF/XML, but other languages, like N-Triples, ATOM, and JSON, also exist. Although BIBFRAME has been modeled in RDF/XML, the Initiative claims that any data format that conforms to the standard model of URIs embedded in triples should be compliant with the BIBFRAME model [4].

Principle 4 encourages broad use of the connections established through the first three principles [4]. Thus, data that has been described in conformance with the above principles can be considered Linked Data and Semantic Web compliant. However, if the URIs expose, point to, or otherwise include information that is made freely available for reuse on the Web, such as through a Creative Commons license, this data can be considered Linked Open Data, not just Linked Data.

As stated earlier, a number of prominent libraries have published library data in compliance with Semantic Web principles [2]. Even though these projects are not BIBFRAME projects, they are generally in-line with FRBR principles of bibliographic description. It is worth examining the degree to which the model conforms to the current understanding of Linked Data and the Semantic Web. To begin, BIBFRAME has defined URIs for all BIBFRAME entities and properties within the BIBFRAME namespace. This is particularly important as some properties that belong to different classes have identical names. The use of URIs serves as a clear

**13**

*BIBFRAME Linked Data: A Conceptual Study on the Prevailing Content Standards and Data…*

means to disambiguate these properties. Secondly, as has been noted, BIBFRAME

bibliographic records into their component pieces through the entity-relation conception of bibliographic description. Taken together, these elements suggest that BIBFRAME conforms well to the current understanding of Linked Data and the Semantic Web. Furthermore, even though the initiative has rendered the model in RDF/XML, BIBFRAME is also designed to be compliant with other data formats which conform to the structured use of URIs within syntax of triples statements. Thus, it also appears that BIBFRAME is, at least in principle, poised to integrate library data with other data produced within contexts outside the library commu-

*Graphical depiction of a reasoner using RDF statements to infer additional resources.*

nity. This aspect too suggests that BIBFRAME is Semantic Web friendly.

There are challenges that may hinder the widespread adoption of BIBFRAME within the library community. In addition to the modeling difficulties and potential conceptual misalignment of BIBFRAME in relation to MARC, FRBR, RDA, Linked Data, and RDF, there are difficulties posed by complex resource types such as audiovisual materials, manuscript, and serial publications [26]. Additionally, although MARC is in essence an exchange format for bibliographic data, it has become so intertwined with the content standards applied to it, first AACR2 and now RDA; this union of the two may further entrench it within the library community. Without consensus regarding the fate of MARC, it may be difficult to persuade MARC's adherents, even if BIBFRAME proves to offer more capabilities to

There may be significant conceptual difficulties with mapping RDA to BIBFRAME. For instance, RDA was developed within the context of the FRBR entity-relationship model. As such, RDA separates resources into FRBR's four main entity classes: Work, Expression, Manifestation and Item. However, as has already been noted, BIBFRAME's main entity classes do not align with FRBR's classes in an exact manner [20]. This lack of alignment may make the mapping between RDA

Although it appears that BIBFRAME conforms to current conceptions of Linked Data and the Semantic Web, there are still a number of issues worth considering. First, since the usefulness of the relationships delineated through the RDF triples depends on the quality and stability of the resources to which they are linked, the BIBFRAME initiative will have to determine the degree to which it will maintain

In addition to these two factors, the BIBFRAME model, like FRBR, deconstructs

*DOI: http://dx.doi.org/10.5772/intechopen.91849*

has been modeled in RDF/XML [25].

**4. Discussion**

**Figure 4.**

catalogers.

and BIBFRAME difficult.

**Figure 3.** *Graphical depiction of a basic RDF statement.*

*BIBFRAME Linked Data: A Conceptual Study on the Prevailing Content Standards and Data… DOI: http://dx.doi.org/10.5772/intechopen.91849*

**Figure 4.**

*Linked Open Data - Applications, Trends and Future Developments*

tools called "reasoners" can make inferences about the data [19].

in triples should be compliant with the BIBFRAME model [4].

modeling of data, which, as the BIBFRAME initiative notes, is the most common framework within the LOD community [4]. As a conceptual framework for representing resources on the Web [15], RDF can be understood as a kind of syntax for structuring data in such a way that it fosters the machine readability of that data through the use of URIs and the delineation of relationships between data elements. RDF is typically rendered in XML, but other languages, such as N3, Turtle, and N-Triples, are also used [22]. In its basic format RDF consists of statements, called triples, which, like sentences, contain subjects, predicates, and objects. A basic RDF statement might read as "Book A (subject)—Written By (predicate)—Author A (object)," where Book A, Written By, and Author A are all identified by URIs, with the possible exception of the object, which could be populated with a text string [22]. The power of this model is that the type of relationships between resources (Book A and Author A) is defined (Written By). **Figure 3** illustrates this statement graphically. Thus, as a result of delineating relationships between data elements,

A reasoner is a software application that can make logical inferences based on a set of statements, or axioms, provided to it through queries. Although there are many query languages that can be used to access and manipulate data modeled in RDF, the SPARQL Protocol and RDF Query Language (SPARQL) has emerged as the most popular [23]. For instance, a reasoner, beginning with a SPARQL query to a database that contained the above RDF statement, could use that statement to make inferences about other books written by Author A and present those to users without the user specifically querying the system to do so (**Figure 4**). Furthermore, there are no restrictions on the number of RDF triples that can be created for a particular resource, which fosters the development of rich data graphs, or the decentralized interconnections between data elements, within the Web environment. Although RDF is not a data format, but a model for representing data elements on the Web, it has been serialized in a number of ways. For instance, BIBFRAME has been modeled in RDF/XML, but other languages, like N-Triples, ATOM, and JSON, also exist. Although BIBFRAME has been modeled in RDF/XML, the Initiative claims that any data format that conforms to the standard model of URIs embedded

Principle 4 encourages broad use of the connections established through the first three principles [4]. Thus, data that has been described in conformance with the above principles can be considered Linked Data and Semantic Web compliant. However, if the URIs expose, point to, or otherwise include information that is made freely available for reuse on the Web, such as through a Creative Commons license, this data can be considered Linked Open Data, not just Linked Data.

As stated earlier, a number of prominent libraries have published library data in compliance with Semantic Web principles [2]. Even though these projects are not BIBFRAME projects, they are generally in-line with FRBR principles of bibliographic description. It is worth examining the degree to which the model conforms to the current understanding of Linked Data and the Semantic Web. To begin, BIBFRAME has defined URIs for all BIBFRAME entities and properties within the BIBFRAME namespace. This is particularly important as some properties that belong to different classes have identical names. The use of URIs serves as a clear

**12**

**Figure 3.**

*Graphical depiction of a basic RDF statement.*

*Graphical depiction of a reasoner using RDF statements to infer additional resources.*

means to disambiguate these properties. Secondly, as has been noted, BIBFRAME has been modeled in RDF/XML [25].

In addition to these two factors, the BIBFRAME model, like FRBR, deconstructs bibliographic records into their component pieces through the entity-relation conception of bibliographic description. Taken together, these elements suggest that BIBFRAME conforms well to the current understanding of Linked Data and the Semantic Web. Furthermore, even though the initiative has rendered the model in RDF/XML, BIBFRAME is also designed to be compliant with other data formats which conform to the structured use of URIs within syntax of triples statements. Thus, it also appears that BIBFRAME is, at least in principle, poised to integrate library data with other data produced within contexts outside the library community. This aspect too suggests that BIBFRAME is Semantic Web friendly.
