**2.1 Qualitative approach**

In the qualitative approach, specific attention is drawn to defining data quality in terms of the different aspects, also termed dimensions. In 1996, Wang and Strong developed a data quality framework based on a two-stage survey on data quality aspects important to data consumers, and captured these dimensions in a hierarchical manner [2]. This model clusters 20 different data quality dimensions into four major categories: that is, intrinsic, contextual, representational and access data quality. Although the basis of this model still stands, some minor changes have been made over the years resulting in the model depicted in **Table 1** [3].

In brief, the *intrinsic* category comprises dimensions that define the accuracy of the data, that is, the extent to which data is certified, error-free, and reliable, as well as the objectivity of the data based on facts and impartial, and their reputation based on its sources or content. The *contextual* data quality category comprises dimensions that must be considered within the context of a specific objective for which one holds the data, that is, the data should be relevant, up to date, of an appropriate amount, yet complete, and ready for use for the stated objective. The *representational* category contains dimensions that reflect how the data are presented within a data system. Dimensions concerning the format of the data, that is, concise and consistent representation, as well as their compatibility, their interpretability and whether they are easy to understand, are considered. The last category is focused on the *accessibility* category that also defines aspects of data quality. Although this category is not always considered in the literature [4], this is an important aspect of overall data quality. The related dimensions include the accessibility of the data in terms of their availability or easily retrievable character, the security measures taken to restrict data appropriately and the traceability of the data to its source.


**5**

*Data Quality Management*

**2.2 Quantitative approach**

using this definition.

**3. Measuring data quality**

user's perspectives and beliefs into account.

**3.1 Objective DQ measurement methods**

*DOI: http://dx.doi.org/10.5772/intechopen.86819*

These dimensions can also be grouped into an internal and external group of dimensions. The internal group contains the dimensions that can be measured purely in terms of the data, and are generally more objective. Examples of these include the accuracy of the data, which can be examined by calculating a score on the magnitude of errors in the data or the data correctness, which can be measured through the number of errors in the data. On the other hand, the external group of dimensions evaluates how the data are related to their environment, and hence are somewhat more subjective in nature. Examples include the relevancy of data with regards to a stated objective, or their ease of understanding by the consumers of the data.

In the quantitative approach, data quality has been defined by J. M. Juran as the fitness of the data to serve a purpose in a given context, that is, in operations, decision making and/or planning as perceived by its users [1]. This concept is denoted as 'fitness for use' and is based on Juran's five principles: that is, who uses the data, how are the data used, is there a danger for human safety, what are the economic resources of the producers and users of the data and what are the characteristics taken into account by users when determining the fitness for use. This definition is widely accepted in both academic and industrial settings. However, in practice the fitness for use is a rather subjective measure as this highly depends on the users'

For example, consider the score of a student on an exam. If scores are rounded to integers, this can potentially influence the final grade that a student receives. Therefore, the rounding procedure might be accurate enough for the professors, but by rounding numbers, the students might miss out on obtaining a final grade and

On the other hand, it might well be that not all uses of the data are known, neither its potential future use purposes. Therefore, DQ might be hard to evaluate

Some definitions of data quality use the notion of zero defects, which aims to reduce defects by motivating people to prevent making mistakes by developing a constant, conscious desire to do the job right from the first time [5]. This zero-defect concept has been incorporated by P. Crosby in its Absolutes of Quality Management [6]. According to Crosby's Absolutes, data quality should conform to its requirements and prevention should be used as a manner to guarantee zero defects, which sets the performance standard. Consequently, data quality can be measured as the price of nonconformance. Although this zero-defect concept is not widely used in the data quality literature, it does emphasize again the necessity to measure data quality.

Based on the definitions of data quality, several DQ measurement methods have been developed, that can generally be divided into objective and subjective methods. While objective methods tend to evaluate data quality rather from the perspective of the data producer based on hard criteria, subjective methods rather take the

Measurements of data quality are generally intended to assess the dimensions of data quality as defined in the previous section. As a first step, a framework must

judgement over the degree of conformity of the data to their intended use.

thus might be not accurate enough from the perspective of the student.

**Table 1.** *Data quality dimensions.*

### *Data Quality Management DOI: http://dx.doi.org/10.5772/intechopen.86819*

*Scientometrics Recent Advances*

**2.1 Qualitative approach**

In the qualitative approach, specific attention is drawn to defining data quality in terms of the different aspects, also termed dimensions. In 1996, Wang and Strong developed a data quality framework based on a two-stage survey on data quality aspects important to data consumers, and captured these dimensions in a hierarchical manner [2]. This model clusters 20 different data quality dimensions into four major categories: that is, intrinsic, contextual, representational and access data quality. Although the basis of this model still stands, some minor changes have been

In brief, the *intrinsic* category comprises dimensions that define the accuracy of the data, that is, the extent to which data is certified, error-free, and reliable, as well as the objectivity of the data based on facts and impartial, and their reputation based on its sources or content. The *contextual* data quality category comprises dimensions that must be considered within the context of a specific objective for which one holds the data, that is, the data should be relevant, up to date, of an appropriate amount, yet complete, and ready for use for the stated objective. The *representational* category contains dimensions that reflect how the data are presented within a data system. Dimensions concerning the format of the data, that is, concise and consistent representation, as well as their compatibility, their interpretability and whether they are easy to understand, are considered. The last category is focused on the *accessibility* category that also defines aspects of data quality. Although this category is not always considered in the literature [4], this is an important aspect of overall data quality. The related dimensions include the accessibility of the data in terms of their availability or easily retrievable character, the security measures taken to restrict data

> Objectivity Reputation

Value added Relevance Timeliness Actionable

Consistent

Alignment

Security Traceability

Appropriate amount

Easily understandable

Concisely represented

made over the years resulting in the model depicted in **Table 1** [3].

appropriately and the traceability of the data to its source.

**Category DQ dimension** Intrinsic Accuracy

Contextual Completeness

Representational Interpretable

Access Accessibility

**4**

**Table 1.**

*Data quality dimensions.*

These dimensions can also be grouped into an internal and external group of dimensions. The internal group contains the dimensions that can be measured purely in terms of the data, and are generally more objective. Examples of these include the accuracy of the data, which can be examined by calculating a score on the magnitude of errors in the data or the data correctness, which can be measured through the number of errors in the data. On the other hand, the external group of dimensions evaluates how the data are related to their environment, and hence are somewhat more subjective in nature. Examples include the relevancy of data with regards to a stated objective, or their ease of understanding by the consumers of the data.

## **2.2 Quantitative approach**

In the quantitative approach, data quality has been defined by J. M. Juran as the fitness of the data to serve a purpose in a given context, that is, in operations, decision making and/or planning as perceived by its users [1]. This concept is denoted as 'fitness for use' and is based on Juran's five principles: that is, who uses the data, how are the data used, is there a danger for human safety, what are the economic resources of the producers and users of the data and what are the characteristics taken into account by users when determining the fitness for use. This definition is widely accepted in both academic and industrial settings. However, in practice the fitness for use is a rather subjective measure as this highly depends on the users' judgement over the degree of conformity of the data to their intended use.

For example, consider the score of a student on an exam. If scores are rounded to integers, this can potentially influence the final grade that a student receives. Therefore, the rounding procedure might be accurate enough for the professors, but by rounding numbers, the students might miss out on obtaining a final grade and thus might be not accurate enough from the perspective of the student.

On the other hand, it might well be that not all uses of the data are known, neither its potential future use purposes. Therefore, DQ might be hard to evaluate using this definition.

Some definitions of data quality use the notion of zero defects, which aims to reduce defects by motivating people to prevent making mistakes by developing a constant, conscious desire to do the job right from the first time [5]. This zero-defect concept has been incorporated by P. Crosby in its Absolutes of Quality Management [6]. According to Crosby's Absolutes, data quality should conform to its requirements and prevention should be used as a manner to guarantee zero defects, which sets the performance standard. Consequently, data quality can be measured as the price of nonconformance. Although this zero-defect concept is not widely used in the data quality literature, it does emphasize again the necessity to measure data quality.
