**2.1 Visual analysis of inconsistencies in patterns of dataset**

Inconsistent data which are associated to patterns in a large dataset can be difficult to visualise. This is because they are not explicitly indicated in the dataset as inconsistent. For example, missing data can exist as "unavailable", "forthcoming", "-", "not existing", or even empty spaces. Contradictions on the other hand, differ from one dataset to another, depending on the semantic definition of the data in the dataset. Interestingly, there are dedicated Applications such as CUBIST [19], ConTra [20], and R Package VIM [21] which enables the visualisation of the amount or pattern of contradiction and missingness in a noisy dataset. Inconsistent data whose pattern involves mutually exclusive type of contradictions is depicted by ConTra. Nwagwu explains in [20] how the contradictory attribute values in the gene "TSPAN6" of the tissue "Pancreas" is detected by ConTra and visualised in a pie chart. ConTra applies colour coding on charts to enable the visualisation of inconsistencies in a large dataset. Also, ConTra enables the visualisation of the pattern of distribution of contradictions across the dataset. It is further discussed in Section 3.11.

R Package VIM is a good analytical tool that focuses on visual presentations and analysis of missingness. It is used in plotting the aggregates of missingness in variables of a Barplots. It also shows missing data in a matrix plot, Histogram, Spline plot, Parallel coordinate plots and in Maps [21]. It uses Barplot to show the number and distributions of missing values for a sub-sample of the EU-SILC data from Statistics. Notwithstanding VIM's comprehensive collection of visualisation methods for exploring missing data, its environment requires extensive training in R skills in order to access its visualisation methods. Also, the VIM package does not enable the analysis of other types of inconsistencies such as contradictions in a dataset apart from the missingness.

There are other tools which enables the visualisation of inconsistencies as explained in [19, 22, 23]. A graphical tool is proposed in [22] that highlight inconsistent instances in the network such as the highlights of direct comparisons that strongly drive other treatment effect estimates and hot spots of network inconsistency. It also proposed a clustering approach that automatically groups comparisons for highlighting hot spots. CUBISTs [19] is an example of an application that applies colour coding and fault tolerance in traditional visualisation tools such as pie or bar chart to enable easy visual analysis of inconsistencies. Even so, these applications are not holistic in exploring inconsistencies in patterns and most of them are designed for particular domain of data analysis.

**29**

Section 3.12.

**3.2 ConTra**

ConTra1

<sup>1</sup> https://github.com/ncjoes/contra

*Visual Identification of Inconsistency in Pattern DOI: http://dx.doi.org/10.5772/intechopen.95506*

this chapter.

**3. Our approaches**

discussed in this section.

values is said to contain missing data.

The analysis of inconsistencies in patterns of a dataset can be enhanced by adapting computational techniques such as fault tolerance and colour coding in traditional visualisation tools such as graphs to enhance the visualisation of inconsistencies in patterns. Fault tolerance necessitates the introduction of softness (statistical defined tolerance) in retrieving the inconsistencies in a dataset. Colour coding necessitates identifying the different ranges of inconsistencies with different colours. Section 3.0 presents how these computational techniques are used in computing inconsistencies in pattern as integrated in the approaches presented in

Two approaches are presented for visualising inconsistencies in patterns in this section namely; visualising inconsistencies in objects with many attribute values and Visual comparison of an investigated dataset with a case control dataset. These approaches and their associated tools which were developed by the authors are

**3.1 Visualising inconsistencies in objects with many attribute values pattern**

A dataset contains data about real world objects. These data contains objects which are associated to attributes and the attributes can be associated to single or many values. Real world objects 'G' such as house, book, car, and television are associated with different attributes 'M' which may have many values 'V'. A book (object) for example, can have colour (attribute) which can be black, white or brown (values). It can be established that particular object (g ∈ G) is associated with an attribute (m ∈ M) which contains many values. For example, a name (object) has marital status (attribute) such as married or single (values). Contradictory data can exist in a dataset when there is conflicting information such that an object (g ∈ G) that is associated with an attribute (m ∈ M), contains contradictory values such that m is associated with A and ¬A. An experiment (object) for example, can be associated with outcome (attribute) such as neutral, high, or low (values). A student (object) took a course (attribute) whose values can be absent, pass or fail. Some of the many valued attribute are likely to be mutually exclusive and should conform to mutual exclusion rule. The mutual exclusion rule can simply be stated that real world objects whose attribute values are mutually exclusive (meaning more than one attribute values cannot be associated with the object at once) are contradictory. Also, any attribute which do not contain the expected

Two open source tools are presented in this chapter to explain how to visualise inconsistencies in objects with many attribute values pattern namely ConTra and Datax. ConTra is discussed in an earlier publication [20] by some of the authors of this chapter and it is also discussed herein. Datax is another tool for highlighting inconsistency in patterns through mining and depicting missing data is presented in

and it is used for mining contradictory data from attributes with many values

is an open source App developed by some of the authors of this chapter

*Visual Identification of Inconsistency in Pattern DOI: http://dx.doi.org/10.5772/intechopen.95506*

The analysis of inconsistencies in patterns of a dataset can be enhanced by adapting computational techniques such as fault tolerance and colour coding in traditional visualisation tools such as graphs to enhance the visualisation of inconsistencies in patterns. Fault tolerance necessitates the introduction of softness (statistical defined tolerance) in retrieving the inconsistencies in a dataset. Colour coding necessitates identifying the different ranges of inconsistencies with different colours. Section 3.0 presents how these computational techniques are used in computing inconsistencies in pattern as integrated in the approaches presented in this chapter.
