**Abstract**

The visual identification of inconsistencies in patterns is an area in computing that has been understudied. While pattern visualisation exposes the relationships among identified regularities, it is still very important to identify inconsistencies (irregularities) in identified patterns. The significance of identifying inconsistencies for example in the growth pattern of children of a particular age will enhance early intervention such as dietary modifications for stunted children. It is described in this chapter, the need to have a system that identifies inconsistencies in identified pattern of a dataset. Also, techniques that enable the visual identification of inconsistencies in patterns such as fault tolerance and colour coding are described. Two approaches are presented in this chapter for visualising inconsistencies in patterns namely; visualising inconsistencies in objects with many attribute values and visual comparison of an investigated dataset with a case control dataset. These approaches are associated with tools which were developed by the authors of this chapter: Firstly, ConTra which allows its users to mine and analyse the contradictions in attribute values whose data does not abide by the mutual exclusion rule of the dataset. Secondly, Datax which mines missing data; enables the visualisation of the missingness and the identification of the associated patterns. Finally, WellGrowth which explores Children's growth dataset by comparing an investigated dataset (data obtained from a Primary Health Centre) with a case control dataset (data from the website of World Health Organisation). Instances of inconsistencies as discovered in the explored datasets are discussed.

**Keywords:** missing data, contradictory, inconsistent, pattern, ConTra, visualisation, bad data

## **1. Introduction**

It is often said that data is the lifeline of research. Due to the importance of data, several research areas such as machine learning, data science, data mining, data analytics and big data has been devoted to the full study and understanding of data. The use of data driven marketing (DDM) as an effective tool in determination of a strategic part of business management is proposed in [1] while the development of data-driven planning for management decision making is advocated in [2]. Also, there is a need for data driven research through open data source [3]. Also, it is noted in [4] that in order to effectively plan an experiment, there is need for preliminary data as a starting point. Even so, the need for valid data in research cannot

be overemphasised. In fact, invalid and inconsistent data could inadvertently impart negatively on results of a research. The authors in [5] pointed out the importance of data validation for systematic software development. Similarly, the authors in [6] explained the importance of health records for diagnosis and treatment purposes. In general the need for valid data is indeed a concern that cuts across every research area. The study of big data has been found to have great impact on scientific discoveries and value creation [7]. The study continues to gain recognition as the quest for tools and measures for validating data continues. Also, [8] explains that the presence of noise hampers the induction of Machine Learning models from data, and can also make the training time longer. Noisy data according to [9] cannot be avoided but rather dealt with. Data, whether structured, semi-structured, or unstructured must be scrutinised with utmost care. The rigour of validating data could be tasking and are usually left in the hands of data scientists.

Data scientists acquire datasets from different environments which in most cases could be noisy. A noisy dataset contains uncertain and inconsistent data that could arise from missing values, imprecise data, and contradictory values, among others. The work of a data scientist includes among others, to explore big dataset in order to find interesting patterns and build supporting arguments for decision making. Such interesting patterns are likely to exclude noise in the form of conflicting or missing data in the dataset which do not support the arguments presented by the analyst. Data which are inconsistent with decision supporting facts should also be analysed. An approach to this analysis is to visually explore the patterns of the decision support data and associated inconsistencies.

Visual analytics is defined in [10] as the science of analytical reasoning facilitated by interactive visual interfaces. Data visualisation is useful for data cleaning, exploring data structure, detecting outliers and unusual groups, identifying trends and clusters, spotting local patterns among others [11]. The visual platform and representations enables better understanding and facilitates analytic as well as deductive reasoning. On the same note, visual analysis of data is important in understanding data and has been found to yield fruitful results in research. According to [12], visual analysis of data enables grasping the multidi-mensional "information reality" from the perspective of users. Visual analytics entails more than a mere visualisation. In fact, it can rather be seen as an integral approach to decision-making, combining visualisation, human factors and data analysis [13]. Visual analytics from another perspective is a data representation approach that employs interactive visualisation to integrate human judgement into algorithmic data-analysis processes [14]. Thus, visual representation of data plays a vital role in data interpretation and analysis.

It is important to analyse interesting patterns and associated noise from big datasets so as to identify the hidden patterns and knowledge in them. Unfortunately, some data scientists advocate deleting or not including the noisy data instead of visually depicting the noise and reporting the analyses. Certainly, deleting inconsistent data from a noisy dataset will increase the incompleteness in the dataset thereby reducing the soundness of the information retrieved from the dataset. Consequently, the noise in a dataset should be tolerated and its tolerance will enable the avoidance of losing interesting information about the dataset. The analysis of incomplete biological data of an organism for example, enhances the understanding of the abnormalities in the organism. Incomplete biological data existing in datasets from laboratory investigations such as data about genes and proteins provides clues to genetic disorder.

The importance of identifying inconsistencies in pattern can also be evident in survey dataset. A survey on pattern of menstruation can reveal a pattern that ladies between the ages of 20 to 30 years old who have not seen their menstruation

**27**

*Visual Identification of Inconsistency in Pattern DOI: http://dx.doi.org/10.5772/intechopen.95506*

ing from Hormonal aberrations for instance.

to properly provide analytical reports that expose the issues.

quantifications like intervals or conveying variance through raw data".

Consequently, the authors of this chapter emphasise the need for visualising inconsistencies in identified data patterns by explaining existing approaches and implementing novel approaches for visual analysis of inconsistencies in patterns. In Section 2, a detailed explanation on the concept of inconsistencies in pattern is given. In Section 3, two approaches for visualising inconsistencies in patterns are presented. The visual analyses of inconsistencies in objects with many attribute values and the visual comparison of an investigated dataset with a case control dataset is described. These approaches and their associated tools which were developed by the authors are discussed in the same section. The WellGrowth application is discussed in the same section. The WellGrowth app integrates the use of fault tolerance and colour coding to visualise inconsistent pattern while using data curated from Nsukka Medical Centre (NMC) and data from the website of World Health Organisation (WHO) as their control studies. A comparison of ConTra, Datax and WellGrowth Apps is presented in

Section 4 while Section 5 is the conclusion and research focus for future work.

Any inconsistent data associated to a pattern reduces the quality of findings presented by the analyst about the pattern. An assessment of such inconsistent data can increase the trustworthiness of the findings from the analyst. There are everyday instances of inconsistent data in identified patterns which are likely to mar the patterns. Meade and Craig in [15] explain how inconsistent data from careless respondents of students' survey can be identified among data patterns common among respondents of the survey. Patterns derived from survey data can be associated to contradictory or incomplete responses. Also, patterns discovered in biological investigations can be associated to inconsistent and incomplete data. A gene expression dataset whose columns includes gene name, tissue name, expression and experiment ID can contain inconsistent data in an identified pattern where many experiments are performed for a particular gene in a particular tissue. An expression can be detected,

**2. Inconsistencies in patterns**

for more than two months are pregnant. This pattern does not mean that all ladies of this same age bracket who have not seen their menstruations for more than two months from the survey data are pregnant. Obviously, there can exist ladies suffer-

Also, respondents to survey questions may provide inaccurate responses, such as giving many consecutive items a response of "4" or repeating a pattern of "1, 2, 3, 4, 5…." as explained in [15]. Such purposefully deceptive or even contradictory responses are herein assessed as inconsistencies in patterns and should be portrayed visually as the wrong side of analysis. An example of inconsistency in a survey pattern involving giving many consecutive items a response of "4" is a pattern that shows responses that do not give many consecutive items a response of "4". It is therefore important to identify such inconsistencies in patterns of interest in order

The importance of identifying and assessing inconsistent data is explained in works such as [15–17] but very few publications exist in the area of visually identifying inconsistencies in patterns of interest [18]. There is therefore a need to have a system that enables the visual analysis of inconsistencies in patterns of interest in a dataset. This is to provide data users with a holistic understanding of data of interest. It is stated in [18] "Of 612 data visualizations from 121 articles published online in February 2019 by a set of leading purveyors of data journalism, social science surveys, and economic estimates, 449 (73%) presented data intended for inference, but only 14 (3%) portrayed uncertainty visually, either by depicting explicit

#### *Visual Identification of Inconsistency in Pattern DOI: http://dx.doi.org/10.5772/intechopen.95506*

*Applications of Pattern Recognition*

be overemphasised. In fact, invalid and inconsistent data could inadvertently impart negatively on results of a research. The authors in [5] pointed out the importance of data validation for systematic software development. Similarly, the authors in [6] explained the importance of health records for diagnosis and treatment purposes. In general the need for valid data is indeed a concern that cuts across every research area. The study of big data has been found to have great impact on scientific discoveries and value creation [7]. The study continues to gain recognition as the quest for tools and measures for validating data continues. Also, [8] explains that the presence of noise hampers the induction of Machine Learning models from data, and can also make the training time longer. Noisy data according to [9] cannot be avoided but rather dealt with. Data, whether structured, semi-structured, or unstructured must be scrutinised with utmost care. The rigour of validating data

could be tasking and are usually left in the hands of data scientists.

decision support data and associated inconsistencies.

data interpretation and analysis.

proteins provides clues to genetic disorder.

Data scientists acquire datasets from different environments which in most cases could be noisy. A noisy dataset contains uncertain and inconsistent data that could arise from missing values, imprecise data, and contradictory values, among others. The work of a data scientist includes among others, to explore big dataset in order to find interesting patterns and build supporting arguments for decision making. Such interesting patterns are likely to exclude noise in the form of conflicting or missing data in the dataset which do not support the arguments presented by the analyst. Data which are inconsistent with decision supporting facts should also be analysed. An approach to this analysis is to visually explore the patterns of the

Visual analytics is defined in [10] as the science of analytical reasoning facilitated by interactive visual interfaces. Data visualisation is useful for data cleaning, exploring data structure, detecting outliers and unusual groups, identifying trends and clusters, spotting local patterns among others [11]. The visual platform and representations enables better understanding and facilitates analytic as well as deductive reasoning. On the same note, visual analysis of data is important in understanding data and has been found to yield fruitful results in research. According to [12], visual analysis of data enables grasping the multidi-mensional "information reality" from the perspective of users. Visual analytics entails more than a mere visualisation. In fact, it can rather be seen as an integral approach to decision-making, combining visualisation, human factors and data analysis [13]. Visual analytics from another perspective is a data representation approach that employs interactive visualisation to integrate human judgement into algorithmic data-analysis processes [14]. Thus, visual representation of data plays a vital role in

It is important to analyse interesting patterns and associated noise from big datasets so as to identify the hidden patterns and knowledge in them. Unfortunately, some data scientists advocate deleting or not including the noisy data instead of visually depicting the noise and reporting the analyses. Certainly, deleting inconsistent data from a noisy dataset will increase the incompleteness in the dataset thereby reducing the soundness of the information retrieved from the dataset. Consequently, the noise in a dataset should be tolerated and its tolerance will enable the avoidance of losing interesting information about the dataset. The analysis of incomplete biological data of an organism for example, enhances the understanding of the abnormalities in the organism. Incomplete biological data existing in datasets from laboratory investigations such as data about genes and

The importance of identifying inconsistencies in pattern can also be evident in survey dataset. A survey on pattern of menstruation can reveal a pattern that ladies between the ages of 20 to 30 years old who have not seen their menstruation

**26**

for more than two months are pregnant. This pattern does not mean that all ladies of this same age bracket who have not seen their menstruations for more than two months from the survey data are pregnant. Obviously, there can exist ladies suffering from Hormonal aberrations for instance.

Also, respondents to survey questions may provide inaccurate responses, such as giving many consecutive items a response of "4" or repeating a pattern of "1, 2, 3, 4, 5…." as explained in [15]. Such purposefully deceptive or even contradictory responses are herein assessed as inconsistencies in patterns and should be portrayed visually as the wrong side of analysis. An example of inconsistency in a survey pattern involving giving many consecutive items a response of "4" is a pattern that shows responses that do not give many consecutive items a response of "4". It is therefore important to identify such inconsistencies in patterns of interest in order to properly provide analytical reports that expose the issues.

The importance of identifying and assessing inconsistent data is explained in works such as [15–17] but very few publications exist in the area of visually identifying inconsistencies in patterns of interest [18]. There is therefore a need to have a system that enables the visual analysis of inconsistencies in patterns of interest in a dataset. This is to provide data users with a holistic understanding of data of interest. It is stated in [18] "Of 612 data visualizations from 121 articles published online in February 2019 by a set of leading purveyors of data journalism, social science surveys, and economic estimates, 449 (73%) presented data intended for inference, but only 14 (3%) portrayed uncertainty visually, either by depicting explicit quantifications like intervals or conveying variance through raw data".

Consequently, the authors of this chapter emphasise the need for visualising inconsistencies in identified data patterns by explaining existing approaches and implementing novel approaches for visual analysis of inconsistencies in patterns. In Section 2, a detailed explanation on the concept of inconsistencies in pattern is given. In Section 3, two approaches for visualising inconsistencies in patterns are presented. The visual analyses of inconsistencies in objects with many attribute values and the visual comparison of an investigated dataset with a case control dataset is described. These approaches and their associated tools which were developed by the authors are discussed in the same section. The WellGrowth application is discussed in the same section. The WellGrowth app integrates the use of fault tolerance and colour coding to visualise inconsistent pattern while using data curated from Nsukka Medical Centre (NMC) and data from the website of World Health Organisation (WHO) as their control studies. A comparison of ConTra, Datax and WellGrowth Apps is presented in Section 4 while Section 5 is the conclusion and research focus for future work.
