*3.3.1 Evaluation of Datax*

Datax was used to analyse the Amazon product review dataset as provided by Datafiniti's Product Database. **Figure 2** depicts the evaluation pane and shows a sneak peek into the first five rows of the investigated dataset while the right side

<sup>2</sup> https://github.com/marioJoker/Datax

<sup>3</sup> https://data.world/datafiniti/consumer-reviews-of-amazon-products

<sup>4</sup> https://datafiniti.co/products/product-data/

### *Applications of Pattern Recognition*

displays the statistical analysis of the dataset. The statistical analysis displayed includes: the names of the investigated columns, the total number of missing values per column of the dataset, and the percentage of the missing data per column (**Figure 3**).

As evident in **Figure 4**, the amount of missingness is indicated by the heights of the bars. Bars with equal height indicate joint missingness in investigated attributes. A Datax Bar plot reveals the amount of missing data and commonalities of such instances across the dataset. It can be observed in **Figure 4** that the columns V, W and Y have the same amount of missingness. Also the column headings U and X have the same amount of missing instances. Obviously, any analysis that includes columns whose pattern indicates significant amount of missingness should acknowledge such missingness in its reports. Columns that do not have missing data are also revealed in **Figure 4**. For example, columns A, B, C, D, F, G, J, M, N,

#### **Figure 2.**

*Datax evaluation pane depicting the number of missing data per investigated column.*

**33**

**Figure 5.**

**Figure 4.**

*Datax Bar plot depicting the amount of Missingness per investigated column.*

*DataxMatrix plot depicting the amount and distribution of Missingness and available data per column.*

*Visual Identification of Inconsistency in Pattern DOI: http://dx.doi.org/10.5772/intechopen.95506*

S, and T do not have any missingness associated to them. Any assertion made by a data analyst about any column should first be evaluated for the relevance of missing data. Datax's Bar plot do not however, show the distribution of missingness among its investigated columns. This is explored by the matrix plot as depicted in **Figure 5**. A Matrix plot of missing data as evident in **Figure 5** reveals the amount and distribution of missingness in the dataset. White colour is used for missing values while black colour is used for the available data values. It can be observed from **Figure 5** that the columns V, W and Y have 1597 missing data in common. The column headings U and X have equal amount of missing instances implying that each

### **Figure 3.**

*Datax evaluation pane depicting the percentage of Missingness per investigated column.*

## *Visual Identification of Inconsistency in Pattern DOI: http://dx.doi.org/10.5772/intechopen.95506*

*Applications of Pattern Recognition*

(**Figure 3**).

displays the statistical analysis of the dataset. The statistical analysis displayed includes: the names of the investigated columns, the total number of missing values per column of the dataset, and the percentage of the missing data per column

*Datax evaluation pane depicting the number of missing data per investigated column.*

*Datax evaluation pane depicting the percentage of Missingness per investigated column.*

As evident in **Figure 4**, the amount of missingness is indicated by the heights of the bars. Bars with equal height indicate joint missingness in investigated attributes. A Datax Bar plot reveals the amount of missing data and commonalities of such instances across the dataset. It can be observed in **Figure 4** that the columns V, W and Y have the same amount of missingness. Also the column headings U and X have the same amount of missing instances. Obviously, any analysis that includes columns whose pattern indicates significant amount of missingness should acknowledge such missingness in its reports. Columns that do not have missing data are also revealed in **Figure 4**. For example, columns A, B, C, D, F, G, J, M, N,

**32**

**Figure 3.**

**Figure 2.**

S, and T do not have any missingness associated to them. Any assertion made by a data analyst about any column should first be evaluated for the relevance of missing data. Datax's Bar plot do not however, show the distribution of missingness among its investigated columns. This is explored by the matrix plot as depicted in **Figure 5**.

A Matrix plot of missing data as evident in **Figure 5** reveals the amount and distribution of missingness in the dataset. White colour is used for missing values while black colour is used for the available data values. It can be observed from **Figure 5** that the columns V, W and Y have 1597 missing data in common. The column headings U and X have equal amount of missing instances implying that each

**Figure 4.** *Datax Bar plot depicting the amount of Missingness per investigated column.*

**Figure 5.**

*DataxMatrix plot depicting the amount and distribution of Missingness and available data per column.*

reviewer that did not fill data in U, did not also fill data in X. The same observation holds for columns I and Z which have same distribution of missingness. The data analyst should make efforts to understand the relationships among the columns with joint and same distribution of missingness to present a robust report about the missingness in any discovered pattern.

Datax has also been used to evaluate cell phone reviews on the amazon online shopping store. The dataset is also deposited along Datax open source code5 . It contains 11 columns and 1,048,576 records. Datax was evaluated by a team of software developers in University of Nigeria, Nsukka and they described its efficiency in mining missing data and visualisation of associated patterns as excellent. Even so, it does not visualise the different forms of missing data. It specifically mines empty cells without noting representations such as "-", "not existing", "not available", among others as missing data. The authors hope to integrate this ability in the next update of the application.
