**3. Experimental results**

To show the imputation performance of the above-mentioned methods, experiments on open datasets were conducted. The datasets included Abalone (Aba), Scene (SCN), White Wine (WW), and Indian Pines (IP). The number of samples were 4177, 2407, 4898, and 21025, respectively. Furthermore, the dimensionality was, respectively, 561, 294, 11, and 200. Imputation approaches included KNN Regression Imputation (KNRImpute), KNNImpute with *K* = 5, Regression Tree Imputation (RTImpute), Random Forest Imputation (RFImpute), and NIPALS-PCA Imputation (PCAImpute) with only one component. All of them were found in open sources.

To generate missing values for each dataset, this study used a random generator to decide missing-values entries. For KNRImpute, KNNImpute, RTImpute, and RFImpute, they required that missing values should not be uniformly distributed in data. Otherwise, imputation could not be performed. Thus, not every of the independent variables were chosen. Missing-value rates ranged from 3.00% to 9.00%,

#### *Applications of Pattern Recognition*

with a separation of 2.00%. When a dataset was recovered, the difference between the substituted values and the ground truth was compared. The criteria for examining the quality of imputation included root-mean-squared errors (RMSEs) and coefficients of determination (R2 ). For coefficients of determination, this study reshaped (i.e., vectorized) a dataset into a vector and then used the following definition to compute R<sup>2</sup> . Assume that *xg* represents an element of a ground-truth dataset (*g* = 1, … ,*MN*), *x*^*<sup>g</sup>* denotes the corresponding recovered value, and *xg* denotes the mean of all the ground-truth values in the same dataset

*<sup>R</sup>* <sup>¼</sup> <sup>1</sup> �<sup>X</sup>

*DOI: http://dx.doi.org/10.5772/intechopen.94068*

smaller.

*Incomplete Data Analysis*

**Figure 3.**

**Figure 4.**

**69**

*bottom are R<sup>2</sup>*

*.*

*the bottom are R<sup>2</sup>*

R2

*g*

*xg* � *x*^*<sup>g</sup>* � �<sup>2</sup>

*=* X *g*

When *R* was close to one, substituted values approached the ground truth. This implied that the difference between the substituted values and the ground truth was

**Figures 1**–**4** display the average and the standard deviation of the RMSEs and

, where the horizontal axis denotes the missing rates, and the vertical axis is the evaluation result. The left subplots are line plots, and the right ones show bar charts with standard variation. As shown in the figures, standard variation was quite small.

*Average and the standard deviation of the RMSEs and R<sup>2</sup> based on dataset WW. The top ones are RMSEs, and*

*Average and the standard deviation of the RMSEs and R<sup>2</sup> based on dataset IP. The top ones are RMSEs, and the*

*. The standard deviation was divided by 10.000 for better resolutions.*

*xg* � *xg* � �<sup>2</sup>

*:* (12)

**Figure 1.**

*Average and the standard deviation of the RMSEs and R<sup>2</sup> based on dataset Aba. The top ones are RMSEs, and the bottom are R<sup>2</sup> . The standard deviation was divided by 10.000 for better resolutions.*

**Figure 2.**

*Average and the standard deviation of the RMSEs and R<sup>2</sup> based on dataset SCN. The top ones are RMSEs, and the bottom are R<sup>2</sup> .*

*Incomplete Data Analysis DOI: http://dx.doi.org/10.5772/intechopen.94068*

$$R = \mathbf{1} - \sum\_{\mathbf{g}} \left( \mathbf{x}\_{\mathbf{g}} - \hat{\mathbf{x}}\_{\mathbf{g}} \right)^{2} / \sum\_{\mathbf{g}} \left( \mathbf{x}\_{\mathbf{g}} - \overline{\mathbf{x}}\_{\mathbf{g}} \right)^{2}. \tag{12}$$

When *R* was close to one, substituted values approached the ground truth. This implied that the difference between the substituted values and the ground truth was smaller.

**Figures 1**–**4** display the average and the standard deviation of the RMSEs and R2 , where the horizontal axis denotes the missing rates, and the vertical axis is the evaluation result. The left subplots are line plots, and the right ones show bar charts with standard variation. As shown in the figures, standard variation was quite small.

**Figure 3.**

with a separation of 2.00%. When a dataset was recovered, the difference between the substituted values and the ground truth was compared. The criteria for examining the quality of imputation included root-mean-squared errors (RMSEs) and

reshaped (i.e., vectorized) a dataset into a vector and then used the following

dataset (*g* = 1, … ,*MN*), *x*^*<sup>g</sup>* denotes the corresponding recovered value, and *xg*

*Average and the standard deviation of the RMSEs and R<sup>2</sup> based on dataset Aba. The top ones are RMSEs, and*

*Average and the standard deviation of the RMSEs and R<sup>2</sup> based on dataset SCN. The top ones are RMSEs, and*

*. The standard deviation was divided by 10.000 for better resolutions.*

denotes the mean of all the ground-truth values in the same dataset

). For coefficients of determination, this study

. Assume that *xg* represents an element of a ground-truth

coefficients of determination (R2

*Applications of Pattern Recognition*

definition to compute R<sup>2</sup>

**Figure 1.**

**Figure 2.**

**68**

*the bottom are R<sup>2</sup>*

*.*

*the bottom are R<sup>2</sup>*

*Average and the standard deviation of the RMSEs and R<sup>2</sup> based on dataset WW. The top ones are RMSEs, and the bottom are R<sup>2</sup> . The standard deviation was divided by 10.000 for better resolutions.*

**Figure 4.**

*Average and the standard deviation of the RMSEs and R<sup>2</sup> based on dataset IP. The top ones are RMSEs, and the bottom are R<sup>2</sup> .*

Besides, RMSEs became higher when missing rates were increased. Observations showed that KNRImpute, RTImpute, and RFImpute generated similar RMSEs. Overall, KNNImpute and PCAImpute were affected by the hyperparameters.
