4. Experimental results

In this section, we use experimental results to demonstrate the performance of our proposed approaches on real-world data sets using several effectiveness evaluations.

We select three data sets from the UCI machine-learning repository [60] and three face recognition data sets [61–63], which are well known in the machine learning and data mining research community. Table 1 lists a summary of these data sets.

In Yale and ORL face data sets, each image is transformed into a 32 32 pixel configuration using Matlab Image Processing Toolbox. In Yale B data set, which has 10 clusters, and each cluster has 585 image data, since the original size (5850) is too big, which leads to too much time consumption in clustering, we randomly select 60 images from the totally 585 images in each cluster. In all the data sets, each image is normalized to have unit norm.

We use interior-point method-based l1\_ls\_matlab tool [64] to solve Eq. (6) for each data object and then implement different weight matrices for algorithm 1 in Matlab to cluster each data set. Table 2 shows a summary of the proposed and baseline algorithms.

Since the true class labels of each data set are known, five commonly used external cluster validation metrics [66–68] are employed to evaluate the clustering results, namely clustering accuracy (CA) and normalized mutual information (NMI)<sup>1</sup> .


To illustrate the weight matrices from different approaches, we demonstrate the visual property of the proposed graph weight matrices in comparison with traditional ones in Figure 1, taking the Yale data set as an example. In Figure 1, each subfigure is a weight matrix with N\*N (entries larger than the threshold is shown in white, otherwise black) and images from the same cluster are arranged together. These sparse representation-based graphs include consistent sign set (CSS), cosine similarity of coefficient vectors (COS), induced similarity measure (SIS), l<sup>1</sup> directed graph construction (DGC), nonnegative sparsity induced similarity measure

Figure 1. Visualization of the graph weight matrices of the Yale data set, where images from the same subject are

Robust Spectral Clustering via Sparse Representation http://dx.doi.org/10.5772/intechopen.76586 167

Since none of the original weight matrices constructed by the five approaches is sparse, we set threshold values (0.2 for COS; 0.388 for CSS; 0 for RBF (σ is set 4 in RBF); 0.02 for the other three matrices) to get the best sparse matrices of different threshold values in Figure 1. A value larger than the threshold is shown in white, otherwise black. Normally, the clustering performance will be good if the weights between two objects from different clusters are little while weights from the same cluster are large. This comment can be equalized in the matrix with above arrangement that good matrix should be compact in diagonal position and sparse in

From Figure 1, we have the following observations: (1) matrices in all subfigures are compact in diagonal position; (2) the matrix of COS is sparser than others in lower left or upper right

(NN) and Gaussian RBF (RBF).

arranged together. (a) SIS, (b) DGC, (c) NN, (d) RBF, (e) CSS, (f) COS.

other positions.

Table 1. Summary of data sets.


Table 2. Summary of algorithms to be compared.

1 We use the matlab toolbox from: http://www.cad.zju.edu.cn/home/dengcai/Data/data.html

We select three data sets from the UCI machine-learning repository [60] and three face recognition data sets [61–63], which are well known in the machine learning and data mining

In Yale and ORL face data sets, each image is transformed into a 32 32 pixel configuration using Matlab Image Processing Toolbox. In Yale B data set, which has 10 clusters, and each cluster has 585 image data, since the original size (5850) is too big, which leads to too much time consumption in clustering, we randomly select 60 images from the totally 585 images in

We use interior-point method-based l1\_ls\_matlab tool [64] to solve Eq. (6) for each data object and then implement different weight matrices for algorithm 1 in Matlab to cluster each data

Since the true class labels of each data set are known, five commonly used external cluster validation metrics [66–68] are employed to evaluate the clustering results, namely clustering

Data set # Instance # Attributes # Classes Source Heart 270 13 2 UCI Image (Image segmentation) 2310 18 7 UCI Yale 165 1024 15 [61] Yale B 600 1200 10 [62] ORL Face 400 1024 40 [63] Movement 360 90 15 UCI

Name Description Source Role

RBF Spectral clustering with weight matrix from Gaussian RBF [12] Baseline SIS Spectral clustering with weight matrix from sparsity induced similarity measure [59] Baseline DGC Spectral clustering with weight matrix from l1 Directed Graph Construction [58] Baseline

KM k-means clustering [65] Baseline

CSS Spectral clustering with weight matrix from consistent sign set Section

COS Spectral clustering with weight matrix from cosine similarity of sparse coefficients Section

.

3.1

3.2

Solution proposed

Solution proposed

[57] Baseline

research community. Table 1 lists a summary of these data sets.

each cluster. In all the data sets, each image is normalized to have unit norm.

set. Table 2 shows a summary of the proposed and baseline algorithms.

We use the matlab toolbox from: http://www.cad.zju.edu.cn/home/dengcai/Data/data.html

NN Spectral clustering with weight matrix from non-negative sparsity induced

accuracy (CA) and normalized mutual information (NMI)<sup>1</sup>

1

Table 1. Summary of data sets.

166 Recent Applications in Data Clustering

similarity measure

Table 2. Summary of algorithms to be compared.

Figure 1. Visualization of the graph weight matrices of the Yale data set, where images from the same subject are arranged together. (a) SIS, (b) DGC, (c) NN, (d) RBF, (e) CSS, (f) COS.

To illustrate the weight matrices from different approaches, we demonstrate the visual property of the proposed graph weight matrices in comparison with traditional ones in Figure 1, taking the Yale data set as an example. In Figure 1, each subfigure is a weight matrix with N\*N (entries larger than the threshold is shown in white, otherwise black) and images from the same cluster are arranged together. These sparse representation-based graphs include consistent sign set (CSS), cosine similarity of coefficient vectors (COS), induced similarity measure (SIS), l<sup>1</sup> directed graph construction (DGC), nonnegative sparsity induced similarity measure (NN) and Gaussian RBF (RBF).

Since none of the original weight matrices constructed by the five approaches is sparse, we set threshold values (0.2 for COS; 0.388 for CSS; 0 for RBF (σ is set 4 in RBF); 0.02 for the other three matrices) to get the best sparse matrices of different threshold values in Figure 1. A value larger than the threshold is shown in white, otherwise black. Normally, the clustering performance will be good if the weights between two objects from different clusters are little while weights from the same cluster are large. This comment can be equalized in the matrix with above arrangement that good matrix should be compact in diagonal position and sparse in other positions.

From Figure 1, we have the following observations: (1) matrices in all subfigures are compact in diagonal position; (2) the matrix of COS is sparser than others in lower left or upper right parts. This means that there are less inter-cluster adjacency connections in the COS than other graphs, so COS can encode more discriminating information and hence is more effective in spectral clustering than other traditional graphs; (3) CSS has a similar performance to SIS, DGC and NN Graph in Yale data set.

The clustering results obtained from the seven clustering algorithms with different evaluation metrics are reported in Tables 3 and 4, each of which corresponds to one evaluation metric. For each data set, the best results are in bold. All the numbers, except the last two rows in each table, represent the best clustering results using different lasso parameters (λ). The last two rows in each table present the average performance of each algorithm over all six data sets. Since the k-means clustering within spectral clustering is sensitive to initial centroids, we run spectral clustering 50 times for each case and report the mean and standard deviation (std).

From Tables 3 and 4, we can clearly see that, generally, CSS or COS algorithm gets the best clustering performance with all the two evaluation metrics. However, there are also some particular cases where CSS or COS does not get best result. For example, though NN gets the best CA on Yale B data sets, COS gets almost the same CA result as NN, that is, from 0.8937 to 0.8940; though NN also gets the best NMI for movement data set, COS gets best result in other metric on this data set. In particular, COS performs better than CSS with mean value of evaluation metrics, and the average standard deviation between 50 random tests of CSS is lowest for all metrics except CA.

Overall, for most data sets, CSS and COS show better performance than those baselines, which are robust across various external validation metrics. However, it is noticed that COS outperforms CSS in terms of all average mean metrics except CA, and CSS outperforms COS in

terms of all average standard metrics. It can be explained that CSS is more stable because its discretization may lower the variance of the pairwise of similarity, while COS get more generalized information of the pairwise of similarity leading to better average metrics but higher variance. Therefore, the choice between stability and quality should be taken into account when it is facing the clustering problem, in practice, using this kind of approach.

Data set CSS COS DGC SIS NN BRF KM Heart Mean 0.2208 0.3149 0.0511 0.1791 0.0331 0.2712 0.2028

Image Mean 0.7088 0.7451 0.5921 0.7319 0.6637 0.7451 0.6122

Yale Mean 0.7137 0.7815 0.7513 0.7641 0.6926 0.6989 0.6484

Yale B Mean 0.9008 0.9526 0.9202 0.9379 0.9510 0.7768 0.7858

ORL face Mean 0.8477 0.8688 0.8492 0.8547 0.8426 0.8512 0.8620

Movement Mean 0.5891 0.5914 0.5933 0.6000 0.6306 0.5741 0.5818

Average Mean 0.6635 0.7090 0.6262 0.6779 0.6356 0.6529 0.6155

(Std) (0.0000) (0.0000) (0.0000) (0.0000) (0.0312) (0.0000) (0.1013)

Robust Spectral Clustering via Sparse Representation http://dx.doi.org/10.5772/intechopen.76586 169

(Std) (0.0071) (0.0184) (0.0171) (0.0176) (0.0357) (0.0284) (0.0437)

(Std) (0.0211) (0.0183) (0.0203) (0.0188) (0.0198) (0.0271) (0.0342)

(Std) (0.0302) (0.0360) (0.0422) (0.0326) (0.0398) (0.0224) (0.0621)

(Std) (0.0116) (0.0108) (0.0105) (0.0118) (0.0109) (0.0140) (0.0157)

(Std) (0.0128) (0.0124) (0.0140) (0.0130) (0.0173) (0.0169) (0.0180)

(Std) (0.0138) (0.0160) (0.0174) (0.0156) (0.0258) (0.0181) (0.0458)

Finally, we plot the averages of the mean value and standard deviation (from the last two rows

of the five tables), for comparing clustering algorithms, as shown in Figure 2.

Table 4. Evaluation of all algorithms with NMI as metric.

Figure 2. Error bar of different algorithms (a) CA, (b) NMI.


Table 3. Evaluation of all algorithms with CA as metric.



Table 4. Evaluation of all algorithms with NMI as metric.

parts. This means that there are less inter-cluster adjacency connections in the COS than other graphs, so COS can encode more discriminating information and hence is more effective in spectral clustering than other traditional graphs; (3) CSS has a similar performance to SIS,

The clustering results obtained from the seven clustering algorithms with different evaluation metrics are reported in Tables 3 and 4, each of which corresponds to one evaluation metric. For each data set, the best results are in bold. All the numbers, except the last two rows in each table, represent the best clustering results using different lasso parameters (λ). The last two rows in each table present the average performance of each algorithm over all six data sets. Since the k-means clustering within spectral clustering is sensitive to initial centroids, we run spectral clustering 50 times for each case and report the mean and standard deviation (std).

From Tables 3 and 4, we can clearly see that, generally, CSS or COS algorithm gets the best clustering performance with all the two evaluation metrics. However, there are also some particular cases where CSS or COS does not get best result. For example, though NN gets the best CA on Yale B data sets, COS gets almost the same CA result as NN, that is, from 0.8937 to 0.8940; though NN also gets the best NMI for movement data set, COS gets best result in other metric on this data set. In particular, COS performs better than CSS with mean value of evaluation metrics, and the average standard deviation between 50 random tests of CSS is

Overall, for most data sets, CSS and COS show better performance than those baselines, which are robust across various external validation metrics. However, it is noticed that COS outperforms CSS in terms of all average mean metrics except CA, and CSS outperforms COS in

Data set CSS COS DGC SIS NN BRF KM Heart Mean 0.7704 0.8174 0.5852 0.7889 0.7519 0.7963 0.7320

Image Mean 0.7631 0.7921 0.7020 0.7820 0.7360 0.5335 0.6215

Yale Mean 0.6823 0.7408 0.7178 0.7023 0.6417 0.6635 0.5482

Yale B Mean 0.8572 0.8937 0.8320 0.8620 0.8940 0.6918 0.6862

ORL face Mean 0.7315 0.7570 0.7225 0.7243 0.6903 0.7314 0.7196

Movement mean 0.5241 0.5472 0.5009 0.5183 0.5304 0.4874 0.4653

Average Mean 0.7214 0.7580 0.6767 0.7296 0.7074 0.6506 0.6288

(Std) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0000) (0.0882)

(Std) (0.0148) (0.0323) (0.0339) (0.0341) (0.0379) (0.0305) (0.0355)

(Std) (0.0432) (0.0345) (0.0414) (0.0314) (0.0412) (0.0395) (0.0529)

(Std) (0.0791) (0.0713) (0.0768) (0.0635) (0.0767) (0.0226) (0.0721)

(Std) (0.0247) (0.0218) (0.0222) (0.0244) (0.0207) (0.0252) (0.0311)

(Std) (0.0222) (0.0187) (0.0248) (0.0193) (0.0271) (0.0232) (0.0203)

(Std) (0.0307) (0.0298) (0.0332) (0.0288) (0.0339) (0.0235) (0.0500)

DGC and NN Graph in Yale data set.

168 Recent Applications in Data Clustering

lowest for all metrics except CA.

Table 3. Evaluation of all algorithms with CA as metric.

terms of all average standard metrics. It can be explained that CSS is more stable because its discretization may lower the variance of the pairwise of similarity, while COS get more generalized information of the pairwise of similarity leading to better average metrics but higher variance. Therefore, the choice between stability and quality should be taken into account when it is facing the clustering problem, in practice, using this kind of approach.

Finally, we plot the averages of the mean value and standard deviation (from the last two rows of the five tables), for comparing clustering algorithms, as shown in Figure 2.

Figure 2. Error bar of different algorithms (a) CA, (b) NMI.
