2.1. Techniques for high-dimensional data

There are many techniques for dealing with high-dimensional signals (or data), popular of which include non-negative matrix factorization (NMF), manifold learning, compressed sensing and some combinations between them.

Non-negative matrix factorization (NMF) is a powerful dimensionality reduction technique and has been widely applied to image processing and pattern recognition applications [17], by approximating a non-negative matrix X by the product of two non-negative low-rank factor matrices W and H. It has attracted much attention since it was first proposed by Paatero and Tapper [18] and has already been proven to be equivalent in terms of optimization process with K-means and spectral clustering under some constraints [19]. The research about NMF can be generally categorized into the following groups. The first group is focused on the distance measures between the original matrix and the approximate matrix, including Kullback–Leibler divergence (KLNMF) [17], Euclidean distance (EucNMF) [20], earth mover's distance metric [21] and Manhattan distance-based NMF (MahNMF) [22]. Besides, there are researches about how to solve the optimization of NMF efficiently and the scalability of NMF algorithms for large-scale data sets, for example, fast Netwon-type methods (FNMA) [23], online NMF with robust stochastic approximation (OR-NMF) [24] and large-scale graphregularized NMF [25]. Moreover, how to improve the performance of NMF using some constrains or exploiting more information of data is also popular, such as sparseness constrained NMF (NMFsc) [26], convex model for NMF using l1,<sup>∞</sup> regularization [27], discriminant NMF (DNMF) [28], graph-regularized NMF (GNMF) [29], manifold regularized discriminative NMF (MD-NMF) [30] and constrained NMF (CNMF) [31] incorporating the label information.

Manifold learning is another theory to process high-dimensional data, assuming that the data distribution is supported on a low-dimensional sub-manifold [32]. The key idea of manifold learning is that the locality structure of high-dimensional data should be preserved in lowdimensional space after dimension reduction, which is exploited as a regularization term [33–35] or constraint [36, 37] to be added to the original problem. It has been widely used to machine learning and computer vision, such as image classification [38], semi-supervised multiview distance metric learning [39], human action recognition [40], complex object correspondence construction [41] and so on.

Besides the abovementioned two approaches for high-dimensional data, in recent years, sparse representation coming from compressed sensing has also attracted a great deal of attention and proves to be an extremely powerful tool for acquiring, representing and compressing high-dimensional data. The following section will briefly review of sparse representation.

#### 2.2. Brief review of sparse representation

Given a sufficient high-dimensional training data set, X = [x1, x2, …, xn] ∈ Rm <sup>n</sup> , where x<sup>i</sup> = [xi1, xi2,…, xim]T ∈ Rm is the column vector of the i-th object. Research on manifold learning [32] has proved that any new test data y lie on a lower dimensional manifold, which can be approximately represented by a linear combination of the training objects:

$$\mathbf{y} = a\_1\mathbf{x}\_1 + \dots + a\_i\mathbf{x}\_i + \dots + a\_n\mathbf{x}\_n = \mathbf{X}a \in \mathbf{R}^m. \tag{1}$$

where λ is a scalar regularization parameter of the Lasso penalty, which directly determines how sparse α will be and balances the trade-off between reconstruction error and sparsity.

Robust Spectral Clustering via Sparse Representation http://dx.doi.org/10.5772/intechopen.76586 159

In addition to Lasso, other sparse learning models are also developed. It will be the elastic net model [47] if the l2-norm of α is also added to Eq. (5) as another penalty term. Double shrinking algorithm (DSA) [48] compresses image data on both dimensionality and cardinality via building either sparse low-dimensional representations or a sparse projection matrix for dimension reduction. Go decomposition (GoDec) [49] tried to efficiently and robustly decompose a matrix with the low-rank part L and the sparse part S. Locality structure of manifold can also be combined with sparse representation, such as manifold elastic net (MEN) [50] and graph-regularized sparse coding (GraphSC) [51], laplacian sparse coding (LSc) [35] and

Learning tasks such as classification and clustering usually perform better and cost less (time and space) on compressed representations than on the original data [48]. Therefore, supervised learning and pattern recognition based on the sparse representation coefficients using these sparse learning models are proposed, such as sparse representation-based classification (SRC) [52], Local\_SRC [53], Kernel\_SRC [54] and the methods outperform traditional classifier, such

Inspired by the successful application of sparse representation in the above-supervised learning approaches, researchers have also exploited sparse representation in unsupervised [55–57] and semi-supervised clustering [58, 59]. The main idea of clustering via sparse representation is to build weight matrix directly from normalized and symmetrized coefficients of sparse representation coefficients, called sparsity-induced similarity (SIS) measure [59]. To a certain extent, weight measure approaches derived from sparse representation can reveal the neighborhood structure without calculating Euclidean distance, which means a great potential to

Some significant work applying SIS to spectral clustering is reviewed as follows. Sparse subspace clustering [55] directly uses the sparse representation of vectors lying in a single low-dimensional linear subspace to cluster the data into separate subspaces, followed by applying spectral clustering. It is also extended to clustering data contaminated by noise, missing entries or outliers. Experiments show that its performance for clustering motion trajectories outperforms state-of-the-art methods, such as power factorization and principal component analysis. Image clustering via sparse representation [56] characterizes the graph adjacency structure and graph weights by sparse linear coefficients, which is more effective than Gaussian RBF [12] to cluster an image data set. In semi-supervised learning by sparse representation [18], the graph adjacency structure as well as the graph weights of the directed graph construction is derived simultaneously and in a parameter-free manner to utilize both labeled and unlabeled data. Experiments on semi-supervised face recognition and image classification demonstrate the superiority over the counterparts based on traditional graphs (e.g. ε-ball neighborhood, k-nearest neighbors). Compared to approaches using SIS of real

Hypergraph laplacian sparse coding (HLSc) [35].

2.3. Sparse representation for clustering

clustering high-dimensional data.

as SVM, nearest neighbor (NN) and nearest subspace (NS).

Obviously, if m ≫ n, Eq. (1) is overdetermined, and α can usually be found as its unique solution. Typically, the number of attributes is much less than that of training objects (i.e. m ≪ n) and Eq. (1) is undetermined, so its solution is not unique.

However, if we add the constraint that the best solution of Eq. (1) should be as sparse as possible, which means that the number of non-zero elements is minimized, the solution becomes unique. Such a sparse representation can be obtained by solving the optimization problem:

$$\alpha^\* = \underset{a}{\text{arg min}} \quad \|\alpha\|\_0 \text{ subject to } \mathbf{y} = \mathbf{X}\alpha,\tag{2}$$

where ||. ||0 denotes the l0-norm of a vector, counting the number of non-zero entries in the vector. Donoho [42] proves that if matrix X satisfies restricted isometry property (RIP) [43], Eq. (2) has a unique solution of α.

However, it is NP-hard to find the sparsest solution of an underdetermined equation: that is, there is no known approach to find the sparsest solution that is significantly more efficient than exhausting all subsets of the entries for α. Researchers in emerging theory of compressed sensing [44] reveal that the non-convex optimization in (2) is equal to the following convex l<sup>1</sup> optimization problem if the solution α is sparse enough:

$$\alpha^\* = \underset{a}{\text{arg min}} \quad \|\alpha\|\_1 \text{ subject to } \mathbf{y} = \mathbf{X}a,\tag{3}$$

where ||. ||1 denotes the l1-norm of a vector, summing the absolute value of each entry in the vector. This problem can be solved in polynomial time by standard linear programming methods [45].

Since the real data contains noise, it may not be possible to express the test sample exactly as a sparse representation of the training data. The sparse solution α can still be approximately obtained by solving the following stable l<sup>1</sup> optimization problem:

$$a^\* = \underset{a}{\text{arg min}} \quad \|a\|\_1 \text{ subject to } \|\mathbf{y} - \mathbf{X}a\|\_2 \le \varepsilon,\tag{4}$$

where ε is the maximum residual error; ||. ||2 denotes the l2-norm of a vector.

In many situations, we do not know the noise level ε beforehand. Then we can use the Lasso (least absolute shrinkage and selection operator) [46] optimization algorithm to recover the sparse solution from the following l<sup>1</sup> optimization:

$$\alpha^\* = \underset{a}{\text{arg min}} \ \lambda \|\alpha\|\_1 + \|\mathbf{y} - \mathbf{X}a\|\_{2'} \tag{5}$$

where λ is a scalar regularization parameter of the Lasso penalty, which directly determines how sparse α will be and balances the trade-off between reconstruction error and sparsity.

In addition to Lasso, other sparse learning models are also developed. It will be the elastic net model [47] if the l2-norm of α is also added to Eq. (5) as another penalty term. Double shrinking algorithm (DSA) [48] compresses image data on both dimensionality and cardinality via building either sparse low-dimensional representations or a sparse projection matrix for dimension reduction. Go decomposition (GoDec) [49] tried to efficiently and robustly decompose a matrix with the low-rank part L and the sparse part S. Locality structure of manifold can also be combined with sparse representation, such as manifold elastic net (MEN) [50] and graph-regularized sparse coding (GraphSC) [51], laplacian sparse coding (LSc) [35] and Hypergraph laplacian sparse coding (HLSc) [35].

Learning tasks such as classification and clustering usually perform better and cost less (time and space) on compressed representations than on the original data [48]. Therefore, supervised learning and pattern recognition based on the sparse representation coefficients using these sparse learning models are proposed, such as sparse representation-based classification (SRC) [52], Local\_SRC [53], Kernel\_SRC [54] and the methods outperform traditional classifier, such as SVM, nearest neighbor (NN) and nearest subspace (NS).

#### 2.3. Sparse representation for clustering

proved that any new test data y lie on a lower dimensional manifold, which can be approxi-

Obviously, if m ≫ n, Eq. (1) is overdetermined, and α can usually be found as its unique solution. Typically, the number of attributes is much less than that of training objects (i.e.

However, if we add the constraint that the best solution of Eq. (1) should be as sparse as possible, which means that the number of non-zero elements is minimized, the solution becomes unique. Such a sparse representation can be obtained by solving the optimization

where ||. ||0 denotes the l0-norm of a vector, counting the number of non-zero entries in the vector. Donoho [42] proves that if matrix X satisfies restricted isometry property (RIP) [43],

However, it is NP-hard to find the sparsest solution of an underdetermined equation: that is, there is no known approach to find the sparsest solution that is significantly more efficient than exhausting all subsets of the entries for α. Researchers in emerging theory of compressed sensing [44] reveal that the non-convex optimization in (2) is equal to the following convex l<sup>1</sup>

where ||. ||1 denotes the l1-norm of a vector, summing the absolute value of each entry in the vector. This problem can be solved in polynomial time by standard linear programming

Since the real data contains noise, it may not be possible to express the test sample exactly as a sparse representation of the training data. The sparse solution α can still be approximately

In many situations, we do not know the noise level ε beforehand. Then we can use the Lasso (least absolute shrinkage and selection operator) [46] optimization algorithm to recover the

<sup>y</sup> <sup>¼</sup> <sup>α</sup>1x<sup>1</sup> <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>α</sup>ixi <sup>þ</sup> <sup>⋯</sup> <sup>þ</sup> <sup>α</sup>nxn <sup>¼</sup> <sup>X</sup>α<sup>∈</sup> <sup>R</sup><sup>m</sup>: (1)

k kα <sup>0</sup> subject to y ¼ Xα, (2)

k kα <sup>1</sup> subject to y ¼ Xα, (3)

k kα <sup>1</sup> subject to k k y � Xα <sup>2</sup> ≤ ε, (4)

λ αk k<sup>1</sup> þ k k y � Xα <sup>2</sup>, (5)

mately represented by a linear combination of the training objects:

m ≪ n) and Eq. (1) is undetermined, so its solution is not unique.

<sup>α</sup><sup>∗</sup> <sup>¼</sup> arg min <sup>α</sup>

optimization problem if the solution α is sparse enough:

<sup>α</sup><sup>∗</sup> <sup>¼</sup> arg min <sup>α</sup>

obtained by solving the following stable l<sup>1</sup> optimization problem:

where ε is the maximum residual error; ||. ||2 denotes the l2-norm of a vector.

<sup>α</sup><sup>∗</sup> <sup>¼</sup> arg min <sup>α</sup>

<sup>α</sup><sup>∗</sup> <sup>¼</sup> arg min <sup>α</sup>

sparse solution from the following l<sup>1</sup> optimization:

problem:

158 Recent Applications in Data Clustering

methods [45].

Eq. (2) has a unique solution of α.

Inspired by the successful application of sparse representation in the above-supervised learning approaches, researchers have also exploited sparse representation in unsupervised [55–57] and semi-supervised clustering [58, 59]. The main idea of clustering via sparse representation is to build weight matrix directly from normalized and symmetrized coefficients of sparse representation coefficients, called sparsity-induced similarity (SIS) measure [59]. To a certain extent, weight measure approaches derived from sparse representation can reveal the neighborhood structure without calculating Euclidean distance, which means a great potential to clustering high-dimensional data.

Some significant work applying SIS to spectral clustering is reviewed as follows. Sparse subspace clustering [55] directly uses the sparse representation of vectors lying in a single low-dimensional linear subspace to cluster the data into separate subspaces, followed by applying spectral clustering. It is also extended to clustering data contaminated by noise, missing entries or outliers. Experiments show that its performance for clustering motion trajectories outperforms state-of-the-art methods, such as power factorization and principal component analysis. Image clustering via sparse representation [56] characterizes the graph adjacency structure and graph weights by sparse linear coefficients, which is more effective than Gaussian RBF [12] to cluster an image data set. In semi-supervised learning by sparse representation [18], the graph adjacency structure as well as the graph weights of the directed graph construction is derived simultaneously and in a parameter-free manner to utilize both labeled and unlabeled data. Experiments on semi-supervised face recognition and image classification demonstrate the superiority over the counterparts based on traditional graphs (e.g. ε-ball neighborhood, k-nearest neighbors). Compared to approaches using SIS of real numbers, non-negative SIS measure [57] exploits the symmetric coefficients of non-negative sparse representation as weight matrix, which outperforms similarity measures, such as SIS and Euclidean (with Gaussian RBF baseline [12]), in cluster analysis of spam images.

However, all the above-existing approaches based on sparse representation treat directly the coefficients or just normalized coefficients of sparse representation as the weight matrix. These cannot exactly reflect the similarity between objects because the coefficients of sparse representation are somehow local similarity and sensitive to outliers. Our approach is expected to provide more effective weight matrix construction using more global content from the solution coefficients of sparse representation.

#### 2.4. Graph construction with sparse representation

In clustering analysis, given a high-dimensional object data set X = [x1, x2, …, xn] ∈ Rm � <sup>n</sup> , x<sup>i</sup> = [xi1, xi2,…, xim] <sup>T</sup> ∈ Rm, we can use Eq. (5) to represent each objects x<sup>i</sup> as a linear combination of other objects. The coefficients vector α<sup>i</sup> of x<sup>i</sup> can be calculated by solving the following Lasso optimization:

$$\alpha\_i^\* = \underset{a\_i}{\text{arg min}} \ \lambda \|\alpha\_i\|\_1 + \|\mathbf{x}\_i - \mathbf{X}\_i \alpha\_i\|\_{2'} \tag{6}$$

Then the non-negative SIS measure is computed as:

matrix to get partition of the graph.

3.1. Proximity based on a consistent sign set

CSS xi; xj

wij ¼

3.2. Proximity based on cosine similarity of coefficient vector

8 < :

of the solution coefficients.

as follows:

of Eq. (6) are:

3. Sparse representation for spectral clustering

NNij <sup>¼</sup> <sup>P</sup>

αi,j n <sup>k</sup>¼1, <sup>k</sup>6¼<sup>i</sup> <sup>α</sup>i, <sup>k</sup>

Our proposed clustering algorithm consists of three steps: (1) solving l<sup>1</sup> optimization of sparse representation to obtain the coefficients of each object; (2) constructing weight matrix between objects on the basis of coefficients using more global content forms the solution coefficients of sparse representation; and (3) exploiting the spectral clustering algorithm with the weight

Compared to the direct construction methods using the independent solutions of Eq. (6), we have the assumption that for any two objects x<sup>i</sup> and xj; the more similar they are, the more similar the corresponding coefficient vectors (e.g., α<sup>i</sup> and αj) are not only a particular coefficient (αi,j or αj,i). According to this assumption, we propose the following two graph adjacency structure and weight matrix constructions, which are expected to use the global information

To get the similarity of two objects clearly and logically, we firstly find an object set for each of the two different objects x<sup>i</sup> and x<sup>j</sup> from the object data set X, called CSS. This definition is based on the assumption that the more objects of which a pair of objects both positively contribute to the reconstruction, the more similar the pair of objects are. In particular, the sparse reconstruction coefficients corresponding to x<sup>i</sup> and x<sup>j</sup> for every object in this set are both positive, defined

Furthermore, we can construct graph adjacency structure and weight matrix as follows. A directed edge is placed between objects x<sup>i</sup> and xjj if CSS(xi, xj) 6¼ Φ and the weight between object x<sup>i</sup> and x<sup>j</sup> is defined as the ratio of the CSS(xi, xj)'s cardinal to the total number of objects:

> ∣CSS xi; x<sup>j</sup> � �∣ n

where n is the total number of objects in X. Obviously, the weight is between 0 and 1.

We can construct coefficient matrix А of data set X, to which transforming solution coefficients

0 i ¼ j ,

� � <sup>¼</sup> xk<sup>j</sup> <sup>α</sup>k,i <sup>&</sup>gt; <sup>0</sup> <sup>∧</sup> <sup>α</sup>k,j <sup>&</sup>gt; <sup>0</sup> � �; <sup>k</sup> 6¼ <sup>i</sup>; <sup>k</sup> 6¼ <sup>j</sup> � � <sup>∀</sup><sup>i</sup> 6¼ <sup>j</sup>: (11)

i 6¼ j

(12)

: (10)

Robust Spectral Clustering via Sparse Representation http://dx.doi.org/10.5772/intechopen.76586 161

where X<sup>i</sup> = X\x<sup>i</sup> = [x1, …, x<sup>i</sup> � 1, x<sup>i</sup> + 1,…, xn]; α<sup>i</sup> = [αi,1, …, αi, (i-1), αi, (<sup>i</sup> + 1), …, αi,n] T .

Once we get the coefficient vector α<sup>i</sup> for each object x<sup>i</sup> (i = 1, 2, …, n) as a sparse representation of all other data objects by solving the l<sup>1</sup> optimization Eq. (6),we can construct the weight matrix using different approaches.

Existing weight matrix constructions via sparse representation are based on the assumption that coefficients in the sparse representation reflect the closeness or similarity between two data objects. For example, the SIS measure [20] is computed as:

$$w\_{ij} = \frac{\max\{\alpha\_{i,j}, 0\}}{\sum\_{k=1, k\neq i}^{n} \max\{\alpha\_{i,k}, 0\}};\quad SIS\_{ij} = \frac{w\_{\vec{\eta}\,} + w\_{\vec{\mu}}}{2}.\tag{7}$$

The l<sup>1</sup> Directed Graph Construction (DGC) measure [19] is computed as:

$$D\mathbf{G}\mathbf{C}\_{i\circ} = \frac{|\alpha\_{i,j}| + |\alpha\_{j,i}|}{2}.\tag{8}$$

Obviously, the similarity calculation using the absolute coefficients in Eq. (8) will mistake the big negative coefficient as high similarity, resulting in a cluster of two objects with apparent opposite attributes value.

The non-negative SIS measure [22] adds a non-negative constraint in l<sup>1</sup> optimization Eq. (6):

$$a\_i^\* = \underset{a\_i}{\text{arg min}} \ \lambda \|a\_i\|\_1 + \|\mathbf{x}\_i - \mathbf{X}\_i a\_i\|\_2 \text{ s.t.}\\a\_{i,j} > 0. \tag{9}$$

Then the non-negative SIS measure is computed as:

numbers, non-negative SIS measure [57] exploits the symmetric coefficients of non-negative sparse representation as weight matrix, which outperforms similarity measures, such as SIS

However, all the above-existing approaches based on sparse representation treat directly the coefficients or just normalized coefficients of sparse representation as the weight matrix. These cannot exactly reflect the similarity between objects because the coefficients of sparse representation are somehow local similarity and sensitive to outliers. Our approach is expected to provide more effective weight matrix construction using more global content from the solution

In clustering analysis, given a high-dimensional object data set X = [x1, x2, …, xn] ∈ Rm � <sup>n</sup>

of other objects. The coefficients vector α<sup>i</sup> of x<sup>i</sup> can be calculated by solving the following Lasso

Once we get the coefficient vector α<sup>i</sup> for each object x<sup>i</sup> (i = 1, 2, …, n) as a sparse representation of all other data objects by solving the l<sup>1</sup> optimization Eq. (6),we can construct the weight

Existing weight matrix constructions via sparse representation are based on the assumption that coefficients in the sparse representation reflect the closeness or similarity between two

<sup>k</sup>¼1, <sup>k</sup>6¼<sup>i</sup> maxf g <sup>α</sup>i, <sup>k</sup>; <sup>0</sup> ; SISij <sup>¼</sup> wij <sup>þ</sup> wji

� þ αj,i � � � �

<sup>T</sup> ∈ Rm, we can use Eq. (5) to represent each objects x<sup>i</sup> as a linear combination

λ αk k<sup>i</sup> <sup>1</sup> þ k k xi � Xiα<sup>i</sup> <sup>2</sup>, (6)

T .

<sup>2</sup> : (7)

<sup>2</sup> : (8)

λ αk k<sup>i</sup> <sup>1</sup> þ k k xi � Xiα<sup>i</sup> <sup>2</sup> s:t:αi, <sup>j</sup> > 0: (9)

,

and Euclidean (with Gaussian RBF baseline [12]), in cluster analysis of spam images.

coefficients of sparse representation.

160 Recent Applications in Data Clustering

matrix using different approaches.

opposite attributes value.

αi

<sup>∗</sup> <sup>¼</sup> arg min <sup>α</sup><sup>i</sup>

x<sup>i</sup> = [xi1, xi2,…, xim]

optimization:

2.4. Graph construction with sparse representation

αi

data objects. For example, the SIS measure [20] is computed as:

wij <sup>¼</sup> max <sup>α</sup>i,j; <sup>0</sup> � � P<sup>n</sup>

The l<sup>1</sup> Directed Graph Construction (DGC) measure [19] is computed as:

DGCij <sup>¼</sup> <sup>α</sup>i,j � � �

Obviously, the similarity calculation using the absolute coefficients in Eq. (8) will mistake the big negative coefficient as high similarity, resulting in a cluster of two objects with apparent

The non-negative SIS measure [22] adds a non-negative constraint in l<sup>1</sup> optimization Eq. (6):

<sup>∗</sup> <sup>¼</sup> arg min <sup>α</sup><sup>i</sup>

where X<sup>i</sup> = X\x<sup>i</sup> = [x1, …, x<sup>i</sup> � 1, x<sup>i</sup> + 1,…, xn]; α<sup>i</sup> = [αi,1, …, αi, (i-1), αi, (<sup>i</sup> + 1), …, αi,n]

$$\text{NN}\_{\vec{\eta}} = \frac{\alpha\_{i,j}}{\sum\_{k=1, k \neq i}^{n} \alpha\_{i,k}}. \tag{10}$$

#### 3. Sparse representation for spectral clustering

Our proposed clustering algorithm consists of three steps: (1) solving l<sup>1</sup> optimization of sparse representation to obtain the coefficients of each object; (2) constructing weight matrix between objects on the basis of coefficients using more global content forms the solution coefficients of sparse representation; and (3) exploiting the spectral clustering algorithm with the weight matrix to get partition of the graph.

Compared to the direct construction methods using the independent solutions of Eq. (6), we have the assumption that for any two objects x<sup>i</sup> and xj; the more similar they are, the more similar the corresponding coefficient vectors (e.g., α<sup>i</sup> and αj) are not only a particular coefficient (αi,j or αj,i). According to this assumption, we propose the following two graph adjacency structure and weight matrix constructions, which are expected to use the global information of the solution coefficients.

#### 3.1. Proximity based on a consistent sign set

To get the similarity of two objects clearly and logically, we firstly find an object set for each of the two different objects x<sup>i</sup> and x<sup>j</sup> from the object data set X, called CSS. This definition is based on the assumption that the more objects of which a pair of objects both positively contribute to the reconstruction, the more similar the pair of objects are. In particular, the sparse reconstruction coefficients corresponding to x<sup>i</sup> and x<sup>j</sup> for every object in this set are both positive, defined as follows:

$$\text{CSS}(\mathbf{x}\_i, \mathbf{x}\_j) = \left\{ \mathbf{x}\_k | \{ a\_{k,i} > 0 \land a\_{k,j} > 0 \}, k \neq i, k \neq j \right\} \text{ \forall i \neq j.} \tag{11}$$

Furthermore, we can construct graph adjacency structure and weight matrix as follows. A directed edge is placed between objects x<sup>i</sup> and xjj if CSS(xi, xj) 6¼ Φ and the weight between object x<sup>i</sup> and x<sup>j</sup> is defined as the ratio of the CSS(xi, xj)'s cardinal to the total number of objects:

$$w\_{ij} = \begin{cases} \frac{|\text{CSS}\{\mathbf{x}\_i, \mathbf{x}\_j\}|}{n} & \text{if } i \neq j \\ 0 & \text{if } i = j \end{cases} \tag{12}$$

where n is the total number of objects in X. Obviously, the weight is between 0 and 1.

#### 3.2. Proximity based on cosine similarity of coefficient vector

We can construct coefficient matrix А of data set X, to which transforming solution coefficients of Eq. (6) are:

$$\mathbf{A}(\mathbf{i}, \mathbf{j}) = \boldsymbol{\alpha}\_{i,j}^{'} = \begin{cases} \boldsymbol{\alpha}\_{i,j} & \mathbf{i} \neq \mathbf{j} \\ \mathbf{0} & \mathbf{i} = \mathbf{j} \end{cases} \tag{13}$$

• wij ¼


5 � 5 matrix:

tively.

3.4. Algorithm description

Obviously, the inner product (DA<sup>i</sup>

DA<sup>i</sup> � DAj n

8 < :

0 i ¼ j

А ¼ α 0 i,j ¼

According to the above introduction of different weight constructions:

as high similarity while CSS and COS both give lower similarity.

i 6¼ j

DAj

, where DAi

) between DA<sup>i</sup>

To illustrate the differences between our approaches for weight construction and others also using sparse representation, an example is given as follows. Assume that the coefficient matrix А of a data set with five objects obtained from solution coefficients of Eq. (6) is as the following

1. SIS<sup>13</sup> = 0.4, SIS<sup>12</sup> = 0.2, DGC<sup>13</sup> = 0.5 and DGC<sup>12</sup> = 0.35, and these numbers show that the similarity between x<sup>1</sup> and x<sup>3</sup> is larger than that between x<sup>1</sup> and x2. However, in our approaches using more entries in А, CSS<sup>13</sup> = 1/5 = 0.2, CSS<sup>12</sup> = 2/5 = 0.4, COS<sup>13</sup> = 0.24 and COS<sup>12</sup> = 0.98 and these numbers show the different weights compared to the first group, where CSS and COS are the abbreviation of the above two proximity approaches, respec-

2. DGC<sup>25</sup> = 0.45, CSS<sup>25</sup> = 0, COS<sup>25</sup> = 0.16, thus DGC mistakes the big negative coefficient (α25'

Algorithm 1 describes the general procedure for spectral clustering of high-dimensional data, using sparse representation. The basic idea is to extract coefficients of sparse representation (Lines 1–4); construct a weight matrix using the coefficients (Line 5); and feed the weight matrix into a spectral clustering algorithm (Line 6) to find the best partitioning efficiently.

Algorithm 1. General procedure for spectral clustering of high-dimensional data.

Input: high-dimensional training data set X = [x1, x2, …, xn] ∈ R<sup>m</sup> � <sup>n</sup>

Output: cluster labels corresponding to each data object: c = [c1, c2,…, cn]

data object; the number of clusters K. Parameter: penalty coefficient λ for Lasso optimization

//standardize the input data for Lasso optimization

0 0:3 0:6 0:6 -0:7 0:400:5 0:6 -0:6 0:4 0:4 0 -0:1 -0:2 -0:6 -0:3 0:200:7 -0:5 0:3 0:2 0:4 0

and DA<sup>j</sup>

denotes the i-th column vector of DA.

Robust Spectral Clustering via Sparse Representation http://dx.doi.org/10.5772/intechopen.76586

, where x<sup>i</sup> = [xi1, xi2,…, xim]T ∈ R<sup>m</sup> represents the i-th

is equal to CSS (xi, xj)'s cardinal

)

163

A directed edge is placed from object x<sup>i</sup> and x<sup>j</sup> if angle cosine of the two corresponding vectors is greater than 0, that is:

$$\frac{\alpha\_i^{'} \cdot \alpha\_j^{'}}{||\alpha\_i^{'}||\_2 \times \left||\alpha\_j^{'}||\_2} > 0,\tag{14}$$

where α<sup>i</sup> ' denotes the i-th row vector of А.

The weight between object x<sup>i</sup> and x<sup>j</sup> is defined as the cosine similarity of α<sup>i</sup> ' and α<sup>j</sup> ' :

$$w\_{i\bar{\imath}} = \begin{cases} \max\left(0, \frac{\boldsymbol{\alpha}\_i^{\prime} \cdot \boldsymbol{\alpha}\_j^{\prime}}{||\boldsymbol{\alpha}\_i^{\prime}||\_2 \times \left||\boldsymbol{\alpha}\_j^{\prime}||\_2}\right||\_2\right) & \boldsymbol{i} \neq \boldsymbol{j} \\\ 0 & \boldsymbol{i} = \boldsymbol{j} \end{cases} \tag{15}$$

From the above similarity calculation formula, two objects have large similarity in condition that the corresponding solution coefficients of Eq. (6) are much similar, which is expected to use the whole solution coefficients.
