**5.3 Data clustering**

The k-means clustering is a method of vector quantization that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean based on a similarity metric [20].

Let *<sup>S</sup>* <sup>¼</sup> *<sup>f</sup>* <sup>1</sup>, *<sup>f</sup>* <sup>2</sup>, *::* … *<sup>f</sup> <sup>n</sup>* and *<sup>K</sup>* <sup>¼</sup> *<sup>C</sup>*1,*C*2, *:* … *Cn* then *Ci* 6¼ *<sup>ϕ</sup>*, *Ci* <sup>∩</sup>*Cj* <sup>¼</sup> *<sup>ϕ</sup>*, and <sup>∪</sup>*<sup>K</sup> <sup>i</sup>*¼<sup>1</sup>*Ci* ¼ *S*, where *i*, *j* ¼ 1, 2, … *K*, and *i* 6¼ *j*.

The clustering method works as follows:


Randomly initialize K:


When deploying k-mean to cluster IoT traffic, we reduce the dimension data to quantize the data into single-dimensional data. We classify the traffic flow depending on similar characteristics and identify clusters of homogeneous traffic flows and define their borders. It needs to have high intra-cluster homogeneity and inter-cluster heterogeneity.

For determining the optimal (fitting) value of K, we use the elbow algorithm repeatedly applying different values of K and plotting their heterogeneity. When the curve begins to flatten, it reaches the optimal value of K.

#### **5.4 Dimensionality reduction**

After clustering the data, initially, we have n states ð Þ *S*1, *S*2, … , *Sn* , applying the dimension reduction technique results in a new set of m states ð Þ *s*1, *s*2, … , *sm* where ð Þ *m* <*n* , *si* ¼ *fi* ð Þ *S*1, *S*2, … , *Sn* and *fi* represents a mapping function. Thus, its idea is to transform the massive data from a high-dimensional space, like IoT data, into a k-dimensional sub-space by partitioning the data-space into fully connected states. The low-dimensional form holds the top eigenvector v which has the meaningful features of the real data ideally close to its natural dimension [22]. The new set of features is extracted by some functional mapping. In this model, we considered the principal component analysis (PCA) and calculated it using the singular value decomposition (SVD) as the PCA presents a structure for reducing data dimensionality by outlining the maximum variation in the data.

To achieve dimension reduction by applying PCA, it requires placing the eigenvalues from the highest to lowest by their value. This ordering stores the elements in order of weight to the variance of the initial data matrix. This will allow us to drop the less important elements. Thus, we keep most of the information and lose a little noise. We can reduce the dimension of the original data. For instance, for any data of d dimensions, we only take the first r eigenvectors:

$$\frac{\sum a\_r}{\sum a\_d} = \frac{a\_1 + a\_2 + \dots + a\_r}{a\_1 + a\_2 + \dots + a\_d} \tag{1}$$

$$a = a\_1, a\_2, \dots, a\_r \tag{2}$$

**Definition 1** [23]: For any matrix Y ¼ *y*1, *y*2, … , *yn* of the size *K* � *d* can be re-written as *<sup>Y</sup>* <sup>¼</sup> *USV<sup>T</sup>* where, U is an orthonormal matrix of size *<sup>K</sup>* � *<sup>r</sup>*, S is a diagonal matrix of size *r* � *r*, V is a matrix of eigenvectors of size *r* � *d* (a column is an eigenvector) (see **Figure 2**).

*An Effective Method for Secure Data Delivery in IoT DOI: http://dx.doi.org/10.5772/intechopen.104663*

**Figure 2.** *Singular value decomposition of Y [23].*

Assume that the data matrix Y is centered, i.e., the column means have been subtracted to be equal to zero. The covariance matrix (C) is calculated by:

$$C = \frac{\mathbf{X}^T \mathbf{X}}{K - 1} \tag{3}$$

Because the covariance matrix is symmetric, it can be diagonalized by:

$$\mathbf{C} = \mathbf{V} \mathbf{L} \mathbf{V}^T \tag{4}$$

where V is an eigenvectors matrix and L is a diagonal matrix with eigenvalues *λi*. The eigenvectors are called principal axes of the data, and, the data projections on the principal axes are called principal components [24]. After obtaining the singular value decomposition, C is defined by:

$$\mathbf{C} = \frac{\mathbf{V} \mathbf{S} \mathbf{U}^T \mathbf{U} \mathbf{S} \mathbf{V}^T}{K - 1} \tag{5}$$

$$\mathbf{H} = \mathbf{V} \frac{\mathbf{S}^2}{K - \mathbf{1}} \mathbf{V}^T \tag{6}$$

As the result, the eigenvectors of C are the same as the matrix V (the right singular vectors of Y) and the eigenvalues of C can be defined from the singular values *λi*.

$$
\lambda\_i = \frac{s\_i^2}{(K-1)} \lambda\_i \tag{7}
$$

The principal components are defined by:

$$\mathbf{V}\mathbf{V} = \mathbf{U}\mathbf{S}\mathbf{V}^T\mathbf{V} = \mathbf{U}\mathbf{S} \tag{8}$$

In short, the PCA is calculated as follows:


Based on Oja's algorithm for stochastic PCA optimization [25], the primary concept of our algorithm is to implement stochastic m updates by uniformly sampling the columns *yi* at random, and reduce the variance of these updates.

$$X\_t' = X\_{t-1} + \eta y\_{i\_\theta} y\_{i\_t}^T X\_{t-1} \tag{9}$$

$$X\_t = \frac{1}{||X\_t'||} X\_t \tag{10}$$

We use the variance-reduced stochastic schemes for convex optimization [23] to reduce the stochastic variance. Let *<sup>B</sup>* <sup>¼</sup> <sup>1</sup> *<sup>n</sup>XX<sup>T</sup>*; then the updates in each iteration can be rewritten of our algorithm as

$$X' = (I + \eta \mathcal{B}) X\_{t-1} + \eta \left( \mathcal{Y}\_i \mathcal{Y}\_{i\_t}^T - \mathcal{B} \right) \left( X\_{t-1} - \overline{X}\_{f-1} \right) \tag{11}$$

$$X\_t = \frac{1}{||X\_t'||} X\_t \tag{12}$$

The algorithm is burst into periods f = 1, 2, 3,. .., wherein all period we do a single exact power iteration by computing *U*. The steps to solve the problem are explained by the pseudo-code in Algorithm 1.

**Algorithm 1**: Dimensionality Reduction Algorithm.

$$\begin{array}{ll}\textbf{Input Matrix } Y = (\boldsymbol{\nu}\_{1}, \ldots, \boldsymbol{\nu}\_{n}). \\\textbf{Output Matrix } \overline{X}\_{f}. \\\textbf{1: }\textbf{Initialize } \text{Orthonormal Matrix } \overline{X}\_{k \times d}. \\\textbf{2: }\textbf{For } f = 1, 2, 3 \ldots K \quad \textbf{Do} \\\textbf{3: }\overline{X}\_{i} = \frac{1}{n} \sum\_{j} \boldsymbol{\nu}\_{i} (\boldsymbol{y}\_{i}^{T} \overline{X}\_{f-1}) \\\textbf{4: }\overline{X}\_{0} = \overline{X}\_{f-1} \\\textbf{5: }\textbf{For } t = 1, 2, 3, \ldots m \quad \textbf{Do} \\\textbf{6: }B\_{t-1} = U U^{T}, \text{ where} \\\textbf{U} \boldsymbol{Y}^{T} \text{ is an decomposition of } X\_{t-1}^{T} \overline{X}\_{f-1} \\\textbf{7: }\textbf{Set }\boldsymbol{t}\_{i} \in \textbf{1, 2, 3, \ldots n} \text{ uniformly at random} \\\textbf{8: }\boldsymbol{X}\_{t}^{\prime} = \boldsymbol{X}\_{t-1} + \eta \left( \boldsymbol{y}\_{i} \left( \boldsymbol{y}\_{i}^{T} \boldsymbol{X}\_{t-1} - \boldsymbol{y}\_{i}^{T} \overline{X}\_{f-1} \boldsymbol{B}\_{t-1} \right) + \overline{\boldsymbol{\mathcal{U}}} \boldsymbol{B}\_{t-1} \right) \\\textbf{9: }\boldsymbol{X}\_{t} = \boldsymbol{X}\_{t}^{\prime} \left( \boldsymbol{X}^{T} \boldsymbol{\boldsymbol{t}}\_{i}^{\prime} \right)^{1/2} \\\textbf{10: }\textbf{is is to ensure that } \boldsymbol{W}\_{t} \text{ has orthonormal columns} \\\textbf{11} \boldsymbol{\overline{X}\_{f} = \boldsymbol{X}\_{m} \\\textbf{12: end for} \end{array}$$
