**3. Unsupervised clustering**

Unsupervised machine learning refers to learning the input but without a reference to known or labelled data. Unlike to supervised machine learning, unsupervised machine learning algorithms cannot be directly applied to a classification problem since there is no prior knowledge of either the number of object classes or each class threshold. Instead, unsupervised learning can be used for discovering the underlying structure or pattern of the data. Thus, the term "*clustering*" refers to a process of grouping similar things together [14]. Therefore, unsupervised learning can be used for discovering such clusters in the input data.

K-means is an unsupervised algorithm for clustering m objects into k clusters in which each observation belongs to the cluster with the nearest mean [15]. Each centroid is a point in a 2- or N-dimensional space that represents the centre of the cluster. **Figure 1** shows an example of K-means clustering algorithm [17]. The algorithm begins with k randomly placed centroids and assigns every item to the nearest one. After the initial assignment, the centroids start being moved to the average location of all the nodes assigned to them, and new assignments of objects to centroids are redone. The process repeats until the centroids stop being moved by the algorithm [18].

Bennet et al. [19] described a method for unsupervised classification in multitemporal optical RSIs based on discreet wavelet transform (DWT) feature extraction and K-means clustering is proposed. After pre-processing the optical image, they applied a feature extraction using the DWT for creating the input vectors. Then, the authors applied a feature reduction for selecting the most discriminative features using an energy based selection. Finally, they used K-means clustering for unsupervised learning of the input data clusters and compared the results by labelling the clusters using ground truth data. Shulei Wu et al. [20] introduced a novel classification method based on K-means using hue, saturation, value (HSV) colour features. Their novel method with HSV data produced higher classification accuracy results when tested with Landsat satellite data than K-means method with RGB data. Abbas et al. [21] compared K-means unsupervised clustering method with the Iterative Self-Organising Data Analysis Technique Algorithm (ISODATA) unsupervised method for automatically grouping pixels of similar spectral features from remote sensing images.

#### **Figure 1.**

*Example of K-means clustering algorithm. There are initially k = {k1} centroids assigned in the object Mdimensional space. As the algorithm goes through the recursive steps the centroids are re-assigned and the clusters boundaries are moved around the objects space. (adapted by [16]).*

*A Cognitive Digital-Optical Architecture for Object Recognition Applications in Remote… DOI: http://dx.doi.org/10.5772/intechopen.109028*

Vishwanath et al. [22] combined K-means unsupervised clustering with a Laplacianof-Gaussian and a Prewitt filters for improving the classification and road edge detection in RSIs. Yin et al. [23] applied K-means clustering algorithm on Lidar based 3D object detection and classification tasks in automated driving (AD). Specifically, they used K-means for 3D points cloud segmentation. The authors reported a high-speed 3D object recognition when run using a GPU enabled platform. Huu Thu Nguyen et al. [24] combined deep learning algorithms with K-means clustering for achieving multiple object detection in both sonar images and 3D point cloud Lidar data.

**Figure 2** shows the K-means algorithm flowchart [25]. The algorithm is a recursive one where previous steps in the flowchart will be called in another step afterwards. The first basic step of this recursive-type algorithm is to determine the number of clusters K. We assume the centroid of these clusters, which it can be any random objects. Alternatively, we can assign as the initial centroids to be the first K objects in sequence. The algorithm as shown on **Figure 2** consists of the following three recursive steps:

1.Determine the centroid coordinate.


Euclidean Distance d between two points *p*<sup>1</sup> *x*1, *y*<sup>1</sup> � � and *<sup>p</sup>*<sup>2</sup> *<sup>x</sup>*2, *<sup>y</sup>*<sup>2</sup> � � in X-Y two-dimensional (2D) space is given by:

$$\mathbf{d}(\mathbf{p}\_1, \mathbf{p}\_2) = \sqrt{\left(\mathbf{x}\_2 - \mathbf{x}\_1\right)^2 + \left(\mathbf{y}\_2 - \mathbf{y}\_1\right)^2} \tag{1}$$

**Figure 2.** *K-means clustering algorithm flowchart –recursive steps (adapted by [25]).*

We can re-write the above equation for points pn, P is the vector of all the points where n = 1 … M, m is the index of cluster points and M is the total number of points, and centroid point *τ*<sup>k</sup> where k = 1 … .C is the index of centroid points, C is the total number of centroid points and T is the vector which contains all the centroid points:

$$\mathbf{d}\left(\mathbf{p}\_{\mathbf{n}},\,\tau\_{\mathbf{k}}\right) = \sqrt{\left(\tau\_{\mathbf{x}\_{\mathbf{k}}} - \mathbf{p}\_{\mathbf{x}\_{\mathbf{n}}}\right)^{2} + \left(\tau\_{\mathbf{y}\_{\mathbf{k}}} - \mathbf{p}\_{\mathbf{y}\_{\mathbf{n}}}\right)^{2}}\tag{2}$$

Then each cluster point pn is assigned to a cluster based on estimating the minimum of the distance *argmindist*ðÞ to a centroid *τ*<sup>k</sup> which is given by:

$$\arg\min\_{\tau\_{\mathbf{k}} \in \mathcal{T}} \text{dist}(\tau\_{\mathbf{k}}, \mathbf{p}\_{\mathbf{n}})^2 \tag{3}$$

Then, we can compute the new centroid *τ*<sup>k</sup> from the clustered group of points by the equation:

$$\sigma\_{\mathbf{k}} = \frac{1}{|\mathbf{S}\_{\mathbf{n}}|} \sum\_{\mathbf{p}\_{\mathbf{n}} \in \mathbf{S}\_{\mathbf{n}}} \mathbf{p}\_{\mathbf{n}} \tag{4}$$

where Sn is the set of all 2D data points assigned to the kth cluster.

Assume now we have two points in X-Y-Z three-dimensional (3D) space. Then, the Euclidean Distance between those two 3D points *p*3D1 *x*1, *y*1, z1 � � and *p*3D2 *x*2, *y*3, z2 � � is given by:

$$\mathbf{d}\left(p\_{\text{3D}\_1}, p\_{\text{3D}\_2}\right)\_{\text{3D}} = \sqrt{\left(\mathbf{x}\_2 - \mathbf{x}\_1\right)^2 + \left(\mathbf{y}\_2 - \mathbf{y}\_1\right)^2 + \left(\mathbf{z}\_2 - \mathbf{z}\_1\right)^2} \tag{5}$$

We can re-write the above Eq. (2) for a 3D centroid point *τ*k3D where k = 1 … .C is the index of 3D centroid points, C is the total number of 3D centroid points and T3D is the vector which contains all the 3D centroid points:

$$\mathbf{d}\left(\mathbf{p}\_{\text{3D}\_{\text{h}}}, \pi\_{\text{3D}\_{\text{k}}}\right)\_{\text{3D}} = \sqrt{\left(\mathbf{r}\_{\text{x}\_{\text{k}}}^{\text{3D}} - \mathbf{p}\_{\text{x}\_{\text{k}}}^{\text{3D}}\right)^{2} + \left(\mathbf{r}\_{\text{y}\_{\text{k}}}^{\text{3D}} - \mathbf{p}\_{\text{y}\_{\text{k}}}^{\text{3D}}\right)^{2}}\tag{6}$$

where p3Dn are the 3D cluster points, P3D is the vector of all the 3D points where n=1… M, n is the index of 3D cluster points and M is the total number of 3D points, and centroid point *τ*3Dk where k = 1….C is the index of 3D centroid points, C is the total number of centroid points and T3D is the vector which contains all the 3D centroid points.

Then, Eq. (3) can be re-written for each 3D data point and for estimating the minimum of the distance *argmin*dist3DðÞ to a centroid *τ*3Dk to a centroid as follows:

$$\arg\min\_{\tau\_{\text{3D}\_k}\in T\_{\text{3D}}} \text{dist}\left(\tau\_{\text{3D}\_k}, \mathbf{p}\_{\text{3D}\_n}\right)\_{\text{3D}}^2\tag{7}$$

Then, Eq. (4) can be re-written for a 3D centroid as:

$$\tau\_{\text{3D}\_{\text{h}}} = \frac{1}{|\text{S}\_{\text{3D}\_{\text{n}}}|} \sum\_{\text{p}\_{\text{3D}\_{\text{n}}} \in \text{S}\_{\text{3D}\_{\text{n}}}} \text{P}\_{\text{3D}\_{\text{n}}} \tag{8}$$

where S3Dn is the set of all 3D data points assigned to the kth cluster.
