**5. Neighbor smoothing integrated into SSK**

The lack of labeled data in our SSK semi-supervised methodology diminishes the classifier's ability. To achieve a more accurate classification, we used a smoothing process, which utilizes geographic neighbor similarity. Cells located close by in the geographic space have a greater chance of sharing the same land use because lands of unified social function are arbitrarily divided into cells and, thereby, neighboring cells tend to share similar social functions. To prevent confusion, we would like to emphasize that there are two different types of neighbors in the context of SSK—featurespace neighbors and geographic neighbors. Until this point in the paper, we have discussed feature-space neighbors. Two cells are considered feature-space neighbors if the Euclidian distance between their feature representations is relatively small. In the SSK without smoothing, only feature-space neighbors were considered. Geographicspace neighbors are cells closely located on the geographical map, and therefore, we use them for geographical smoothing.

Smoothing makes the results more homogenous in the geographical space. It causes the algorithm to be more accurate overall, but less sensitive to island land uses, relatively small lands that include a social function that is different from its surrounding areas. Because geographical space smoothing diminishes the chance of identifying these lands, we evaluated different degrees of smoothing, thus, controlling the trade-off between accuracy and sensitivity to island land uses.

The smoothing is integrated into the SSK process; in each iteration, before assigning a class, the geographic neighbors are also considered. The land-use array *Aq*, computed by the feature-space neighbors of *xq*, is weighted with the geographical neighbors' land-use arrays (computed by their feature-space neighbors) to create an integrated array that is used for classification and confidence estimation. The rest remains the same—in every iteration, 5% of the samples are added to the training set *G*, with a proportion of the number of samples assigned to each class, and the process ends when all samples are labeled (or before, depending on the user/ application).

To weigh between the query cell land-use array and its geographic neighbors' landuse arrays, we first need to define a neighbor. *xi* is considered as *xq*'s geographic neighbor if the geographical distance between them is smaller than a distance denoted as *radiusq*. The distance between two cells is defined as the distance between their geographical centers. The neighbors' radius of query cell *xq* is given by:

$$radius\_q = 3\sqrt{\left(width\_q/2\right)^2 + \left(height\_q/2\right)^2},\tag{8}$$

where *widthq* and *heightq* are *xq*'s width and height (meters).

The square root expression in Eq. (6) is the length of half of the cell's diagonal. That way, the radius is fitted to the size and shape of the cell. Half the diagonal is multiplied by 3 because, in a preliminary study, it was found to fit the problem. **Figure 7(left)** demonstrates the query cell's neighbor radius. Cell *xq* is the default squared cell–

$$200 \times 200 \text{ m}^2; \text{ therefore, } radius\_q = 3\sqrt{\left(200/2\right)^2 + \left(200/2\right)^2} = 424.3 \text{m. In the example}$$

**Figure 7.**

*(left) The neighbors' radius for the query cell q. (right) An example in which query cell xq has two equally closed neighbors xa and xb.*

in **Figure 7**, six cells' centers fall inside the circle formed by the neighbors' radius and, thus, those six cells, numbered 1 to 6, are considered as *xq*'s neighbors.

In **Figure 7(left)**, the cells within the neighbors' radius of *xq* lay on different geographical distances from the center of *xq*. For example, the centers of cells *x*3, *x*1, and *x*<sup>6</sup> are 200, 283, and 400 meters away, respectively. We want to weigh the contribution of a neighbor according to its distance from the query cell because the closer the neighbor is, the greater the chance that it shares the same land use as the query cell. The weights are given by:

$$\mathcal{W}\_q^{(i)} = \frac{1}{D\left(\varkappa\_q, \varkappa\_i\right)^2} \quad \forall i \in nbrs\_q,\tag{9}$$

where *nbrsq* is the set of *xq* 0 *s* neighbors, and *D xq*, *xi* � � is the geographical-based distance between query cell *xq* and its neighbor *xi*.

In the example demonstrated in **Figure 7(left)**, the weights of cells *x*3, *x*1, and *x*<sup>6</sup> are *W*ð Þ<sup>3</sup> *<sup>q</sup>* <sup>¼</sup> <sup>1</sup>*=*200<sup>2</sup> , *W*ð Þ<sup>1</sup> *<sup>q</sup>* <sup>¼</sup> <sup>1</sup>*=*2832 , and *W*ð Þ <sup>6</sup> *<sup>q</sup>* <sup>¼</sup> <sup>1</sup>*=*400<sup>2</sup> . Notice that between these three cells, cell *x*<sup>3</sup> is the closest to cell *xq*, thus its weight is the highest accordingly.

Notice, we denote distances differently in the feature space and the geographical space. Lower case *d* is a distance in the feature space (Eq. (1)), and upper case *D* is a distance in the geographical space (Eq. (7)).

We then compute an array *NAq* that combines land-use array for *xq*'s neighbors by weighting every neighbor's distance from *xq*:

$$\text{NA}\_q = \frac{\sum\_{i \in nbrs\_q} \mathcal{W}\_q^{(i)} \mathcal{A}\_i}{\sum\_{i \in nbrs\_q} \mathcal{W}\_q^{(i)}}. \tag{10}$$

For demonstrating the mathematical equations used for integrating neighbor smoothing in SSK, we will use the example illustrated in **Figure 7(right)**. *xq* has only two neighbors, *xa* and *xb*. Since *xa* and *xb* are located at the same distance from *xq*, their weights are equal, *W*ð Þ *<sup>a</sup> <sup>q</sup>* <sup>¼</sup> *<sup>W</sup>*ð Þ *<sup>b</sup> <sup>q</sup>* <sup>¼</sup> <sup>1</sup>*=*268<sup>2</sup> .

Let us assume the land-use arrays are *Aa* ¼ ð Þ 0, 0, 1, 0 and *Ab* ¼ ð Þ 0, 0*:*8, 0, 0*:*2 . Then *NAq* is calculated by the weighted average of *Aa* and *Ab*:*NAq* <sup>¼</sup> *<sup>W</sup>*ð Þ *<sup>a</sup> <sup>q</sup> Aa*þ*W*ð Þ *<sup>b</sup> <sup>q</sup> Ab W*ð Þ *<sup>a</sup> <sup>q</sup>* <sup>þ</sup>*W*ð Þ *<sup>b</sup> q* ¼ <sup>1</sup>*=*<sup>2682</sup> ð Þ*Aa*<sup>þ</sup> <sup>1</sup>*=*<sup>2682</sup> ð Þ*Ab* <sup>1</sup>*=*<sup>2682</sup> ð Þþ <sup>1</sup>*=*<sup>2682</sup> ð Þ <sup>¼</sup> *Aa*þ*Ab* <sup>2</sup> <sup>¼</sup> ð Þþ 0, 0, 1, 0 ð Þ 0, 0*:*8, 0, 0*:*<sup>2</sup> <sup>2</sup> <sup>¼</sup> ð Þ 0, 0*:*8, 1, 0*:*<sup>2</sup> <sup>2</sup> ¼ ð Þ 0, 0*:*4, 0*:*5, 0*:*1 . As

*Mapping of Social Functions in a Smart City When Considering Sparse Knowledge DOI: http://dx.doi.org/10.5772/intechopen.104901*

can be seen, the value in entry 3 (0.5) is the highest in the array, indicating that *xq*'s neighbors tend to be attributed to class 3, that is because *xa* and its corresponding land-use array *Aa* are 100% attributed to class 3. However, *xq*'s neighbors also tend to be attributed to class 2, which is because *xb* is most likely attributed to class 2.

*Aq* and *NAq*, the query cell land-use array and its neighbor's land-use array, are integrated to *IAq* by calculating their weighted average:

$$IA\_q = P \bullet \! \! \! \! \! \! \! \! \! \/ A\_q + (\! \! \! \! - \! \! \! \/) \! \! \! \! \/ A\_q \! \! \! \! \/ \/ A\_q \! \! \! \! \/ \/ A\_q \! \! \! \/ \/ A\_q$$

where *P* is the weight of *NAq* and, therefore, it is given to all of *xq*'s neighbors together. We denote *P* as the neighbor weight. For example, consider again the example in **Figure 7(right)** and assume *P* ¼ 0*:*3 and *Aq* ¼ ð Þ 0*:*1, 0*:*8, 0*:*1, 0 . Then,

$$IA\_q = 0.3 \cdot (0, 0.4, 0.5, 0.1) + 0.7 \cdot (0.1, 0.8, 0.1, 0) = (0.07, 0.68, 0.22, 0.03). \tag{12}$$

Examining *Aq* extracted by *xq*'s feature-space neighbors, it seems like *xq* has the highest chance to be attributed to class 2, but examining *NAq*, extracted by *xq*'s geographic-space neighbors, it seems most likely that it belongs to class 3. However, after incorporating both spaces, *xq* is most likely attributed to class 2

The neighbor weight *P* depends on the number of geographic neighbors *xq* has. The more neighbors it has, the more reliable their weighted array is, and we want it to have a more significant role in determining *xq*'s class. The formula for computing *P*

$$P(|nbr\mathbf{r}\_q|, \sigma) = \begin{cases} \sigma + \sigma \frac{(|nbr\mathbf{r}\_q| - 1)}{11} & |nbr\mathbf{r}\_q| > 0 \\ 0 & |nbr\mathbf{r}\_q| = 0 \end{cases},\tag{13}$$

where *nbrsq* � � � � is the number of neighbors that *xq* has, and *σ* is the smoothing parameter that determines the degree of influence that the neighbors have in the classification of the query cell. Setting a low *σ*, for example, will cause the neighbors of the query cells to be less significant in the classification.

In the example above, *P* ¼ 0*:*3, because the number of neighbors *nbrsq* � � � � ¼ 2 (as can be seen in **Figure 7(right)**), and *σ* ¼ 0*:*275. Therefore, *P nbrsq* � � � �, *<sup>σ</sup>* � �=0*:*<sup>275</sup> <sup>þ</sup>0*:*<sup>275</sup> ð Þ <sup>2</sup>�<sup>1</sup> <sup>11</sup> ¼ 0*:*3.

Eq. (9) is designed in a way that when *xq* has only one neighbor, its neighbor weight is *P nbrsq* � � � � <sup>¼</sup> 1, *<sup>σ</sup>* � � <sup>¼</sup> *<sup>σ</sup>*, whereas if *xq* has 12 neighbors (the maximal number of neighbors because more neighbors cannot fit inside the neighbor's radius considering the shape and size of the cells), then *P nbrsq* � � � � <sup>¼</sup> 12, *<sup>σ</sup>* � � <sup>¼</sup> <sup>2</sup>*σ*. The value of *<sup>P</sup>* grows linearly between the case of only one neighbor and the case of 12 neighbors. If the query cell does not have any neighbors, then *P nbrsq* � � � � <sup>¼</sup> 0, *<sup>σ</sup>* � � <sup>¼</sup> 0, and *IAq* <sup>¼</sup> 0∙*NAq* þ ð Þ 1 � 0 ∙*Aq* ¼ *Aq*. Because there are no neighbors to consider, *NAq* will have no influence on setting *IAq*, and *IAq* ¼ *Aq*.

The classification confidence is calculated as in Eq. (3), but here it is calculated over *IAq* instead of *Aq*

$$
overline{e}\_q = \max\left(\mathrm{IA}\_q\right),\tag{14}$$

where in the example, *confidenceq* ¼ *max* ð0*:*07, 0*:*68, 0*:*22, 0*:*03Þ ¼ 0*:*68.

Again, in each iteration, the number of samples added to *G* from each class is proportional to the number of cells assigned to that class in this iteration. If *confidenceq* is high enough, then *xq* is classified as the class with the highest value in *IAq*. The algorithm ends when all samples are added to *G* (or before based on the user/application).

The procedure of the SSK algorithm with neighbor smoothing:


4.For each *xq* ∈ *Q* (for each yet unlabeled sample)

$$\begin{array}{l} \textbf{a. } A\_{q} = \frac{\sum\_{i=1}^{k} w\_{q}^{(i)} A\_{i}}{\sum\_{i=1}^{k} w\_{q}^{(i)}} \text{ (land-use array) (Eq. (2))}\\\\ \textbf{b. } radius\_{q} = \textbf{3} \* \sqrt{\left(\frac{width\_{q}}{2}\right)^{2} + \left(\frac{height\_{q}}{2}\right)^{2}} \text{ (neighbor radius) (Eq. (5))} \end{array}$$

b.  $radius\_q = \mathbf{3} \* \sqrt{\left(\frac{width\_q}{2}\right)^2 + \left(\frac{height\_q}{2}\right)^2}$  (neighbar radius) (Eq.

$$\mathsf{c. } nbrs\_q \leftarrow \mathcal{O}$$

d. For each *xi* ∈ *G*

If *D xq*, *xi* � �<*radiusq* then *nbrsq nbrsq*∪*xi* � � (add to *nbrsq* the *xi* neighbor)

$$\text{e. } w\_q^{(i)} = \frac{1}{D\left(x\_q, x\_i\right)^l} \forall i \in nbr\pi\_q \text{ (Eq. (7))}$$

$$\text{f. NA}\_q = \frac{\sum\_{i \in \text{nbr}\_q} w\_q^{(i)} A\_i}{\sum\_{i \in \text{nbr}\_q} w\_q^{(i)}} \text{ (neighbours' land-use array) (Eq. (8))}$$


$$\text{i. } confidence\_q = max \left(IA\_q\right) \text{ (Eq. (11))}$$

5.For each land-use class *c*


*Mapping of Social Functions in a Smart City When Considering Sparse Knowledge DOI: http://dx.doi.org/10.5772/intechopen.104901*
