**6. Empirical evaluation of neighbor smoothing integrated into SSK**

In this section, we evaluate the effect of the neighbor smoothing integrated into SSK. **Figure 9** compares the SSK accuracy with different neighbor smoothing values *σ*, varying from 0 (no smoothing performed) to 0.25. As *σ* is higher, the accuracy rate is higher, varying from 74% when no smoothing is performed to 80% when *σ* is 0.25.

Recall that the accuracy rate of RF is 84%. Although not reaching RF's accuracy rate, the smoothing enables SSK accuracy to be significantly close to that of RF even though the latter is a supervised paradigm that uses a much bigger training set (87.5% of the cells are labeled and used as ground truth for training the RF in each of the eight cross-validation folds, comparing to only 5% of the cells that are used by the SSK). However, the effectivity of the smoothing process is overestimated because the neighbor similarity property that the neighbor smoothing relies on is exaggerated in our dataset. In the process of selecting the areas, we chose ones that are homogenous in land use, and their "real" land-use label is relatively easy for locals to determine. This means that most areas include only one land use in a specific hour. Homogenous areas have some advantages—they are practical for labeling, and they can serve to assess the process feasibility, but they are less representative of normal urban behavior. Thus, the areas we selected are overly homogenous. Therefore, the chance of neighboring cells sharing the same land use is higher than in normal urban behavior.

*Mapping of Social Functions in a Smart City When Considering Sparse Knowledge DOI: http://dx.doi.org/10.5772/intechopen.104901*

#### **Figure 9.** *Effect of smoothing parameter σ on the accuracy rate (Acc).*

Island land uses located in the heart of other land uses, to which the neighbor smoothed SSK is less sensitive, occur less frequently in our data. We do expect this process to also perform well in a less homogenous dataset, however, in a more limited manner. We expect the algorithm to perform better when setting a higher smoothing parameter value, up to a point where the results become too homogenous, causing too many errors in identifying island land uses.

**Figure 10** compares the geographical confusion maps of SSK classification without (**Figure 10a** and **b**) and with (**Figure 10c** and **d**) neighbor smoothing with *σ* ¼ 0*:*25 on the work hours 8 a.m. to 5 p.m. in Ra'anana (**Figure 10a** and **c**) and Kiryat Arye, an industrial area of Petch Tikva (**Figure 10b** and **d**). Recall that the colors in each cell demonstrate accumulation of the classification results of the different hours and various random cells chosen to be used for the initial set of labeled cells.

The smoothing causes the classification assignment to be more consistent and less influenced by the randomness effect caused by randomly chosen cells with predefined land use. Considering more factors in the cell class assignment, that is, considering the cell's neighbors, diminishes the effect of randomness and lowers the classification variance. For example, see the classification of the industrial cells in Kiryat Arye. This is an area of homogenous social function, and the smoothing makes classification there more consistent. The cells are more uniformly colored in the same color (yellow) indicating that they were classified to the same class in more of the iterations. The smoothing also lowers SSK's bias. Because of the smoothing, all cells in Kiryat Arye are correctly classified as Industrial in most of the algorithm iterations. Without smoothing, 35 out of the 42 cells are well classified in most of the runs, while with smoothing, all 42 cells are well classified in most of them. For example, the bottom-right cell in Kiryat-Arye without smoothing (**Figure 10b**) is incorrectly classified in most runs (note the small yellow area indicating "Industrial" compared to the other colors), whereas with smoothing (**Figure 10d**), this cell is mostly correctly classified as "Industrial."

On the downside, neighbor smoothing diminishes the ability to identify "island" land uses. For example, see the commercial island street in Ra'anana located in the heart of several neighborhoods. Notice that even before smoothing (**Figure 10a**), SSK mostly classified it as Residential, as it is affected by nearby residential cells (as described above). Because the triangulating signal strength location estimation

#### **Figure 10.**

*Geographical confusion maps of SSK without (a, b) and with (c, d) smoothing (σ* ¼ 0*:*25*).*

technology used for the location estimation in this work suffers from inaccuracy, the extent of the problem is not negligible. Especially, small and narrow ("island") streets that are surrounded by a "sea" of residential neighborhoods are affected by this inaccuracy. Smoothing complicates the task of identifying island land use, as it makes the results more homogenous, and thus, the classifier is more decisive and mistakenly classifies more to Residential (in the case of Ra<sup>0</sup> anana; **Figure 10c**).

Smoothing influence depends on the geographical structure of the land use. We will distinguish between geographically wide-stretching land uses, such as Residential, and island land uses, which are usually located in the heart of a wide-stretching land use, such as commercial streets or shopping malls, or located at the borders between them, such as highways.

Neighbor smoothing causes the wide-stretching land uses to expand over island land uses and, consequently, more lands are classified as wide-stretching. Therefore, wide-stretching land uses recall increases—more cells are classified as wide-stretching with more cells identified correctly, but precision declines because some of the "new" *Mapping of Social Functions in a Smart City When Considering Sparse Knowledge DOI: http://dx.doi.org/10.5772/intechopen.104901*

#### **Figure 11.**

*Smoothing effect (σ) on the precision and recall performance measures in classifying (a) wide-stretching residential land uses and (b) narrow commercial island land uses.*

wide-stretching cells belong to the neighboring island land use; thus, the percentage of correctly classified cells declines. The recall of island land uses decreases because fewer islands are identified, whereas precision increases because.

the cells classified as islands are those that are the most unambiguously correctly classified.

However, because our dataset is homogenous, both precision and recall improve in all land uses. **Figure 11** demonstrates the effect of the smoothing parameter on recall and precision of wide-stretching Residential (**Figure 11a**) and Commercial island land (**Figure 11b**) uses.

In the wide-stretching Residential example, recall ascends from 0.92 to 0.96; thus, 50% of the unidentified Residential cells are identified due to the smoothing. Whereas in the Commercial island land use, recall ascent is less prominent, from 0.52 to 0.54; thus, a 4% rise of the unidentified Commercial cells is identified due to the smoothing. As we would expect, the recall improvement in the wide-stretching land uses is considerably more significant. In the wide-stretching Residential cell, precision ascends from 0.73 to 0.76; thus, the percentage of cells incorrectly assigned as Residential is slightly reduced from 27–24%. Whereas in the Commercial island land use, precision rises significantly from 0.70 to 0.82; thus, the percentage of cells incorrectly assigned as Commercial is reduced from 30–18%. As we would expect, the precision improvement in the island land uses is considerably more significant.
