**1. Introduction**

SOM is a very useful neural network for visualization and data analysis. Among SOM's application areas, urban design is a potential area. Many of SOM's applications can be included in urban design such as: analysis of growth factors in urban design proposal [1], consider urban spatial structure [2], analysis of city systems [3], city data mining [4], predicting accessibility demand for healthcare infrastructure) [5], etc. However, for SOM's calculation results to be more accurate, improving the quality of feature map is a problem to solve.

SOM creates a map of the input data in the multi-dimensional space to the less dimensional space that is usually two-dimensional space called by the feature map of the data. To evaluate the quality of feature map, people mainly use two indicators: learning quality and projection quality [6–9]. The learning quality indicator is determined through measurement of quantization error (QE) [10, 11]. The projection quality indicator is determined through measurement of topographical error (TE) [12–14]. If the values of the QE and TE are small, feature map will be assessed with good quality.

Many studies have shown that the quality of feature map is affected greatly by the initial parameters of the network, including map size, numbers of training and neighborhood radius [11, 15–18]. Beside that, a feature map achieving with a set of fit parameters is not considered as the best quality map. Therefore, improving the feature map quality of SOM is concerned by many researchers.

To achieve good quality map for each dataset in traditional method is "trying error" with different parameters of the network. These parameters, creating a map with the smallest error measurement are suitable for the dataset [11]. According to Chattopadhyay et al. [19], with a specific dataset, the size of the map is selected by "trying error" until reaching value of QE, TE small enough. Polzlbauer [20] indicates the technical correlation between QE and TE, which TE often arises when QE reduces. In case of increasing the size of Kohonen layer, QE may reduce but TE increases (i.e., the large size of the Kohonen layer can distort the shape of the map), and vice versa when the size of Kohonen layer is too small, TE is not trust. The use of a small neighborhood radius leads to reduced QE. If the neighborhood radius is the smallest value, QE will reach a minimum value [21].

It can be recognized that achieving a feature map with good quality according to many criterion is a difficult problem. So far, there has not any solution, reducing

In this chapter, we improved Gaussian neighborhood function by adding the adjusting parameter in order to simultaneously reduce the QE, TE of the map. The next contents of the chapter include: Section 2 presents an overview of SOM and assessment measures of the quality of feature map; Section 3 presents our studying on adjusting the parameter of the Gaussian neighborhood function; Section 4 indicates the empirical results and the conclusion of the proposed method.

SOM includes input and output Kohonen layer. Kohonen layer is usually organized under the form of a two-dimensional matrix of neurons. Each unit *i* (neuron) in the Kohonen layer having a weight vector *wi* = [*wi,*1, *wi,*2, …, *wi,n*], with *n* is the size of the input vector; *wi,j* is the weight vector of neuron *i* going with input *j* (**Figure 1**). SOM is trained by unsupervised algorithm. The process is repeated

• Step 1. Finding BMU: randomly select sample *x*(*t*) from dataset (with *t* is training times), search for a neuron *c* of the Kohonen matrix containing the minimum *dist* distance (frequently use functions of Euclidean, Manhattan or

vector dot product). Neuron *c* is called by *Best Matching Unit* (BMU).

• Step 2. Calculating neighborhood radius of BMU: using the interpolation

*Nc*ðÞ¼ *<sup>t</sup> <sup>N</sup>*<sup>0</sup> exp � *<sup>t</sup>*

where *Nc*ð Þ*t* is the neighborhood radius in the *t* training time; *N*<sup>0</sup> is initial

*λ*

log ð Þ *<sup>N</sup>*<sup>0</sup> is time constant, with *<sup>K</sup>* is the total number of

*<sup>i</sup>* <sup>k</sup>*x t*ð Þ� *wi* f gj (1)

h i (2)

*dist* ¼ k*x t*ðÞ� *wc*k ¼ min

function (reduce gradually following the times of iterations)

simultaneously both QE and TE that is well-applied for every dataset.

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function*

**2. Self-organizing map neural network**

*DOI: http://dx.doi.org/10.5772/intechopen.89233*

**2.1 Structure and the algorithm**

many times, at time *t* doing three steps:

neighborhood radius; *<sup>λ</sup>* <sup>¼</sup> *<sup>K</sup>*

iterations.

**Figure 1.**

**89**

*Illustrates the structure of SOM.*

Besides the method of "trying error" to determine a suitable network configuration, the study on improving the algorithm of SOM to enhance the quality of feature map is also interested by researchers. Germen [22, 23] optimized QE by integrating "hit" parameter when updating the weight vector of the neurons, the term "hit" means the number of excitation to a neuron (or BMU counter). The "hit" parameter will determine adjusting weight vector of neuron, i.e., the neurons representing for many samples are adjusted less (to ensure not lose information) than neurons representing for less samples.

Neme [24, 25] proposed SOMSR model (SOM with selective refractoriness), which allows reducing TE. In this model, the neighborhood radius of the BMU did not reduce gradually in the learning process. In every training times, every neuron in the neighborhood radius of the BMU will decide itself whether being affected by the BMU or not in the next training.

Kamimura [26] has integrated the "firing" rate in the distance function to maximize information input. The "firing" rate identifies the important degree of each feature comparing to the remaining features. This method can reduce both QE and TE; however, with each dataset, it needs to "trying error" in several times to determine the appropriate value of "firing."

Lopez-Rubio [27] describes the topographical error of the map as a state of "selfintersections." If it detects a "self-intersections" state between neurons after each learning times, it will redo that learning times. This solution can reduce the TE, but increase QE.

Another approach is to adjust the scope and the learning rate of the neighborhood neurons. Kohonen [11] homogenised learning rate of all the neurons in the neighborhood radius to learning rate of the BMU by using the "bubbles" neighborhood function. He concluded that the bubbles function is less effective than the Gaussian function.

Aoki and Aoyagi [28] and Ota et al. [29] published an asymmetric neighborhood function. The essence of this function is extending the neighborhood radius towards one direction and shrinking the opposite one. Theoretically, this could "slide" down the topographical error out of the map. However, his experiment has been limited in the certain situations and not really convinced.

Lee and Verleysen [30] replaced the neighborhood function by "fisherman" rule. "Fisherman" rule updates the neurons in neighborhood radius following the recursive principle, which BMU is adjusted following input sample and the BMUadjacent neurons (adjacent level 1) is governed by the BMU (unadjusted by input samples), moreover, each adjacent neuron in level 2 is adjusted by the previous adjacent neuron in level 1. The remaining neurons in the neighborhood radius are adjusted in the same rule. However, the way to determine the order of the adjacent neurons when they are organized in a rectangular or a hexagonal grid is not shown in his article. In addition, he concluded that the Gaussian function has better results than the rule of "fisherman".

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function DOI: http://dx.doi.org/10.5772/intechopen.89233*

It can be recognized that achieving a feature map with good quality according to many criterion is a difficult problem. So far, there has not any solution, reducing simultaneously both QE and TE that is well-applied for every dataset.

In this chapter, we improved Gaussian neighborhood function by adding the adjusting parameter in order to simultaneously reduce the QE, TE of the map. The next contents of the chapter include: Section 2 presents an overview of SOM and assessment measures of the quality of feature map; Section 3 presents our studying on adjusting the parameter of the Gaussian neighborhood function; Section 4 indicates the empirical results and the conclusion of the proposed method.

### **2. Self-organizing map neural network**

#### **2.1 Structure and the algorithm**

To achieve good quality map for each dataset in traditional method is "trying error" with different parameters of the network. These parameters, creating a map with the smallest error measurement are suitable for the dataset [11]. According to Chattopadhyay et al. [19], with a specific dataset, the size of the map is selected by "trying error" until reaching value of QE, TE small enough. Polzlbauer [20] indicates the technical correlation between QE and TE, which TE often arises when QE reduces. In case of increasing the size of Kohonen layer, QE may reduce but TE increases (i.e., the large size of the Kohonen layer can distort the shape of the map), and vice versa when the size of Kohonen layer is too small, TE is not trust. The use of a small neighborhood radius leads to reduced QE. If the neighborhood radius is

Besides the method of "trying error" to determine a suitable network configuration, the study on improving the algorithm of SOM to enhance the quality of feature map is also interested by researchers. Germen [22, 23] optimized QE by integrating "hit" parameter when updating the weight vector of the neurons, the term "hit" means the number of excitation to a neuron (or BMU counter). The "hit" parameter will determine adjusting weight vector of neuron, i.e., the neurons representing for many samples are adjusted less (to ensure not lose information) than neurons

Neme [24, 25] proposed SOMSR model (SOM with selective refractoriness), which allows reducing TE. In this model, the neighborhood radius of the BMU did not reduce gradually in the learning process. In every training times, every neuron in the neighborhood radius of the BMU will decide itself whether being affected by

Kamimura [26] has integrated the "firing" rate in the distance function to maximize information input. The "firing" rate identifies the important degree of each feature comparing to the remaining features. This method can reduce both QE and TE; however, with each dataset, it needs to "trying error" in several times to

Lopez-Rubio [27] describes the topographical error of the map as a state of "selfintersections." If it detects a "self-intersections" state between neurons after each learning times, it will redo that learning times. This solution can reduce the TE, but

Another approach is to adjust the scope and the learning rate of the neighborhood neurons. Kohonen [11] homogenised learning rate of all the neurons in the neighborhood radius to learning rate of the BMU by using the "bubbles" neighborhood function. He concluded that the bubbles function is less effective than the

Aoki and Aoyagi [28] and Ota et al. [29] published an asymmetric neighborhood function. The essence of this function is extending the neighborhood radius towards one direction and shrinking the opposite one. Theoretically, this could "slide" down the topographical error out of the map. However, his experiment has been limited

Lee and Verleysen [30] replaced the neighborhood function by "fisherman" rule. "Fisherman" rule updates the neurons in neighborhood radius following the recursive principle, which BMU is adjusted following input sample and the BMUadjacent neurons (adjacent level 1) is governed by the BMU (unadjusted by input samples), moreover, each adjacent neuron in level 2 is adjusted by the previous adjacent neuron in level 1. The remaining neurons in the neighborhood radius are adjusted in the same rule. However, the way to determine the order of the adjacent neurons when they are organized in a rectangular or a hexagonal grid is not shown in his article. In addition, he concluded that the Gaussian function has better results

the smallest value, QE will reach a minimum value [21].

representing for less samples.

increase QE.

Gaussian function.

than the rule of "fisherman".

**88**

the BMU or not in the next training.

*Sustainability in Urban Planning and Design*

determine the appropriate value of "firing."

in the certain situations and not really convinced.

SOM includes input and output Kohonen layer. Kohonen layer is usually organized under the form of a two-dimensional matrix of neurons. Each unit *i* (neuron) in the Kohonen layer having a weight vector *wi* = [*wi,*1, *wi,*2, …, *wi,n*], with *n* is the size of the input vector; *wi,j* is the weight vector of neuron *i* going with input *j* (**Figure 1**). SOM is trained by unsupervised algorithm. The process is repeated many times, at time *t* doing three steps:

• Step 1. Finding BMU: randomly select sample *x*(*t*) from dataset (with *t* is training times), search for a neuron *c* of the Kohonen matrix containing the minimum *dist* distance (frequently use functions of Euclidean, Manhattan or vector dot product). Neuron *c* is called by *Best Matching Unit* (BMU).

$$\text{dist} = \left\| \mathbf{x}(t) - \boldsymbol{w}\_{\varepsilon} \right\| = \min\_{i} \left\{ \left\| \mathbf{x}(t) - \boldsymbol{w}\_{i} \right\| \right\} \tag{1}$$

• Step 2. Calculating neighborhood radius of BMU: using the interpolation function (reduce gradually following the times of iterations)

$$N\_c(t) = N\_0 \exp\left[-\frac{t}{\lambda}\right] \tag{2}$$

where *Nc*ð Þ*t* is the neighborhood radius in the *t* training time; *N*<sup>0</sup> is initial neighborhood radius; *<sup>λ</sup>* <sup>¼</sup> *<sup>K</sup>* log ð Þ *<sup>N</sup>*<sup>0</sup> is time constant, with *<sup>K</sup>* is the total number of iterations.

**Figure 1.** *Illustrates the structure of SOM.*

• Step 3. Updates weight vector of neurons in the neighborhood radius of BMU towards being near to sample *x*(*t*):

$$w\_i(t+1) = w\_i(t) + L(t)h\_{ci}(t)[\mathbf{x}(t) - w\_i(t)] \tag{3}$$

where *L t*ð Þ is the learning rate at the iteration *t*, (the learning rate is reduced simply along with time similar to neighborhood radius, with 0 <*L t*ð Þ< 1). *L t*ð Þ could be a linear function, exponential function …; *hci*ð Þ*t* is a neighborhood function, showing the impact of distance on the learning process calculated by the formula (4)

$$h\_{\rm ci}(t) = \exp\left[-\frac{||r\_{\rm c} - r\_{\rm i}||^2}{2\mathcal{N}\_{\rm c}^{\rm 2}(t)}\right] \tag{4}$$

where *rc* and *ri* are the positions of neuron *c* and neuron *i* in Kohonen matrix.

#### **2.2 The quality of feature map**

Quantization error and topographical error are main measurements to assess the quality of SOM. Quantization error is the average difference of the input samples compared to its corresponding winning neurons (BMU). It assesses the accuracy of the represented data, therefore, it is better when the value is smaller [11].

$$QE = \frac{1}{T} \sum\_{t=1}^{T} ||\mathbf{x}(t) - \mathbf{w}\_c(t)||\tag{5}$$

The formula (4) is rewritten in the following general form:

*Illustrates function hci*ð Þ*t after changing the value of q.*

*DOI: http://dx.doi.org/10.5772/intechopen.89233*

• If k k *rc* � *ri* ¼ 0 (BMU is neuron being assessed), *hci*ðÞ¼ *t* 1.

radius *Nc*ðÞ¼ *t* 10, where *p* = 2 and *q* = 0.5, 1, 2, 4, 8, 12.

ters *q*, *p*, specifically:

**Figure 2.**

**Figure 3.**

parameter *q*.

**3.1 Parameter** *q*

**91**

on parameter *q*, with:

*hci*ðÞ¼ *t* exp �*q*

*Illustrates the influencing of input sample on the neurons in the neighborhood radius at training times t.*

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function*

where *q* and *p* are two adjustable parameters, with *q* ≥ 0 và *p* ≥ 0.

It shows that the value of *hci*ð Þ*t* depending on the distance from the position of the being assessed neuron (*ri*) (neuron *i*) to the position of BMU (*rc*) and parame-

• If k k *rc* � *ri* ¼ *Nc*ð Þ*t* (the being assessed neuron in the farthest position in neighborhood radius *Nc*ð Þ*t* ), the value of the neighborhood function depends

The formula (8) shows the minimum value of function *hci*ð Þ*t* depends on

**Figure 3** illustrates the neighborhood function *hci*ð Þ*t* in case of the neighborhood

In principle, the bigger adjusting level of neurons's weight vector in the current learning times, the higher their difference with other input patterns in other learning times is. This is the cause of increasing the quantization error. Therefore, to

k k *rc* � *ri <sup>p</sup> Nc <sup>p</sup>*ð Þ*<sup>t</sup>*

(7)

*hci*ðÞ¼ *t* exp ½ � �*q* (8)

where *x*(*t*) is the input sample at the training *t*; *wc*(*t*) is the BMU's weight vector of sample *x*(*t*); *T* is total of training times.

Topographical error assesses the topology preservation [13, 14]. It indicates the number of the data samples having the first best matching unit (BMU1) and the second best matching unit (BMU2) being not adjacent. Therefore, the smaller value is better.

$$TE = \frac{1}{T} \sum\_{t=1}^{T} d(\mathbf{x}(t)) \tag{6}$$

where *x*(*t*) is the input sample at training times *t*; *d*(*x*(*t*)) = 1 if BMU1 and BMU2 of *x*(*t*) not adjacent, vice versa, *d*(*x*(*t*)) = 0; *T* is total of training times.

#### **3. Adding adjust parameter for Gaussian neighborhood function**

Formula 3 shows the learning ability of SOM depends on two components: learning rate *L t*ð Þ and neighborhood function *hci*ð Þ*t* .

Because the learning rate decreases simply over time, it should define the general learning rate of SOM over the training time. Therefore, the quality of feature map will be influenced mainly by neighborhood function *hci*ð Þ*t* . The adjustment of the neighborhood function will affect directly to the learning process and the quality of the feature map of SOM.

Neighborhood function *hci*ð Þ*t* defines the influence level of input sample on neurons in the neighborhood radius *Nc*ð Þ*t* of BMU (**Figure 2**).

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function DOI: http://dx.doi.org/10.5772/intechopen.89233*

**Figure 2.**

• Step 3. Updates weight vector of neurons in the neighborhood radius of BMU

where *L t*ð Þ is the learning rate at the iteration *t*, (the learning rate is reduced simply along with time similar to neighborhood radius, with 0 <*L t*ð Þ< 1). *L t*ð Þ could be a linear function, exponential function …; *hci*ð Þ*t* is a neighborhood function, showing the impact of distance on the learning process calculated by

*hci*ðÞ¼ *<sup>t</sup>* exp � k k *rc* � *ri*

where *rc* and *ri* are the positions of neuron *c* and neuron *i* in Kohonen matrix.

Quantization error and topographical error are main measurements to assess the quality of SOM. Quantization error is the average difference of the input samples compared to its corresponding winning neurons (BMU). It assesses the accuracy of

where *x*(*t*) is the input sample at the training *t*; *wc*(*t*) is the BMU's weight vector

Topographical error assesses the topology preservation [13, 14]. It indicates the number of the data samples having the first best matching unit (BMU1) and the second best matching unit (BMU2) being not adjacent. Therefore, the smaller value

*t*¼1

where *x*(*t*) is the input sample at training times *t*; *d*(*x*(*t*)) = 1 if BMU1 and BMU2

the represented data, therefore, it is better when the value is smaller [11].

*t*¼1

*TE* <sup>¼</sup> <sup>1</sup> *T* X *T*

of *x*(*t*) not adjacent, vice versa, *d*(*x*(*t*)) = 0; *T* is total of training times.

**3. Adding adjust parameter for Gaussian neighborhood function**

learning rate *L t*ð Þ and neighborhood function *hci*ð Þ*t* .

neurons in the neighborhood radius *Nc*ð Þ*t* of BMU (**Figure 2**).

quality of the feature map of SOM.

Formula 3 shows the learning ability of SOM depends on two components:

Neighborhood function *hci*ð Þ*t* defines the influence level of input sample on

Because the learning rate decreases simply over time, it should define the general learning rate of SOM over the training time. Therefore, the quality of feature map will be influenced mainly by neighborhood function *hci*ð Þ*t* . The adjustment of the neighborhood function will affect directly to the learning process and the

*QE* <sup>¼</sup> <sup>1</sup> *T* X *T*

*wi*ð Þ¼ *t* þ 1 *wi*ðÞþ*t L t*ð Þ*hci*ð Þ*t* ½ � *x t*ð Þ� *wi*ð Þ*t* (3)

2

k k *x t*ðÞ� *wc*ð Þ*t* (5)

*dxt* ð Þ ð Þ (6)

(4)

2*Nc* 2 ð Þ*t*

" #

towards being near to sample *x*(*t*):

*Sustainability in Urban Planning and Design*

the formula (4)

is better.

**90**

**2.2 The quality of feature map**

of sample *x*(*t*); *T* is total of training times.

*Illustrates the influencing of input sample on the neurons in the neighborhood radius at training times t.*

**Figure 3.** *Illustrates function hci*ð Þ*t after changing the value of q.*

The formula (4) is rewritten in the following general form:

$$h\_{ci}(t) = \exp\left[-q\frac{||r\_c - r\_i||^p}{N\_c!}\right] \tag{7}$$

where *q* and *p* are two adjustable parameters, with *q* ≥ 0 và *p* ≥ 0.

It shows that the value of *hci*ð Þ*t* depending on the distance from the position of the being assessed neuron (*ri*) (neuron *i*) to the position of BMU (*rc*) and parameters *q*, *p*, specifically:


$$h\_{ci}(t) = \exp\left[-q\right] \tag{8}$$

The formula (8) shows the minimum value of function *hci*ð Þ*t* depends on parameter *q*.

**Figure 3** illustrates the neighborhood function *hci*ð Þ*t* in case of the neighborhood radius *Nc*ðÞ¼ *t* 10, where *p* = 2 and *q* = 0.5, 1, 2, 4, 8, 12.

#### **3.1 Parameter** *q*

In principle, the bigger adjusting level of neurons's weight vector in the current learning times, the higher their difference with other input patterns in other learning times is. This is the cause of increasing the quantization error. Therefore, to

reduce the QE, we must reduce the level and scope of the influencing of input sample, i.e., the increase of the value of *q* will reduce *QE*.

Therefore, the adjustment of parameter *p* has no significant impact on improving the quality of the feature map of SOM, but the parameter *q* has positive signif-

with the parameter *q* can be adjusted depending on each the dataset to achieve

We have conducted experiments for 12 published datasets, including: XOR (data samples are distributed within the XOR operation), Aggregation, Flame, Pathbased,

*q* **0.5 1 2 4 8 12** XOR 0.1890 0.1585 0.1299 0.1129 0.0902 0.0810

Aggregation 5.9702 5.0643 4.0276 2.9340 2.2819 1.8472

Flame 2.1839 1.9512 1.5194 1.1822 0.9129 0.8206

Pathbased 4.5859 4.0427 3.2618 **2.4779** 1.9392 1.7401

Spiral 4.7595 4.1719 3.4675 2.9239 2.2975 2.0085

Jain 5.2745 4.4829 3.5726 2.3559 1.6236 1.5234

Compound 4.4205 3.7595 3.1508 2.5672 1.8323 1.7744

R15 2.2226 2.0212 1.8005 1.4606 1.0730 0.9562

D31 4.7676 4.1204 3.3943 2.4569 2.0055 1.6793

Iris 0.7709 0.6430 0.5353 0.4403 0.3773 0.3494

Vowel 2.7459 2.5736 2.3755 2.2005 1.9150 1.7468

Zoo 1.5841 1.4421 1.2468 1.0912 0.9790 0.9156

*Experiment results when fixed parameter p = 2, change parameter q.*

k k *rc* � *ri*

0.0318 **0.0223** 0.0273 0.0427 0.0705 0.0925

0.0549 0.0362 0.0294 **0.0245** 0.0424 0.0678

0.0700 0.0567 0.0407 **0.0393** 0.0479 0.0833

0.0561 0.0433 0.0373 **0.0315** 0.0434 0.0794

0.0543 0.0404 **0.0284** 0.0364 0.0413 0.0564

0.0513 0.0395 0.0313 **0.0269** 0.0443 0.0637

0.0624 **0.0299** 0.0349 0.0400 0.0630 0.0690

0.0722 0.0631 0.0368 **0.0274** 0.0613 0.1162

0.0479 0.0352 0.0284 **0.0207** 0.0332 0.0394

0.0739 **0.0548** 0.0689 0.0940 0.1196 0.1566

0.0537 0.0436 **0.0412** 0.0448 0.0494 0.0497

0.0343 0.0254 0.0169 **0.0104** 0.0162 0.0208

*Nc* 2 ð Þ*t*

" #

2

(9)

icance in improving the quality of the feature map of SOM. The bigger the parameter *q* is, the smaller *QE* is. However, *q* reaches the most appropriate value when *TE* is the smallest. Therefore, we recommend the neighborhood function

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function*

*ci*ðÞ¼ *t* exp �*q*

*ci*ð Þ*t* with an adjustable parameter as follows:

*DOI: http://dx.doi.org/10.5772/intechopen.89233*

better quality of feature map.

**4. Experiments**

**Table 1.**

**93**

*h*0

*h*0

However, if *q* is too large, the learning ability of the map is restricted, i.e., the topography of map changes less, and partly depends on the initialization of the neural's weight vector. On the other hand, neighborhood radius *Nc*ð Þ*t* can be shrunk, due to *hci*ð Þ*t* ≈0 with neurons in remote positions of neighborhood radius (i.e., neurons in remote positions in the neighborhood radius are not adjusted or adjusted negligibly by input sample). Therefore, to ensure that all the neurons in the neighborhood radius *Nc*ð Þ*t* are adjusted by the input sample, the parameter *q* is not allowed to be too large. For example, the case of *q* = 8 and 12, function *hci*ð Þ*t* ≈0 when the value of k k *rc* � *ri* reaching to *Nc*ð Þ*t* .

In case of *q* ≈ 0, Gaussian function has the same result as bubble function, i.e., *hci*ð Þ*t* ≈1 with all neurons in the neighborhood radius *Nc*ð Þ*t* . As a result, if the neighborhood radius *Nc*ð Þ*t* is bigger, the feature map will be more likely to change locally following input sample *x*(*t*). This reduces the remember ability the previous learning times of the network.

Therefore, TE may depends on initializing the weight vector of neurons if *q* is too large or depends on the order of the input samples if *q* is too small. It is notable that the initial weight vector of neurons and the order of the input sample are selected randomly. Therefore, the topographic learning ability of network is best when parameter *q* is not too small or too large.

#### **3.2 Parameter** *p*

When the parameter *q* is fixed, if the parameter p increases, the value of function *hci*ð Þ*t* of the neurons that near the BMU will increase gradually to 1, i.e., the number of neighbors around the BMU that are adjusted similar with BMU will extend. This is the cause of QE increasingly. If the parameter *p* is too large, the feature map tends to change locally according to the input sample from the closest training times (similar to the case that parameter q is too small). However, TE may vary slightly because TE is conducted by BMU1 and BMU2.

**Figure 4** illustrates original neighborhood function *hci*ð Þ*t* (with *q* = 0.5 and *p* = 2) and adjusted neighborhood function *hci*ð Þ*t* (with *q* = 4 and *p* = 1, 2, 3, 4, 5, 6) in case of *Nc*ðÞ¼ *t* 10.

In case of *p* = 1, the graph *hci*ð Þ*t* is similar to the case of *q* = 8, 12 in **Figure 3**, i.e., the smallest QE compared to the case of *p* > 1, but *TE* is unreliable due to depend on initializing the weight vector of neurons.

**Figure 4.** *Illustrates function hci*ð Þ*t after changing the value of p.*

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function DOI: http://dx.doi.org/10.5772/intechopen.89233*

Therefore, the adjustment of parameter *p* has no significant impact on improving the quality of the feature map of SOM, but the parameter *q* has positive significance in improving the quality of the feature map of SOM. The bigger the parameter *q* is, the smaller *QE* is. However, *q* reaches the most appropriate value when *TE* is the smallest. Therefore, we recommend the neighborhood function *h*0 *ci*ð Þ*t* with an adjustable parameter as follows:

$$h\_{cl}'(t) = \exp\left[-q\frac{||r\_c - r\_i||^2}{N\_c^{-2}(t)}\right] \tag{9}$$

with the parameter *q* can be adjusted depending on each the dataset to achieve better quality of feature map.

## **4. Experiments**

reduce the QE, we must reduce the level and scope of the influencing of input

However, if *q* is too large, the learning ability of the map is restricted, i.e., the topography of map changes less, and partly depends on the initialization of the neural's weight vector. On the other hand, neighborhood radius *Nc*ð Þ*t* can be shrunk, due to *hci*ð Þ*t* ≈0 with neurons in remote positions of neighborhood radius (i.e., neurons in remote positions in the neighborhood radius are not adjusted or adjusted negligibly by input sample). Therefore, to ensure that all the neurons in the neighborhood radius *Nc*ð Þ*t* are adjusted by the input sample, the parameter *q* is not allowed to be too large. For example, the case of *q* = 8 and 12, function *hci*ð Þ*t* ≈0

In case of *q* ≈ 0, Gaussian function has the same result as bubble function, i.e.,

Therefore, TE may depends on initializing the weight vector of neurons if *q* is too large or depends on the order of the input samples if *q* is too small. It is notable that the initial weight vector of neurons and the order of the input sample are selected randomly. Therefore, the topographic learning ability of network is best

When the parameter *q* is fixed, if the parameter p increases, the value of function *hci*ð Þ*t* of the neurons that near the BMU will increase gradually to 1, i.e., the number of neighbors around the BMU that are adjusted similar with BMU will extend. This is the cause of QE increasingly. If the parameter *p* is too large, the feature map tends to change locally according to the input sample from the closest training times (similar to the case that parameter q is too small). However, TE may

**Figure 4** illustrates original neighborhood function *hci*ð Þ*t* (with *q* = 0.5 and *p* = 2) and adjusted neighborhood function *hci*ð Þ*t* (with *q* = 4 and *p* = 1, 2, 3, 4, 5, 6) in case

In case of *p* = 1, the graph *hci*ð Þ*t* is similar to the case of *q* = 8, 12 in **Figure 3**, i.e., the smallest QE compared to the case of *p* > 1, but *TE* is unreliable due to depend on

*hci*ð Þ*t* ≈1 with all neurons in the neighborhood radius *Nc*ð Þ*t* . As a result, if the neighborhood radius *Nc*ð Þ*t* is bigger, the feature map will be more likely to change locally following input sample *x*(*t*). This reduces the remember ability the previous

sample, i.e., the increase of the value of *q* will reduce *QE*.

when the value of k k *rc* � *ri* reaching to *Nc*ð Þ*t* .

*Sustainability in Urban Planning and Design*

when parameter *q* is not too small or too large.

initializing the weight vector of neurons.

*Illustrates function hci*ð Þ*t after changing the value of p.*

vary slightly because TE is conducted by BMU1 and BMU2.

learning times of the network.

**3.2 Parameter** *p*

of *Nc*ðÞ¼ *t* 10.

**Figure 4.**

**92**

We have conducted experiments for 12 published datasets, including: XOR (data samples are distributed within the XOR operation), Aggregation, Flame, Pathbased,


#### **Table 1.**

*Experiment results when fixed parameter p = 2, change parameter q.*

Spiral, Jain, Compound, R15, D31, Iris, Vowel and Zoo. The parameters were used in the experiment as follows: network size: 10 10; initial neighborhood radius: 10; initial learning rate: 1; number of training times: 20,000.

**Case 2:** Parameter *q* is fixed, parameter *p* changes.

When *p* = 1: both *QE* and *TE* increase high.

*DOI: http://dx.doi.org/10.5772/intechopen.89233*

*XOR dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.*

*Aggregation dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

of *p* = 1, 2, 3, 4, 5, 6.

smallest value of *TE* in figure (a).

value.

**Figure 5.**

**Figure 6.**

**95**

**Table 2** shows the experimental results when fixes parameter *q* of each dataset corresponding to the best value of *TE* in **Table 1** and respectively change the value

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function*

When *p* ≥ 2: *TE* tends to be stable or increase slightly when *p* rises. This shows that the parameter *p* is negligibly significance in improving the topographical quality when identifying suitable parameter *q*; *QE* tends to increase with the majority of datasets while increasing *p* (excepting for the dataset XOR, Compound and Iris, *QE* tends to decrease, but *TE* tends to increase). This suggests that, *p* = 2 is the best

From **Figures 5** to **16** are charts comparing the values of *QE*,*TE* when changing the parameters *q* and *p*, in which: figures on the left (a) are the results when fixing *p* = 2 and changing *q*; figures on the right (b) are the results when fixing *q* and changing *p*. Parameter *q* is selected by the corresponding value to achieve the

The experiments were conducted in two cases: case 1—fixed parameter *p*, changed parameter *q*; case 2—fixed parameter *q*, changed the parameter *p*.

Note: The results in **Tables 1** and **2** are the average value of 10 experiment times. The result of each dataset presented in two rows: the first row shows *QE* and the second row displays *TE*.

**Case 1:** Parameter *p* is fixed, parameter *q* changed.

**Table 1** shows the experimental results with parameter *p* = 2 and change the value of parameter *q* = 0.5, 2, 4, 8, 12.

We can see that *QE* is in a reverse ratio to *q*, when *q* is bigger, *QE* is smaller, while *TE* reaches the minimum value with *q* = 1, 2, 4. This is especially true with the proposed analysis in Section 3.

The values in bold are the best results, in which: *TE* is the smallest, *QE* is also smaller than the case of using the original neighborhood function (*q* = 0.5) (column 2, **Table 1**).


#### **Table 2.**

*Experiment results when change parameter p, fixed parameter q.*

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function DOI: http://dx.doi.org/10.5772/intechopen.89233*

**Case 2:** Parameter *q* is fixed, parameter *p* changes.

**Table 2** shows the experimental results when fixes parameter *q* of each dataset corresponding to the best value of *TE* in **Table 1** and respectively change the value of *p* = 1, 2, 3, 4, 5, 6.

When *p* = 1: both *QE* and *TE* increase high.

Spiral, Jain, Compound, R15, D31, Iris, Vowel and Zoo. The parameters were used in the experiment as follows: network size: 10 10; initial neighborhood radius: 10;

The experiments were conducted in two cases: case 1—fixed parameter *p*, changed parameter *q*; case 2—fixed parameter *q*, changed the parameter *p*.

Note: The results in **Tables 1** and **2** are the average value of 10 experiment times. The result of each dataset presented in two rows: the first row shows *QE* and the

**Table 1** shows the experimental results with parameter *p* = 2 and change the

We can see that *QE* is in a reverse ratio to *q*, when *q* is bigger, *QE* is smaller, while *TE* reaches the minimum value with *q* = 1, 2, 4. This is especially true with the

The values in bold are the best results, in which: *TE* is the smallest, *QE* is also

*p* **1 23456** XOR (*q* = 1) 0.1754 0.1587 0.1546 0.1518 0.1525 0.1513

Aggregation (*q* = 4) 2.7895 3.0003 3.2722 3.6436 3.6100 3.8718

Flame (*q* = 4) 1.1858 1.2105 1.2306 1.3158 1.4010 1.4209

Pathbased (*q* = 4) 2.5458 2.4759 2.7586 2.8462 2.9400 2.9928

Spiral (*q* = 2) 3.5976 3.4319 3.4334 3.4603 3.4926 3.5797

Jain (*q* = 4) 2.3664 2.3519 2.7136 2.9018 3.1494 3.3035

Compound (*q* = 1) 4.2063 3.7575 3.6224 3.4969 3.5082 3.4913

R15 (*q* = 4) 1.3161 1.4406 1.5544 1.6498 1.6972 1.7376

D31 (*q* = 4) 2.3832 2.4769 2.8137 2.9886 3.0686 3.1960

Iris (*q* = 1) 0.7140 0.6382 0.6166 0.6002 0.5880 0.5849

Vowel (*q* = 2) 2.3938 2.3715 2.4186 2.4310 2.4529 2.4627

Zoo (*q* = 4) 1.1817 1.0912 1.1780 1.1954 1.2015 1.2131

*Experiment results when change parameter p, fixed parameter q.*

0.0534 0.0203 0.0225 0.0244 0.0238 0.0255

0.0850 0.0300 0.0277 0.0273 0.0316 0.0282

0.1438 0.0405 0.0284 0.0304 0.0331 0.0330

0.1300 0.0313 0.0363 0.0351 0.0349 0.0304

0.0690 0.0290 0.0265 0.0290 0.0261 0.0264

0.0896 0.0263 0.0270 0.0306 0.0402 0.0403

0.0666 0.0291 0.0337 0.0340 0.0373 0.0398

0.1055 0.0294 0.0367 0.0390 0.0454 0.0548

0.0803 0.0199 0.0227 0.0238 0.0259 0.0284

0.0665 0.0518 0.0555 0.0560 0.0572 0.0598

0.0635 0.0410 0.0416 0.0414 0.0429 0.0455

0.0366 0.0104 0.0182 0.0188 0.0176 0.0180

smaller than the case of using the original neighborhood function (*q* = 0.5)

initial learning rate: 1; number of training times: 20,000.

**Case 1:** Parameter *p* is fixed, parameter *q* changed.

second row displays *TE*.

value of parameter *q* = 0.5, 2, 4, 8, 12.

*Sustainability in Urban Planning and Design*

proposed analysis in Section 3.

(column 2, **Table 1**).

**Table 2.**

**94**

When *p* ≥ 2: *TE* tends to be stable or increase slightly when *p* rises. This shows that the parameter *p* is negligibly significance in improving the topographical quality when identifying suitable parameter *q*; *QE* tends to increase with the majority of datasets while increasing *p* (excepting for the dataset XOR, Compound and Iris, *QE* tends to decrease, but *TE* tends to increase). This suggests that, *p* = 2 is the best value.

From **Figures 5** to **16** are charts comparing the values of *QE*,*TE* when changing the parameters *q* and *p*, in which: figures on the left (a) are the results when fixing *p* = 2 and changing *q*; figures on the right (b) are the results when fixing *q* and changing *p*. Parameter *q* is selected by the corresponding value to achieve the smallest value of *TE* in figure (a).

**Figure 5.** *XOR dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.*

**Figure 6.** *Aggregation dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

**Figure 10.**

**Figure 11.**

**Figure 12.**

**97**

*Jain dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

*DOI: http://dx.doi.org/10.5772/intechopen.89233*

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function*

*Compound dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.*

*R15 dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

**Figure 7.** *Flame dataset. (a)* p *= 2 and* q *changes and (b)* q *= 4 and* p *changes.*

**Figure 8.** *Pathbased dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

**Figure 9.** *Spiral dataset. (a) p = 2 and q changes and b) q = 2 and p changes.*

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function DOI: http://dx.doi.org/10.5772/intechopen.89233*

**Figure 10.** *Jain dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

**Figure 11.** *Compound dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.*

**Figure 12.** *R15 dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

**Figure 7.**

**Figure 8.**

**Figure 9.**

**96**

*Flame dataset. (a)* p *= 2 and* q *changes and (b)* q *= 4 and* p *changes.*

*Sustainability in Urban Planning and Design*

*Pathbased dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

*Spiral dataset. (a) p = 2 and q changes and b) q = 2 and p changes.*

the parameter *q*, the parameter *p* has little significant impact on improving the

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function*

**Table 3** shows the results of *QE*,*TE* when using neighborhood function *h*<sup>0</sup>

(with parameter *p* = 2 and *q* is determined for each dataset shown in **Table 2**) and some other neighborhood functions. Results show that the neighborhood function

*ci*ð Þ*t* achieved *QE*,*TE* smaller than the original Gaussian function, Bubbles function

Note: The results in **Table 3** are the average value of 10 experiment times. The result of each dataset present in two rows: the first row shows *QE* and the second

*ci*ð Þ*t*

quality of the feature map.

row displays *TE*.

**Figure 15.**

**Figure 16.**

**99**

and asymmetric neighborhood function.

*DOI: http://dx.doi.org/10.5772/intechopen.89233*

*Vowel dataset. (a) p = 2 and q changes and (b) q = 2 and p changes.*

*Zoo dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

*h*0

**Figure 13.** *D31 dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

**Figure 14.** *Iris dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.*

When putting parameter *p* = 2 and changing parameter *q*, we see that the charts are similar (figure (a)—on the left), with *QE* is reduced gradually,*TE* reduced at first, then increased inversely with *QE* when parameter *q* increased gradually. *TE* reaches the lowest value when *q*∈ [10, 28].

When fixing parameter *q* and changing the parameter *p*, the charts also have similarities (figure (b)—on the right), including: *TE* is highest when *p* = 1; both graphs of *QE* and *TE* tend to stabilize or increase gradually with *p* ≥ 2.

**Conclusion:** With *p* = 2 (default value), the adjustment of the parameter *q* has significantly impacted on the quality of the feature map. If *q* is bigger, the *QE* is smaller. However,*TE* is lowest when *q* is not too small or too large. Therefore, with *p* = 2, parameter *q* is the most suitable when its value is large enough to achieve the lowest value of *TE*. Conversely, if we have identified the most appropriate value of

*Improving Feature Map Quality of SOM Based on Adjusting the Neighborhood Function DOI: http://dx.doi.org/10.5772/intechopen.89233*

the parameter *q*, the parameter *p* has little significant impact on improving the quality of the feature map.

**Table 3** shows the results of *QE*,*TE* when using neighborhood function *h*<sup>0</sup> *ci*ð Þ*t* (with parameter *p* = 2 and *q* is determined for each dataset shown in **Table 2**) and some other neighborhood functions. Results show that the neighborhood function *h*0 *ci*ð Þ*t* achieved *QE*,*TE* smaller than the original Gaussian function, Bubbles function and asymmetric neighborhood function.

Note: The results in **Table 3** are the average value of 10 experiment times. The result of each dataset present in two rows: the first row shows *QE* and the second row displays *TE*.

**Figure 15.** *Vowel dataset. (a) p = 2 and q changes and (b) q = 2 and p changes.*

**Figure 16.** *Zoo dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

When putting parameter *p* = 2 and changing parameter *q*, we see that the charts are similar (figure (a)—on the left), with *QE* is reduced gradually,*TE* reduced at first, then increased inversely with *QE* when parameter *q* increased gradually. *TE*

When fixing parameter *q* and changing the parameter *p*, the charts also have similarities (figure (b)—on the right), including: *TE* is highest when *p* = 1; both

**Conclusion:** With *p* = 2 (default value), the adjustment of the parameter *q* has significantly impacted on the quality of the feature map. If *q* is bigger, the *QE* is smaller. However,*TE* is lowest when *q* is not too small or too large. Therefore, with *p* = 2, parameter *q* is the most suitable when its value is large enough to achieve the lowest value of *TE*. Conversely, if we have identified the most appropriate value of

graphs of *QE* and *TE* tend to stabilize or increase gradually with *p* ≥ 2.

reaches the lowest value when *q*∈ [10, 28].

*Iris dataset. (a) p = 2 and q changes and (b) q = 1 and p changes.*

*D31 dataset. (a) p = 2 and q changes and (b) q = 4 and p changes.*

*Sustainability in Urban Planning and Design*

**Figure 13.**

**Figure 14.**

**98**


**Table 3.**

*Compares measures QE, TE of some neighborhood functions.*
