2.2. Neural gas method

Vector quantization techniques encode a data space V ⊆ Rm, utilizing only a finite set C = {ci| i∈Zr} of reference vectors [18].

Let the winner vector ci(v) be defined for any vector v ∈ V as

$$\|i(\mathfrak{v}) = \arg\min\_{i \in Z\_r} \|\mathfrak{v} - \mathfrak{c}\_i\|. \tag{9}$$

By using the finite set C, the space V is partitioned as

$$V\_i = \{ \mathbf{v} \in V \mid \|\mathbf{v} - \mathbf{c}\_i\| \le \|\mathbf{v} - \mathbf{c}\_j\| \text{ f or j} \in \mathbf{Z}\_r \} \tag{10}$$

where V ¼ ∪<sup>i</sup><sup>∈</sup> ZrVi and Vi ∩ Vj ¼ φ for i6¼j.

The evaluation function for the partition is defined by

$$E = \sum\_{i=1}^{r} \sum\_{\mathbf{v} \in V\_i} \frac{1}{n\_i} \left\| \mathbf{v} - \mathbf{c}\_i \right\|^2,\tag{11}$$

where ni = |Vi|.

The learning algorithm for the conventional fuzzy inference model is shown as follows:

∈ D is given.

Step A7: If p=P, then go to Step A8, and if p < P then go to Step A4 with p p + 1.

to Step A3 with t t + 1; else, if E(t) ≤ θ and t ≤ Tmax, then the algorithm terminates.

Step A9: If t > Tmax and E(t) > θ, then go to Step A2 with n = n + 1 and t = 1.

Step A8: Let E(t) be inference error at step t calculated by Eq. (5). If E(t) > θ and t < Tmax, then go

Vector quantization techniques encode a data space V ⊆ Rm, utilizing only a finite set C = {ci|

k k v � ci : (9)

, (10)

<sup>i</sup>ð Þ¼ <sup>v</sup> arg min <sup>i</sup> <sup>∈</sup>Zr

Vi ¼ v∈ V jkv � cik ≤ kv � cjk f or j∈Z<sup>r</sup>

Step A6: Parameters wi, cij, and bij are updated by Eqs. (6), (7), and (8).

Step A1: The threshold θ of inference error and the maximum number of learning time Tmax

Learning Algorithm A

Step A3: Let p = 1.

Step A4: A data x

Algorithm SDM (c, b, w)

θ1: inference error

n: the number of rules

input: current parameters

2.2. Neural gas method

i∈Zr} of reference vectors [18].

are set. Let n0 be the initial number of rules. Let t=1.

132 From Natural to Artificial Intelligence - Algorithms and Applications

p <sup>1</sup>; ⋯; x p <sup>m</sup>; yr p

Step A2: The parameters bij, cij, and wi are set randomly.

Step A5: From Eqs. (2) and (3), μ<sup>i</sup> and y<sup>∗</sup> are computed.

In particular, Algorithm SDM is defined as follows:

Tmax1: the maximum number of learning time

output: parameters c, b, and w after learning Steps A3 to A8 of Algorithm A are performed.

Let the winner vector ci(v) be defined for any vector v ∈ V as

By using the finite set C, the space V is partitioned as

where V ¼ ∪<sup>i</sup><sup>∈</sup> ZrVi and Vi ∩ Vj ¼ φ for i6¼j.

Let us introduce the neural gas method as follows [18]:

For any input data vector v, the neighborhood ranking cik for k∈ Z<sup>∗</sup> <sup>r</sup>�<sup>1</sup> is

determined, being the reference vector for which there are k vectors c<sup>j</sup> with

$$\|\|\mathbf{v} - \mathbf{c}\_{j}\|\| < \|\mathbf{v} - \mathbf{c}\_{i}k\|\|\tag{12}$$

Let the number k associated with each vector c<sup>i</sup> denoted by ki(v,ci). Then, the adaption step for adjusting the parameters is given by

$$
\triangle \mathfrak{c}\_i = \varepsilon \cdot h\_\lambda(k\_i(\mathfrak{v}, \mathfrak{c}\_i)) \cdot (\mathfrak{v} - \mathfrak{c}\_i) \tag{13}
$$

$$h\_{\lambda}(k\_i(\mathfrak{v}, \mathfrak{c}\_i)) = \exp\left(-k\_i(\mathfrak{v}, \mathfrak{c}\_i)/\lambda\right) \tag{14}$$

where ε ∈ [0, 1] and λ > 0.

Let the probability of v selected from V be denoted by p(v).

The flowchart of the conventional neural gas algorithm is shown in Figure 1 [18], where εint, εfin, and Tmax<sup>2</sup> are learning constants and the maximum number of learning, respectively. The method is called learning algorithm NG.

Using the set D , a decision procedure for center and width parameters is given as follows:

#### Algorithm Center (c)

$$D^\* = \{ (\mathbf{x}^p, \dots, \mathbf{x}^p) \mid p \in \mathbf{Z}\_p \}.$$

p(x): the probability of x selected for x∈D<sup>∗</sup> .

Step 1: By using p(x) for x ∈ D<sup>∗</sup> , NG method of Figure 1 [16, 18] is performed.

As a result, the set C of reference vectors for D<sup>∗</sup> is determined, where C = n.

Step 2: Each value for center parameters is assigned to a reference vector. Let

$$b\_{i\bar{j}} = \frac{1}{n\_i} \sum\_{\mathbf{x}\_k \in \mathcal{C}\_i} \left( c\_{i\bar{j}} - \mathbf{x}\_{k\bar{j}} \right)^2 \tag{15}$$

where Ci and ni are the set and the number of learning data belonging to the ith cluster Ci and <sup>C</sup> <sup>¼</sup> <sup>∪</sup><sup>r</sup> <sup>i</sup>¼<sup>1</sup>Ci and <sup>n</sup> <sup>¼</sup> <sup>P</sup><sup>r</sup> <sup>i</sup>¼<sup>1</sup> ni.

As a result, center and width parameters are determined from algorithm center (c).

2.3. The probability distribution of input data based on the rate of change of output

Based on the literature [13], the probability (distribution) is defined as follows:

)|p∈ZP }

with <sup>x</sup><sup>i</sup> � <sup>x</sup><sup>j</sup> � � �

M

l¼1

and xil , respectively. The number M means the range considering H(x).

) for x<sup>i</sup>

pM <sup>x</sup><sup>i</sup> � � <sup>¼</sup> <sup>H</sup> <sup>x</sup><sup>i</sup> � � P<sup>P</sup>

See Ref. [19] for the detailed explanation using the example of pM (x). Using pM (x), Kishida has

<sup>H</sup> <sup>x</sup><sup>i</sup> � � <sup>¼</sup> <sup>X</sup>

, we determine the neighborhood ranking (x<sup>i</sup>

) which shows the rate of output change for input data x<sup>i</sup>

by normalizing H(x<sup>i</sup>

� <sup>&</sup>lt; <sup>x</sup><sup>i</sup> � <sup>x</sup>ik � � � �.

Learning Algorithms for Fuzzy Inference Systems Using Vector Quantization

http://dx.doi.org/10.5772/intechopen.79925

1 being closest to x<sup>i</sup> and x<sup>i</sup>

yi � yil � � � � x<sup>i</sup> � x<sup>i</sup> k k<sup>l</sup> 0 , x<sup>i</sup>

k (k = 0, …, P � 1) being the

, (16)

) as follows:

<sup>j</sup>¼<sup>1</sup> <sup>H</sup> <sup>x</sup><sup>j</sup> ð Þ, (17)

, i∈ZP, and yi

1, …, x<sup>i</sup>

and yil are output for

k ,…,

135

, by the

output change.

Input: D = {(x<sup>p</sup>

Output: pM (x)

xi

vector x<sup>i</sup>

input x<sup>i</sup>

where P<sup>P</sup>

T0

Algorithm Prob (pM (x))

, y<sup>r</sup>

Step 1: Give an input data x<sup>i</sup>

Step 2: Determine H(x<sup>i</sup>

following equation:

<sup>P</sup>�1) of the vector <sup>x</sup><sup>i</sup> with <sup>x</sup><sup>i</sup>

)|p∈ZP } and D<sup>∗</sup> = {(x<sup>p</sup>

for which there are k vectors x<sup>j</sup>

Step 3: Determine the probability pM (x<sup>i</sup>

proposed the following learning algorithm [13]:

max: maximum number of learning time for NG Tmax: maximum number of learning time for SDM

<sup>i</sup>¼<sup>1</sup> pM <sup>x</sup><sup>i</sup> � � <sup>¼</sup> 1.

Learning Algorithm C

θ: threshold of MSE

M: the size of ranges n: the number of rules ∈D<sup>∗</sup>

where xil for l ZM means the lth neighborhood ranking of x<sup>i</sup>

0 = x<sup>i</sup> , x<sup>i</sup>

It is known that many rules are needed at or near the places where output data change quickly in fuzzy modeling. Then, how can we find the rate of output change? The probability pM (x) is one method to perform it. As shown in Eqs. (16) and (17), any input data where output changes quickly is selected with the high probability, and any input data where output changes slowly is selected with the low probability, where M is the size of range considering

Figure 1. Neural gas method [18].

Learning Algorithm B using Algorithm Center (c) is introduced as follows [16, 17]:
