5. Bregman divergence algorithm formulation

In recent times, approaches that measure the distortion in classes have become more common, instead of depending on a single distance. Indeed, the analysis of distortion is being used in many applications of machine learning, computational geometry, and IPS. Using Bregman divergence to measure the similarity/dissimilarity has recently become an attractive method because it encapsulates both information-theoretic relative entropy and the geometric Euclidean distance, which is a meta-algorithm [30]. The Bregman distance D<sup>φ</sup> between two sets of convex space data, p = (p1, …, pd) and q = (q1, …, qd), that is associated with φ (defined as a strictly convex and differentiable function) can be defined as

$$D\_{\varphi}(p,q) = \varphi(p) - \varphi(q) - \langle \nabla \varphi(p), p - q \rangle \tag{9}$$

where h i :; : denotes the dot product:

$$\langle p, q \rangle = \sum\_{i=1}^{d} p^{(i)} q^{(i)} = p^T q \tag{10}$$

and ∇φð Þp denotes the gradient decent operator:

$$\nabla \boldsymbol{\varphi}(\boldsymbol{p}) = \begin{bmatrix} \boldsymbol{\partial}\boldsymbol{\varphi} & \boldsymbol{\partial}\boldsymbol{\varphi} \\ \boldsymbol{\partial}\boldsymbol{p}\_1 & \boldsymbol{\partial}\boldsymbol{p}\_d \end{bmatrix}^T \tag{11}$$

The Bregman divergence unifies the statistical KLD with the squared Euclidean distance by defining the distortion measurement in classes:

• The Euclidean distance is obtained from the Bregman divergence by considering the convex function as <sup>φ</sup>ð Þ¼ <sup>p</sup> <sup>P</sup><sup>d</sup> <sup>i</sup>�<sup>1</sup> pi <sup>2</sup> <sup>¼</sup> h i <sup>p</sup>; <sup>p</sup> , which is the parabolic potential function in Figure 2.

• The KLD is also a Bregman divergence if the convex function used is <sup>φ</sup>ð Þ¼ <sup>p</sup> <sup>P</sup><sup>d</sup> <sup>i</sup>�<sup>1</sup> pi logpi , which is defined as negative Shannon entropy. The KLD is defined for two discrete distributions as

$$KL(p \| \| q) = \sum\_{s} p(S=s) \log \left( \frac{p(S=s)}{q(S=s)} \right) \tag{12}$$

In information theory, the Shannon differential entropy measures the amount of uncertainty of a random variable:

$$H(p) = p \log \frac{1}{p} \tag{13}$$

SDφð Þ¼ p; q

In the same manner, Jeffreys' divergence symmetrizes the oriented KLD as follows:

¼ 1 2

<sup>¼</sup> <sup>X</sup>

JSD pð Þ¼ kq

<sup>2</sup> H p<sup>k</sup>

X L

i¼1

SDφð Þ¼ p; q

¼

SD pð Þ¼ ; <sup>q</sup> <sup>X</sup>

1 <sup>2</sup> KL p;

p þ q 2

> 1 2

proposed a new divergence called K-divergence:

¼ 1

¼ 1 2

divergence can be symmetrized as

for d-dimensional multivariate data:

follows:

Dφð Þþ p; q Dφð Þ q; p

<sup>s</sup> ð Þ p Sð Þ� <sup>¼</sup> <sup>s</sup> q Sð Þ <sup>¼</sup> <sup>s</sup> log p Sð Þ <sup>¼</sup> <sup>s</sup>

p þ q 2

� �

Such information-theoretic divergence has two major drawbacks: first, the output can be undefined if q = 0 and p 6¼ 0, and second, the J-divergence is not bound by terms of metric distance. To avoid these drawbacks and avoid the log(0) or to divide by 0, the authors in [32]

By introducing the K-divergence, [30] produced the Jensen-Shannon divergence (JSD) as

p þ q 2 � � <sup>þ</sup> KL q;

! !

The JSD can be defined, bound by an L1-metric, and finite. In the same vein, the Bregman

� � <sup>2</sup> � <sup>φ</sup>

q þ p 2 � � <sup>þ</sup> <sup>D</sup><sup>φ</sup> <sup>q</sup>;

K pð Þ¼ kq KL p;

� � � H pð Þþ H q<sup>k</sup>

D<sup>φ</sup> p;

φð Þþ p φ qj

L

φ pi � � <sup>þ</sup> <sup>φ</sup> qi � � <sup>2</sup> � <sup>φ</sup> pi <sup>þ</sup> qi

i¼1

pi log pi 1 <sup>2</sup> qi <sup>þ</sup> <sup>1</sup> 2 pi <sup>2</sup> (16)

http://dx.doi.org/10.5772/intechopen.74754

Machine Learning Algorithm for Wireless Indoor Localization

(20)

149

(24)

h i p � q; ∇φð Þ� p ∇φð Þq (17)

J pð Þ¼ ; q KL pð Þþ kq KL qð Þ kp (18)

q Sð Þ ¼ s

p þ q 2 � � � � (22)

q þ p 2 � � � � (25)

2

� � (26)

� � (27)

p þ q 2 � � � H qð Þ � � (23)

> <sup>þ</sup> qi log qi 1 <sup>2</sup> qi <sup>þ</sup> <sup>1</sup> 2 pi

> > p þ qj 2

� � (21)

¼ H pð Þþ kq H qð Þ� kp ðH pð Þþ H qð Þ (19)

The KLD is equal to the cross entropy of two discrete distributions minus the Shannon differential entropy [31]:

$$KL(p||q) = \sum\_{s} H^{x}(p(\mathcal{S} = s) \| (q(\mathcal{S} = s) - H(\| p(\mathcal{S} = s))))\tag{14}$$

where H<sup>x</sup> is the cross-entropy:

$$H^{\pi}\left(p(\mathcal{S}=s)\|\left(q(\mathcal{S}=s)=\sum\_{\circ}p(\mathcal{S}=s)\log\frac{1}{q(\mathcal{S}=s)}\right)\tag{15}$$

and S is the set of vectors of the RSS. In general, the Bregman divergence is not symmetrical, but it can symmetrize as follows:

Figure 2. The Bregman divergence represents the vertical distance between the potential function and the hyperplane at q.

Machine Learning Algorithm for Wireless Indoor Localization http://dx.doi.org/10.5772/intechopen.74754 149

$$SD\_{\varphi}(p,q) = \frac{D\_{\varphi}(p,q) + D\_{\varphi}(q,p)}{2} \tag{16}$$

$$=\frac{1}{2}\langle p-q,\nabla\varphi(p)-\nabla\varphi(q)\rangle\tag{17}$$

In the same manner, Jeffreys' divergence symmetrizes the oriented KLD as follows:

$$J(p,q) = KL(p \| q) + KL(q \| p) \tag{18}$$

$$0 = H(p \| q) + H(q \| p) - (H(p) + H(q))\tag{19}$$

$$\hat{\rho} = \sum\_{s} \left( (p(\mathcal{S} = s) - q(\mathcal{S} = s)) \log \left( \frac{p(\mathcal{S} = s)}{q(\mathcal{S} = s)} \right) \right) \tag{20}$$

Such information-theoretic divergence has two major drawbacks: first, the output can be undefined if q = 0 and p 6¼ 0, and second, the J-divergence is not bound by terms of metric distance. To avoid these drawbacks and avoid the log(0) or to divide by 0, the authors in [32] proposed a new divergence called K-divergence:

$$K(p||q) = KL\left(p, \frac{p+q}{2}\right) \tag{21}$$

By introducing the K-divergence, [30] produced the Jensen-Shannon divergence (JSD) as follows:

$$JSD(p||q) = \frac{1}{2} \left( KL\left(p, \frac{p+q}{2}\right) + KL\left(q, \frac{p+q}{2}\right) \right) \tag{22}$$

$$\hat{\rho} = \frac{1}{2} \left( H\left(p \| \frac{p+q}{2}\right) - H(p) + H\left(q \| \frac{p+q}{2}\right) - H(q) \right) \tag{23}$$

$$=\frac{1}{2}\left(\sum\_{i=1}^{L}\left(p\_i\log\frac{p\_i}{\frac{1}{2}q\_i+\frac{1}{2}p\_i}+q\_i\log\frac{q\_i}{\frac{1}{2}q\_i+\frac{1}{2}p\_i}\right)\right)\tag{24}$$

The JSD can be defined, bound by an L1-metric, and finite. In the same vein, the Bregman divergence can be symmetrized as

$$SD\_{\psi}(p,q) = \frac{1}{2}\left(D\_{\psi}\left(p, \frac{q+p}{2}\right) + D\_{\psi}\left(q, \frac{q+p}{2}\right)\right) \tag{25}$$

$$\rho = \frac{\varphi(p) + \varphi\left(q\_j\right)}{2} - \varphi\left(\frac{p + q\_j}{2}\right) \tag{26}$$

for d-dimensional multivariate data:

• The KLD is also a Bregman divergence if the convex function used is <sup>φ</sup>ð Þ¼ <sup>p</sup> <sup>P</sup><sup>d</sup>

s

In information theory, the Shannon differential entropy measures the amount of uncertainty of

H pð Þ¼ <sup>p</sup>log <sup>1</sup>

The KLD is equal to the cross entropy of two discrete distributions minus the Shannon

and S is the set of vectors of the RSS. In general, the Bregman divergence is not symmetrical,

Figure 2. The Bregman divergence represents the vertical distance between the potential function and the hyperplane at q.

s

KL pð Þ¼ <sup>k</sup><sup>q</sup> <sup>X</sup>

148 Machine Learning - Advanced Techniques and Emerging Applications

KL pð Þ¼ <sup>k</sup><sup>q</sup> <sup>X</sup>

� �

s Hx

<sup>H</sup><sup>x</sup> p Sð Þk <sup>¼</sup> <sup>s</sup> q Sð Þ¼ <sup>¼</sup> <sup>s</sup> <sup>X</sup>

distributions as

a random variable:

differential entropy [31]:

where H<sup>x</sup> is the cross-entropy:

but it can symmetrize as follows:

which is defined as negative Shannon entropy. The KLD is defined for two discrete

p Sð Þ <sup>¼</sup> <sup>s</sup> log p Sð Þ <sup>¼</sup> <sup>s</sup>

q Sð Þ ¼ s � �

<sup>p</sup> (13)

ðp Sð Þk ¼ s ðq Sð Þ� ¼ s H pS ðð ð Þ ¼ s (14)

q Sð Þ ¼ s

p Sð Þ <sup>¼</sup> <sup>s</sup> log <sup>1</sup>

<sup>i</sup>�<sup>1</sup> pi logpi ,

(12)

(15)

$$SD(p,q) = \sum\_{i=1}^{L} \frac{\varphi(p\_i) + \varphi(q\_i)}{2} - \varphi\left(\frac{p\_i + q\_i}{2}\right) \tag{27}$$

Algorithm 2. The Kullback-Leibler multivariate Gaussian positioning method

2. During the online phase, RSS measurements are taken from unknown locations of the smartphone.

• The previous step is repeated for different APs until the minimum distance is obtained.

• A database for each RP is set using RSS measurements from different locations.

• The minimum symmetric Bregman divergence is estimated using Eq. 27.

time delays are used to generate the Radio Map.

3. During the online phase, the following steps are performed:

4. The maximum outputs are transferred to the output layer.

which can distinguish near RPs from those further away.

which can distinguish near RPs from those further away.

6. Performance analysis

1. During the offline phase, RSS measurements are taken at different known locations, and 10 scans with 10 second

• The RSS measurements from the APs of smartphones from unknown locations are set in the same way as the database of the offline phase with respect to the similar media access control (MAC) address.

Machine Learning Algorithm for Wireless Indoor Localization

http://dx.doi.org/10.5772/intechopen.74754

151

The proposed algorithm evaluations will be presented in the subsequent subsections; the algorithms were implemented on the first floor of the CEAS at WMU. To collect the data sample, a Samsung S5 smartphone with operating system 4.4.2 was used. The proposed algorithms were implemented on an HP Pavilion using Java software with an Eclipse framework. Cisco Linksys E2500 Advanced Simultaneous Dual-Band Wireless-N Routers were used in the area of interest. Most of this work discounted the variation of the RSS from the APs.

To evaluate the performance of the different fingerprinting techniques, the localization error was computed as the Euclidean distance between the actual reported coordinates of the test points and the coordinates of the mobile user during the online phase. The number of RSS of the APs and the number of nearest neighbors were noted, as they can affect the accuracy of the algorithms. The number of APs can play an important role in the accuracy of the distance error,

To evaluate the performance of the different fingerprinting techniques, the localization error was computed as the Euclidean distance between the actual reported coordinates of the test points and the coordinates of the mobile user during the online phase. The number of RSS of the APs and the number of nearest neighbors were noted, as they can affect the accuracy of the algorithms. The number of APs can play an important role in the accuracy of the distance error,

In order to measure the impact of the APs on the accuracy, we used a specific number of nearest neighbors with a variety of APs. However, that resulted in a longer RSS scanning interval, which slowed the process down. As a result, the online phase comprised five time samples, which took 1 s for Wi-Fi scanning on the device. To investigate the accuracy of our proposed algorithm, different algorithms were used, such as PNN and KNN, and compared with our proposed algorithm. Different numbers of nearest neighbors were used to estimate

the location of the object and to evaluate the performance of our system framework.

Figure 3. Interpreting the Jensen-Bregman divergence.

where q represents the fingerprint dataset and p, which is the dataset of the test points, represents the APs that the mobile device received. Because φ is a strictly convex function and SD pð Þ ; q equals zero if and only if p = q, this family of distortions is termed JSD. The geometric interpretation is represented in Figure 3, where divergence represents the vertical distance between <sup>p</sup>þ<sup>q</sup> 2 ;φ <sup>p</sup>þ<sup>q</sup> 2 and the midpoint of the segment ½ð Þ <sup>p</sup>;φð Þ<sup>p</sup> ; ð Þ� q; φð Þq .

In general, for a positive definite matrix, the Jensen-Bregman divergence contains the generalized quadratic distance, which is known as the Mahalanobis distance:

$$\begin{split} SD(p,q) &= \frac{\varrho(p) + \varrho(q)}{2} - \varrho\left(\frac{p+q}{2}\right) \\ &= \frac{2\langle Qp,p\rangle + 2\langle \mathbb{Q}q,q\rangle - 2\langle \mathbb{Q}(p+q),p+q\rangle}{4} \\ &= \frac{1}{4}(\langle \mathbb{Q}p,p\rangle + \langle \mathbb{Q}q,q\rangle - 2\langle \mathbb{Q}p,q\rangle) \\ &= \frac{1}{4}\langle \mathbb{Q}(p-q),p-q\rangle \\ &= \frac{1}{4}\|p-q\|\_{\mathbb{Q}}^{2} \end{split} \tag{28}$$

To improve accuracy, we present Algorithm 2:
