1. Introduction

Automatically, identifying the location of a user has recently become a hot topic in research. The study in [1] estimated that the global indoor localization market is expected to grow from its value of \$935.05 million in 2014 to approximately \$4.42 billion in 2019, corresponding to an estimated compound annual growth rate (CAGR) of 36.5%. The estimation of mobile locations has an important role in many computing applications. The global positioning system (GPS) is one of the most common location-based systems, but it cannot be used inside buildings as a direct line of sight (LOS) is required between the GPS receiver and the satellite to identify the

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

user's location. Therefore, a large number of technologies, such as Bluetooth, radiofrequency identification (RFID), wireless local area network (WLAN or Wi-Fi), magnetic field variations, ultrasound, Zigbee, and light-emitting diode (LED) light bulbs, have been developed to create high-accuracy indoor positioning systems (IPS), with Wi-Fi being the most commonly used technology. Most smartphones can obtain received signal strength (RSS) from the access points (APs) of WLANs because of the low cost and existing WLAN infrastructure [2, 3]

The IPS algorithm that uses RSS-based indoor localization can be classified into two main types: log-distance propagation model (PM) algorithms based on the signal and fingerprinting indoor localization based on the data collected. IPS based on signal propagation is divided into lateration and angulation. The main idea of lateration estimation is to calculate the distance between the smartphone and AP using geometry and signal measurement information, such as the time of arrival (TOA), time difference of arrival (TDOA), and angle of arrival (AOA), of the incoming signals from APs. In general, propagation signals suffer from non-line-of-sight (NLOS) multipath signals due to the presence of walls and furniture and the movement of people. In addition, the signal accuracy decreases if one or more AP coordinates are not accurately calculated. All of these drawbacks have made it difficult to estimate an object's position using signal propagation [4]. Thus, fingerprinting-based localization systems have been proposed as an alternative technology [5] as they do not require infrastructure. Instead, they use the existing WLAN in the building and the smartphone, which relies on the spectrum of the RSS from the APs to the location to estimate the user's location coordinates.

The fingerprint-based technique is divided into offline and online phases. In the offline phase, the entire area of interest is divided into a rectangular set of grid points, and at each point, a site survey is taken by recording the RSS from APs, which is then stored in a database called the radio map [6–10]. In the online phase, the smartphone collects the RSS from the APs and sends it to the server to compare the predefined fingerprint of the offline phase with the RSS in the online phase in order to estimate the location on the grid map, as shown in Figure 1.

[12], whereas others assume a non-Gaussian distribution, such as those described in [13]. Using Wi-Fi localization systems to estimate the location of an object has many advantages compared with other technologies, such as availability and low cost. However, because the RSSI signal uses both offline and online phases, hardware variance can significantly degrade the positional accuracy of these systems. Some studies have investigated this variance; for example, it was reported in [11] that when using different smartphones to collect RSSI data at the same time and same location, some phones consistently had higher RSSI values than others. The orientation of the user can also contribute to the variance of the RSSI signal because

Machine Learning Algorithm for Wireless Indoor Localization

http://dx.doi.org/10.5772/intechopen.74754

143

Figure 1. The offline and online stages of location Wi-Fi-based fingerprinting architecture.

This hardware variance problem in Wi-Fi localization has also been noticed in Cisco location systems [11]; some signals were found to be omitted when a different device was used in the

• We propose the use of Jensen-Bregman divergence (JBD) as a WLAN-based method and a Kullback-Leibler multivariate Gaussian (KLMVG) model. The matching stage was performed

• We propose a procedure with high characterization distribution. The RSS values were taken from four different orientations (45, 135, 225, and 315) to prevent body-blocking effects, with a scan performed for 100 s in each direction to reduce the effects of signal variation.

the human body can be a significant attenuator.

online phase compared with the offline phase.

using probability kernels as a regression scheme.

This chapter presents the following:

The k-nearest neighbor (kNN) algorithm is one of the simplest ways to estimate location; it depends upon the Euclidean distance to measure the similarity/dissimilarity between the offline and online phases. Even though this algorithm is easy to implement, it has low accuracy. Other methods such as statistical learning and Bayesian modeling have also been used to estimate the location of an object. Accuracy is one of the most important requirements of IPS.

Mean distance error is typically used as the performance metric and is calculated as the average Euclidean distance between the actual location and the estimated location.

Recently, an important issue was raised about the variation of signal propagation, namely, the question of how signals are able to propagate over time in the same place in the presence of multiple factors, such as physical obstructions, radiofrequency (RF) equipment, and the presence of human bodies. These factors can lead to attenuation and multipath issues, thereby causing gradual changes in the signal that can reduce the accuracy of the localization system [11]. The values stored in fingerprint maps represent the mean value of the received signal strength indicator (RSSI). Some approaches presume that the RSSI distribution is Gaussian

Figure 1. The offline and online stages of location Wi-Fi-based fingerprinting architecture.

[12], whereas others assume a non-Gaussian distribution, such as those described in [13]. Using Wi-Fi localization systems to estimate the location of an object has many advantages compared with other technologies, such as availability and low cost. However, because the RSSI signal uses both offline and online phases, hardware variance can significantly degrade the positional accuracy of these systems. Some studies have investigated this variance; for example, it was reported in [11] that when using different smartphones to collect RSSI data at the same time and same location, some phones consistently had higher RSSI values than others. The orientation of the user can also contribute to the variance of the RSSI signal because the human body can be a significant attenuator.

This hardware variance problem in Wi-Fi localization has also been noticed in Cisco location systems [11]; some signals were found to be omitted when a different device was used in the online phase compared with the offline phase.

This chapter presents the following:

user's location. Therefore, a large number of technologies, such as Bluetooth, radiofrequency identification (RFID), wireless local area network (WLAN or Wi-Fi), magnetic field variations, ultrasound, Zigbee, and light-emitting diode (LED) light bulbs, have been developed to create high-accuracy indoor positioning systems (IPS), with Wi-Fi being the most commonly used technology. Most smartphones can obtain received signal strength (RSS) from the access points

The IPS algorithm that uses RSS-based indoor localization can be classified into two main types: log-distance propagation model (PM) algorithms based on the signal and fingerprinting indoor localization based on the data collected. IPS based on signal propagation is divided into lateration and angulation. The main idea of lateration estimation is to calculate the distance between the smartphone and AP using geometry and signal measurement information, such as the time of arrival (TOA), time difference of arrival (TDOA), and angle of arrival (AOA), of the incoming signals from APs. In general, propagation signals suffer from non-line-of-sight (NLOS) multipath signals due to the presence of walls and furniture and the movement of people. In addition, the signal accuracy decreases if one or more AP coordinates are not accurately calculated. All of these drawbacks have made it difficult to estimate an object's position using signal propagation [4]. Thus, fingerprinting-based localization systems have been proposed as an alternative technology [5] as they do not require infrastructure. Instead, they use the existing WLAN in the building and the smartphone, which relies on the spectrum

(APs) of WLANs because of the low cost and existing WLAN infrastructure [2, 3]

142 Machine Learning - Advanced Techniques and Emerging Applications

of the RSS from the APs to the location to estimate the user's location coordinates.

average Euclidean distance between the actual location and the estimated location.

The fingerprint-based technique is divided into offline and online phases. In the offline phase, the entire area of interest is divided into a rectangular set of grid points, and at each point, a site survey is taken by recording the RSS from APs, which is then stored in a database called the radio map [6–10]. In the online phase, the smartphone collects the RSS from the APs and sends it to the server to compare the predefined fingerprint of the offline phase with the RSS in the online phase in order to estimate the location on the grid map, as shown in Figure 1.

The k-nearest neighbor (kNN) algorithm is one of the simplest ways to estimate location; it depends upon the Euclidean distance to measure the similarity/dissimilarity between the offline and online phases. Even though this algorithm is easy to implement, it has low accuracy. Other methods such as statistical learning and Bayesian modeling have also been used to estimate the location of an object. Accuracy is one of the most important requirements of IPS. Mean distance error is typically used as the performance metric and is calculated as the

Recently, an important issue was raised about the variation of signal propagation, namely, the question of how signals are able to propagate over time in the same place in the presence of multiple factors, such as physical obstructions, radiofrequency (RF) equipment, and the presence of human bodies. These factors can lead to attenuation and multipath issues, thereby causing gradual changes in the signal that can reduce the accuracy of the localization system [11]. The values stored in fingerprint maps represent the mean value of the received signal strength indicator (RSSI). Some approaches presume that the RSSI distribution is Gaussian


• JBD and KLMVG outperformed the probabilistic neural network (PNN) and kNN with respect to the accuracy and the average error distance, indicating that the proposed combination scheme is more effective in the sensitive environments of WLAN-based positioning systems.

In [26], the RSS-based Bluetooth low-energy localization technique was used to establish the fingerprint, after which the KLD was used in probabilistic kernel regression to estimate the object's location. The results showed this method to be accurate to approximately 1 m in an office environment. In general, the KLD kernel regression performs better in a multimodal distribution. In [27], the KLD was used to estimate the probabilistic kernel of both Gaussian and non-Gaussian distributions in order to compare them and to determine their limitations.

Machine Learning Algorithm for Wireless Indoor Localization

http://dx.doi.org/10.5772/intechopen.74754

145

We begin with a typical WLAN scenario in which a person carries a smartphone device with WLAN access and takes RSS measurements from different APs within the College of Engineering and Applied Sciences (CEAS) at Western Michigan University (WMU). It is commonly assumed that the RSSI from multiple APs is distributed as a multimodal signal, as noted in [16]. However, in our study, the recorded signal-to-noise ratio for a single device varied significantly at any one location, with the values differing by as much as 10 dBm. Specifically, the signal-to-noise values were recorded for 35 min during rush hour for a single AP and in the

There are many parameters that can affect the shape of the signal, such as reflection, diffraction, and pedestrian traffic. In this study, we sought to find a scenario that would lead to a better distribution of the Wi-Fi signal. During the offline phase, a realistic scenario was created that took into account the variation of the signal. However, because the effects of the body of the person holding the phone as well as pedestrian traffic can change the variation of the signal, a recording of the RSS was taken in four directions (45, 135, 225, and 315�) to reduce these variations. At each RP, a raw set of RSS data were collected as a time sample from the APs in

RSS were obtained from the four different directions and ten scans used to create the fingerprint-

⋮ ⋮ ⋱⋮

This allowed us to obtain the average of the RSS samples over time for different APs, i ¼ 1, 2, ::…L, j ¼ 1, 2, :…N, where N represents the number of RPs and L is the number of

i,j ð Þ<sup>τ</sup> ; <sup>τ</sup> <sup>¼</sup> <sup>1</sup>; ::…; <sup>t</sup>; <sup>t</sup> <sup>¼</sup> <sup>100</sup> n o, where t represents the number of

is the orientation direction. Next, the average and covariance matrix of the

�ð Þ 1,N 1

CCCCCA

(1)

�ð Þ 2,N

�ð Þ L,N

i,j ð Þτ and t = 10, which were randomly chosen from the 100 time samples.

3. Overall structure of indoor positioning system

�ð Þ

<sup>Q</sup> �ð Þ <sup>¼</sup>

q �ð Þ <sup>1</sup>,<sup>1</sup> q �ð Þ <sup>1</sup>, <sup>2</sup> ⋯ q

0

BBBBB@

q �ð Þ <sup>2</sup>,<sup>1</sup> q �ð Þ <sup>2</sup>, <sup>2</sup> ⋯ q

q �ð Þ L, <sup>1</sup> q �ð Þ L,<sup>2</sup> ⋯ q

ing database, known as the Radio Map, represented by Q �ð Þ [28]:

APs. The variance vector of each RP can be defined as

same location.

the area of interest, denoted as q

time samples and � ��

where q

�ð Þ i,j <sup>¼</sup> <sup>1</sup> q P<sup>t</sup> <sup>τ</sup>¼<sup>1</sup> <sup>q</sup> �ð Þ
