*3.1.2. GNSS + INS*

Very often in bibliography it is possible to find two different methods for GNSS + INS positioning: the loosely (LC) and the tightly (TC) coupled approaches(**Figure 5**).

In the first case (LC), the software integrates acceleration and angular velocities and updates all the state parameters. These include the positions and angular assets, but also the instrumental biases, using GNSS positions and IMU measurements.

In the latter one (TC method), the input parameters are the same, but both the GNSS (pseudoranges, carrier‐phases, Doppler) and IMU observations enter into the extended Kalman filter, each with its own rate and precision, associated with their new biases [8], in order to provide an unique solution.

Although from a computing point of view, this is a heavier process and it takes into account the use even of one or a few visible GNSS satellites, which is a typical situation of urban canyons [13]

It is important to underline that the only LC method is available today because this approach does not require the raw GNSS measurements, while for the TC is fundamental to have these observations.

Starting from this, in this subsection a brief analysis of results obtainable with GPS + INS instruments installed on smartphones is made, following the LC approach.

The tests were carried out in our campus in the same two different test sites described in the previous subsection, considering the same special support created in the Geomatics Lab at the Politecnico di Torino.

Considering the Inertial Explorer® software for postprocessing all data acquired on the field, it is possible to have an horizontal error loop equal to 4.21 m and a vertical error loop equal to 3.73 m, considering a 3‐min session duration. Obviously, the results are slightly different if different smartphones are considered but it is possible t o affirm that these values are representative for the technology available today.

### **3.2. Indoor scenarios**

The spread of smartphone devices with different embedded sensors, increased computational power and advance connectivity features, has led to the introduction into the market

Figure 5. GNSS + INS processing approaches.

of numerous application services, based on the awareness of the user position, which provides information and assistance for navigation in the environment, pose estimation, tracking, and any kind of service related to the spatial context. Many location-based services (LBS) are implemented as information systems that use as prior information the position of a mobile device [22]. The number of companies deploying LBS solution for commercial purpose reveals that location‐based solutions are finally meeting markets' needs and soon will be implemented on mass‐market application. The principal fields of application are medical care [15], ambient assisted living [40], environmental monitoring [33], transportation [38], and marketing [1] etc.

*3.1.2. GNSS + INS*

172 Smartphones from an Applied Research Perspective

an unique solution.

yons [13]

observations.

Politecnico di Torino.

**3.2. Indoor scenarios**

sentative for the technology available today.

Figure 5. GNSS + INS processing approaches.

Very often in bibliography it is possible to find two different methods for GNSS + INS posi-

In the first case (LC), the software integrates acceleration and angular velocities and updates all the state parameters. These include the positions and angular assets, but also the instru-

In the latter one (TC method), the input parameters are the same, but both the GNSS (pseudoranges, carrier‐phases, Doppler) and IMU observations enter into the extended Kalman filter, each with its own rate and precision, associated with their new biases [8], in order to provide

Although from a computing point of view, this is a heavier process and it takes into account the use even of one or a few visible GNSS satellites, which is a typical situation of urban can-

It is important to underline that the only LC method is available today because this approach does not require the raw GNSS measurements, while for the TC is fundamental to have these

Starting from this, in this subsection a brief analysis of results obtainable with GPS + INS

The tests were carried out in our campus in the same two different test sites described in the previous subsection, considering the same special support created in the Geomatics Lab at the

Considering the Inertial Explorer® software for postprocessing all data acquired on the field, it is possible to have an horizontal error loop equal to 4.21 m and a vertical error loop equal to 3.73 m, considering a 3‐min session duration. Obviously, the results are slightly different if different smartphones are considered but it is possible t o affirm that these values are repre-

The spread of smartphone devices with different embedded sensors, increased computational power and advance connectivity features, has led to the introduction into the market

instruments installed on smartphones is made, following the LC approach.

tioning: the loosely (LC) and the tightly (TC) coupled approaches(**Figure 5**).

mental biases, using GNSS positions and IMU measurements.

The major part of these services requires accurate localization for people, instruments, vehicles, animals, and assets. As it is well known, the GNSS positioning provides good accuracies only in open-sky environments. Contrariwise, when an indoor space or in an urban canyon is considered, the GNSS positioning in not possible and it is mandatory to overcome this issue considering different techniques and sensors. In recent years, some indoor location‐based services (LBSs) have been developed integrating different technologies and measurements [22], such as cameras [27], infrared (Kinect), ultrasound [20], WLAN/Wi-Fi [6], RFID [23], mobile communication [10], and so forth are examples of the technologies that the scientific community has put at the service of indoor locations. Despite the ample panorama of solutions, mass market applications for indoor positioning require the use of embedded sensors in commercial smartphone without supplementary physical components. For this reason, major modification to the devices is forbidden and the type of technology usable in these applications is reduced. Ref. [36] has made a summary on the user requirements for mass-market localization systems that is reported in **Table 3**.

All these indoor positioning systems have pros and cons that make them more useful in specific scenarios, compared to other options. One of the most useful but complex localization method is the inertial navigation system. This system is based on dead reckoning, which computes locations employing inertial measurements units installed inside the smartphone as accelerometers and gyroscope. The main advantages of a system using IMU (INS) is that nowadays, every kind of mobile device have it already implemented inside and no external


**Table 3.** Summary of requirements for mass-marked localization according to Wirola et al. [36].

infrastructure is required. Moreover, with the inertial systems, the only input information that is needed is the staring position. Without any other external information required, this technology is not affected by adverse weather conditions or by security vulnerability or jamming problems. However, these systems suffer from integration drift, making errors accumulate and therefore must be corrected by some other system. The LBSs based on the camera sensor have strong advantages and do not need to install any network of chipsets in the environment. All the primary sensors are already installed in the user device. In this case, the system could be considered low cost. Moreover, the positioning accuracy with these systems is usually more accurate in comparison to other systems. Furthermore, most of these systems based on triangulation, cannot determinate the orientation of the user, with important limitations to support many useful applications like augmented reality.

It is evident now that providing a reliable and stable position information in a complex and changing environment is a very challenging task. Sensor fusion may be an option to combine advantages of two or more different techniques (e.g., angle of arrival (AoA), time of flight (ToF), received signal straight indication (RSSI)), and technologies (e.g., GPS, Wi‐Fi, Bluetooth, camera sensors, ultrasound) and minimize the limitation. Some methods and technologies are ideal candidate to support or complete other navigation or localization systems in a multimodal approach in order to obtain an accuracy and reliability in the location information superior to that obtainable by each technique, technology, or system parameter without the use of diversity. Multimode solutions employing different sensor would not be feasible for low-end handsets unable to connect to more than one technology or without the hardware enhancements required to apply different techniques. For these reasons, the positioning solution with smartphone technologies exploits the already embedded sensors: INS, CMOS image sensor, and Wi-Fi.

In this chapter, we will focus on solution of image recognition-based (IRB) technology that uses CIS as the main sensor. In particular, after a general overview on existing methods, we will investigate an innovative method of IRB location based on the image retrieval of real-time acquired smartphone pictures with the corresponding synthetically generated 3D image or RGBD image extracted by a database. Then we will evaluate the integration of INS for a multimodal solution of the previous method for indoor navigation and finally some consideration on system using Wi-Fi technology as the main positioning technology.

#### *3.2.1. Cameras + INS*

Indoor positioning and navigation by optical sensors is becoming one of the dominant techniques, able to cover a large number of fields of application at all levels of accuracy. The success of these techniques is due to the improvements and miniaturization of the CMOS sensors. Simultaneously, there has been an increase in the data transfer speed and smartphone computational capabilities, as well as a remarkable development in the field of image processing.

As seen before, the LBSs based on the camera sensor have strong advantages. First, these systems do not need to install any network of chipsets in the environment as the primary sensor (CIS) is already installed in the user device. This allows to develop a low cost service without design and implementation of onsite network. Moreover, the positioning accuracy with these systems is usually more accurate in comparison to other systems. In industrial process, for example, computer vision systems based on object detection algorithms are used in production line to track object and check the quality. These kinds of systems have accuracy around few millimeters. Of course, applications of image-based positioning with smartphones cannot reach these levels of accuracy but can perfectly match the requirements for navigation purpose.

infrastructure is required. Moreover, with the inertial systems, the only input information that is needed is the staring position. Without any other external information required, this technology is not affected by adverse weather conditions or by security vulnerability or jamming problems. However, these systems suffer from integration drift, making errors accumulate and therefore must be corrected by some other system. The LBSs based on the camera sensor have strong advantages and do not need to install any network of chipsets in the environment. All the primary sensors are already installed in the user device. In this case, the system could be considered low cost. Moreover, the positioning accuracy with these systems is usually more accurate in comparison to other systems. Furthermore, most of these systems based on triangulation, cannot determinate the orientation of the user, with important limita-

It is evident now that providing a reliable and stable position information in a complex and changing environment is a very challenging task. Sensor fusion may be an option to combine advantages of two or more different techniques (e.g., angle of arrival (AoA), time of flight (ToF), received signal straight indication (RSSI)), and technologies (e.g., GPS, Wi‐Fi, Bluetooth, camera sensors, ultrasound) and minimize the limitation. Some methods and technologies are ideal candidate to support or complete other navigation or localization systems in a multimodal approach in order to obtain an accuracy and reliability in the location information superior to that obtainable by each technique, technology, or system parameter without the use of diversity. Multimode solutions employing different sensor would not be feasible for low-end handsets unable to connect to more than one technology or without the hardware enhancements required to apply different techniques. For these reasons, the positioning solution with smartphone technologies exploits the already embedded sensors: INS,

In this chapter, we will focus on solution of image recognition-based (IRB) technology that uses CIS as the main sensor. In particular, after a general overview on existing methods, we will investigate an innovative method of IRB location based on the image retrieval of real-time acquired smartphone pictures with the corresponding synthetically generated 3D image or RGBD image extracted by a database. Then we will evaluate the integration of INS for a multimodal solution of the previous method for indoor navigation and finally some consideration

Indoor positioning and navigation by optical sensors is becoming one of the dominant techniques, able to cover a large number of fields of application at all levels of accuracy. The success of these techniques is due to the improvements and miniaturization of the CMOS sensors. Simultaneously, there has been an increase in the data transfer speed and smartphone computational capabilities, as well as a remarkable development in the field of image

As seen before, the LBSs based on the camera sensor have strong advantages. First, these systems do not need to install any network of chipsets in the environment as the primary sensor (CIS) is already installed in the user device. This allows to develop a low cost service without

on system using Wi-Fi technology as the main positioning technology.

tions to support many useful applications like augmented reality.

CMOS image sensor, and Wi-Fi.

174 Smartphones from an Applied Research Perspective

*3.2.1. Cameras + INS*

processing.

There are many previous research studies on indoor image-based localization that pursue different goals and use different methods and technologies also in the function of the field of interest of the research groups. There are visual odometry approaches [19], simultaneous and location mapping (SLAM) [24], structure from motion, or investigating semantic features [29]. Some interesting work exploits the computer vision algorithm and in particular the neural network and transfer learning for visual indoor positioning and classification [35]. Some use RGB‐D images to perform object recognition [25]. On the use of a smartphone as a navigation device, some interesting research can be found in [27, 39].

As seen in bibliography, there are many LBS based on images, whose accuracies and coverage area is function of the application. Some accuracy ranges may be useful for applications in very large indoor spaces like museums or fairs, and others may require accuracies at subroom level, for example, in the field of logistics and optimization. When trying to make indoor positioning and navigation in more complex spaces with task of "search and rescue" or in construction sites, the coverage area decreases and higher accuracies are required.

A possible solution, considering all sensors which are installed into the smartphone device, is the image recognition-based approach, where the localization of our device is based on photogrammetric principle [16]. Image recognition-based (IRB) positioning is a good technology for smartphone indoor localization. The aim of these procedures is to match a user-generated query image, via a mobile device, against an existing image database with position information [41].

Some test has been carried out in our campus, following the methodologies presented in [9, 28]. The use of IRB positioning in mobile applications is characterized by the availability of a single camera; under this constraint, in order to estimate the camera parameters (position and orientation), a prior knowledge of 3D environment has to be available, in the form of a database of images with associated spatial information. A Terrestrial LiDAR (Light Detection and Ranging) Survey (TLS) with an associated camera can be executed to acquire the 3D model of the environment used to generate the images database (RGB-D images) . Once the retrieval of the reference image is completed, it is possible to extract the 3D information of the selected features from the image to estimate the external parameters (position and attitude) of the query image according to the collinearity equations (**Figure 6**).

*A priori* information are necessary for these techniques, but nowadays, an accurate 3D model that could be always available for further upgrading and be usable for collateral tasks is obtainable, thanks to the integration of some geomatics techniques, such as photogrammetry, LiDAR, and mobile mapping systems.

**Figure 6.** The IRBL procedure.

**Table 4** summarizes the accuracy results in terms of discrepancies from ground truth and estimated values, for indoor trial in case of good level of similarity between the query image and the reference one extracted out of the database.

It is important to state that the entire procedure could be executed in real time on a commercial smartphone and could provide the device position in few seconds. This is true for one-spot positioning task, while, when it is needed to transpose the methodology for indoor navigation, it is necessary to take into account three fundamental problems: the energy consumption, the latency of the image processing, and the Internet data consumption. Acquiring images at a given frame rate for navigation applications is a procedure that requires high wastage of energy, with a consequent problem of battery optimization. Furthermore, as each query image has to be sent to a server for image retrieval procedure, a certain amount of Internet traffic is needed. Finally, the rates of positioning information are subordinated to the latency of the entire IRB methodology.

To overcome the reduction of the frame rate and latencies compensation, inertial (INS) platforms built with MEMS (micro electro-mechanical systems) technology can be integrated in the IRB positioning [12]. Fusing IRB position and attitude measurement with INS measurements, accelerations, and angular velocity measurements are integrated to provide real-time relative position and relative attitude information, while inner INS variables (velocity at the starting point, accelerometer biases, and gyro drift) are estimated using absolute IRB positioning inputs (position and attitude).


**Table 4.** Accuracy results in indoor trial for position (Δ*X*, Δ*Y*, Δ*Z*) and attitude (Δ*ω*, Δ*ϕ*, Δ*k*).

When MEMS technology is used together with IRB positioning, it is important to analyze the precisions and accuracies obtainable. The procedure was tested in our campus walking in a predefined path using two different smartphones (a) mounted on a special support, as described in **Figure 3**.

The procedure starts with the analysis of the raw data of inertial sensors (acceleration, angular velocity, and magnitude of magnetic field), directly registered from the smartphone. It is necessary to filter these data for estimating and removing the noise. After that, it is possible to use INS raw data in real time for positioning purposes considering a Kalman filter approach in order to reduce the number of frames that can be acquired for geo-localization. This means that it is possible to extend the time interval between two images from 2 s up to 5 s, depending on the requested accuracy.

In **Table 5**, we see the positioning results in terms of accuracies, considering an IBN approach. Considering an interval of 1 s between images, the mean planimetric error was 21.3 cm at 67% of reliability, while at 95% this error was 37 cm.

When the positioning obtained with an interval of 2 s between the images is analyzed, the mean planimetric error increases to 61 cm at 67% and 1.49 m at 95%.

IBN allows to reduce to 50% the final residuals and increase the outages up to 90 s, even improving the quality of the estimated angles. At the moment, the IBN requires a server with high performance in order to obtain the solution and a well‐defined images database (DB).

#### *3.2.2. Wi‐Fi et al.*

**Table 4** summarizes the accuracy results in terms of discrepancies from ground truth and estimated values, for indoor trial in case of good level of similarity between the query image

It is important to state that the entire procedure could be executed in real time on a commercial smartphone and could provide the device position in few seconds. This is true for one-spot positioning task, while, when it is needed to transpose the methodology for indoor navigation, it is necessary to take into account three fundamental problems: the energy consumption, the latency of the image processing, and the Internet data consumption. Acquiring images at a given frame rate for navigation applications is a procedure that requires high wastage of energy, with a consequent problem of battery optimization. Furthermore, as each query image has to be sent to a server for image retrieval procedure, a certain amount of Internet traffic is needed. Finally, the rates of positioning information are subordinated to the

To overcome the reduction of the frame rate and latencies compensation, inertial (INS) platforms built with MEMS (micro electro-mechanical systems) technology can be integrated in the IRB positioning [12]. Fusing IRB position and attitude measurement with INS measurements, accelerations, and angular velocity measurements are integrated to provide real-time relative position and relative attitude information, while inner INS variables (velocity at the starting point, accelerometer biases, and gyro drift) are estimated using absolute IRB position-

and the reference one extracted out of the database.

**Figure 6.** The IRBL procedure.

176 Smartphones from an Applied Research Perspective

latency of the entire IRB methodology.

ing inputs (position and attitude).

Over the last decade, wireless location estimation has been an active field of research, becoming the most widespread approach for indoor localization in GNSS denied environment. A WLAN (Wireless Local Area Networks, IEEE 802.11 standard), otherwise known as Wi-Fi


**Table 5.** Results obtained with drift estimation coming from images.

(Wi-Fi is a trademark of the Wi-Fi Alliance), is a wireless network of devices that uses high frequency radio signal (2.4 GHz in ISM band) to transmit and receive data within a limited area. As the connection between nodes of the network maintains continuity, the communication is preserved even if one device is moving around in the limited area (50–100 m) [37]. This means that for these reasons, the WLAN technology could be used to estimate the location of a mobile device within this network. The positioning accuracy required to offer satisfactory LBSs is in the order of 1 m and a great effort is needed in R&D. The expansion of this field of research is expected to continue for years, beside numerous commercial applications, due to the fact that it is a low cost solution providing proper connectivity and high speed links. In fact, nowadays, the WLAN infrastructure is widespread in many indoor environments and it is already standardize for commercial smartphone communication.

Usually, an indoor environment is often complex, characterized by nonline-of-sight (NLOS) of target objects; in these situations, WLAN positioning technologies could be very helpful because they do not require the line of sight. Unfortunately, compared to IRBL procedure, WLAN positioning is affected by a large estimation error, proportional to the number, and position of nodes in the network. Others challenging issues are the power consumption and the signal attenuation.

Pros and cons of WLAN positioning are true in function of the techniques of positioning used. The most popular WLAN positioning method is based on the received signal straight indicator (RSSI) because it is easy to extract from any connected device in a Wi-Fi network [17]. The RSSI method is based on the received signal power and on the relation between the signal attenuation and distance of the nodes. Knowing the strength of the emitted signal, the strength of the received signal, is possible to calculate its attenuation and consequently the distance between the emitter and the receiver. With these techniques, it is possible to combine different strategies for positioning, like propagation modeling, fingerprinting, cell of origin, and multilateration [30]. To obtain a most precise localization, it is necessary to combine the technique of fingerprinting [37] that consists an *a priori* analysis to map the observed signal strength of fixed routers in every place of the indoor environment. With this data it is possible to generate a database (i.e., a radio map). The limitation of this method is the necessity of *a priori* information, an effort that means an increased workload and a well-spread router network. The propagation model differs from the fingerprinting model because it tries to determinate the RSSI map analytically instead of empirically. Of course, the major issues are related with the right description and modeling of the environmental effects (moving objects, signal attenuation, multipath) [5].

Another way to locate a device in a Wi-Fi positioning system is the cell of origin (CoO) method, with which the receiver position is made to coincide with the coordinate of the access point (AP) generating the highest RSSI value. Due to the spatial distribution of the APs in an indoor environment, this type of techniques is able to reach location with errors around 10–20 m [14].

Finally, multilateration methods, like time of arrival, time difference of arrival, angle of arrival, and so forth, are less common for WLAN positioning due to computational complexity of these kinds of measurements in mobile devices [26].

A literature review on WLAN systems for indoor positioning has been published by He et al. in 2016 [18]. There are many previous research studies on indoor Wi-Fi localization that pursue different goals and use different methods and technologies also in the function of the field of interest of the research groups. In particular, besides the numerous interesting works on positioning and navigation on self-made mobile devices with sensor integration, there are some researches exploiting the embedded sensors in COTS (Commercial On-The-Shelf) smartphone. Some interesting work exploits the integration of inertial sensor-based positioning with Wi-Fi capability of smartphones [7, 31]. For example, in [7] the authors propose a sensor fusion framework for combining Wi-Fi, pedestrian dead reckoning (PDR) and landmarks. The whole system runs on a smartphone and Android app is developed for real-time indoor localization and navigation. The established accuracy is 1 m. An interesting multimodal approach of Wi-Fi navigation is described in [21], where PDR carried out with only low cost sensors and Wi-Fi smartphones are issued in a cooperative positioning operation made by a certain number of participants. The size of the error becomes smaller when the number of participants rises (5 m for 50 devices). Some use GPS integration for cloud-based LBSs [3], while other researchers introduce sensors fusion between Wi-Fi and CIS for accurate indoor positioning [32] or for augmented reality navigation [1].

A comprehensive and complete view on indoor positioning systems implemented today, with its applications and obtainable positioning accuracies, is described in [18].
