*2.1.2. Localization estimation methods*

In addition to the environment representation, i.e., the map, the methods that estimate the MAV's pose also play an important role in localization systems. Usually, they receive as input the map of the environment and the sensor reading, and then the goal is to find the part of the map that best matches with the sensor reading. In **Figure 2**, the localization estimation method is within the localization system, **Figure 2(b)**, and together with the motion model, it is possible to properly estimate the MAV's pose, **Figure 2(c)**.

Different from the previous section, which introduced the maps used by the MAV localization systems in big groups, such as the 2D satellite images or the marker maps, the localization estimation methods do not share the same similarity between themselves. Then, here they are discussed individually, and as well as in the previous section, their advantages and disadvantages are highlighted.

It is natural that the works that rely on 2D satellite image as a map have an estimation method that is based on image comparison, since their sensor readings are images. The general idea is to compare every MAV image with different patches from different poses within the map, and the most similar one probably represents the MAV's pose in the map. To extract the so-called patches from the map, some global localization works use the Monte Carlo algorithm to extract patches from the whole map [18, 31], whereas the local localization ones, given that the MAV's initial pose is known, only extract a patch from the initial pose and keep extracting them as it moves through the environment [21, 26]. When comparing the MAV images against the patches, each Monte Carlo-based work proposed a novel measurement model, in which one has a new image descriptor called abBRIEF to robustly compute an image signature to the comparison [31], and the other used SURF descriptors [32] and machine learning to compare MAV images and the patches [18]. Both approaches can compute the similarity between all pairs of MAV image and patch and find the most similar pair. On the other hand, the other local localization approaches have a reduced search space, since they know the initial patch from the satellite image. In this case, the image comparison is mainly made by two methods: either by template matching [22], in which a pixel-by-pixel comparison is performed between two images, or by feature matching [26], that involves the detection, extraction and matching of the features from the two images. In general, given that all these works rely on an image, they use either features or image descriptors to represent the images and then perform the comparison. In their proposal, they aim to overcome the problem of illumination and color changes that occur when dealing with images and outdoor environment, as the case of MAV localization systems.

The alignment technique is also used by other approaches to estimate the MAV's pose, even the ones that rely on 3D maps. In [23], the alignment is made considering the 2D keypoints from the MAV image and the 3D landmarks of the 3D map. To do so, the authors cluster the landmarks into visual words to speed up the matching and alignment with a nearest neighbour search. This 2D to 3D alignment, or transformation, is also applied in another work [19]. Given that the map in this work is a 3D representation of the environment, but the MAV image is a simple 2D RGB image, they have to transform the MAV image into a 3D data, to then align the lines and edges detected in both. As these both works perform local localization, they have an initial reduced search space, which helps them to have a good alignment at the beginning.

Besides the estimation methods presented so far, other ones are even more specific. In [25], for instance, a robust and quick response landing pattern is designed to be visually detected through images and then assist the MAV to its landing. In such a case, the pattern is the map, and the computer vision method proposed by the authors of this work can detect the scale of the map and then estimate the MAV localization. In [24] a markers detection-based approach is also proposed to estimate the MAV's pose. However, in contrast to [25], in [24] the markers are ultraviolet LED that are embedded in the MAVs. Hence, the estimation, in this case, is a mutual one, i.e. one MAV estimates its pose in relation to another one and vice versa. First, their algorithm detects the size of the markers in the image, and then it estimates the internal distance between a pair of markers. Therefore, they can calculate the distance between two MAVs and their pose. In addition to that, in [27] tether-based feedback and inertia sensing are used to estimate the MAV's pose. In more details, the length, azimuth, and elevation angle of the tether are the input for a mechanics model that calculates the absolutely straight tether between the origin and the MAV. The work [28] also relies on a non-popular sensor to estimate the MAV's pose, and the goal is to detect access points (AP) and measure the received signal strength. Then, they can estimate the MAV's pose relative to the APs that have their positions well defined in the map. A similar approach is proposed in [30], in which the MAV's pose is estimated in an urban environment by the transmission of beacons. They are located in different buildings, and they provide a local frame of reference, supporting the MAVs for their location estimation by providing the details of the area and height of the buildings. It is also possible to say that sonar is another type of sensor not easily found embedded on MAVs, and this is the sensor used in [29]. To estimate the MAV's pose, the authors proposed a multi-ray model based on the four sonars sensors embedded in a MAV. This model approximates a beam pattern accurately, and it does not require high computational power.

In general, the localization estimation methods are responsible for comparing the sensor reading and a sample of the map. In another way, it is also known as a transformation from the local coordinate system, i.e., the robot sensor reading, to a global one, i.e., the map. Given the works presented in this section, we can notice that sometimes the sensor reading and the map are different data, such as 2D images from a regular RGB camera and a 3D map. Also, in some cases, these methods have to estimate the MAV's pose based on an outdated map. Because of that, they have to be robust against differences between the real and the mapped environment, and even though they might not represent the same area, the pose should be estimated.
