**2.2 Single-site system model**

In wireless communication, the Base Station (BS) and User Equipment (UE) engage in point-to-point communication as shown in **Figure 1**. The BS may function

*Localization Techniques in Multiple-Input Multiple-Output Communication: Fundamental… DOI: http://dx.doi.org/10.5772/intechopen.112037*

as an Access Point (AP) or as another device in device-to-device communication. Typically, the BS has multiple antenna array elements while the UE may have one or more antenna elements. A general assumption is that the BS and UE are located in the far-field zones of each other, and multiple propagation paths exist between them. Multipaths arise from either reflection off objects or scattering [43]. Typically, there is a Line-of-Sight (LOS) path and several Non-LOS (NLOS) paths. The LOS path can be blocked, in which case only NLOS paths may exist.

Regardless of which side transmits the signal, the propagation path geometry between the BS and UE remains the same. Each path is characterized by an Angle-of-Departure (AoD), an Angle-of-Arrival (AoA), a Time-of-Arrival (ToA), and a complex gain. Since the signal geometry is invariant, it is possible to use AoA and AoD interchangeably. The AoD and AoA are vectors that define the azimuth and elevation angles in 3D space, while ToA represents the time it takes for the propagating signal to travel from the transmitter to the receiver. The ToA is sometimes referred to as the propagation path *delay*. ToA is equal to the length of the path traveled (*d*) divided by the speed of light (*c*):

$$
\mathfrak{r} = \mathfrak{d}/\mathfrak{c}.\tag{1}
$$

The 2D multipath propagation geometry is illustrated in **Figure 2**. In the LOS case, the shortest distance between the BS and UE represents the path traveled by the LOS signal. Furthermore, **Figure 2(a)** shows the AoD from the BS *θ<sup>t</sup> LOS* and AoA at the UE *θr LOS*. On the other hand, the NLOS propagation path can be modeled using a *virtual BS* (BS') [43] as depicted in **Figure 2(b)**. A NLOS path can be thought of as direct path from a virtual node behind the reflecting surface. The virtual BS is on the opposite side of the reflecting surface, maintaining the same distance from it as the original BS, resulting in *dNLOS*<sup>1</sup> ¼ *d*<sup>0</sup> *NLOS*1. The total path traveled by the NLOS signal is *dNLOS* ¼ *dNLOS*<sup>1</sup> þ *dNLOS*2. Furthermore, the AoD from the virtual BS can be calculated as *<sup>π</sup>* � *<sup>θ</sup> <sup>t</sup> NLOS*, where *θ<sup>t</sup> NLOS* is the AoD at the original BS.

#### **2.3 Channel model**

The wireless communication community has widely adopted the COST 2100 MIMO channel model [44] as the predominant geometric channel model. This model

**Figure 2.**

*(a) LOS propagation path geometry for estimating relative location of the UE with respect to the BS. (b) NLOS propagation path and virtual BS (BS') geometry for estimating relative location of the UE with respect to the BS.*

expresses that a propagation environment can be defined by a set of scatterers that create clusters of multipath components. The model is applicable for both sub-6 GHz and mmWave band frequencies.

Consider a MIMO Orthogonal Frequency-Division Multiplexing (OFDM) wireless system, in which the BS and the UE are equipped with antenna arrays with *NB* and *NU* elements, respectively. The system uses OFDM signaling with *NC* subcarriers and the wideband channel has *L* taps. The received signal at the *l th* subcarrier of the UE antenna array can be expressed as

$$\mathbf{y}\left[l\right] = \mathbf{H}\left[l\right]\mathbf{s}\left[l\right] + \mathbf{n}\left[l\right].\tag{2}$$

Here, *<sup>y</sup>*½ �*<sup>l</sup>* <sup>∈</sup> *NU*�<sup>1</sup> denotes the received signal, *<sup>H</sup>* ½ �*<sup>l</sup>* <sup>∈</sup> *NU*�*NB* represents the channel matrix, *<sup>s</sup>*½ �*<sup>l</sup>* <sup>∈</sup> *NB*�<sup>1</sup> represents the transmitted signal, and *<sup>n</sup>*½��*<sup>l</sup>* **<sup>N</sup> <sup>0</sup>**, , *<sup>σ</sup>*<sup>2</sup> ð Þ*<sup>I</sup>* denotes the noise at the receiver.

The propagation paths between the BS and the UE can be split into *C* distinguishable path clusters, with each cluster containing *RC* distinguishable paths. Each path cluster is characterized by a mean time delay *τ* ð Þ*k <sup>m</sup>* , *k*∈ 1, … ,*C*, *m* ∈1, … , *RC*, a mean AoD *θtx <sup>c</sup>* , *ϕtx <sup>c</sup>* <sup>∈</sup> ½ Þ 0, *<sup>π</sup>* , and a mean AoA *<sup>θ</sup>rx <sup>c</sup>* , *ϕrx <sup>c</sup>* ∈ ½ Þ 0, 2*π* . Each cluster contributes *RC* paths between the transmitter and the receiver, where each path has a relative time delay *τcm* (relative with respect to mean), a relative AOD *θ tx cm* , *<sup>ϕ</sup>tx cm* , a relative AOA *<sup>θ</sup>rx cm* , *<sup>ϕ</sup>rx cm* , and a complex path gain *αcm* . The mean and relative paths are illustrated in **Figure 3**.

#### **2.4 Channel state information (CSI)**

Assuming the channel model defined above, the complex baseband delay-ℓ MIMO channel matrix *<sup>H</sup>* ½ � <sup>ℓ</sup> <sup>∈</sup> *NU*�*NB* can be written as [45, 46]

**Figure 3.** *The mean and relative paths of a NLOS path.*

*Localization Techniques in Multiple-Input Multiple-Output Communication: Fundamental… DOI: http://dx.doi.org/10.5772/intechopen.112037*

$$\mathbf{H}[\boldsymbol{\ell}^{\varepsilon}] = \sqrt{\frac{N\_{\mathrm{B}}N\_{\mathrm{U}}}{P\_{\mathrm{pl}}}} \sum\_{\varepsilon=1}^{C} \sum\_{c\_{\mathrm{m}}=1}^{R\_{\mathrm{C}}} a\_{\varepsilon\_{\mathrm{m}}} \mathbf{e}\_{\mathrm{x}} \Big(\boldsymbol{\theta}\_{\varepsilon}^{\mathrm{x}\mathrm{x}} + \boldsymbol{\theta}\_{\varepsilon\_{\mathrm{m}}}^{\mathrm{x}\mathrm{x}}, \boldsymbol{\theta}\_{\varepsilon}^{\mathrm{x}\mathrm{x}} + \boldsymbol{\phi}\_{\varepsilon\_{\mathrm{m}}}^{\mathrm{x}\mathrm{x}}\Big) \mathbf{e}\_{\mathrm{x}}^{H} \Big(\boldsymbol{\theta}\_{\varepsilon}^{\mathrm{x}\mathrm{x}} + \boldsymbol{\theta}\_{\varepsilon\_{\mathrm{m}}}^{\mathrm{x}\mathrm{x}}, \boldsymbol{\phi}\_{\varepsilon}^{\mathrm{x}} + \boldsymbol{\phi}\_{\varepsilon\_{\mathrm{m}}}^{\mathrm{x}\mathrm{x}}\Big) \delta(\boldsymbol{\ell}^{\prime}\boldsymbol{T}\_{\mathrm{s}} - n\_{\mathrm{c}}\boldsymbol{T}\_{\mathrm{s}}), \tag{3}$$

where ℓ ¼ 0, 1, … , *L* � 1*:* Furthermore, *Ppl* indicates the pathloss between the transmitter and the receiver, while *<sup>e</sup>tx*ð Þ *<sup>θ</sup>*, *<sup>ϕ</sup>* <sup>∈</sup> *NB*�<sup>1</sup> and *<sup>e</sup>rx*ð Þ *<sup>θ</sup>*, *<sup>ϕ</sup>* <sup>∈</sup> *NU*�<sup>1</sup> denote the antenna array response vectors of the transmitter and the receiver, respectively. *δ*ð Þ*t* is the Dirac function, *Ts* is the signaling time, and *ncm* ¼ ⌊ *τc*þ*τcm Ts* ⌋.

The channel matrix at subcarrier *k*, denoted as H½ � *k* , can be written as <sup>H</sup>½ �¼ *<sup>k</sup>* <sup>P</sup>*L*�<sup>1</sup> <sup>ℓ</sup>¼<sup>0</sup> *<sup>H</sup>* ½ � <sup>ℓ</sup> *<sup>e</sup>* �*j* 2*πk NC* ℓ . The overall Channel Frequency Response (CFR) matrix, denoted as **H**, can be expressed as H ¼ ½ � H½ � 0 , H½ � 1 , … , H½ � *NC* � 1 , where *Nc* is the number of subcarriers. This matrix is also known as the Channel State Information (CSI) and its estimation is referred to as the channel estimation problem.

The direct measurement of CSI is possible using MIMO-OFDM systems with fully digital beamforming which is available at sub-6 GHz bands. However, in the mmWave band, only analog beamforming is available, making direct CSI measurement not feasible. Instead, estimation techniques are used to obtain the CSI indirectly [47]. Channel estimation in mmWave massive MIMO channel is under extensive research and several CSI estimation methods have been proposed to this end [48–50]. Accurate estimation of these parameters is crucial for effective localization.

#### **2.5 Angle-delay-profile (ADP)**

Assuming a single antenna at the UE and a uniform linear array antenna at the BS, the ADP is a linear transformation of the CSI computed by multiplying it with two Discrete Fourier Transform (DFT) matrices *V* ∈ *NB*�*NB* and *F* ∈ *NC*�*NC* . The ADP matrix *G* ∈ *NB*�*NC* is defined as follows [51]

$$\mathbf{G} = \mathbf{V}^{H} \mathcal{H} \mathbf{F},\tag{4}$$

where *V* ∈ *NB*�*NB* is defined as

$$[\mathbf{V}]\_{i,k} \stackrel{\Delta}{=} \frac{\mathbf{1}}{\sqrt{N\_B}} e^{-j2\pi \frac{\left(i\left(k - \frac{N\_B}{2}\right)\right)}{N\_B}},\tag{5}$$

and *F* ∈ *NC*�*NC* as

$$[\mathbf{F}]\_{i,k} \stackrel{\Delta}{=} \frac{1}{\sqrt{N\_C}} e^{-j2\pi \frac{ik}{N\_C}},\tag{6}$$

where *i* ¼ 0, … , *NC* � 1 and *k* ¼ 0, … , *NB* � 1.

This transformation has proven to be quite useful for various localization applications. **Figure 4** illustrates an example of the magnitude of the raw CSI ∣H∣ and its ADP transformation ∣*G*∣. The transformation converts the data into a sparse representation which has shown to improve the performance and generalizability of data-driven models [52]. Furthermore, in the visual representation of the raw CSI data, the scattering characteristics of the multipaths are ambiguous [53]. In contrast, the ADP

**Figure 4.**

*(a) Raw CSI data of a OFDM-MIMO system with 30 sub-carriers. The BS is equipped with a uniform linear array antenna with 30 antenna elements, and UE with single antenna. (b) the ADP transformation of the CSI in (a) with LOS and NLOS path clusters labeled.*

provides semantic visual interpretation of the channel multipath, where ½ � *G <sup>i</sup>*,*<sup>k</sup>* denotes the power of path associated with the angle

$$\theta\_k = \arccos\left(\frac{2k - N\_B}{N\_B}\right),\tag{7}$$

and delay

$$
\pi\_i = iT\_s.\tag{8}
$$

The semantic visual interpretation means that the path clusters can easily be identified visually in the ADP. Referring to **Figure 4(b)**, the strongest peak in the ADP is the LOS path cluster and the remaining peaks are NLOS path clusters. This information is not visually observable in the raw CSI in **Figure 4(a)**.

#### **2.6 Received signal strength indicator (RSSI)**

The RSSI is a metric used in wireless communication systems that measures the strength of a received signal. RSSI parameters are typically used in distributed (or cell free) MIMO localization systems. Cell-free MIMO uses a large number of distributed antennas and MIMO techniques to improve coverage, capacity, and reliability compared to single-site MIMO system shown in **Figure 1**. Specifically, it aims to improve the performance of single-site MIMO systems by dynamically assigning antennas to users based on their location and available resources.

An example of a distributed MIMO system is illustrated in **Figure 5**. In this example, there are multiple BSs distributed in the environment. The RSSI is measured for each BS to create a RSSI vector **p** ¼ *p*1, *p*2, … , *pM* , where *M* represents the number of BSs and *pi* is the RSSI from the *i th* BS. The RSSI vector should be unique for every location in the environment. To ensure the uniqueness of the RSSI vector, multiple BSs are necessary.

#### **2.7 Channel parameters summary**

The common channel parameters discussed above are summarized in **Table 1**. These parameters can be used individually or in combination to estimate the location *Localization Techniques in Multiple-Input Multiple-Output Communication: Fundamental… DOI: http://dx.doi.org/10.5772/intechopen.112037*

#### **Figure 5.**

*Illustration of a distributed MIMO system with four BSs and one UE.*


#### **Table 1.**

*List of MIMO channel parameters utilized in localization techniques.*

of a UE. For instance, to define a propagation path, AoD or AoA is often used in conjunction with ToA.

### **3. Localization techniques**

Localization is an extensive area of research in wireless MIMO communication and several different approaches have been proposed to solve this problem. This section provides an overview of the common localization techniques in sub-6 GHz and mmWave MIMO systems.

#### **3.1 Map-assisted localization**

Map-assisted localization techniques leverage 2D or 3D environment maps along with channel parameters to determine the location of UEs. The map provides information about the scattering surfaces and other obstacles in the environment. Then, by utilizing the AoD and delay of the signal path, multiple beam paths can be traced from the BS to the UE. This is illustrated in **Figure 6**. The paths are traced using the geometry defined in **Figure 2**. The point where these paths intersect is the UE's

**Figure 6.**

*Map-assisted localization using propagation path tracing. The figure illustrates the LOS path and three NLOS paths between the BS and UE. The intersection of these four paths represents the UE's location.*

location. The minimum requirement to localize the UE is the AoD of two different paths. Alternatively, the UE can be localized if the angle and delay of a single path are known. The delay is used to estimate the length of the path by solving for *d* in (1). However, more precise localization is achieved by utilizing multiple paths and incorporating both angle and delay information. Furthermore, since the communication is bi-directional, either AoD or AoA can be used to estimate the UE's location.

When analog beamforming is available, which is typically at lower frequency bands (i.e. sub-6 GHz), the angle and delay can be directly measured. However, the mmWave bands digital beam forming is still prevalent, which does not enable measuring angle and delay directly. Therefore, angle and delay parameters have to be estimated. One approach to this problem is to estimate CSI and convert it to ADP. Then, the angle and delay can be estimated using (7) and (8), respectively.

#### **3.2 Localization using compressive sensing techniques**

*Compressive Sensing* (CS), also known as compressed sensing or sparse sampling, is a signal processing technique that allows for the reconstruction of a sparse signal from a small number of measurements or samples. CS has found its way in many applications [54, 55]. *Sparsity* is the property of a signal or data representation whereby a small number of coefficients or elements carry most of the signal's energy or information content, while the majority of coefficients or elements are zero or close to zero [56]. In fact, many real-world signals are sparse or compressible in either their original domain or some transform domain, such as Fourier or wavelet transforms [57]. An example is shown in **Figure 4**, where the raw CSI data is transformed into ADP to create a sparse representation. As may be observed in the ADP, the multipath components are concentrated into only a few clusters creating a sparse representation.

CS techniques have found many applications in wireless MIMO communication by exploiting the sparsity of channel model parameters [57, 58]. These applications include channel estimation, spectrum sensing, and localization. Channel estimation provides information on the AoA/AoD and ToA of the paths and thus the relative location of the UE with respect to the BS can be estimated.

In mmWave MIMO communication, channel estimation and localization are typically combined. The idea behind sparse channel estimation is that the system can make only a few random measurements which are then used to reconstruct channel model parameters using CS techniques. A commonly used CS technique in mmWave

*Localization Techniques in Multiple-Input Multiple-Output Communication: Fundamental… DOI: http://dx.doi.org/10.5772/intechopen.112037*

MIMO channel estimation is Distributed Compressive Sensing - Simultaneous Orthogonal Matching Pursuit (DCS-SOMP). DCS-SOMP is typically used to estimate AoA/AoD and ToA [59–61]. Once the angle and delay channel parameters are recovered, the relative UE location can be estimated from the LOS path directly as shown in **Figure 2(a)**. When LOS is not available, the location can be estimated from the NLOS path by applying the virtual BS concept as shown in **Figure 2(b)**.

#### **3.3 Fingerprinting-based localization**

*Fingerprinting* is a data-driven localization technique that typically consists of two phases: offline phase and online phase. During the offline phase, the locations in the environment are mapped to a unique wireless measurement to create geo-tagged fingerprint database [62]. The unique wireless measurements are referred to as fingerprints. The measurements can be any wireless parameter such as RSSI, CSI/ADP, AoA/AoD or ToA. Then, during the online phase, the new measurement (fingerprint) is compared to the geo-tagged database to estimate the UE's location. The underlying principle behind fingerprinting is that the wireless channel between the UE and BS is uniquely determined by the scattering environment surrounding the UE's location [63]. Therefore, each location has a unique fingerprint. Matching a new wireless measurement to the measurements in the geo-tagged dataset typically involves a machine learning model. The training is performed during the offline phase. The most common fingerprinting models are based on Deep Learning (DL), Gaussian Progress Regression (GPR), or clustering and classification models.

RSSI-based fingerprinting is commonly used in wireless systems that have rich AP distributions such as Wireless Sensor Networks (WSNs) [64–66], Wi-Fi networks [67, 68], or Distributed Massive MIMO (DM-MIMO) systems [69, 70]. Since the RSSI provides a single measurement from the BS or AP, multiple APs are required to generate a unique fingerprint. On the other hand, single-site localization takes advantage of the multipath characteristics of the MIMO channel which are captured in CSI data or the angle and delay parameters that define the multipath. Furthermore, the CSI fingerprint can be used in its original form or it can be transformed into ADP.

#### *3.3.1 Application of deep learning techniques*

Deep Learning Neural Networks (DL NNs) require a large training dataset that covers the entire environment. The input to the NN is the wireless measurement and the output is the UE location. Several different NN architectures have been proposed in fingerprinting-based localization, including Multiple-Layer Perception (MLP) networks, [71, 72], Convolutional Neural Networks (CNNs), [51, 63, 73–75] and Recurrent Neural Networks (RNNs) [53].

Thus far, CNN models have demonstrated the highest localization accuracy performance. The CNN model treats the input fingerprint as a 2D image and performs series of convolutions over multiple layers to establish the spatial correlation in the 2D input. Typically, raw CSI or transformed ADP fingerprints are used for this application. The sparsity of ADP enhances the CNN model both from a computational complexity and a learning point-of-view [76]. RNN models are time series models that can track the changes of the input over time to predict the next UE location. RNN models can predict changes in the environment and account for these changes in the location estimation. RNN models are also used to predict the future location of the UE.

**Figure 7.** *Fingerprinting based multi-level classification grid of the environment map.*

These networks can either be postulated as classification or regression models. In the classification models, the environment is usually divided into grids where each grid represents a class. If the area is larger, it is not uncommon to have multiple levels of classification, where each grid may be subdivided into smaller grids as shown in **Figure 7**. In general, the first level employs a CNN classification model (coarse search), whereas the second level utilizes a different machine learning algorithm to perform a fine search. In addition to increasing the complexity of the model, the multi-layer approach is more susceptible to errors. If at the first stage, the grid is classified incorrectly, then the error propagates into the second stage. Furthermore, the accuracy of the classification model is limited to the size of the grid. On the other hand, the goal of regression is to find a function or equation that best describes the relationship between the input and output variables. Therefore, regression models predict a continuous output variable and the accuracy is not limited to the grid as in classification.

#### *3.3.2 Application of Gaussian process regression models*

A *Gaussian Process* (GP) is a collection of random variable functions indexed by time or space. The key property of a GP is that any finite subset of the random variables is jointly Gaussian distributed. That is, for any finite set of vector elements **x**1, … , **x***<sup>n</sup>* ∈ X, the associated set of random variables *f*ð Þ **x**<sup>1</sup> , … , *f*ð Þ **x***<sup>n</sup>* follow a joint Gaussian distribution. The following notation is commonly used in literature to represent the GP

$$f(\mathbf{x}) \sim \mathcal{GP}(m(\mathbf{x}), k(\mathbf{x}, \mathbf{x}')),\tag{9}$$

where the mean and covariance functions are defined as

$$m(\mathbf{x}) = \mathbb{E}[f(\mathbf{x})],\tag{10}$$

$$k(\mathbf{x}, \mathbf{x}') = \mathbb{E}\left[ (f(\mathbf{x}) - m(\mathbf{x}))(f(\mathbf{x}') - m(\mathbf{x}')) \right] \tag{11}$$

for any **x**, **x**<sup>0</sup> ∈ X [77]. Therefore, the GP is entirely defined by its mean and covariance functions [78].

A Gaussian Process Regression (GPR) model is a non-parametric statistical model that uses a GP to model a continuous function and provides a probabilistic prediction with uncertainty estimates [79]. To define the GPR model, assume D*train* ¼ <sup>Δ</sup> **<sup>X</sup>**, **<sup>y</sup>** <sup>¼</sup> Δ **x***i*, *yi <sup>n</sup> i*¼1 , **x***<sup>i</sup>* ∈ ℝ*<sup>d</sup>*, *yi* ∈ ℝ to be an input–output pair training dataset. Furthermore,

*Localization Techniques in Multiple-Input Multiple-Output Communication: Fundamental… DOI: http://dx.doi.org/10.5772/intechopen.112037*

assume that a latent function *f*ð Þ� is responsible for generating the observed output *yi* given the input vector **x***i*. Then, GPR model can be defined as

$$y\_i = f(\mathbf{x}\_i) + \varepsilon\_i,\tag{12}$$

where *<sup>f</sup>*ð Þ� **<sup>x</sup>** GP *<sup>m</sup>*ð Þ **<sup>x</sup>** , *<sup>k</sup>* **<sup>x</sup>**, **<sup>x</sup>**<sup>0</sup> ð Þ ð Þ , *<sup>ϵ</sup>* � **<sup>N</sup>** 0, *<sup>σ</sup>*<sup>2</sup> ð Þ**<sup>I</sup>** is the noise of the system that has an independent, identically distributed (i.i.d.) Gaussian distribution with zero mean and variance *σ*2, and *i* refers to the *i th* observation.

GPR models often assume zero mean as default. The correlation between input points is defined by the covariance function (also known as the kernel). There is a variety of kernels, including exponential, matern, quadratic, and more, each with hyperparameters that can be fine-tuned during training [80]. Given a new testing sample **x <sup>∗</sup>** , the mean and variance (uncertainty) of the unknown output **y <sup>∗</sup>** are predicted as

$$\overline{\mathbf{y}}\_{\*} = \mathbf{K}\_{\*}^{\mathbf{T}} \left(\mathbf{K} + \sigma\_{n}^{2}I\right)^{-1} \mathbf{y},\tag{13}$$

$$\mathbb{V}\left[\overline{\mathbf{y}}\_{\*}\right] = \mathbf{K}\_{\*}\, \mathop{\mathbf{s}}\nolimits - \mathbf{K}\_{\*}^{\mathrm{T}}\left(\mathrm{K} + \sigma\_{n}^{2}I\right)^{-1}\mathbf{K}\_{\*}\, \mathrm{s},\tag{14}$$

where **K** ¼ *K*ð Þ **X**, **X** , **K<sup>∗</sup>** ¼ *K*ð Þ **X**, **X<sup>∗</sup>** , and **K∗ ∗** ¼ *K*ð Þ **X<sup>∗</sup>** , **X<sup>∗</sup>** [81]. **K** is the covariance matrix (also known as Gram matrix) whose entries are the kernel functions *k* **x***i*, **x***<sup>j</sup>* � � [79].

In localization, the objective of the GPR model is to define the latent function *f*ð Þ� in (12), where **x***<sup>i</sup>* is the channel parameter (fingerprint) and *yi* is the UE location. Given a new fingerprint **x**<sup>∗</sup> , the UE location is predicted using (13), while the level of uncertainty in the prediction is estimated by (14). The fingerprint in distributed MIMO systems is usually the RSSI vector as proposed in [69, 70, 82]. On the other hand, the input in singlesite MIMO systems can be AoA/AoD vector, CSI or ADP data as proposed in [83, 84].

The main advantage of GPR models over DL CNN models is that they can be trained on substantially smaller datasets. GPR models have shown the ability to train models with small-scale datasets due to the small number of hyper-parameters that define the model [84]. However, the GPR model does have its drawbacks. The main weakness of the GPR model lies in its training complexity, which is characterized by high computational and memory demands. Specifically, GPR training has a computational complexity of *O n*<sup>3</sup> ð Þ and a memory complexity of *O n*<sup>2</sup> ð Þ, where *<sup>n</sup>* represents the number of training points in the dataset [85].

#### *3.3.3 Clustering and classification*

*Clustering* is a class machine learning algorithms used to group similar objects or data points together into clusters. The groupings are based on some similarity or distance measures. The goal of clustering is to identify patterns or structures in the data that may not be immediately apparent, and to group similar data points into clusters that can be easily analyzed or visualized. In localization, clustering techniques can be used to compare the test fingerprints to the fingerprints in the training database. Kmeans clustering and K-Nearest Neighbor (KNN) classification have widely been used in fingerprinting-based wireless localization and have shown to provide excellent accuracy given enough data point [86, 87]. The KNN location estimation is given by

$$
\hat{\mathbf{p}} = \frac{1}{K} \sum\_{i=1}^{K} \mathbf{p}\_i,\tag{15}
$$


#### **Table 2.**

*Methods in MIMO localization, categorized according to the localization technique employed and the parameters utilized.*

where *K* is the number of surrounding neighbors considered and *pi* is the coordinate of the *i th* nearest reference point. Weighted KNN (WKNN) is an extension of KNN where the contribution of each neighbor is weighted. The WKNN is defined as

$$
\hat{\mathbf{p}} = \sum\_{i=1}^{K} w\_i \mathbf{p}\_i,\tag{16}
$$

where *wi* is the weight of the *i th* reference point. Typically, the weight corresponds to the distance between the reference point and the input point. The closer the neighbor is to the input point, the more weight it carries in the final prediction. The weights can also be defined by some similarity criteria calculated between the input and the reference fingerprint. Various similarity criteria have been established in wireless MIMO communication, such as normalized correlation [53, 86, 88], Joint Angle Delay Similarity Coefficient (JADSC) [63], Angular Similarity Coefficient Weight (ASCW) [89], and Weighted Mean Square Error (WMSE) [90].

#### **3.4 Summary of methods**

**Table 2** provides a summary of the methods proposed in recent years that apply the localization techniques discussed in the previous subsection. The techniques are also grouped by the type of communication parameter used with the associated technique.
