**1. Introduction**

In multiple-input multiple-output (MIMO) communication systems, both the transmitters and receivers are equipped with several antennas which will help in achieving high gains in spectral, power, and energy efficiency compared to conventional single-input single-output (SISO) systems where both the transmitters and receivers have only one antenna each. As a matter of fact, the MIMO systems have the ability to turn multipath propagation and multipath delay spread into a benefit for the receiver. The key advantage of MIMO systems is the many orders of magnitude of the signal-to-noise ratio (SNR) at no extra bandwidth. However, a non-negligible software and hardware processing complexity is added at both sides (transmitter and receiver). Present wireless communication standards including Wi-Fi standards like IEEE 802.11n/ac, long-term evolution (LTE), and WiMAX are considering MIMO technology as a key element. Moreover, in the next generation of wireless technology systems (i.e., 5G), massive MIMO is emerging as a new research field in which base stations are equipped with 100 or more antennas. At

the receiver side, designing reliable and energy-efficient MIMO detectors is a very challenging task, because of the complexity of the implementation of the signal detection due to the interfering sub-streams. The signal detection problem refers to finding the most probable transmitted symbols based on the perfect channel state information (CSI) available at the receiver and the received signal.

(MMSE), and successive interference cancelation (SIC) are used to deal with massive MIMO detection. In [1, 2], the authors presented surveys on various MIMO and massive MIMO detection techniques from algorithmic viewpoints. Although many classical massive MIMO detectors have been proposed in the literature, herein, new recent algorithms based on the application of machine learning, geometrical tech-

In this chapter, we propose an overview of the SDM detection algorithms. We specifically stress out the different paradigms that are used to solve the detection problem and compare all of them. Thus, we describe the most well-known and promising MIMO detectors, as well as some unusual-yet-interesting ones. Section 2 presents the framework and the assumptions that are used in the remainder section. Section 3 introduces the maximum likelihood (ML) optimal detector, and then Section 4 describes the linear ones. Section 5 details algorithms based on the interference cancelation, and Section 6 discusses the one based on tree-search. Finally, Section 7 highlights unusual-yet-interesting detectors before Section 8 concludes the chapter. **Figure 1** provides an overview of all the detectors described in this

In the SDM framework, data streams are transmitted at the same time and at the same frequency, and the receiver relies on spatial consideration to distinguish the streams. Herein, we assume that the MIMO transmitter does not use any spatial coding and that all data streams are independent. To give the reader a unified mathematical description through this chapter, we adopt the following notation: scalars, vectors, and matrices are denoted by lower-case, bold-face lower-case, and bold-face higher-case letters, respectively. We call **v**ð Þ*i* the *i*th coefficient of the vector **v**, and **H**ð Þ *i*, *j* is the element of the *i*th row and *j*th column in the **H** matrix. In the linear input–output MIMO model where data are transmitted as the symbols of a constellation Φ, the received vector **y** ∈ *<sup>M</sup>* is the result of the emitted symbols **x**∈ Φ*<sup>N</sup>* propagated through the channel **H** and added to an additive noise **w**.

and the MIMO detection problem then refers to the combinatorial optimization

Assuming a circularly symmetric Gaussian noise, solving Eq. (2) is equivalent to

searching the most probable emitted symbol vector based on the signal on each receive antennas and the channel state. Even if <sup>∥</sup>**<sup>y</sup>** � **Hx**∥<sup>2</sup> is a convex function with respect to *x*, the detection problem is not a convex optimization problem due to the discrete feasible solution set Φ*<sup>N</sup>*. As a result, a special algorithm has to be used, and

Let us start by outlining the traditional assumptions that we will use in the present chapter. Although many constellation types could be used in MIMO systems, we limit the discussion to the square quadrature amplitude modulations (QAMs) that are most commonly investigated. Besides, the channel is considered

<sup>∥</sup>**<sup>y</sup>** � **Hx**∥<sup>2</sup>

arg min **x**∈ Φ*<sup>N</sup>*

**y** ¼ **Hx** þ **w** (1)

*:* (2)

niques, and bioinspired methods are presented and discussed.

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation*

*DOI: http://dx.doi.org/10.5772/intechopen.93089*

**2. Introduction to MIMO detection algorithms**

This model leads to the following equation:

this chapter will describe the most common ones.

problem:

**205**

chapter as a tree mind map.

The hardware implementation of massive MIMO detector is of particular interest to deal with 5G wireless technology. Optimal massive detectors such as the maximum likelihood detector (MLD) or the sphere decoding (SD) are considered infeasible given their high computational complexity. Hence, low computational complexity algorithm achieving near-optimal performance is required; many existing detection algorithms like zero forcing (ZF), minimum mean-square error

#### *A Review to Massive MIMO Detection Algorithms: Theory and Implementation DOI: http://dx.doi.org/10.5772/intechopen.93089*

(MMSE), and successive interference cancelation (SIC) are used to deal with massive MIMO detection. In [1, 2], the authors presented surveys on various MIMO and massive MIMO detection techniques from algorithmic viewpoints. Although many classical massive MIMO detectors have been proposed in the literature, herein, new recent algorithms based on the application of machine learning, geometrical techniques, and bioinspired methods are presented and discussed.

In this chapter, we propose an overview of the SDM detection algorithms. We specifically stress out the different paradigms that are used to solve the detection problem and compare all of them. Thus, we describe the most well-known and promising MIMO detectors, as well as some unusual-yet-interesting ones. Section 2 presents the framework and the assumptions that are used in the remainder section. Section 3 introduces the maximum likelihood (ML) optimal detector, and then Section 4 describes the linear ones. Section 5 details algorithms based on the interference cancelation, and Section 6 discusses the one based on tree-search. Finally, Section 7 highlights unusual-yet-interesting detectors before Section 8 concludes the chapter. **Figure 1** provides an overview of all the detectors described in this chapter as a tree mind map.

## **2. Introduction to MIMO detection algorithms**

In the SDM framework, data streams are transmitted at the same time and at the same frequency, and the receiver relies on spatial consideration to distinguish the streams. Herein, we assume that the MIMO transmitter does not use any spatial coding and that all data streams are independent. To give the reader a unified mathematical description through this chapter, we adopt the following notation: scalars, vectors, and matrices are denoted by lower-case, bold-face lower-case, and bold-face higher-case letters, respectively. We call **v**ð Þ*i* the *i*th coefficient of the vector **v**, and **H**ð Þ *i*, *j* is the element of the *i*th row and *j*th column in the **H** matrix.

In the linear input–output MIMO model where data are transmitted as the symbols of a constellation Φ, the received vector **y** ∈ *<sup>M</sup>* is the result of the emitted symbols **x**∈ Φ*<sup>N</sup>* propagated through the channel **H** and added to an additive noise **w**. This model leads to the following equation:

$$\mathbf{y} = \mathbf{H}\mathbf{x} + \mathbf{w} \tag{1}$$

and the MIMO detection problem then refers to the combinatorial optimization problem:

$$\underset{\mathbf{x}\in\Phi^{N}}{\text{arg min}}\,\|\mathbf{y}-\mathbf{H}\mathbf{x}\|^{2}.\tag{2}$$

Assuming a circularly symmetric Gaussian noise, solving Eq. (2) is equivalent to searching the most probable emitted symbol vector based on the signal on each receive antennas and the channel state. Even if <sup>∥</sup>**<sup>y</sup>** � **Hx**∥<sup>2</sup> is a convex function with respect to *x*, the detection problem is not a convex optimization problem due to the discrete feasible solution set Φ*<sup>N</sup>*. As a result, a special algorithm has to be used, and this chapter will describe the most common ones.

Let us start by outlining the traditional assumptions that we will use in the present chapter. Although many constellation types could be used in MIMO systems, we limit the discussion to the square quadrature amplitude modulations (QAMs) that are most commonly investigated. Besides, the channel is considered

the receiver side, designing reliable and energy-efficient MIMO detectors is a very challenging task, because of the complexity of the implementation of the signal detection due to the interfering sub-streams. The signal detection problem refers to finding the most probable transmitted symbols based on the perfect channel state

*Advanced Radio Frequency Antennas for Modern Communication and Medical Systems*

The hardware implementation of massive MIMO detector is of particular inter-

est to deal with 5G wireless technology. Optimal massive detectors such as the maximum likelihood detector (MLD) or the sphere decoding (SD) are considered infeasible given their high computational complexity. Hence, low computational complexity algorithm achieving near-optimal performance is required; many existing detection algorithms like zero forcing (ZF), minimum mean-square error

*Tree mind map of the detectors described in this chapter and the number or the corresponding section.*

information (CSI) available at the receiver and the received signal.

**Figure 1.**

**204**

memoryless, linear, and flat and with a block fading<sup>1</sup> . In this chapter, we assume that channel state information (CSI) is correctly estimated at the receiver side but not at the transmitter side. The impact of imperfect CSI at the receiver on the performance of detection algorithms is not addressed in the present chapter. Some known training symbols are sent from the transmitter, based on which the receiver estimates the channel before proceeding to the detection of the transmitted data symbols.

detector solves Eq. (2) transforming the constraint from **x**∈ Φ*<sup>N</sup>* to **x**∈ *<sup>N</sup>* such that the problem become an easy-to-solve convex optimization with a known mathe-

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation*

constraint on *x* is reintroduced by quantizing the vector accordingly to the constellation in use. This quantization should lead to a good estimation as after the appli-

highlighting that all the interference are canceled. The previous equation is also the proof that the ZF detector is the optimal linear one regarding the signal-tointerference ratio (SIR) criteria. Indeed, one can see that the vector **T**ZF*:***y** contains

By only focusing on the interference, the ZF detector performance suffers from not taking the noise into account. Indeed, if the noise level is known to the receiver, a Bayesian estimator including this information can provide a better detection. A linear Bayesian estimator minimizing the mean-square error can be derived using

**I** �<sup>1</sup>

with *σ*<sup>2</sup> being the noise variance per real direction. The detector based on this detection matrix, followed by the quantization, is called the minimum mean-square error (MMSE), and it is known to maximize the signal-to-noise-plus-interference ratio (SINR). When the signal-to-noise ratio (SNR) is low (i.e., *σ*<sup>2</sup> is high), the MMSE detector provides better results, jointly minimizing the interference and the noise. Otherwise, when *σ*<sup>2</sup> is very low, the corrective term becomes negligible, and

To improve further the performance, it is necessary to drop the linear detector approach and look for more elaborate decoding algorithms. Historically, the first nonlinear detector type is still based on the principle of canceling signal interference. This concept leads to two approaches: an iterative one named successive interference cancelation (SIC) and a simultaneous version named parallel

The SIC detectors opt for a two-step iterative process: first, a decision is taken on the first position **x**1, and then assuming that the decision was right, the detector corrects **y** by removing the interference that would have been generated by **x**1.

each stream independently plus some noise but without any interference.

**<sup>T</sup>**MMSE <sup>¼</sup> **<sup>H</sup>***<sup>H</sup>***<sup>H</sup>** <sup>þ</sup> <sup>2</sup>*σ*<sup>2</sup>

cation of the detection matrix **T**ZF ¼ **H**þ, Eq. (1) becomes

**4.2 Minimum mean-square error (MMSE) detector**

the orthogonality principle [6] leading to

the ZF and MMSE detectors overlap.

interference cancelation (PIC).

**207**

**5. Interference cancellation detectors**

**5.1 Successive interference cancellation (SIC) detector**

**x**<sup>0</sup> ¼ **H**þ**y** (3)

**T**ZF*:***y** ¼ **x** þ **H**þ**w** (4)

**H***<sup>H</sup>* (5)

**H***<sup>H</sup>* being the left Moore-Penrose pseudoinverse. Then, the

matical solution:

with **<sup>H</sup>**<sup>þ</sup> <sup>¼</sup> **<sup>H</sup>***<sup>H</sup>***<sup>H</sup>** �<sup>1</sup>

*DOI: http://dx.doi.org/10.5772/intechopen.93089*

The channel matrix is modeled as a complex matrix **H** ∈ ð Þ *<sup>M</sup>*,*<sup>N</sup>* . In that case, the element **H**ð Þ *i*, *j* refers to the complex channel gain between the *j*th transmit antenna to the *i*th receive antenna. Many channel models can fit in this framework, and we stick to the most popular one: the uncorrelated Rayleigh fading channel [3, 4]. The uncorrelated channel model provides a good approximation of propagating environments with rich scattering where the signals between the transmitter and the receiver experience many different paths and no strong line of sight between the transmitter and the receiver. This situation occurs, for instance, in urban and indoor conditions. In these conditions, each receiver antenna receives a sum of a large number of signal paths, and the channel transfer functions can be modeled as the realization of a circularly symmetric normal distribution.

#### **3. Maximum likelihood detector**

Obtaining the optimal result requires, in the most straightforward approach, the use of the ML detector that solves Eq. (2) using an exhaustive search. Even if this method gives the best result since all *x*∈ Φ*<sup>N</sup>* are evaluated, it is not suitable for real implementation. Indeed, the number of vectors to be tested grows exponentially with the number of transmit antenna and the constellation size. Thus, the computational cost of evaluating Eq. (2) requires an unrealistic quantity of resources to detect the transmitted vector **x**. That is why a variety of detection algorithm has been developed throughout the past year to achieve the same detection performance of ML detectors while having a tractable complexity.

From a computational theory perspective, the detection problem is an instance of the closest lattice-point search (CLPS) problem with a specified lattice [5]. It has been proved that regardless of the preprocessing on the lattice (i.e., the channel matrix), the problem is always NP-hard [5]. The NP-hardness implies that it is not possible, at the moment, to find any detector that is sure to have both an optimal performance and a polynomial complexity2 . For that reason, all the following detectors have suboptimal performance (which can be very close to optimal) or a non-polynomial worst-case complexity (which can be polynomial is the average case).

#### **4. Linear detectors**

#### **4.1 Zero forcing (ZF) detector**

Linear detectors are the most simple algorithms to solve the detection problem. The most basic one is the ZF algorithm that follows a two-step process. First, the ZF

<sup>1</sup> The nomenclature section provides definitions.

<sup>2</sup> It is widely believed that it does not exist any solution to solve NP-hard problems in polynomial time. Yet, this assumption is not proven so that there is still a possibility that such an algorithm exists and is just not discovered nowadays.

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation DOI: http://dx.doi.org/10.5772/intechopen.93089*

detector solves Eq. (2) transforming the constraint from **x**∈ Φ*<sup>N</sup>* to **x**∈ *<sup>N</sup>* such that the problem become an easy-to-solve convex optimization with a known mathematical solution:

$$\mathbf{x}\_0 = \mathbf{H}^+ \mathbf{y} \tag{3}$$

with **<sup>H</sup>**<sup>þ</sup> <sup>¼</sup> **<sup>H</sup>***<sup>H</sup>***<sup>H</sup>** �<sup>1</sup> **H***<sup>H</sup>* being the left Moore-Penrose pseudoinverse. Then, the constraint on *x* is reintroduced by quantizing the vector accordingly to the constellation in use. This quantization should lead to a good estimation as after the application of the detection matrix **T**ZF ¼ **H**þ, Eq. (1) becomes

$$\mathbf{T}\_{\mathbf{Z}\mathbf{F}\cdot\mathbf{y}} = \mathbf{x} + \mathbf{H}^+\mathbf{w} \tag{4}$$

highlighting that all the interference are canceled. The previous equation is also the proof that the ZF detector is the optimal linear one regarding the signal-tointerference ratio (SIR) criteria. Indeed, one can see that the vector **T**ZF*:***y** contains each stream independently plus some noise but without any interference.

#### **4.2 Minimum mean-square error (MMSE) detector**

By only focusing on the interference, the ZF detector performance suffers from not taking the noise into account. Indeed, if the noise level is known to the receiver, a Bayesian estimator including this information can provide a better detection. A linear Bayesian estimator minimizing the mean-square error can be derived using the orthogonality principle [6] leading to

$$\mathbf{T}\_{\text{MMSE}} = \left(\mathbf{H}^H \mathbf{H} + 2\sigma^2 \mathbf{I}\right)^{-1} \mathbf{H}^H \tag{5}$$

with *σ*<sup>2</sup> being the noise variance per real direction. The detector based on this detection matrix, followed by the quantization, is called the minimum mean-square error (MMSE), and it is known to maximize the signal-to-noise-plus-interference ratio (SINR). When the signal-to-noise ratio (SNR) is low (i.e., *σ*<sup>2</sup> is high), the MMSE detector provides better results, jointly minimizing the interference and the noise. Otherwise, when *σ*<sup>2</sup> is very low, the corrective term becomes negligible, and the ZF and MMSE detectors overlap.

#### **5. Interference cancellation detectors**

To improve further the performance, it is necessary to drop the linear detector approach and look for more elaborate decoding algorithms. Historically, the first nonlinear detector type is still based on the principle of canceling signal interference. This concept leads to two approaches: an iterative one named successive interference cancelation (SIC) and a simultaneous version named parallel interference cancelation (PIC).

#### **5.1 Successive interference cancellation (SIC) detector**

The SIC detectors opt for a two-step iterative process: first, a decision is taken on the first position **x**1, and then assuming that the decision was right, the detector corrects **y** by removing the interference that would have been generated by **x**1.

memoryless, linear, and flat and with a block fading<sup>1</sup>

realization of a circularly symmetric normal distribution.

mance of ML detectors while having a tractable complexity.

**3. Maximum likelihood detector**

and a polynomial complexity2

**4.1 Zero forcing (ZF) detector**

<sup>1</sup> The nomenclature section provides definitions.

**4. Linear detectors**

just not discovered nowadays.

**206**

symbols.

that channel state information (CSI) is correctly estimated at the receiver side but not at the transmitter side. The impact of imperfect CSI at the receiver on the performance of detection algorithms is not addressed in the present chapter. Some known training symbols are sent from the transmitter, based on which the receiver estimates the channel before proceeding to the detection of the transmitted data

*Advanced Radio Frequency Antennas for Modern Communication and Medical Systems*

The channel matrix is modeled as a complex matrix **H** ∈ ð Þ *<sup>M</sup>*,*<sup>N</sup>* . In that case, the element **H**ð Þ *i*, *j* refers to the complex channel gain between the *j*th transmit antenna to the *i*th receive antenna. Many channel models can fit in this framework, and we stick to the most popular one: the uncorrelated Rayleigh fading channel [3, 4]. The uncorrelated channel model provides a good approximation of propagating environments with rich scattering where the signals between the transmitter and the receiver experience many different paths and no strong line of sight between the transmitter and the receiver. This situation occurs, for instance, in urban and indoor conditions. In these conditions, each receiver antenna receives a sum of a large number of signal paths, and the channel transfer functions can be modeled as the

Obtaining the optimal result requires, in the most straightforward approach, the use of the ML detector that solves Eq. (2) using an exhaustive search. Even if this method gives the best result since all *x*∈ Φ*<sup>N</sup>* are evaluated, it is not suitable for real implementation. Indeed, the number of vectors to be tested grows exponentially with the number of transmit antenna and the constellation size. Thus, the computational cost of evaluating Eq. (2) requires an unrealistic quantity of resources to detect the transmitted vector **x**. That is why a variety of detection algorithm has been developed throughout the past year to achieve the same detection perfor-

From a computational theory perspective, the detection problem is an instance of the closest lattice-point search (CLPS) problem with a specified lattice [5]. It has been proved that regardless of the preprocessing on the lattice (i.e., the channel matrix), the problem is always NP-hard [5]. The NP-hardness implies that it is not possible, at the moment, to find any detector that is sure to have both an optimal performance

suboptimal performance (which can be very close to optimal) or a non-polynomial

Linear detectors are the most simple algorithms to solve the detection problem. The most basic one is the ZF algorithm that follows a two-step process. First, the ZF

<sup>2</sup> It is widely believed that it does not exist any solution to solve NP-hard problems in polynomial time. Yet, this assumption is not proven so that there is still a possibility that such an algorithm exists and is

worst-case complexity (which can be polynomial is the average case).

. For that reason, all the following detectors have

. In this chapter, we assume

Then, SIC detectors repeat this process on the next **x**'s entry until the whole vector is received.

Even if the performance is better than with the linear detectors, the SIC process is very prone to error, given that the assumption at an iteration has an impact on all the following ones. For this reason, the simple SIC detector has quickly been replaced by a variant seeking for an optimal iteration order [7]. This variant called ordered successive interference cancelation (OSIC) aims to make the first assumption on the position that leads to the better SNR or SINR.

To select the best symbol to detect at each iteration, the OSIC detector computes the post-SNR or post-SINR for each symbol, assuming that the *k*th element is canceled using the detection matrix **T**. Most of the time, the detection matrix is chosen to be **T**ZF or **T**MMSE optionally updated after each iteration. When using the SNR criterion, the value of the *k*th post-SNR is computed as in [7, 8].

$$\text{SNR}\_k = \frac{<\mathbf{x}(k)^2>\_{\Phi} |\mathbf{T}\_k \mathbf{h}\_k|^2}{\sigma^2 \left\|\mathbf{T}\_k\right\|^2} \tag{6}$$

with **T***<sup>k</sup>* being the *k*th row of **T**, **h***<sup>k</sup>* the *k*th column of **H**, and < **x**<sup>2</sup> *<sup>k</sup>* > <sup>Φ</sup> the expected value over the constellation set. The latter term is the average signal power of the *k*th data stream that can be computed, assuming that each symbol is equiprobable, as

$$<\mathbf{x}\_k^2 >\_{\Phi} = \frac{1}{\varphi} \sum\_{\mathbf{x} \in \Phi} \mathbf{x}(k)^2 \tag{7}$$

Simultaneously, the iterative reception techniques developed for turbo codes and single-input single-output channels are adapted to MIMO systems. The goal is to receive a coded message by alternating between soft-input soft-output detector and decoder. Each algorithm uses a priori information from the other to improve its performance [10]. This method leads to one of the current most accomplished version of the PIC family: a soft-input soft-output detector to be used in iterative

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation*

*DOI: http://dx.doi.org/10.5772/intechopen.93089*

This version adds several improvements to the basic algorithm described in **Figure 3**. First, it uses the soft symbols from [12] that are defined as the expected value of the symbols knowing the a priori. The reliability of a soft symbol is

computed as its variance. Then, the parallel cancelation (see the third instruction in **Figure 3**) is performed using the soft symbols in place of the rough estimation. Finally, a last MMSE filtering is performed before the computation of the loglikelihood ratios (LLRs). Further reductions in complexity are also used, such as the max-log approximation [10, 13, 14] or the channel Gram matrix [15]. Thanks to all of these improvements, an application-specific integrated circuit (ASIC) is reported to achieve a throughput greater than 750 Mb/s with good BER performance [11].

The key idea to select between SIC and PIC detectors is to compare the relative quality of data streams. As stated earlier, SIC algorithms guess the best data stream and then process the other one based on this assumption. This process makes the

decoding with any message coding [11].

**Figure 2.**

**209**

*OSIC algorithm outline.*

**5.3 Selecting between SIC and PIC detectors**

with *φ* the number of symbols in constellation Φ.

When using the SINR criterion, the post-SINR expression becomes slightly more complex as the post-processed power of each other channel appears in the expression

$$\text{SINR}\_k = \frac{<\mathbf{x}(k)^2>\_{\Phi} \left| \mathbf{T}\_k \mathbf{h}\_k \right|^2}{\sum\_{l \neq k} < \mathbf{x}(l)^2>\_{\Phi} \left| \mathbf{T}\_l \mathbf{h}\_l \right|^2 + \sigma^2 \left\| \mathbf{T}\_k \right\|^2}. \tag{8}$$

For clarity sake, **Figure 2** sums up the OSIC detection algorithm introducing a process *t* : **H**↦*T* to build a detection matrix from a channel one. This process is most of the time the Moore-Penrose pseudoinverse or the process described in Eq. (5). We also denote by D the set of the symbol index to be decoded and by the affectation. One must note that an instruction is optional and may be skipped. If this instruction is applied, performance is increased by canceling the interference in the post-criterion computation and so is the complexity.

#### **5.2 Parallel interference cancellation (PIC) detector**

The main drawback of the OSIC algorithm is that the number of iterations grows linearly with the number of antennas. The number of stage becomes an issue for large MIMO system since each stage adds a reception delay. For that reason, a detector capable of canceling the interference for all antennas at once was developed.

The first application of such an algorithm to SDM systems dates from the early 2000s, and it is based on a few basic steps summed up in **Figure 3** as published in [9]. The main point of the PIC algorithm is to start by using a simple detector with poor performance, most of the time a linear one, and cancel the interference on all antennas at once based on the assumption. If better performance is required, it is possible to iterate the last three instructions as many times as needed by using the new detected symbol as the new assumption.

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation DOI: http://dx.doi.org/10.5772/intechopen.93089*

**Figure 2.** *OSIC algorithm outline.*

Then, SIC detectors repeat this process on the next **x**'s entry until the whole vector

*Advanced Radio Frequency Antennas for Modern Communication and Medical Systems*

tion on the position that leads to the better SNR or SINR.

Even if the performance is better than with the linear detectors, the SIC process is very prone to error, given that the assumption at an iteration has an impact on all the following ones. For this reason, the simple SIC detector has quickly been replaced by a variant seeking for an optimal iteration order [7]. This variant called ordered successive interference cancelation (OSIC) aims to make the first assump-

To select the best symbol to detect at each iteration, the OSIC detector computes

the post-SNR or post-SINR for each symbol, assuming that the *k*th element is canceled using the detection matrix **T**. Most of the time, the detection matrix is chosen to be **T**ZF or **T**MMSE optionally updated after each iteration. When using the

SNR*<sup>k</sup>* <sup>¼</sup> <sup>&</sup>lt;**x**ð Þ*<sup>k</sup>* <sup>2</sup> <sup>&</sup>gt; <sup>Φ</sup>j j **<sup>T</sup>***k***h***<sup>k</sup>*

value over the constellation set. The latter term is the average signal power of the *k*th data stream that can be computed, assuming that each symbol is equiprobable, as

When using the SINR criterion, the post-SINR expression becomes slightly more complex as the post-processed power of each other channel appears in the expression

For clarity sake, **Figure 2** sums up the OSIC detection algorithm introducing a process *t* : **H**↦*T* to build a detection matrix from a channel one. This process is most of the time the Moore-Penrose pseudoinverse or the process described in Eq. (5). We also denote by D the set of the symbol index to be decoded and by the affectation. One must note that an instruction is optional and may be skipped. If this instruction is applied, performance is increased by canceling the interference in the

The main drawback of the OSIC algorithm is that the number of iterations grows linearly with the number of antennas. The number of stage becomes an issue for large MIMO system since each stage adds a reception delay. For that reason, a detector capable of canceling the interference for all antennas at once was developed.

The first application of such an algorithm to SDM systems dates from the early 2000s, and it is based on a few basic steps summed up in **Figure 3** as published in [9]. The main point of the PIC algorithm is to start by using a simple detector with poor performance, most of the time a linear one, and cancel the interference on all antennas at once based on the assumption. If better performance is required, it is possible to iterate the last three instructions as many times as needed by using the

*<sup>k</sup>* <sup>&</sup>gt; <sup>Φ</sup> <sup>¼</sup> <sup>1</sup> *φ* X **x**∈ Φ

SINR*<sup>k</sup>* <sup>¼</sup> <sup>&</sup>lt;**x**ð Þ*<sup>k</sup>* <sup>2</sup> <sup>&</sup>gt; <sup>Φ</sup>j j **<sup>T</sup>***k***h***<sup>k</sup>*

2

2

*<sup>l</sup>*6¼*<sup>k</sup>* <sup>&</sup>lt;**x**ð Þ*<sup>l</sup>* <sup>2</sup> <sup>&</sup>gt; <sup>Φ</sup> **<sup>T</sup>***l***h***<sup>l</sup>* j j<sup>2</sup> <sup>þ</sup> *<sup>σ</sup>*<sup>2</sup>∥**T***k*∥<sup>2</sup> *:* (8)

*<sup>σ</sup>*<sup>2</sup>∥**T***k*∥<sup>2</sup> (6)

**<sup>x</sup>**ð Þ*<sup>k</sup>* <sup>2</sup> (7)

*<sup>k</sup>* > <sup>Φ</sup> the expected

SNR criterion, the value of the *k*th post-SNR is computed as in [7, 8].

with **T***<sup>k</sup>* being the *k*th row of **T**, **h***<sup>k</sup>* the *k*th column of **H**, and < **x**<sup>2</sup>

<**x**<sup>2</sup>

with *φ* the number of symbols in constellation Φ.

P

post-criterion computation and so is the complexity.

**5.2 Parallel interference cancellation (PIC) detector**

new detected symbol as the new assumption.

**208**

is received.

Simultaneously, the iterative reception techniques developed for turbo codes and single-input single-output channels are adapted to MIMO systems. The goal is to receive a coded message by alternating between soft-input soft-output detector and decoder. Each algorithm uses a priori information from the other to improve its performance [10]. This method leads to one of the current most accomplished version of the PIC family: a soft-input soft-output detector to be used in iterative decoding with any message coding [11].

This version adds several improvements to the basic algorithm described in **Figure 3**. First, it uses the soft symbols from [12] that are defined as the expected value of the symbols knowing the a priori. The reliability of a soft symbol is computed as its variance. Then, the parallel cancelation (see the third instruction in **Figure 3**) is performed using the soft symbols in place of the rough estimation. Finally, a last MMSE filtering is performed before the computation of the loglikelihood ratios (LLRs). Further reductions in complexity are also used, such as the max-log approximation [10, 13, 14] or the channel Gram matrix [15]. Thanks to all of these improvements, an application-specific integrated circuit (ASIC) is reported to achieve a throughput greater than 750 Mb/s with good BER performance [11].

#### **5.3 Selecting between SIC and PIC detectors**

The key idea to select between SIC and PIC detectors is to compare the relative quality of data streams. As stated earlier, SIC algorithms guess the best data stream and then process the other one based on this assumption. This process makes the

paradigms will be detailed after the description of the preprocessing used by all the

All tree-search-based detectors use the same preprocessing. Let be **H** ¼ **QR** the QR decomposition of the channel matrix with **Q** a unitary matrix and **R** an upper triangular one. The decomposition is computed only once per coherence block leading to a negligible overhead of the complexity per received symbol. Using the

as unitary matrices act as isometries. Thus, by exploiting the property of unitary

with **y**~≜**Q***<sup>H</sup>***y**. Computing **y**~ is the only overhead in complexity that is required

on a symbol basis as it cannot be preprocessed for the whole coherence block. The point of this QR preprocessing is that the triangularity of **R** allows to compute the objective function iteratively. Indeed, we can introduce for all *k*∈f g 1, … , *N* , **d**ð Þ*k* ≜**y**~ð Þ� *k* ð Þ **Rx** ð Þ*k* with ð Þ **Rx** ð Þ*k* being the *k*th coefficient of the

<sup>∥</sup>**y**<sup>~</sup> � **Rx**∥<sup>2</sup> <sup>¼</sup> <sup>X</sup>*<sup>n</sup>*

*j*¼1

<sup>∀</sup>*k*∈f g 1, … , *<sup>N</sup>* , **<sup>d</sup>**ð Þ¼ *<sup>k</sup>* **<sup>y</sup>**~ð Þ� *<sup>k</sup>* <sup>X</sup>*<sup>n</sup>*

With this expression, it is clear that we can compute partial estimations of the objective function and **d**ð Þ*k* coefficients while the symbol vector is built. Indeed,

*k*¼1

**<sup>d</sup>**ð Þ*<sup>k</sup>* <sup>2</sup>

**<sup>R</sup>**ð Þ *<sup>k</sup>*, *<sup>j</sup>* **<sup>x</sup>**ð Þ¼*<sup>j</sup>* <sup>X</sup>*<sup>n</sup>*

*j*¼*k*

*j*¼*k*

, this norm can be rewritten as

product *Rx*. Given this definition, the objective function is written as

<sup>∥</sup>**<sup>y</sup>** � **Hx**<sup>∥</sup> <sup>¼</sup> <sup>∥</sup>**<sup>y</sup>** � **QRx**<sup>∥</sup> <sup>¼</sup> <sup>∥</sup>**Q***<sup>H</sup>***<sup>y</sup>** � **<sup>Q</sup>***<sup>H</sup>***QRx**<sup>∥</sup> (9)

� � *and two data streams.*

∥**y** � **Hx**∥ ¼ ∥**y**~ � **Rx**∥ (10)

*:* (11)

**R**ð Þ *k*, *j* **x**ð Þ*j* (12)

**R**ð Þ *k*, *j* **x**ð Þ*j :* (13)

variants.

**Figure 4.**

**6.1 Preprocessing using QR decomposition**

*Tree view of the detection problem: Example for* Φ ¼ *s*1, *s*2, *s*3, *s*<sup>4</sup>

*DOI: http://dx.doi.org/10.5772/intechopen.93089*

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation*

Moreover, the triangularity of **R** gives

<sup>∀</sup>*k*∈f g 1, … , *<sup>N</sup>* ,ð Þ **Rx** ð Þ¼ *<sup>k</sup>* <sup>X</sup>*<sup>n</sup>*

QR decomposition, we have

matrices **<sup>Q</sup>***<sup>H</sup>* <sup>¼</sup> **<sup>Q</sup>**�<sup>1</sup>

that leads to

**211**

**Figure 3.** *PIC algorithm outline as published in [9].*

SIC algorithms very prone to error propagation. Indeed, if an assumption is wrong, the error has consequences on all the data streams to be detected. Hence, SIC detectors should be used when there is a net ranking in the quality of each data stream. A basic scenario for this would be a MIMO system receiving data from several users with different channel qualities.

On the contrary, PIC detectors process all the data streams at once so that they are more resilient to interstream error propagation. However, the parallel computations assume that every data stream is as reliable as the other. Due to this assumption, a poor-quality data stream propagates its error to the whole system. For that reason, PIC detectors are well suited when all data streams have the same quality level.

#### **6. Tree-search-based detectors**

Tree-search-based detectors are the current most investigated algorithms. They use a different framework than the linear and the interference cancelation detector. As the name suggests, the tree-search detector interprets the detection problem from Eq. (2) as the search for the best path in a tree. Tree-search-based detectors can either be optimal with a non-polynomial yet small complexity or quasi-optimal yet not optimal with a polynomial convexity.

**Figure 4** gives an example of the tree interpretation for a constellation with four symbols and two data streams. In this configuration, solving the detection problem is equivalent to find two symbols in the set Φ ¼ f g *s*1, *s*2, *s*3, *s*<sup>4</sup> that minimize the objective function. This process can be seen as finding the path in the tree that leads to the best objective function. The first tree level corresponds to the first symbol and so on for each level.

In this paradigm, the exhaustive search detector described in Section 3 computes the objective function for each leaf node and then selects the best path. Tree-searchbased detectors search for the best leaf without trying every path. This leads to three enumeration paradigms: depth-first, breadth-first, and best-first. These

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation DOI: http://dx.doi.org/10.5772/intechopen.93089*

#### **Figure 4.**

SIC algorithms very prone to error propagation. Indeed, if an assumption is wrong, the error has consequences on all the data streams to be detected. Hence, SIC detectors should be used when there is a net ranking in the quality of each data stream. A basic scenario for this would be a MIMO system receiving data from

*Advanced Radio Frequency Antennas for Modern Communication and Medical Systems*

On the contrary, PIC detectors process all the data streams at once so that they are more resilient to interstream error propagation. However, the parallel computations assume that every data stream is as reliable as the other. Due to this assumption, a poor-quality data stream propagates its error to the whole system. For that reason, PIC detectors are well suited when all data streams have the same

Tree-search-based detectors are the current most investigated algorithms. They use a different framework than the linear and the interference cancelation detector. As the name suggests, the tree-search detector interprets the detection problem from Eq. (2) as the search for the best path in a tree. Tree-search-based detectors can either be optimal with a non-polynomial yet small complexity or quasi-optimal

**Figure 4** gives an example of the tree interpretation for a constellation with four symbols and two data streams. In this configuration, solving the detection problem is equivalent to find two symbols in the set Φ ¼ f g *s*1, *s*2, *s*3, *s*<sup>4</sup> that minimize the objective function. This process can be seen as finding the path in the tree that leads to the best objective function. The first tree level corresponds to the first symbol

In this paradigm, the exhaustive search detector described in Section 3 computes the objective function for each leaf node and then selects the best path. Tree-searchbased detectors search for the best leaf without trying every path. This leads to three enumeration paradigms: depth-first, breadth-first, and best-first. These

several users with different channel qualities.

*PIC algorithm outline as published in [9].*

**6. Tree-search-based detectors**

and so on for each level.

**210**

yet not optimal with a polynomial convexity.

quality level.

**Figure 3.**

*Tree view of the detection problem: Example for* Φ ¼ *s*1, *s*2, *s*3, *s*<sup>4</sup> � � *and two data streams.*

paradigms will be detailed after the description of the preprocessing used by all the variants.

#### **6.1 Preprocessing using QR decomposition**

All tree-search-based detectors use the same preprocessing. Let be **H** ¼ **QR** the QR decomposition of the channel matrix with **Q** a unitary matrix and **R** an upper triangular one. The decomposition is computed only once per coherence block leading to a negligible overhead of the complexity per received symbol. Using the QR decomposition, we have

$$\|\mathbf{y} - \mathbf{H}\mathbf{x}\| = \|\mathbf{y} - \mathbf{Q}\mathbf{R}\mathbf{x}\| = \|\mathbf{Q}^H \mathbf{y} - \mathbf{Q}^H \mathbf{Q} \mathbf{R} \mathbf{x}\|\tag{9}$$

as unitary matrices act as isometries. Thus, by exploiting the property of unitary matrices **<sup>Q</sup>***<sup>H</sup>* <sup>¼</sup> **<sup>Q</sup>**�<sup>1</sup> , this norm can be rewritten as

$$\|\mathbf{y} - \mathbf{H}\mathbf{x}\| = \|\|\mathbf{\bar{y}} - \mathbf{R}\mathbf{x}\|\tag{10}$$

with **y**~≜**Q***<sup>H</sup>***y**. Computing **y**~ is the only overhead in complexity that is required on a symbol basis as it cannot be preprocessed for the whole coherence block.

The point of this QR preprocessing is that the triangularity of **R** allows to compute the objective function iteratively. Indeed, we can introduce for all *k*∈f g 1, … , *N* , **d**ð Þ*k* ≜**y**~ð Þ� *k* ð Þ **Rx** ð Þ*k* with ð Þ **Rx** ð Þ*k* being the *k*th coefficient of the product *Rx*. Given this definition, the objective function is written as

$$\left\|\tilde{\mathbf{y}} - \mathbf{R}\mathbf{x}\right\|^2 = \sum\_{k=1}^{n} \mathbf{d}(k)^2. \tag{11}$$

Moreover, the triangularity of **R** gives

$$\forall k \in \{1, \ldots, N\}, (\mathbf{R}\mathbf{x})(k) = \sum\_{j=1}^{n} \mathbf{R}(k, j)\mathbf{x}(j) = \sum\_{j=k}^{n} \mathbf{R}(k, j)\mathbf{x}(j) \tag{12}$$

that leads to

$$\forall k \in \{1, \ldots, N\}, \mathbf{d}(k) = \tilde{\mathbf{y}}(k) - \sum\_{j=k}^{n} \mathbf{R}(k, j)\mathbf{x}(j). \tag{13}$$

With this expression, it is clear that we can compute partial estimations of the objective function and **d**ð Þ*k* coefficients while the symbol vector is built. Indeed,

starting from the last component, Eq. (13) is fully evaluated for the *j*th position as soon as a hypothesis is made on **x**ð Þ*j* . Thus, it is possible to add a new operand in Eq. (11) and to have an idea of how promising the partial symbol vector is. The partial objective function from Eq. (11) is traditionally called the partial Euclidean distance (PED).

#### **6.2 Depth-first tree-search detection: sphere decoding**

The depth-first paradigm is the oldest one, and it is commonly known in the communication field as the sphere decoding (SD). SD is the transposition of the mathematical Fincke-Pohst algorithm in the telecommunication field [16]. The basic principle of this algorithm is to define an upper-bound for the objective function named the radius *r*<sup>2</sup> and then to use it to prune paths as early as possible. Reintroducing *x*<sup>0</sup> from Eq. (3), the upper-bound constraint gives

$$\|\mathbf{y} - \mathbf{H}\mathbf{x}\|^2 = \|\mathbf{H}(\mathbf{x}\_0 - \mathbf{x})\| \le r^2. \tag{14}$$

criterion. The max-log approximation avoids the computation of the exact LLRs by

<sup>∥</sup>**<sup>y</sup>** � **Hx**∥<sup>2</sup> � min

Thus, to compute the max-log approximation, one must know the objective function of the best leaf (i.e., one of the minimum in Eq. (15)) but also the objective function of each best counter-hypotheses (the other minimum). A path should then be pruned only if its PED is greater than the current radius *r*<sup>2</sup> and if this path cannot lead to a better counter-hypothesis. This can be implemented by adding another

One of the most advanced ASIC-implemented depth-first reported so far uses a two-dimensional Schnorr-Euchner enumeration. This implementation reaches a throughput higher than 600 Mb/s for the soft-output version and exceeds 1.2 Gb/s for the hard-output one while keeping an excellent energy efficiency [21]. The high throughput is achieved by using several SD cores in parallel to decode several

**6.3 Breadth-first tree-search detectors: K-best and M-algorithm**

overgrow, it can reduce the number of surviving paths.

voltage over the scaling method in memories [26].

Breadth-first detectors drop out of optimality for better implementability. Indeed, they address the two main issues of the depth-first paradigm: the

unpredictable complexity that depends on the SNR through *r*<sup>2</sup> and the depth-andbacktrack enumeration that prevents the use of hardware pipelines. Breadth-first detectors achieve this goal by removing the pruning criterion and always keep the same number of paths instead. At each level, the detector compares all the current paths' PED and keeps only the best ones. This number is traditionally called *K* for the K-best algorithm [22] or *M* for the M-algorithm [23]. A recent work mixed this approach with the upper-bound radius from death-first to prune even more path per level and reduce the complexity further [24]. This method converges to the breadth-first if all PED are always under the upper-bound, but if some PEDs

The restricted number of surviving paths induces that the right one can be pruned early if its PED has grown too quickly in the early levels. This is the reason for the optimality loss. Then, some detectors implement a post-detection SNR criterion to reorder the tree levels such that the most certain one is at the top. Thus, the right path is less likely to be pruned by mistake in the early stages. Thanks to this reordering, the K-best algorithm performs very well yet, not optimally in a

From a hardware implementation perspective, breadth-first tree-search detectors are very efficient. They do not require any backtrack so that an expanded node can be safely deleted as it will never be revisited. Moreover, the number of visited nodes per level is fixed. Thus, ASIC can embed the exact amount of resources required. These two characteristics allow the construction of efficient hardware pipelines that substantially increase the throughput. K-best can achieve at least the same throughput as depth-first without the need for parallel cores. Hard-output implementations exceeding 2.5 Gb/s are reported using the breadth-first paradigm [25]. Another study focused on energy efficiency designed a breadth-first variant that can handle both the channel noise and the hardware noise generated by the

!

*<sup>i</sup>* <sup>¼</sup> *<sup>x</sup>*<sup>∈</sup> <sup>Φ</sup>*<sup>N</sup>* : *bi* <sup>¼</sup> *<sup>k</sup>* � � is the set of all symbols with the *<sup>i</sup>*th bit set to *<sup>k</sup>*.

*x*∈*χ*<sup>1</sup> *i*

<sup>∥</sup>**<sup>y</sup>** � **Hx**∥<sup>2</sup>

(15)

claiming that

where *χ<sup>k</sup>*

vectors simultaneously.

mathematical sense.

**213**

*Li* <sup>≈</sup> <sup>1</sup>

*DOI: http://dx.doi.org/10.5772/intechopen.93089*

radius called the hypothetical radius constraint.

<sup>2</sup>*σ*<sup>2</sup> min *x*∈*χ*<sup>0</sup> *i*

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation*

This inequality highlights that constraining the objective function may be interpreted as looking for solution no so far from **x**0. As stated in Eq. (4), the only deviation from **x**<sup>0</sup> is due to the noise so that the choice of *r* must be adapted to the SNR. Thus, if the SNR is high, the radius can be small, while in the contrary scenario, the radius should be increased so that there is at least one vector **x** satisfying the constraint from Eq. (14).

In the remainder of this section, we assume that *r*<sup>2</sup> is adequately chosen and that there is at least a solution. As stated in Section 6.1, the QR decomposition allows us to compute a PED at each level. As the PED is a sum of squares, it can only increase during the decoding process. Thus, if at some point, a PED violates the constraint from Eq. (14), then all the vectors build upon this partial solution are bound to be infeasible. From a tree-search perspective, this means that if a node already breaks the constraint, all its children will do the same. Thus, all paths starting from this node can be pruned without performance loss.

The SD is referred to as a depth-first detector as starting from the root node, it goes as depth as possible until it reaches a leaf or violates Eq. (14). If a leaf is reached, it is compared to the best leaf so far and saved if it is the new best leaf. If Eq. (14) is violated, the SD algorithm backtracks and explores a new path. When all paths are either completed or pruned, the result is the best leaf reached.

The Schnorr-Euchner (SE) enumeration is another depth-first enumeration known to perform better by using a lattice reduction method [17, 18]. The basic idea is to explore the node's children by the increasing order of their PED. This is particularly useful when using the radius reduction technique that sets an infinite *r*<sup>2</sup> at the beginning and then updates it to the best objective function for a leaf encounter so far.

The SD algorithm and its SE version are optimal as they ensure to find the exact solution of the detection problem. Indeed, the best leaf is obviously the best leaf among all the completed paths, and the pruned paths cannot lead to a better point due to their already worst PED. The NP-hardness argument detailed in Section 3 implies that SD has a non-polynomial worst-case complexity. Moreover, SD expected complexity is also non-polynomial even if the exponential growth is slow enough to compete with polynomial detectors under certain circumstances [19].

A very efficient soft-input soft-output depth-first algorithm is the single tree-search sphere decoding (STS-SD) [20]. To produce its soft-output, it uses the max-log approximation [10, 13, 14] and makes some changes on the pruning

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation DOI: http://dx.doi.org/10.5772/intechopen.93089*

criterion. The max-log approximation avoids the computation of the exact LLRs by claiming that

$$L\_i \approx \frac{1}{2\sigma^2} \left( \min\_{\mathbf{x} \in \chi\_i^0} \left\| \mathbf{y} - \mathbf{H}\mathbf{x} \right\|^2 - \min\_{\mathbf{x} \in \chi\_i^1} \left\| \mathbf{y} - \mathbf{H}\mathbf{x} \right\|^2 \right) \tag{15}$$

where *χ<sup>k</sup> <sup>i</sup>* <sup>¼</sup> *<sup>x</sup>*<sup>∈</sup> <sup>Φ</sup>*<sup>N</sup>* : *bi* <sup>¼</sup> *<sup>k</sup>* � � is the set of all symbols with the *<sup>i</sup>*th bit set to *<sup>k</sup>*. Thus, to compute the max-log approximation, one must know the objective function of the best leaf (i.e., one of the minimum in Eq. (15)) but also the objective function of each best counter-hypotheses (the other minimum). A path should then be pruned only if its PED is greater than the current radius *r*<sup>2</sup> and if this path cannot lead to a better counter-hypothesis. This can be implemented by adding another radius called the hypothetical radius constraint.

One of the most advanced ASIC-implemented depth-first reported so far uses a two-dimensional Schnorr-Euchner enumeration. This implementation reaches a throughput higher than 600 Mb/s for the soft-output version and exceeds 1.2 Gb/s for the hard-output one while keeping an excellent energy efficiency [21]. The high throughput is achieved by using several SD cores in parallel to decode several vectors simultaneously.

#### **6.3 Breadth-first tree-search detectors: K-best and M-algorithm**

Breadth-first detectors drop out of optimality for better implementability. Indeed, they address the two main issues of the depth-first paradigm: the unpredictable complexity that depends on the SNR through *r*<sup>2</sup> and the depth-andbacktrack enumeration that prevents the use of hardware pipelines. Breadth-first detectors achieve this goal by removing the pruning criterion and always keep the same number of paths instead. At each level, the detector compares all the current paths' PED and keeps only the best ones. This number is traditionally called *K* for the K-best algorithm [22] or *M* for the M-algorithm [23]. A recent work mixed this approach with the upper-bound radius from death-first to prune even more path per level and reduce the complexity further [24]. This method converges to the breadth-first if all PED are always under the upper-bound, but if some PEDs overgrow, it can reduce the number of surviving paths.

The restricted number of surviving paths induces that the right one can be pruned early if its PED has grown too quickly in the early levels. This is the reason for the optimality loss. Then, some detectors implement a post-detection SNR criterion to reorder the tree levels such that the most certain one is at the top. Thus, the right path is less likely to be pruned by mistake in the early stages. Thanks to this reordering, the K-best algorithm performs very well yet, not optimally in a mathematical sense.

From a hardware implementation perspective, breadth-first tree-search detectors are very efficient. They do not require any backtrack so that an expanded node can be safely deleted as it will never be revisited. Moreover, the number of visited nodes per level is fixed. Thus, ASIC can embed the exact amount of resources required. These two characteristics allow the construction of efficient hardware pipelines that substantially increase the throughput. K-best can achieve at least the same throughput as depth-first without the need for parallel cores. Hard-output implementations exceeding 2.5 Gb/s are reported using the breadth-first paradigm [25]. Another study focused on energy efficiency designed a breadth-first variant that can handle both the channel noise and the hardware noise generated by the voltage over the scaling method in memories [26].

starting from the last component, Eq. (13) is fully evaluated for the *j*th position as soon as a hypothesis is made on **x**ð Þ*j* . Thus, it is possible to add a new operand in Eq. (11) and to have an idea of how promising the partial symbol vector is. The partial objective function from Eq. (11) is traditionally called the partial Euclidean

*Advanced Radio Frequency Antennas for Modern Communication and Medical Systems*

The depth-first paradigm is the oldest one, and it is commonly known in the communication field as the sphere decoding (SD). SD is the transposition of the mathematical Fincke-Pohst algorithm in the telecommunication field [16]. The basic principle of this algorithm is to define an upper-bound for the objective function named the radius *r*<sup>2</sup> and then to use it to prune paths as early as possible.

<sup>∥</sup>**<sup>y</sup>** � **Hx**∥<sup>2</sup> <sup>¼</sup> <sup>∥</sup>**H x**ð Þ <sup>0</sup> � **<sup>x</sup>** ∥ ≤ *<sup>r</sup>*

This inequality highlights that constraining the objective function may be interpreted as looking for solution no so far from **x**0. As stated in Eq. (4), the only deviation from **x**<sup>0</sup> is due to the noise so that the choice of *r* must be adapted to the SNR. Thus, if the SNR is high, the radius can be small, while in the contrary scenario, the radius should be increased so that there is at least one vector **x**

In the remainder of this section, we assume that *r*<sup>2</sup> is adequately chosen and that there is at least a solution. As stated in Section 6.1, the QR decomposition allows us to compute a PED at each level. As the PED is a sum of squares, it can only increase during the decoding process. Thus, if at some point, a PED violates the constraint from Eq. (14), then all the vectors build upon this partial solution are bound to be infeasible. From a tree-search perspective, this means that if a node already breaks the constraint, all its children will do the same. Thus, all paths starting from this

The SD is referred to as a depth-first detector as starting from the root node, it

The Schnorr-Euchner (SE) enumeration is another depth-first enumeration known to perform better by using a lattice reduction method [17, 18]. The basic idea is to explore the node's children by the increasing order of their PED. This is particularly useful when using the radius reduction technique that sets an infinite *r*<sup>2</sup>

The SD algorithm and its SE version are optimal as they ensure to find the exact solution of the detection problem. Indeed, the best leaf is obviously the best leaf among all the completed paths, and the pruned paths cannot lead to a better point due to their already worst PED. The NP-hardness argument detailed in Section 3 implies that SD has a non-polynomial worst-case complexity. Moreover, SD expected complexity is also non-polynomial even if the exponential growth is slow enough to compete with polynomial detectors under certain circumstances [19]. A very efficient soft-input soft-output depth-first algorithm is the single tree-search sphere decoding (STS-SD) [20]. To produce its soft-output, it uses the max-log approximation [10, 13, 14] and makes some changes on the pruning

goes as depth as possible until it reaches a leaf or violates Eq. (14). If a leaf is reached, it is compared to the best leaf so far and saved if it is the new best leaf. If Eq. (14) is violated, the SD algorithm backtracks and explores a new path. When all

paths are either completed or pruned, the result is the best leaf reached.

at the beginning and then updates it to the best objective function for a leaf

2

*:* (14)

**6.2 Depth-first tree-search detection: sphere decoding**

satisfying the constraint from Eq. (14).

node can be pruned without performance loss.

encounter so far.

**212**

Reintroducing *x*<sup>0</sup> from Eq. (3), the upper-bound constraint gives

distance (PED).

Breadth-first detectors can provide soft-output using the max-log approximation and a list. This list approach is used by several detectors, including some other tree-search algorithm and detectors from other families. The point of this approach is to produce a list Γ of point associated with their objective function and to approximate Eq. (15) as

$$L\_i \approx \frac{1}{2\sigma^2} \left( \min\_{\mathbf{x} \in \Gamma \cap \mathbb{X}\_i^0} \|\mathbf{y} - \mathbf{H}\mathbf{x}\|^2 - \min\_{\mathbf{x} \in \Gamma \cap \mathbb{X}\_i^1} \|\mathbf{y} - \mathbf{H}\mathbf{x}\|^2 \right). \tag{16}$$

optimization. It is trained for both static channel (**H** is fixed) and on a time-varying channel (the same condition as previously). As the errors are sometimes unavoidable due to a bad channel realization, the loss function should not be the objective

log ð Þ*<sup>k</sup>* <sup>∥</sup>**<sup>x</sup>** � **<sup>x</sup>***k*<sup>∥</sup>

with **x**ZF the result of ZF detection and **x***<sup>k</sup>* the detected symbol of the *k*th layer. Then, the normalization with the ZF result avoids over-penalizing the situation with bad **H** realization. Moreover, the logarithm weights the result from each layer

The most studied bioinspired decoders fall into two categories: ant colony optimizations (ACO) and particle swarm optimizations (PSO) that include the firefly algorithm (FA). These techniques are often very complex compared to the previously described algorithms, but they claim to be resilient to challenging conditions. Bioinspired algorithms should be able to decode messages with imperfect CSI, or

ACO-based detectors simulate several ants that choose a path randomly to follow with a nonuniform probability function [31]. Each antenna is processed independently. At each iteration, an ant selects the symbol *s*∈ Φ with the probability

> *s ηβ* <sup>P</sup> *<sup>s</sup> <sup>s</sup>*<sup>∈</sup> <sup>Φ</sup>*τα s η β s*

with *τ<sup>s</sup>* the pheromone level on the path, *η<sup>s</sup>* an image of the objective function, most of the time through a log-sigmoid function, and ð Þ *α*, *β* the two parameters that balance the relative importance of each term. After each iteration, the pheromone level is updated according to the following principle: the better the objective function the ant achieves, the more pheromone it dropped off. Thus, the ant selects more often the path that seems more promising regarding the objective function and the previous runs while preserving some chance of exploring a new path. FA-based detectors simulate several fireflies that try to find the best mating partner. The objective function determines the attractiveness of a firefly. Then a firefly goes toward more attractive congeners biased with a random influence to promote exploration [32]. Some FA variant implements a memory effect that makes it even closer to a PSO-based algorithm [33]. This framework is applied to MIMO detection using the QR decomposition described in Section 6.1. The FA represents each symbol to decode as a nest containing as many fireflies as the constellation size. Thus, the fireflies have to select a partner in each nest from the last symbol to the first based on the biased attractiveness. When the firefly population searched all the nests, the best path represents the decoded symbol vector. FA-based detectors can be related to tree-search-based algorithm with a randomness exploration and a

Geometrical detectors are based on a two-step process: an exploration to find a small set of promising solutions and an exploitation to improve this set at a small cost. It follows the traditional approach in nonconvex optimization to perform

*ps* <sup>¼</sup> *<sup>τ</sup><sup>α</sup>*

<sup>∥</sup>**<sup>x</sup>** � **<sup>x</sup>**ZF<sup>∥</sup> (17)

(18)

function. Thus, the DetNet designers opt for a

*DOI: http://dx.doi.org/10.5772/intechopen.93089*

to give more credit to the final ones.

**7.2 Bioinspired detectors**

the data streams are correlated.

fixed number of path allowed.

**7.3 Geometrical detectors**

**215**

X *L*

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation*

*k*¼1

For most breadth-first algorithms, Γ is the list of all completed paths [22, 25].

#### **6.4 Best-first tree-search detector: fast descent tree-search and parallel tree-search**

Best-first detectors are also sometimes called metric-first. The basic idea besides this enumeration is to not favor depth or breadth over each other. Instead, the node with the best PED is expanded, regardless of its level. The best-first algorithm keeps a node pool with nodes to be expanded. First, the pool is initialized with the root node. Then, at each iteration, the node with the lower PED is popped out from the pool, and all its children are computed and pushed in the pool unless they are leaves. If they are, a comparison with the best leaf so far allows us to keep track of the best result. When a leaf is reached, its objective function may be used as an upper-bound to prune the pool for each node with a PED higher than the new reference. The detection ends when the pool is empty.

This simple method quickly overfills the pool as several nodes are added when only one is popped out. Rather than providing a huge pool to contain all the nodes, improvement is to convert the *φ*-ary tree (with *φ* the constellation size) to a first-child next-sibling binary tree [27]. This method is called the modified best-first algorithm (MBF). With this variant, the only nodes added in the pool after an expansion are the best child and the best yet-to-visit siblings. Then, the growth rate of the pool size is controlled. However, this method struggles to provide a full path solution quickly as the popped out node is the one with the lower PED that is often close to the root. To solve this issue, a variant implementation called MBF fast descent (MBF-FD) changes the expansion rule. When a node is expanded, the process goes through the best child until reaching a leaf, pushing in all best siblings along the way [28].

Recently, a best-first algorithm ASIC is reported to reach 800 Mb/s in a softinput soft-output framework [29]. The modified algorithm, called cross-level parallel tree-search, splits the pool node into several pools, one per level. At each iteration, a node from each pool in popped out expanded using the best-child/bestsibling framework, and the new nodes are pushed in the according pool. Moreover, the presented detector prune nodes in each pool using the upper-bound to keep only the one that can improve the result or the counter-hypothesis (see Section 6.2 for details). The slit pool helps the parallelization process so that this algorithm variant is very suitable for hardware implementation.

#### **7. Other unusual detectors: bioinspired and geometrical detectors**

#### **7.1 Deep neural MIMO detection: learning to detect**

The rise of deep learning leads to the search for an efficient neural network to solve the detection problem such as DetNet [30]. This network is inspired by the projected gradient descent algorithm that is a major option to solve convex

optimization. It is trained for both static channel (**H** is fixed) and on a time-varying channel (the same condition as previously). As the errors are sometimes unavoidable due to a bad channel realization, the loss function should not be the objective function. Thus, the DetNet designers opt for a

$$\sum\_{k=1}^{L} \log\left(k\right) \frac{\|\mathbf{x} - \mathbf{x}\_k\|}{\|\mathbf{x} - \mathbf{x}\_{\text{ZF}}\|} \tag{17}$$

with **x**ZF the result of ZF detection and **x***<sup>k</sup>* the detected symbol of the *k*th layer. Then, the normalization with the ZF result avoids over-penalizing the situation with bad **H** realization. Moreover, the logarithm weights the result from each layer to give more credit to the final ones.

#### **7.2 Bioinspired detectors**

Breadth-first detectors can provide soft-output using the max-log approximation and a list. This list approach is used by several detectors, including some other tree-search algorithm and detectors from other families. The point of this approach

<sup>∥</sup>**<sup>y</sup> Hx**∥<sup>2</sup> min

For most breadth-first algorithms, Γ is the list of all completed paths [22, 25].

Best-first detectors are also sometimes called metric-first. The basic idea besides this enumeration is to not favor depth or breadth over each other. Instead, the node with the best PED is expanded, regardless of its level. The best-first algorithm keeps a node pool with nodes to be expanded. First, the pool is initialized with the root node. Then, at each iteration, the node with the lower PED is popped out from the pool, and all its children are computed and pushed in the pool unless they are leaves. If they are, a comparison with the best leaf so far allows us to keep track of the best result. When a leaf is reached, its objective function may be used as an upper-bound to prune the pool for each node with a PED higher than the new reference. The

This simple method quickly overfills the pool as several nodes are added when only one is popped out. Rather than providing a huge pool to contain all the nodes, improvement is to convert the *φ*-ary tree (with *φ* the constellation size) to a first-child next-sibling binary tree [27]. This method is called the modified best-first algorithm (MBF). With this variant, the only nodes added in the pool after an expansion are the best child and the best yet-to-visit siblings. Then, the growth rate of the pool size is controlled. However, this method struggles to provide a full path solution quickly as the popped out node is the one with the lower PED that is often close to the root. To solve this issue, a variant implementation called MBF fast descent (MBF-FD) changes the expansion rule. When a node is expanded, the process goes through the best child

Recently, a best-first algorithm ASIC is reported to reach 800 Mb/s in a softinput soft-output framework [29]. The modified algorithm, called cross-level parallel tree-search, splits the pool node into several pools, one per level. At each iteration, a node from each pool in popped out expanded using the best-child/bestsibling framework, and the new nodes are pushed in the according pool. Moreover, the presented detector prune nodes in each pool using the upper-bound to keep only the one that can improve the result or the counter-hypothesis (see Section 6.2 for details). The slit pool helps the parallelization process so that this algorithm

**7. Other unusual detectors: bioinspired and geometrical detectors**

The rise of deep learning leads to the search for an efficient neural network to solve the detection problem such as DetNet [30]. This network is inspired by the projected gradient descent algorithm that is a major option to solve convex

until reaching a leaf, pushing in all best siblings along the way [28].

variant is very suitable for hardware implementation.

**7.1 Deep neural MIMO detection: learning to detect**

**214**

!

*x*∈Γ∩*χ*<sup>1</sup> *i*

<sup>∥</sup>**<sup>y</sup> Hx**∥<sup>2</sup>

*:* (16)

is to produce a list Γ of point associated with their objective function and to

*Advanced Radio Frequency Antennas for Modern Communication and Medical Systems*

**6.4 Best-first tree-search detector: fast descent tree-search and parallel**

approximate Eq. (15) as

**tree-search**

*Li* <sup>≈</sup> <sup>1</sup>

detection ends when the pool is empty.

<sup>2</sup>*σ*<sup>2</sup> min *x*∈Γ∩*χ*<sup>0</sup> *i*

> The most studied bioinspired decoders fall into two categories: ant colony optimizations (ACO) and particle swarm optimizations (PSO) that include the firefly algorithm (FA). These techniques are often very complex compared to the previously described algorithms, but they claim to be resilient to challenging conditions. Bioinspired algorithms should be able to decode messages with imperfect CSI, or the data streams are correlated.

> ACO-based detectors simulate several ants that choose a path randomly to follow with a nonuniform probability function [31]. Each antenna is processed independently. At each iteration, an ant selects the symbol *s*∈ Φ with the probability

$$p\_s = \frac{\tau\_s^a \eta\_s^\beta}{\sum\_{s \in \Phi} \tau\_s^a \eta\_s^\beta} \tag{18}$$

with *τ<sup>s</sup>* the pheromone level on the path, *η<sup>s</sup>* an image of the objective function, most of the time through a log-sigmoid function, and ð Þ *α*, *β* the two parameters that balance the relative importance of each term. After each iteration, the pheromone level is updated according to the following principle: the better the objective function the ant achieves, the more pheromone it dropped off. Thus, the ant selects more often the path that seems more promising regarding the objective function and the previous runs while preserving some chance of exploring a new path.

FA-based detectors simulate several fireflies that try to find the best mating partner. The objective function determines the attractiveness of a firefly. Then a firefly goes toward more attractive congeners biased with a random influence to promote exploration [32]. Some FA variant implements a memory effect that makes it even closer to a PSO-based algorithm [33]. This framework is applied to MIMO detection using the QR decomposition described in Section 6.1. The FA represents each symbol to decode as a nest containing as many fireflies as the constellation size. Thus, the fireflies have to select a partner in each nest from the last symbol to the first based on the biased attractiveness. When the firefly population searched all the nests, the best path represents the decoded symbol vector. FA-based detectors can be related to tree-search-based algorithm with a randomness exploration and a fixed number of path allowed.

#### **7.3 Geometrical detectors**

Geometrical detectors are based on a two-step process: an exploration to find a small set of promising solutions and an exploitation to improve this set at a small cost. It follows the traditional approach in nonconvex optimization to perform

simple descents that can be stuck in local optimums from several starting points. Geometrical detectors use a real-valued model and the singular value decomposition (SVD) rather than the QR one [34]. Let us rewrite the objective function by introducing the SVD of **<sup>H</sup>** <sup>¼</sup> **UDV***<sup>T</sup>* with **<sup>U</sup>** and **<sup>V</sup>** two orthogonal matrices and **<sup>D</sup>** the diagonal matrix containing the singular values *λ<sup>i</sup>* f g : 1≤*i* ≤*n* in ascending order.

Consequently, the objective function can be rewritten as

$$\|\mathbf{y} - \mathbf{H}\mathbf{x}\|^2 = \left(\mathbf{V}^T(\mathbf{x} - \mathbf{x}\_0)\right)^T \mathbf{D}\mathbf{U}^T \mathbf{U} \mathbf{D} \left(\mathbf{V}^T(\mathbf{x} - \mathbf{x}\_0)\right) \tag{19}$$

using **x**<sup>0</sup> from Eq. (3). As the vectors of **V**, named **v***<sup>i</sup>* f g : 1≤ *i*≤ *n* , constitute a basis, we can define *α<sup>i</sup>* f g : 1≤*i* ≤*n* the coordinates of **x** � **x**<sup>0</sup> on this basis. Using the orthogonality of **U** and **V** and the diagonality of **D**, Eq. (19) leads to

$$\left\|\mathbf{y} - \mathbf{H}\mathbf{x}\right\|^2 = \sum\_{i=1}^{n} a\_i^2 \lambda\_i^2. \tag{20}$$

**8. Conclusions and summary**

*DOI: http://dx.doi.org/10.5772/intechopen.93089*

according to the BER-complexity trade-off.

decoders to exploit channel codings better.

area, in particular on geometrical detectors.

ACO ant colony optimization

FA firefly algorithm

LLR log-likelihood ratio MBF modified best-first

ML maximum likelihood

CDMA code-division multiple access CLPS closest lattice-point search CSI channel state information

FPGA field-programmable gate array

MBF-FD modified best-first fast descent

MMSE minimum mean-square error

PED partial Euclidean distance PIC parallel interference cancelation

OSIC ordered successive interference cancelation

ASIC application-specific integrated circuit

**Conflict of interest**

**Nomenclature**

**Abbreviations**

**217**

MIMO detection is a well-studied problem that has been tackled from several perspectives. The mathematical interpretation, as a combinatorial optimization problem, leads to the optimal and linear detectors. From the signal processing perspective, detecting a signal means improving the SNR or SINR so that the direct answer is to cancel the interference and to remove the noise. From an algorithmic perspective, the detection problem is the search for the best path in a weighted tree that relies on some well-known algorithms. Other sources of inspiration, such as nature or geometry, provide some interesting perspectives. These paradigms and the associated detectors are summed up in **Table 1**, and we compare all of them

*A Review to Massive MIMO Detection Algorithms: Theory and Implementation*

All these perspectives shed a different light on the problem, leading to fruitful experimentation. Indeed, some methods take inspiration from others to keep on improving. Therefore, some improvement axes remain open, for instance, the permanent decrease of complexity with equal performance, the development for efficient hardware implementations, or the optimization of the interaction with

The authors declare no conflict of interest but to research and publish in this

Memoryless the channel outputs only depend on its state and its inputs. Linear the channel outputs are linear combinations of the inputs.

Flat the channel coherence bandwidth is large enough to consider that all frequencies experience the same transfer function.

Block fading the channel states vary slow enough to be considered constant over many symbols named coherence block.

Let Δ*<sup>i</sup>* be the straight line passing through *x*<sup>0</sup> and directed by **v***i*. One can note that Eq. (20) highlights that the objective function grows more slowly along the first Δ*<sup>i</sup>* rather than along the last ones so that promising points must be around these first straight lines. Then the solution is most likely to be found along this line. The geometrical exploration step is then performed, selecting some points near the first Δ*i*. Then a straightforward descent algorithm is performed by looking for the best point in the close neighborhood until convergence.

A soft-output version of this algorithm is possible using the max-log approximation and the list approach detailed in [35], Section 5.2. A field-programmable gate array (FPGA) implementation has recently been proposed. This groundwork points out that geometrical detectors may achieve good performance in the future yet being far from mature at that point [36].


#### **Table 1.** *Summary of all detectors described in this chapter.*
