**8.3. Case study: 3-user antenna selection interference alignment**

An implementation of the antenna selection interference alignment algorithm presented in Section 3 has been chosen as a case study using the development framework presented in Section 8.2. The proposed 3-user 2x2 MIMO zero-forcing IA antenna selection algorithm computes precoding matrices **V**, decoding matrices **U** and a metric *η* based on [4]. Compared to the experimental testbed for fixed antenna patterns presented in [13], our implementation also chooses a subset of channels (i.e. antennas or radiation patterns) from the available channels. This leads to an increased channel orthogonality for the chosen channels at a reduced number of RF front ends.

The problem of finding the optimum antenna combination ˆ*i* from a set of *I* combinations can be formulated as

$$\hat{i} = \underset{i=1\ldots I}{\text{arg}\max} \sum\_{k=1}^{K} \eta(\mathbf{V}\_{k,i\prime} \mathbf{U}\_{k,i})\tag{29}$$

where *η*(**V**, **U**) is a function of the resulting SNR according to Section 3.1. **V***k*,*<sup>i</sup>* and **U***k*,*<sup>i</sup>* are the precoding and decoding matrices of user *k* for a given antenna combination *i*. Equation 29 is solved by visiting all *I* antenna combinations.

### *8.3.1. Computational complexity*

14 Will-be-set-by-IN-TECH

Certain parameters are of special interest in the domain of wireless communication platforms. The limited power budget in mobile devices puts hard constraints on the power efficiency, requiring power optimization across all layers of algorithm development, design

Deriving comprehensive cost models using Monte-Carlo methods requires visiting a significantly larger number of points in the design space compared to existing heuristically driven parameter optimization approaches covered by existing FPGA-based simulation acceleration systems. The achievable simulation speedup is a key factor enabling the characterization and optimization of complex communication systems using Monte-Carlo approaches which are infeasible for pure software simulation due to the large required stimuli

The FPGA-based hybrid hardware-in-the-loop research and design space exploration (DSE) framework created in this work combines high-level tools (e.g. MATLAB/Simulink) and optimized hardware blocks [17]. Its application domain ranges from the design, optimization and verification of efficient and optimized signal processing blocks for computationally demanding next-generation wireless communication systems to system characterization and

> **,\$ 0,02 )(&**

**'HEXJ**

The framework consists of a host PC, FPGA-based emulation systems, a generic fully synthesizeable VHDL SoC infrastructure, dedicated processors, processor softcores and a software library providing a transparent communication application programming interface (API). This allows signal processing blocks to be split and run distributed on a highly heterogeneous signal processing system. Software API libraries provide unified transparent communication between MATLAB, C/C+, embedded software and the hardware on-chip multilayer bus system. The same resources are accessible from all components, enabling a flexible partitioning and migration of processing task between high-level software, embedded software and dedicated hardware modules. The framework block diagram is shown in Figure 10. The properties of the optimized on-chip infrastructure template make it suitable for usage in final ASIC targets and thus enable the test, debugging and characterization of signal processing blocks in their target environment. Using standard FPGA design flows, new computationally intensive processing cores are directly implemented as the optimized hardware target modules. Instrumentation is used to enable dynamic, software controlled parameter adjustment. The remaining blocks may continue to run as high-level models,

**\$FFHOHUDWRU**

(WKHUQHW **(WKHUQHW,)**

**)3\*\$**

**9/,: 6RIWFRUH**

**&RQWURO 3URFHVVRU**

> **''5 6'5\$0**

> **''5 6'5\$0**

**(PXODWLRQ**

**65\$0**

**2&3 6ZLWFK**

**/(21 6RIWFRUH**

implementation and semiconductor technology.

**0DWODE 7DVN 7DVNQ 6: 3&**

**&RPPXQLFDWLRQ/D\HU 0DWODE,) &&,)**

> **(WKHUQHW 3+<** \*LJDELW

**Figure 10.** Emulation framework block diagram

**8.2. Development framework**

sets.

DSE.

The resource requirements of an optimized efficient integer implementation of the proposed novel antenna-selection IA algorithm is presented in this section, based on FPGA implementation results. Target systems include SDR platforms, FPGAs and ASICs.

This section focuses on the costs of the 3-user 2x2 MIMO processing consisting of matrix inversions, matrix multiplications, eigenvector computation and normalization, see Equations 6 to 9. The metric *η* is computed for both eigenvectors. All intermediate matrices can be independently scaled by arbitrary scalars without affecting the antenna decision or **V** and **U**. Exploiting this makes the cost of all involved 2x2 matrix inversions negligible and allows intermediate matrices to be block-normalized by shifting, i.e. extract a common power of 2 from all matrix elements. This results in reduced integer word lengths and thus reduced hardware costs. Table 1 summarizes the number of required real-valued mathematical base operations for antenna selection and the computation of **V** and **U** per antenna combination and subcarrier, without a final normalization step of **V** and **U**. Complex multiplications are composed of three real multiplications, three additions and two subtractions, INVSQRT denotes the reciprocal square root [26].

For the configuration above, the original MATLAB algorithm takes 3.63 s on an Intel Xeon 2.4 GHz CPU running MATLAB R2012a for the computation of the optimal antenna combination <sup>ˆ</sup>*<sup>i</sup>* and its corresponding precoding and decoding matrices *Vk* and *Uk* from a set of channel information *H*. The FPGA implementation created in this case study achieves realtime operation, requiring 380 *μ*s at 100 MHz clock frequency on a Xilinx Virtex-6 LX550T

The implementation presented in the previous section is a based on a closed-form *K* = 3 user IA algorithm. There is no known closed-form solution for *K* > 3 users, but iterative algorithms exist. In this section, we present implementation complexity estimates of the minimum mean

> 2x2 3x3 4x4 5x5 6x6 7x7 8x8 9x9 10x10 11x11 12x12 13x13 14x14 15x15 16x16

Interference Alignment for UWB-MIMO Communication Systems 149

User

**H***kk***V***<sup>k</sup>* (31)

*kk***U***<sup>k</sup>* (32)

<sup>2</sup> ≤ 1 by Newton

0 5 10 15 20 25 30 35

The MMSE-IA algorithm starts with arbitrary precoding matrices **V***k*, then iteratively updates the decoding and precoding matrices **U***<sup>k</sup>* and **V***<sup>k</sup>* according to Eq. (31) and (32) until

*<sup>j</sup>* **<sup>H</sup>***<sup>H</sup>*

The number of required iterations is data-dependent. Each iteration step requires the following operations to be executed: matrix multiplication, pseudo-inverse, Newton

*kj* <sup>+</sup> *<sup>σ</sup>*2**<sup>I</sup>**

*<sup>j</sup>* **H***jk* + *λk***I**

⎞ ⎠

⎞ ⎠

−1

−1 **H***<sup>H</sup>*

**Figure 12.** Number of operations for iterative MMSE interference alignment (4 iterations)

convergence. The Lagrange multiplier *<sup>λ</sup><sup>k</sup>* <sup>≥</sup> 0 is computed to satisfy �**V***k*�<sup>2</sup>

**H***kj***V***j***V***<sup>H</sup>*

**U***<sup>k</sup>* =

**V***<sup>k</sup>* =

⎛ ⎝ *K* ∑ *j*=1

⎛ ⎝ *K* ∑ *j*=1 **H***<sup>H</sup> jk***U***j***U***<sup>H</sup>*

FPGA in a BEE4 emulation system. Thus, the achieved speedup is 9553.

**8.4. Cost functions for** *K***-user IA**

1E+04

iteration.

1E+05

1E+06

1E+07

1E+08

1E+09

Operations

square error (MMSE) IA algorithm presented in [27].


**Table 1.** Operation counts for the computation of *η* per antenna combination *i* and subcarrier

To keep the total transmit power constant, the chosen antenna combination's precoding matrices *V* need to be normalized, resulting in 3 ADD, 8 MUL and 1 INVSQRT additional operations #OPN per transmitter and subcarrier. The above analysis implies that in general, the implementation cost is dominated by the multiplications in terms of silicon area and power consumption.

For the case of *K* = 3 users with *M* = 2 active transmit antennas used out of *L* = 3 physical antennas per transmitter and *N* = 2 antennas per receiver, there are 27 antenna combinations to be visited per subcarrier.

For realtime operation, the maximum allowable latency is defined to be *T*0. Assigning relative operation costs *α<sup>i</sup>* to each operation type OP*i*, the total computational cost *C* for *S* subcarriers becomes

$$\mathbf{C} = \frac{\mathbf{S}}{T\_0} \cdot \left( n \cdot \sum\_{i \in \text{OP}} \alpha\_i \cdot \#\text{OPC}\_i + \mathbf{K} \cdot \sum\_{i \in \text{OP}} \alpha\_i \cdot \#\text{OPN}\_i \right) \tag{30}$$

### *8.3.2. Hardware cost estimation*

Using *α* as relative silicon area costs, the total silicon area implementation cost of an architecture without resource sharing can be estimated from Eq. (30). The relative area *α* of 16-bit arithmetic operations for an ASIC implementation based on [23] results in the values given in Table 2. The relative costs *α*MUL of a multiplier are defined to be 1.

$$\begin{array}{c c c c} \text{OP} & \text{ADD MUL SQRT} & \text{TINSQRT} \\ \hline \alpha & 0.108 & 1 & 1.73 & 3 \\ \end{array}$$

**Table 2.** Relative silicon area costs of 16-bit arithmetic operations

For a system using antenna selection at the transmitter only with *L* = 3 antennas, *S* = 128 subcarriers and *T*<sup>0</sup> = 1 ms, the total IA costs are estimated to be *C* = 1.875 GOPS.

For the configuration above, the original MATLAB algorithm takes 3.63 s on an Intel Xeon 2.4 GHz CPU running MATLAB R2012a for the computation of the optimal antenna combination <sup>ˆ</sup>*<sup>i</sup>* and its corresponding precoding and decoding matrices *Vk* and *Uk* from a set of channel information *H*. The FPGA implementation created in this case study achieves realtime operation, requiring 380 *μ*s at 100 MHz clock frequency on a Xilinx Virtex-6 LX550T FPGA in a BEE4 emulation system. Thus, the achieved speedup is 9553.

### **8.4. Cost functions for** *K***-user IA**

16 Will-be-set-by-IN-TECH

This section focuses on the costs of the 3-user 2x2 MIMO processing consisting of matrix inversions, matrix multiplications, eigenvector computation and normalization, see Equations 6 to 9. The metric *η* is computed for both eigenvectors. All intermediate matrices can be independently scaled by arbitrary scalars without affecting the antenna decision or **V** and **U**. Exploiting this makes the cost of all involved 2x2 matrix inversions negligible and allows intermediate matrices to be block-normalized by shifting, i.e. extract a common power of 2 from all matrix elements. This results in reduced integer word lengths and thus reduced hardware costs. Table 1 summarizes the number of required real-valued mathematical base operations for antenna selection and the computation of **V** and **U** per antenna combination and subcarrier, without a final normalization step of **V** and **U**. Complex multiplications are composed of three real multiplications, three additions and two subtractions, INVSQRT

> OP ADD MUL SQRT INVSQRT Matrix mult. 696 348 0 0 Eigenvectors 15 8 3 0 Metric score 46 82 6 2 #OPC 757 438 9 2

To keep the total transmit power constant, the chosen antenna combination's precoding matrices *V* need to be normalized, resulting in 3 ADD, 8 MUL and 1 INVSQRT additional operations #OPN per transmitter and subcarrier. The above analysis implies that in general, the implementation cost is dominated by the multiplications in terms of silicon area and

For the case of *K* = 3 users with *M* = 2 active transmit antennas used out of *L* = 3 physical antennas per transmitter and *N* = 2 antennas per receiver, there are 27 antenna combinations

For realtime operation, the maximum allowable latency is defined to be *T*0. Assigning relative operation costs *α<sup>i</sup>* to each operation type OP*i*, the total computational cost *C* for *S* subcarriers

Using *α* as relative silicon area costs, the total silicon area implementation cost of an architecture without resource sharing can be estimated from Eq. (30). The relative area *α* of 16-bit arithmetic operations for an ASIC implementation based on [23] results in the values

> OP ADD MUL SQRT INVSQRT *α* 0.108 1 1.73 3

For a system using antenna selection at the transmitter only with *L* = 3 antennas, *S* = 128

*<sup>α</sup><sup>i</sup>* · #OPCi + *<sup>K</sup>* · ∑

*i*∈OP

*α<sup>i</sup>* · #OPNi

(30)

**Table 1.** Operation counts for the computation of *η* per antenna combination *i* and subcarrier

denotes the reciprocal square root [26].

power consumption.

becomes

to be visited per subcarrier.

*8.3.2. Hardware cost estimation*

*<sup>C</sup>* <sup>=</sup> *<sup>S</sup> T*0 · 

*<sup>n</sup>* · ∑ *i*∈OP

given in Table 2. The relative costs *α*MUL of a multiplier are defined to be 1.

subcarriers and *T*<sup>0</sup> = 1 ms, the total IA costs are estimated to be *C* = 1.875 GOPS.

**Table 2.** Relative silicon area costs of 16-bit arithmetic operations

The implementation presented in the previous section is a based on a closed-form *K* = 3 user IA algorithm. There is no known closed-form solution for *K* > 3 users, but iterative algorithms exist. In this section, we present implementation complexity estimates of the minimum mean square error (MMSE) IA algorithm presented in [27].

**Figure 12.** Number of operations for iterative MMSE interference alignment (4 iterations)

The MMSE-IA algorithm starts with arbitrary precoding matrices **V***k*, then iteratively updates the decoding and precoding matrices **U***<sup>k</sup>* and **V***<sup>k</sup>* according to Eq. (31) and (32) until convergence. The Lagrange multiplier *<sup>λ</sup><sup>k</sup>* <sup>≥</sup> 0 is computed to satisfy �**V***k*�<sup>2</sup> <sup>2</sup> ≤ 1 by Newton iteration.

$$\mathbf{U}\_{k} = \left(\sum\_{j=1}^{K} \mathbf{H}\_{kj} \mathbf{V}\_{j} \mathbf{V}\_{j}^{H} \mathbf{H}\_{kj}^{H} + \sigma^{2} \mathbf{I}\right)^{-1} \mathbf{H}\_{kk} \mathbf{V}\_{k} \tag{31}$$

$$\mathbf{V}\_{k} = \left(\sum\_{j=1}^{K} \mathbf{H}\_{jk}^{H} \mathbf{U}\_{j} \mathbf{U}\_{j}^{H} \mathbf{H}\_{jk} + \lambda\_{k} \mathbf{I}\right)^{-1} \mathbf{H}\_{kk}^{H} \mathbf{U}\_{k} \tag{32}$$

The number of required iterations is data-dependent. Each iteration step requires the following operations to be executed: matrix multiplication, pseudo-inverse, Newton

#### 18 Will-be-set-by-IN-TECH 150 Ultra-Wideband Radio Technologies for Communications, Localization and Sensor Applications Interference Alignment for UWB-MIMO Communication Systems <sup>19</sup>

iterations. Figure 12 summarizes the estimated number of operations for the computation of a set of **V** and **U** matrices, based on well-known optimized hardware implementations. Comparing the iterative approach computational complexity to the the closed-form 2x2 IA implementation presented in Section 8.3, the number of operations is increased by a factor of approximately 60.8.

**10. References**

*Compilers, Springer* .

*Newsletter* 54(4): 5 –8.

–4321.

Germany.

[1] Adamiuk, G. [2010]. *Methoden zur Realisierung von Dual-orthogonal, Linear Polarisierten*

Interference Alignment for UWB-MIMO Communication Systems 151

[2] Banz, C., Hesselbarth, S., Flatt, H., Blume, H. & Pirsch, P. [2012]. Real-Time Stereo Vision System using Semi-Global Matching Disparity Estimation: Architecture and FPGA-Implementation, *Transactions on High-Performance Embedded Architectures and*

[3] Björck, Å. & Golub, G. H. [1973]. Numerical methods for computing angles between

[4] Cadambe, V. & Jafar, S. [2008]. Interference alignment and degrees of freedom of the k -user interference channel, *Information Theory, IEEE Transactions on* 54(8): 3425 –3441. [5] Cadambe, V. & Jafar, S. [2009]. Reflections on interference alignment and the degrees of freedom of the k-user mimo interference channel, *IEEE Information Theory Society*

[6] El-Absi, M., El-Hadidy, M. & Kaiser, T. [2012]. Antenna selection for interference alignment based on subspace canonical correlation, *2012 International Symposium on*

[7] El Ayach, O., Peters, S. & Heath, R. [2010]. The feasibility of interference alignment over measured mimo-ofdm channels, *Vehicular Technology, IEEE Transactions on* 59(9): 4309

[8] El-Hadidy, M., El-Absi, M. & Kaiser, T. [2012]. Articial diversity for uwb mb-ofdm interference alignment based on real-world channel models and antenna selection

[9] El-Hadidy, M., Mohamed, T., Zheng, F. & Kaiser, T. [2008]. 3d hybrid em ray-tracing

[10] Fügen, T., Maurer, J., Kayser, T. & Wiesbeck, W. [2006]. Capability of 3-D Ray Tracing for Defining Parameter Sets for the Specification of Future Mobile Communications Systems,

[11] Fügen, T., Waldschmidt, C., Maurer, J. & Wiesbeck, W. [2003]. MIMO capacity of bridge access points based on measurements and simulations for arbitrary arrays, *5th European*

[12] Gesbert, G., Shafi, M., Shiu, D., Smith, P. J. & Naguib, A. [2003]. From Theory to Practice: An Overview of MIMO Space-Time Coded Wireless Systems, *IEEE Journal on Selected*

[13] González, O., Ramírez, D., Santamaría, I., García-Naya, J. & Castedo, L. [2011]. Experimental validation of interference alignment techniques using a multiuser MIMO

[15] Jereczek, G. [2010]. *Design of Capacity Maximizing MIMO Antenna Systems for Car-2-Car Communication*, Master's thesis, Karlsruhe Institute of Technology (KIT), Karlsruhe,

[16] Khalighi, M. I., Brossier, J., Jurdain, G. & Raoof, K. [2001]. Water Filling Capacity of Rayleigh MIMO Channels, *IEEE Transactions on Antennas and Propagation* 1: A155–A158. [17] Kock, M., Hesselbarth, S. & Blume, H. [2013]. Hardware-accelerated design space exploration framework for communication systems, *Wireless Innovation Forum Conference*

testbed, *Smart Antennas (WSA), 2011 International ITG Workshop on*, pp. 1 –8. [14] Heath, R., Sandhu, S. & Paulraj, A. [2001]. Antenna selection for spatial multiplexing

systems with linear receivers, *IEEE Commun. Letters* 5(4): 142 –144.

techniques, *2012 IEEE International Conference on Ultra-Wideband (ICUWB)* .

deterministic uwb channel model, simulations and measurements, 2: 1 –4.

*Antennas and Propagation, IEEE Transactions on* 54(11): 3125 –3137.

*Personal Mobile Communications Conference*, pp. 467–471.

*Areas in Communications* 21: 281–302.

*Antennen für die UWB-Technik*, PhD thesis, Karlsruhe. URL: *http://digbib.ubka.uni-karlsruhe.de/volltexte/1000019874*

*Communications and Information Technologies (ISCIT)* .

linear subspaces, *Math. Comp.* 27: 579–594.
