WOLA samples, *W* 4 6

**Parameter Mode 1 Mode 2** # subcarriers, Nc 512 1024 Overlapping factor, *K* 4 4 IFFT size, K � Nc 2048 4096

**Parameter Mode 1 Mode 2** # subcarriers, *N* 512 1024 # subcarriers per PRB 12 12 # active PRBs 3 3 IFFT size, *N*<sup>0</sup> 64 64 Upsampling factor, *N/N*<sup>0</sup> 8 16 Filter length, *L* 37 73 Filter type Dolph-Chebyshev (60-dB side lobe attenuation)

36 (other symbols) 72 (other symbols)

With exception of the inverse FFT (IFFT), the tasks required for a modulator involve only simple arithmetic, data selection and reordering. The first module is

the QAM mapper. For a general *M*-ary QAM case, the module is simply

and-add operation, the consecutive IFFT output block streams are delayed by *Nc*. To continuously accumulate consecutive IFFT output blocks delayed by *Nc=*2, a feedback shift register of 2ð Þ� � *K* � 1 *Nc=*2 samples is used to align the previous IFFT

*Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications*

UFMC, sometimes called Universal Filtered OFDM (UF-OFDM), is an OFDMbased waveform that attempts to reduce OOB emissions by time-domain filtering. The *N* subcarriers of each symbol are divided into *B physical resource blocks* (PRBs) of *N=B* subcarriers each. Usually, only part of the PRBs is used for transmission (*active PRBs*). For each active PRB, IFFT and bandpass *L*-order FIR filtering are performed. Instead of the CP, a zero-valued guard interval with length *L* is inserted after the IFFT. Frequency-shifted versions of the FIR filter are applied to all active PRBs, and, finally, the filtered sub-bands are superimposed to form an UFMC multicarrier symbol. Chebyshev filters are normally used for bandpass filtering in

The classic UFMC modulation scheme [26] uses an *N*-point IFFT and FIR filters with complex coefficients for each active sub-band. To reduce this increased com-

Moreover, the same real-coefficient FIR filter is used in all sub-bands, followed by frequency shifters implemented as multiplications by a complex exponential. **Figure 3** illustrates the datapath structure for the UFMC modulator considered in

The UFMC modulator of this work has three processing branches, one to process each active PRB (B = 3). These branches share the same architecture and start with QAM mapping of the incoming data. The QAM mapper is equal to the one used in the OFDM and FBMC datapaths. The *subcarrier mapping* module maps the 12 PRB subcarriers to the central bins of an array with *N*<sup>0</sup> (64) elements and zeroes the

remaining *N*<sup>0</sup> � 12 elements. It follows the same approach as subcarrier mapping in the FBMC modulator: a double buffer of 2 � *N*<sup>0</sup> elements and read/write control engines.


block with the incoming IFFT block.

*DOI: http://dx.doi.org/10.5772/intechopen.91297*

**3.3 Baseband datapath for UFMC**

plexity, Knopp et al. [27] combine a smaller *N*<sup>0</sup>

UFMC [23–25].

this work.

**Figure 3.**

**45**

*Conceptual structure of the UFMC baseband modulator.*

**Figure 2.**

*Frequency spreading FBMC-OQAM baseband modulation.*

and on successive transmitted symbols [19]. For instance, if the a symbol includes the in-phase (I) and quadrature (Q) components with the pattern *I,Q,I,Q,...*, the next symbol will use the pattern *Q,I,Q,I...*. The QAM mapper is implemented as for the OFDM modulator. The I/Q decoupling is efficiently performed with an FSM that alternately stores or outputs the I/Q components of a QAM symbol.

The following datapath modules are mainly characterized by parameters *K* and *Nc*. The *guard band insertion* module places the OQAM symbols in the central bins of an *Nc*-element array. The remaining subcarriers are zero and represent frequency guard bands. The operation of this module is similar to *subcarrier mapping* in OFDM modulation, except that that no DC null component is inserted.

The frequency spreading operation comprises *upsampling* by *K* and *FIR* filtering. The upsampler outputs *K* � 1 zero values between two incoming I/Q samples. It uses registers to store the input data and a counter to control the number of zero values at the output. For pulse shaping, a FIR filter architecture with a transpose structure was adopted because, unlike the direct FIR model, it does not require an extra input shift register, nor a tree of pipelined adders to achieve high throughput. The number of filter coefficients is odd (2 � *K* � 1), and their values are symmetric with a single-centre coefficient equal to one (**Table 4**). The multiplications by the centre coefficient can be ignored, as they do not affect the input value. However, the remaining coefficients imply non-trivial multiplications. The amount of nontrivial multiplications per FIR filter can be halved by exploiting the symmetry of the coefficients. As the sub-band signal is complex-valued, two FIR filters are required to separately filter the real and imaginary parts. The IFFT modules are the same as those used in the OFDM modulator.

The final operation is to *overlap-and-add* consecutive IFFT output stream blocks delayed by *Nc=*2 samples [19]. This operation uses an array of 2 � *K* � *Nc* elements as temporary storage; the first half stores the current FBMC symbol, and the second half accumulates IFFT output blocks. For each IFFT output block, the whole array is shifted by *Nc=*2 positions and then the IFFT output block is added to the second half of the array. A direct mapping of this approach to a hardware implementation would require the use of replicated memory structures to perform two read operations per clock cycle on the temporary array [21]. Instead, the overlap-and-add (OAA) module used was inspired by the architecture used in [22]. The main difference has to do with the fact that OQAM is not employed in [22] and, for the overlap-


**Table 4.** *Frequency domain prototype filter coefficients [20].*

and-add operation, the consecutive IFFT output block streams are delayed by *Nc*. To continuously accumulate consecutive IFFT output blocks delayed by *Nc=*2, a feedback shift register of 2ð Þ� � *K* � 1 *Nc=*2 samples is used to align the previous IFFT block with the incoming IFFT block.

#### **3.3 Baseband datapath for UFMC**

and on successive transmitted symbols [19]. For instance, if the a symbol includes the in-phase (I) and quadrature (Q) components with the pattern *I,Q,I,Q,...*, the next symbol will use the pattern *Q,I,Q,I...*. The QAM mapper is implemented as for the OFDM modulator. The I/Q decoupling is efficiently performed with an FSM

The following datapath modules are mainly characterized by parameters *K* and *Nc*. The *guard band insertion* module places the OQAM symbols in the central bins of an *Nc*-element array. The remaining subcarriers are zero and represent frequency guard bands. The operation of this module is similar to *subcarrier mapping* in OFDM

The frequency spreading operation comprises *upsampling* by *K* and *FIR* filtering. The upsampler outputs *K* � 1 zero values between two incoming I/Q samples. It uses registers to store the input data and a counter to control the number of zero values at the output. For pulse shaping, a FIR filter architecture with a transpose structure was adopted because, unlike the direct FIR model, it does not require an extra input shift register, nor a tree of pipelined adders to achieve high throughput. The number of filter coefficients is odd (2 � *K* � 1), and their values are symmetric with a single-centre coefficient equal to one (**Table 4**). The multiplications by the centre coefficient can be ignored, as they do not affect the input value. However, the remaining coefficients imply non-trivial multiplications. The amount of nontrivial multiplications per FIR filter can be halved by exploiting the symmetry of the coefficients. As the sub-band signal is complex-valued, two FIR filters are required to separately filter the real and imaginary parts. The IFFT modules are the same as

The final operation is to *overlap-and-add* consecutive IFFT output stream blocks delayed by *Nc=*2 samples [19]. This operation uses an array of 2 � *K* � *Nc* elements as temporary storage; the first half stores the current FBMC symbol, and the second half accumulates IFFT output blocks. For each IFFT output block, the whole array is shifted by *Nc=*2 positions and then the IFFT output block is added to the second half of the array. A direct mapping of this approach to a hardware implementation would require the use of replicated memory structures to perform two read operations per clock cycle on the temporary array [21]. Instead, the overlap-and-add (OAA) module used was inspired by the architecture used in [22]. The main difference has to do with the fact that OQAM is not employed in [22] and, for the overlap-

**K** *H***<sup>0</sup>** *H***<sup>1</sup>** ¼ *H*�**<sup>1</sup>** *H***<sup>2</sup>** ¼ *H*�**<sup>2</sup>** *H***<sup>3</sup>** ¼ *H*�**<sup>3</sup>**

3 1 0.911438 0.411438 —

<sup>p</sup> *<sup>=</sup>*<sup>2</sup> — —

2

<sup>p</sup> *<sup>=</sup>*<sup>2</sup> 0.235147

2

4 1 0.971960 ffiffi

that alternately stores or outputs the I/Q components of a QAM symbol.

modulation, except that that no DC null component is inserted.

*Frequency spreading FBMC-OQAM baseband modulation.*

*Field Programmable Gate Arrays (FPGAs) II*

those used in the OFDM modulator.

2 1 ffiffi

*Frequency domain prototype filter coefficients [20].*

**Table 4.**

**44**

**Figure 2.**

UFMC, sometimes called Universal Filtered OFDM (UF-OFDM), is an OFDMbased waveform that attempts to reduce OOB emissions by time-domain filtering. The *N* subcarriers of each symbol are divided into *B physical resource blocks* (PRBs) of *N=B* subcarriers each. Usually, only part of the PRBs is used for transmission (*active PRBs*). For each active PRB, IFFT and bandpass *L*-order FIR filtering are performed. Instead of the CP, a zero-valued guard interval with length *L* is inserted after the IFFT. Frequency-shifted versions of the FIR filter are applied to all active PRBs, and, finally, the filtered sub-bands are superimposed to form an UFMC multicarrier symbol. Chebyshev filters are normally used for bandpass filtering in UFMC [23–25].

The classic UFMC modulation scheme [26] uses an *N*-point IFFT and FIR filters with complex coefficients for each active sub-band. To reduce this increased complexity, Knopp et al. [27] combine a smaller *N*<sup>0</sup> -point IFFT with *N=N*<sup>0</sup> upsampling. Moreover, the same real-coefficient FIR filter is used in all sub-bands, followed by frequency shifters implemented as multiplications by a complex exponential. **Figure 3** illustrates the datapath structure for the UFMC modulator considered in this work.

The UFMC modulator of this work has three processing branches, one to process each active PRB (B = 3). These branches share the same architecture and start with QAM mapping of the incoming data. The QAM mapper is equal to the one used in the OFDM and FBMC datapaths. The *subcarrier mapping* module maps the 12 PRB subcarriers to the central bins of an array with *N*<sup>0</sup> (64) elements and zeroes the remaining *N*<sup>0</sup> � 12 elements. It follows the same approach as subcarrier mapping in the FBMC modulator: a double buffer of 2 � *N*<sup>0</sup> elements and read/write control engines.

**Figure 3.** *Conceptual structure of the UFMC baseband modulator.*

UFMC performs well for short-packet lengths and sporadic burst transmission [28, 29]. Moreover, the parallel sub-band processing in UFMC requires an IFFT core per branch. Therefore, instead of the high-performance pipelined IFFT architectures adopted for the OFDM and FBMC datapaths, low-resource memory-based FFT architectures are adopted in the UFMC modulator. The memory-based architecture adopted here is detailed in [30]. The *upsampler* architecture and operation is similar to the one used for frequency spreading in FBMC modulation. Here, the number of zeros between consecutive IFFT output samples is *N=N*<sup>0</sup> ð Þ� 1.

*forward compatible*. Here, DPR and DFS are combined to produce a dynamically reconfigurable baseband processing architecture for multimode, multi-waveform coexistence and dynamic spectrum aggregation. To enable the full potential of 5G, carrier aggregation should also be possible across separated frequency bands [31] (*noncontiguous CA*). For noncontiguous CA, a *multidimensional* PHY layer (and, therefore, baseband architecture) is needed, even when data aggregation is not performed in the PHY layer, but in the media access control (MAC) communication layer instead [32]. In this context, multidimensional means that the PHY layer is an

*Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications*

array of independent processing blocks, rather than a monolithic structure.

modulators, whose functionality and clock frequency can be dynamically

ment unit: it is responsible for triggering the reconfiguration of the

The basic unit for DPR is a complete baseband datapath, and each one is

implemented in a reconfigurable partition. From the three RPs, *RP*<sup>1</sup> is exclusively used for primary OFDM-based transmission. The two remaining RPs can be used for primary or secondary transmission: *RP*<sup>2</sup> implements FBMC or OFDM transmission modes; *RP*<sup>3</sup> implements UFMC or OFDM transmission modes. For instance, if the primary transmission requires more capacity, the three RPs can be used to independently modulate different component carriers in a noncontiguous CA scheme. If the primary transmission is not so demanding, RP2 and RP3 can be used for secondary *multi-waveform* 5G transmission. **Figure 5** illustrates a potential multi-waveform coexistence scenario by showing the combined periodograms of the OFDM, FBMC and UFMC baseband signals obtained from the implemented

During system initialization, the ARM CPU manages the downloading of partial

bitstreams and input data files from an SD card to the DDR memory. For the purpose of validating the baseband engines, the input data is retrieved from the DDR and sent to the baseband modulator(s), and the results are stored back to the DDR (and used for validating the implementation). Each RP has an associated DMA

To achieve the specialization of computation at runtime, the configuration interface adopted is the ICAP. This high-bandwidth internal interface permits the FPGA to reconfigure itself. Xilinx sets the maximum ICAP bandwidth at 400, for a 100 clock frequency and 32-bit data width [34]. Nevertheless, the ICAP can be overclocked to further enhance the reconfiguration throughput [35]. In the present

eters in noncontiguous CA schemes.

*DOI: http://dx.doi.org/10.5772/intechopen.91297*

modulator datapaths.

**47**

controller to accelerate the access to the DDR.

The baseband architecture presented in this chapter features three independent

reconfigured through DPR and DFS, respectively. This setup enables the processing of multiple component carriers with different waveforms and/or baseband param-

A prototype of the multidimensional baseband modulator was implemented on an Avnet Zedboard equipped with a Zynq xc7z020 device. The system top level combines features from the designs of the previous section and can be divided into three parts. The Zedboard's 512 MB DDR memory is used as a repository for the partial bitstreams used for DPR. The Zynq's ARM CPU act as the system manage-

multidimensional baseband modulator and setting up data transfers between the DDR memory and the modulators implemented in the programmable logic together with the infrastructure for DPR and DFS. **Figure 4** shows the top-level architecture. The proposed architecture targets the communication scenario described in [33], which combines *multi-waveform coexistence* with *dynamic spectrum access*. In this scenario, 5G communications build on the pre-existing 4G infrastructure (*nonstand-alone 5G*): the primary 4G-LTE communications are OFDM-based, and the secondary 5G communications opportunistically exploit vacant spectrum resources through DSA, transmitting with different waveforms (OFDM, FBMC or UFMC).

Dolph-Chebyshev FIR filters with a transpose structure are used for bandpass sub-band filtering. Again, the FIR coefficients are symmetric: there are an odd number of symmetric coefficients, and the centre coefficient is equal to one. However, the higher FIR order used in UFMC modulation requires further discussion. Considering an *L*-order FIR filter, *L* � 1 coefficients imply non-trivial multiplications that can be halved due to coefficient symmetry (*L*�<sup>1</sup> <sup>2</sup> ). As each processing branch requires two FIR filters—for the real and imaginary parts—there are *L* � 1 non-trivial multiplications per branch.

In Xilinx FPGAs, non-trivial multiplications can be efficiently performed by DSP blocks. These blocks are embedded into the logic fabric in a column arrangement. Cost-optimized devices have a smaller amount of DSP blocks, and their utilization should be carefully considered. For instance, the xc7z020 device has 220 DSP blocks. Considering the modes of operation from **Table 4**, the overall amount of non-trivial multiplications for FIR filtering (3 � ð Þ *L* � 1 ) is 108 for mode 1 and 216 for mode 2. This represents a high DSP utilization, and the sparse distribution of these types of blocks throughout the logic fabric degrades the scalability of the UFMC modulator. In addition, placement and routing of the design is more difficult and likely to affect the overall timing closure. To reduce DSP utilization, a multiplier-less architecture for FIR filters was adopted. The FIR coefficients are represented in Q1.5 format, using the Canonic Signed Digit (CSD) system with minimum non-zero bits. Then, non-trivial multiplications are substituted by shifters and adders. For example, the multiplication by 0.90625 can be implemented as:

This strategy eliminates the use of DSP blocks in FIR filters, but increases slice utilization. However, slices are the most numerous type of resource (13,300 slices in the xc7z020 device), making this approach well-suited for the present application.

After FIR filtering, each sub-band signal is shifted to the corresponding frequency band. The *frequency shift* module for each branch has a ROM memory to store the complex exponential values and a complex multiplier. Finally, the filtered sub-band responses are summed to create the UFMC symbol.

#### **4. A dynamically reconfigurable baseband modulator for 5G communication**

After the preceding overview of the architecture of high-performance baseband engines for three different waveforms, this section presents the architecture of a baseband processing engine that is *flexible*, *scalable*, resource and power *efficient* and

#### *Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications DOI: http://dx.doi.org/10.5772/intechopen.91297*

*forward compatible*. Here, DPR and DFS are combined to produce a dynamically reconfigurable baseband processing architecture for multimode, multi-waveform coexistence and dynamic spectrum aggregation. To enable the full potential of 5G, carrier aggregation should also be possible across separated frequency bands [31] (*noncontiguous CA*). For noncontiguous CA, a *multidimensional* PHY layer (and, therefore, baseband architecture) is needed, even when data aggregation is not performed in the PHY layer, but in the media access control (MAC) communication layer instead [32]. In this context, multidimensional means that the PHY layer is an array of independent processing blocks, rather than a monolithic structure.

The baseband architecture presented in this chapter features three independent modulators, whose functionality and clock frequency can be dynamically reconfigured through DPR and DFS, respectively. This setup enables the processing of multiple component carriers with different waveforms and/or baseband parameters in noncontiguous CA schemes.

A prototype of the multidimensional baseband modulator was implemented on an Avnet Zedboard equipped with a Zynq xc7z020 device. The system top level combines features from the designs of the previous section and can be divided into three parts. The Zedboard's 512 MB DDR memory is used as a repository for the partial bitstreams used for DPR. The Zynq's ARM CPU act as the system management unit: it is responsible for triggering the reconfiguration of the multidimensional baseband modulator and setting up data transfers between the DDR memory and the modulators implemented in the programmable logic together with the infrastructure for DPR and DFS. **Figure 4** shows the top-level architecture.

The proposed architecture targets the communication scenario described in [33], which combines *multi-waveform coexistence* with *dynamic spectrum access*. In this scenario, 5G communications build on the pre-existing 4G infrastructure (*nonstand-alone 5G*): the primary 4G-LTE communications are OFDM-based, and the secondary 5G communications opportunistically exploit vacant spectrum resources through DSA, transmitting with different waveforms (OFDM, FBMC or UFMC). The basic unit for DPR is a complete baseband datapath, and each one is implemented in a reconfigurable partition. From the three RPs, *RP*<sup>1</sup> is exclusively used for primary OFDM-based transmission. The two remaining RPs can be used for primary or secondary transmission: *RP*<sup>2</sup> implements FBMC or OFDM transmission modes; *RP*<sup>3</sup> implements UFMC or OFDM transmission modes. For instance, if the primary transmission requires more capacity, the three RPs can be used to independently modulate different component carriers in a noncontiguous CA scheme. If the primary transmission is not so demanding, RP2 and RP3 can be used for secondary *multi-waveform* 5G transmission. **Figure 5** illustrates a potential multi-waveform coexistence scenario by showing the combined periodograms of the OFDM, FBMC and UFMC baseband signals obtained from the implemented modulator datapaths.

During system initialization, the ARM CPU manages the downloading of partial bitstreams and input data files from an SD card to the DDR memory. For the purpose of validating the baseband engines, the input data is retrieved from the DDR and sent to the baseband modulator(s), and the results are stored back to the DDR (and used for validating the implementation). Each RP has an associated DMA controller to accelerate the access to the DDR.

To achieve the specialization of computation at runtime, the configuration interface adopted is the ICAP. This high-bandwidth internal interface permits the FPGA to reconfigure itself. Xilinx sets the maximum ICAP bandwidth at 400, for a 100 clock frequency and 32-bit data width [34]. Nevertheless, the ICAP can be overclocked to further enhance the reconfiguration throughput [35]. In the present

UFMC performs well for short-packet lengths and sporadic burst transmission [28, 29]. Moreover, the parallel sub-band processing in UFMC requires an IFFT core per branch. Therefore, instead of the high-performance pipelined IFFT architectures adopted for the OFDM and FBMC datapaths, low-resource memory-based FFT architectures are adopted in the UFMC modulator. The memory-based architecture adopted here is detailed in [30]. The *upsampler* architecture and operation is similar to the one used for frequency spreading in FBMC modulation. Here, the number of zeros between consecutive IFFT output samples is *N=N*<sup>0</sup> ð Þ� 1.

Dolph-Chebyshev FIR filters with a transpose structure are used for bandpass sub-band filtering. Again, the FIR coefficients are symmetric: there are an odd number of symmetric coefficients, and the centre coefficient is equal to one. However, the higher FIR order used in UFMC modulation requires further discussion. Considering an *L*-order FIR filter, *L* � 1 coefficients imply non-trivial multiplica-

branch requires two FIR filters—for the real and imaginary parts—there are *L* � 1

In Xilinx FPGAs, non-trivial multiplications can be efficiently performed by DSP blocks. These blocks are embedded into the logic fabric in a column arrangement. Cost-optimized devices have a smaller amount of DSP blocks, and their utilization should be carefully considered. For instance, the xc7z020 device has 220 DSP blocks. Considering the modes of operation from **Table 4**, the overall amount of non-trivial multiplications for FIR filtering (3 � ð Þ *L* � 1 ) is 108 for mode 1 and 216 for mode 2. This represents a high DSP utilization, and the sparse distribution of these types of blocks throughout the logic fabric degrades the scalability of the UFMC modulator. In addition, placement and routing of the design is more difficult and likely to affect the overall timing closure. To reduce DSP utilization, a multiplier-less architecture for FIR filters was adopted. The FIR coefficients are represented in Q1.5 format, using the Canonic Signed Digit (CSD) system with minimum non-zero bits. Then, non-trivial multiplications are substituted by shifters and adders. For example, the multiplication

This strategy eliminates the use of DSP blocks in FIR filters, but increases slice utilization. However, slices are the most numerous type of resource (13,300 slices in the xc7z020 device), making this approach well-suited for the present application. After FIR filtering, each sub-band signal is shifted to the corresponding frequency band. The *frequency shift* module for each branch has a ROM memory to store the complex exponential values and a complex multiplier. Finally, the filtered

After the preceding overview of the architecture of high-performance baseband engines for three different waveforms, this section presents the architecture of a baseband processing engine that is *flexible*, *scalable*, resource and power *efficient* and

sub-band responses are summed to create the UFMC symbol.

**4. A dynamically reconfigurable baseband modulator for 5G**

<sup>2</sup> ). As each processing

tions that can be halved due to coefficient symmetry (*L*�<sup>1</sup>

non-trivial multiplications per branch.

*Field Programmable Gate Arrays (FPGAs) II*

by 0.90625 can be implemented as:

**communication**

**46**

to the clock management module available in the FPGA. To change the frequency of the output clocks, the input signal *en* must be enabled, and the desired mode of operation should be given through the *mode* port. The DFS controller is fed by a 100 reference input clock that is used to synthesize the clock signal used for baseband processing. Its frequency (*f clkBB*) can be configured to one of four values: 16.7, 33.3, 66.7 and 100 MHz. All modulator datapaths can work at 100 MHz. The other values are based on the scaling of subcarrier spacing by 2*<sup>μ</sup>* as in 5G New Radio systems [2], where *μ* is an integer that specifies the mode of operation. In this system, primary communications are based on the LTE OFDM numerologies (**Table 1**), where the subcarrier spacing (Δ*f* ) is 15 kHz. For OFDM mode 2 [cf. (**Table 1**)], the sampling frequency required is *<sup>N</sup>* � <sup>Δ</sup>*<sup>f</sup>* <sup>¼</sup> <sup>15</sup>*:*36 MHz. Scaling the subcarrier spacing by 2*<sup>μ</sup>* with *<sup>μ</sup>* <sup>¼</sup> f g 1, 2 , results in sampling frequencies of 2<sup>1</sup> � <sup>15</sup>*:*<sup>36</sup> <sup>¼</sup> <sup>30</sup>*:*72 MHz and

*Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications*

A general overview of the resource utilization of the prototype is presented in

The resource utilization of each modulator is presented in **Table 6**. The results lead to a key observation: the hardware virtualization achieved with the 7000 slices,

**overhead**

**DFS DPR**

**OFDM FBMC UFMC OFDM FBMC UFMC**

Slice 13,300 4210 24 424 1400 2400 3200 7000 LUT 53,200 10,700 75 938 5600 9600 12,800 28,000 FF 106,400 13,110 79 1292 11,200 19,200 25,600 56,000 BRAM 140 7.5 0 1.5 20 40 30 90 DSP 220 0 0 0 40 80 40 160

**RP1 RP2 RP3 All RPs**

**Table 5**. The static part occupies around 32 and 5% of the slices and BRAMs, respectively. Apart from PS-PL interconnect cores and DMA controllers to accelerate the baseband modulators, the static part also implements the infrastructure for reconfiguration (DPR and DFS). The hardware required to implement DPR and DFS is below 2% of the available LUTs, FFs and BRAMs. The three RPs form the system's reconfigurable part and occupy 52.6, 64.3 and 72.7% of the available slices, BRAMs and DSPs, respectively. Overall, the resource utilization for the complete system implementation represents a considerable share of the resources available in

the xc7z020 device: 84.3% of slices, 69.7% of BRAMs and 72.7% of DSPs.

*Post place-and-route resource utilization for the static and reconfigurable system parts.*

*Post place-and-route resource utilization for each baseband modulator datapath.*

**Resource Mode 1 Mode 2**

Slice 1015 1575 2315 1126 2210 3100 LUT 2829 5103 8090 3400 7876 11,782 FF 2107 2307 6279 2170 2284 9912 BRAM 7 19 11.5 10.5 40 11.5 DSP 14 21 18 14 21 18

**Resource Available Static part (total) Reconfig.**

22 � <sup>15</sup>*:*<sup>36</sup> <sup>¼</sup> <sup>61</sup>*:*44 MHz.

*DOI: http://dx.doi.org/10.5772/intechopen.91297*

**Table 5.**

**Table 6.**

**49**

*Device, xc7z020; f clk* ¼ *100 MHz.*

#### **Figure 4.**

*Top-level architecture for the multidimensional and reconfigurable baseband modulator. HPx, high-performance ports; GPIO, general purpose I/O.*

**Figure 5.** *Periodograms for OFDM, FBMC and UFMC baseband signals.*

work, the ICAP is overclocked at 200 MHz. To take advantage of ICAP overclocking, a dedicated DMA controller is used to accelerate the transfer of partial bitstreams to the ICAP.

The implementation of DFS follows the reference design from [36]. This design considers an FSM that reads configuration parameters from a ROM and writes them *Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications DOI: http://dx.doi.org/10.5772/intechopen.91297*

to the clock management module available in the FPGA. To change the frequency of the output clocks, the input signal *en* must be enabled, and the desired mode of operation should be given through the *mode* port. The DFS controller is fed by a 100 reference input clock that is used to synthesize the clock signal used for baseband processing. Its frequency (*f clkBB*) can be configured to one of four values: 16.7, 33.3, 66.7 and 100 MHz. All modulator datapaths can work at 100 MHz. The other values are based on the scaling of subcarrier spacing by 2*<sup>μ</sup>* as in 5G New Radio systems [2], where *μ* is an integer that specifies the mode of operation. In this system, primary communications are based on the LTE OFDM numerologies (**Table 1**), where the subcarrier spacing (Δ*f* ) is 15 kHz. For OFDM mode 2 [cf. (**Table 1**)], the sampling frequency required is *<sup>N</sup>* � <sup>Δ</sup>*<sup>f</sup>* <sup>¼</sup> <sup>15</sup>*:*36 MHz. Scaling the subcarrier spacing by 2*<sup>μ</sup>* with *<sup>μ</sup>* <sup>¼</sup> f g 1, 2 , results in sampling frequencies of 2<sup>1</sup> � <sup>15</sup>*:*<sup>36</sup> <sup>¼</sup> <sup>30</sup>*:*72 MHz and 22 � <sup>15</sup>*:*<sup>36</sup> <sup>¼</sup> <sup>61</sup>*:*44 MHz.

A general overview of the resource utilization of the prototype is presented in **Table 5**. The static part occupies around 32 and 5% of the slices and BRAMs, respectively. Apart from PS-PL interconnect cores and DMA controllers to accelerate the baseband modulators, the static part also implements the infrastructure for reconfiguration (DPR and DFS). The hardware required to implement DPR and DFS is below 2% of the available LUTs, FFs and BRAMs. The three RPs form the system's reconfigurable part and occupy 52.6, 64.3 and 72.7% of the available slices, BRAMs and DSPs, respectively. Overall, the resource utilization for the complete system implementation represents a considerable share of the resources available in the xc7z020 device: 84.3% of slices, 69.7% of BRAMs and 72.7% of DSPs.

The resource utilization of each modulator is presented in **Table 6**. The results lead to a key observation: the hardware virtualization achieved with the 7000 slices,


**Table 5.**

*Post place-and-route resource utilization for the static and reconfigurable system parts.*


#### **Table 6.**

*Post place-and-route resource utilization for each baseband modulator datapath.*

work, the ICAP is overclocked at 200 MHz. To take advantage of ICAP

*Periodograms for OFDM, FBMC and UFMC baseband signals.*

*Top-level architecture for the multidimensional and reconfigurable baseband modulator. HPx,*

*high-performance ports; GPIO, general purpose I/O.*

*Field Programmable Gate Arrays (FPGAs) II*

bitstreams to the ICAP.

**Figure 5.**

**48**

**Figure 4.**

overclocking, a dedicated DMA controller is used to accelerate the transfer of partial

The implementation of DFS follows the reference design from [36]. This design considers an FSM that reads configuration parameters from a ROM and writes them 90 BRAMs and 160 DSPs reserved by the three RPs allows the implementation of six baseband modulators, which would need 11,322 slices, 99.5 BRAMs and 106 DSPs in total. Adding these *virtualized* resources to the static resources exceeds the available xc7z020 slices by 17%. This is an unequivocal demonstration of the resource efficiency benefits that DPR brings to multimode baseband processors. An equivalent static multimode design could benefit from the reuse of common hardware blocks between different modulator datapaths (especially between OFDM and FBMC datapaths). However, implementing the multidimensional baseband modulator as a static multimode design would be challenging given the resource budget available on cost-optimized devices like the xc7z020. There are FPGA/SoC devices with larger area and logic density. However, using them would decrease the system's cost-effectiveness: an FPGA with a larger chip area is more expensive and likely to consume more power [37].

• 33.3 MHz results in dynamic power savings between 79 mW (70% reduction in

• 16.7 MHz results in dynamic power savings between 99 mW (88% reduction in

For the set of baseband clock frequencies defined, the DFS procedure took on average 47 μs to modify the clock frequency, a latency which is acceptable in 5G NR

In the multidimensional baseband modulator, the area and amount of RP resources are higher than in the individual designs, resulting in larger bitstream

overclocking. **Table 8** quantifies the DPR latency and compressed bitstream size for the worst-case scenario in each RP. The largest RP (*RP*3) takes up to 767 μs to be reconfigured, corresponding to the transfer of a 596 kB bitstream to the ICAP. In all DPR latency measurements, the reconfiguration throughput was at least 790 MB/s. This value is about 99% of the theoretical ICAP throughput, considering 32-bit transfers and overclocking at 200 MHz. In general, the DPR latency for each individual RP is below 1 ms, while the overall reconfiguration of the three RPs takes less than 2 ms. These latency values are within an acceptable range considering the

The ITU report [38] states that in critical, ultralow-latency scenarios, a *makebefore-break* approach must be adopted to completely mitigate the control plane latency. In other words, the control plane latency must be *hidden* by setting up a new communication channel before breaking the current one. Under these circumstances, a high-priority communication can reserve a spare RP to seamlessly adapt the transmission mode. This scenario is exemplified in **Figure 6**. Let us assume that *RP*<sup>1</sup> is currently performing baseband modulation for an ultralow-latency communication. This transmission needs to be adapted from OFDM mode 1 to OFDM mode 2, without breaking the current communication link. *RP*<sup>2</sup> is currently unused and is reconfigured to OFDM mode 2 before baseband processing at *RP*<sup>1</sup> terminates. In this way, the baseband processing datapath can be modified without incurring

**Characteristic RP1 RP2 RP3** DPR latency 400 μs 677 μs 767 μs Partial bitstream size 309 kB 526 kB 596 kB

*Measured DPR latency and size of compressed partial bitstreams for the worst-case scenarios.*

*Example of make-before-break approach to mitigate DPR latency.*

OFDM mode 1) and 156 mW (67% reduction in UFMC mode 2)

*Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications*

OFDM mode 1) and 194 mW (83% reduction in UFMC mode 2)

sizes. However, the reconfiguration speed was increased through ICAP

communications.

control plane requirements from [38].

*DOI: http://dx.doi.org/10.5772/intechopen.91297*

any latency penalty due to DPR.

**Table 8.**

**Figure 6.**

**51**

Considering the modes of operation shown in **Tables 1**–**3**, and that all 3 RPs are in use, the proposed design supports 32 combinations of baseband modulators: *2 RP*<sup>1</sup> *modes 4 RP*<sup>2</sup> *modes 4 RP*<sup>3</sup> *modes*. The use of DPR simplifies system upgrade with new modes of operation in order to extend the system's useful lifetime. The addition of modes of operation is not limited by the available resources on the FPGA device, but instead by the resources reserved by the RPs and the capacity to store partial bitstreams (512 MB DDR memory, in this case).

During the DPR design with the Xilinx Vivado EDA tool, the different system configurations are created from a design checkpoint that saves the floorplanning and routing of the system's static part, leaving the RPs as empty *black boxes*. New configurations can be created by designing new circuit configurations for these black boxes and generating the corresponding partial bitstreams. This design reusability makes the system adaptable and reduces the upgrade design time.

The dynamic power consumption for each modulator datapath and baseband clock frequency was estimated with the power analysis tool from Vivado 2015.2. The high-confidence estimates were performed using placed and routed netlists and accurate node activity files. The results are presented in **Table 7**. The UFMC modulator modes have a higher dynamic power consumption compared to FBMC and OFDM. This is mainly due to the higher resource usage and node activity of UFMC datapaths. The clock frequency adaptation allowed by DFS results in power savings that tend to be more evident for the most resource-demanding modes of operation (UFMC and FBMC). Compared to a design with baseband clock frequency fixed at 100 MHz, the clock frequency adaptation to:


• 66.7 MHz results in dynamic power savings between 39 mW (35% reduction in OFDM mode 1) and 82 mW (51% reduction in FBMC mode 2)

*Device, xc7z020; analysis tool, Vivado 2015.2; post place-and-route power analysis with high confidence level; node activity derived from post place-and-route simulation.*

#### **Table 7.**

*Dynamic power consumption estimates for the six implemented baseband modulator cores (in).*

*Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications DOI: http://dx.doi.org/10.5772/intechopen.91297*


For the set of baseband clock frequencies defined, the DFS procedure took on average 47 μs to modify the clock frequency, a latency which is acceptable in 5G NR communications.

In the multidimensional baseband modulator, the area and amount of RP resources are higher than in the individual designs, resulting in larger bitstream sizes. However, the reconfiguration speed was increased through ICAP overclocking. **Table 8** quantifies the DPR latency and compressed bitstream size for the worst-case scenario in each RP. The largest RP (*RP*3) takes up to 767 μs to be reconfigured, corresponding to the transfer of a 596 kB bitstream to the ICAP. In all DPR latency measurements, the reconfiguration throughput was at least 790 MB/s. This value is about 99% of the theoretical ICAP throughput, considering 32-bit transfers and overclocking at 200 MHz. In general, the DPR latency for each individual RP is below 1 ms, while the overall reconfiguration of the three RPs takes less than 2 ms. These latency values are within an acceptable range considering the control plane requirements from [38].

The ITU report [38] states that in critical, ultralow-latency scenarios, a *makebefore-break* approach must be adopted to completely mitigate the control plane latency. In other words, the control plane latency must be *hidden* by setting up a new communication channel before breaking the current one. Under these circumstances, a high-priority communication can reserve a spare RP to seamlessly adapt the transmission mode. This scenario is exemplified in **Figure 6**. Let us assume that *RP*<sup>1</sup> is currently performing baseband modulation for an ultralow-latency communication. This transmission needs to be adapted from OFDM mode 1 to OFDM mode 2, without breaking the current communication link. *RP*<sup>2</sup> is currently unused and is reconfigured to OFDM mode 2 before baseband processing at *RP*<sup>1</sup> terminates. In this way, the baseband processing datapath can be modified without incurring any latency penalty due to DPR.


**Table 8.**

90 BRAMs and 160 DSPs reserved by the three RPs allows the implementation of six baseband modulators, which would need 11,322 slices, 99.5 BRAMs and 106 DSPs in total. Adding these *virtualized* resources to the static resources exceeds the available xc7z020 slices by 17%. This is an unequivocal demonstration of the resource efficiency benefits that DPR brings to multimode baseband processors. An equivalent static multimode design could benefit from the reuse of common hardware blocks between different modulator datapaths (especially between OFDM and FBMC datapaths). However, implementing the multidimensional baseband modulator as a static multimode design would be challenging given the resource budget available on cost-optimized devices like the xc7z020. There are FPGA/SoC devices with larger area and logic density. However, using them would decrease the system's cost-effectiveness: an FPGA with a larger chip area is more expensive and likely to

Considering the modes of operation shown in **Tables 1**–**3**, and that all 3 RPs are in use, the proposed design supports 32 combinations of baseband modulators: *2 RP*<sup>1</sup> *modes 4 RP*<sup>2</sup> *modes 4 RP*<sup>3</sup> *modes*. The use of DPR simplifies system upgrade with new modes of operation in order to extend the system's useful lifetime. The addition of modes of operation is not limited by the available resources on the FPGA device, but instead by the resources reserved by the RPs and the capacity to store

During the DPR design with the Xilinx Vivado EDA tool, the different system configurations are created from a design checkpoint that saves the floorplanning and routing of the system's static part, leaving the RPs as empty *black boxes*. New configurations can be created by designing new circuit configurations for these black boxes and generating the corresponding partial bitstreams. This design reus-

The dynamic power consumption for each modulator datapath and baseband clock frequency was estimated with the power analysis tool from Vivado 2015.2. The high-confidence estimates were performed using placed and routed netlists and accurate node activity files. The results are presented in **Table 7**. The UFMC modulator modes have a higher dynamic power consumption compared to FBMC and OFDM. This is mainly due to the higher resource usage and node activity of UFMC datapaths. The clock frequency adaptation allowed by DFS results in power savings that tend to be more evident for the most resource-demanding modes of operation (UFMC and FBMC). Compared to a design with baseband clock frequency fixed at

• 66.7 MHz results in dynamic power savings between 39 mW (35% reduction in

**OFDM FBMC UFMC OFDM FBMC UFMC**

ability makes the system adaptable and reduces the upgrade design time.

OFDM mode 1) and 82 mW (51% reduction in FBMC mode 2)

*f* clk **Mode 1 Mode 2**

*Dynamic power consumption estimates for the six implemented baseband modulator cores (in).*

100 MHz 113 148 180 123 161 233 66.7 MHz 74 84 119 78 79 155 33.3 MHz 34 25 60 33 28 77 16.7 MHz 14 8 30 10 10 39 *Device, xc7z020; analysis tool, Vivado 2015.2; post place-and-route power analysis with high confidence level; node*

consume more power [37].

*Field Programmable Gate Arrays (FPGAs) II*

partial bitstreams (512 MB DDR memory, in this case).

100 MHz, the clock frequency adaptation to:

*activity derived from post place-and-route simulation.*

**Table 7.**

**50**

*Measured DPR latency and size of compressed partial bitstreams for the worst-case scenarios.*

**Figure 6.**

*Example of make-before-break approach to mitigate DPR latency.*

### **5. Conclusion**

This chapter presents a reconfigurable, multidimensional baseband modulator architecture suitable for multimode, multiple waveform coexistence and dynamic spectrum aggregation scenarios. The design combines the runtime specialization of computation and performance. By featuring three independent and reconfigurable baseband modulators, the architecture allows the processing of up to three component carriers using different waveforms (OFDM, FBMC and UFMC) and/or numerologies. The total reconfigurable area of the system covers more than half the available xc7z020 resources; the ICAP overclocking contributes to maintain the DPR latency low enough for the analyzed scenarios. In this design, the performance specialization through DFS resulted in dynamic power savings of up to 194 mW. Besides flexibility, scalability and forward compatibility, *cost-effectiveness* is perhaps the most relevant feature of this architecture. It is clearly demonstrated how the hardware virtualization through DPR enables implementations that exceed the hardware resources available on an FPGA device. This allows for system implementations on a small-form, cost-optimized devices with immediate cost and power consumption benefits and without compromising system functionality.

**References**

htm

Ltd; 2016

2017

[1] ITU-R. IMT Vision—Framework and

*DOI: http://dx.doi.org/10.5772/intechopen.91297*

*Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications*

[10] Delorme J, Martin J, Nafkha A, Moy C, Clermidy F, Leray P, et al. A FPGA partial reconfiguration design approach for cognitive radio based on NoC architecture. In: 2008 Joint 6th International IEEE Northeast Workshop on Circuits and Systems and TAISA Conference. Montreal, QC: IEEE; 2008. pp. 355-358. DOI: 10.1109/NEWCAS.

[11] He K, Crockett L, Stewart R. Dynamic reconfiguration technologies based on FPGA in software defined radio system. Journal of Signal Processing Systems. 2011;**69**(1):75-85

[12] Vipin K, Fahmy SA. Mapping adaptive hardware systems with partial reconfiguration using CoPR for Zynq. In: 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS). Montreal, QC: IEEE; 2015. pp. 1-8. DOI:

[13] Rihani MAF, Mroue M, Prévotet JC, Nouvel F, Mohanna Y. ARM-FPGAbased platform for reconfigurable wireless communication systems using partial reconfiguration. EURASIP Journal on Embedded Systems. 2017;**2017**(1):35

[14] Shreejith S, Banarjee B, Vipin K, Fahmy SA. Dynamic cognitive radios on the Xilinx Zynq Hybrid FPGA. In: Weichold M, Hamdi M, Shakir M, Abdallah M, Karagiannidis G, Ismail M, editors. Cognitive Radio Oriented Wireless Networks. CrownCom 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Vol. 156. Cham: Springer; 2015. DOI: 10.1007/978-3-319-24540-9\_35

[15] Pham TH, Fahmy SA, McLoughlin IV. An end-to-end multi-standard OFDM transceiver architecture using FPGA partial reconfiguration. IEEE Access. 2017;**5**:

21002-21015

10.1109/AHS.2015.7231169

2008.4606394

[2] TS G. NR; NR and NG-RAN Overall Description; Stage 2 (Release 15); 2018. 38.300 V15.3.1. Available from: http:// www.3gpp.org/DynaReport/38-series.

[3] Andrews JG, Buzzi S, Choi W, Hanly SV, Lozano A, Soong ACK, et al. What will 5G be? IEEE Journal on Selected Areas in Communications.

[4] Luo FL, Zhang C. Signal Processing for 5G: Algorithms and Implementations. United Kingdom: John Wiley & Sons

[5] Jue G. Exploring 5G Coexistence Scenarios Using a Flexible Hardware/ Software Testbed—Application Note;

[6] Zhao Q, Sadler BM. A survey of dynamic Spectrum access. IEEE Signal Processing Magazine. 2007;**24**(3):79-89

Chandrasekaran M. 5G roadmap: 10 key enabling technologies. Computer Networks. 2016;**106**:17-48

Enderwitz MA, Stewart RW. The Zynq Book: Embedded Processing with the ARM Cortex-A9 on the Xilinx Zynq-7000 all Programmable SoC. Glasgow, United Kingdom: Strathclyde Academic

[9] Delahaye JP, Palicot J, Moy C, Leray P. Partial reconfiguration of FPGAs for dynamical reconfiguration of a software radio platform. In: 16th IST Mobile and Wireless Communications Summit. Budapest: IEEE; 2007. pp. 1-5. DOI: 10.1109/ISTMWC.2007.4299250

[7] Akyildiz IF, Nie S, Lin SC,

[8] Crockett LH, Elliot RA,

Media; 2014

**53**

2014;**32**(6):1065-1082

Overall Objectives of the Future Development of IMT for 2020 and beyond. ITU-R; 2015. ITU-R M.2083-0

#### **Acknowledgements**

This work was financed by the ERDF (European Regional Development Fund) through the Operational Programme for Competitiveness and Internationalization (COMPETE) 2020 Programme within Project POCI-01-0145-FEDER-006961 and by the National Fund through a Ph.D. Grant (PD/BD/105860/2014) from the FCT (Fundação para a Ciência e a Tecnologia) (Portuguese Foundation for Science and Technology).

#### **Author details**

Mário Lopes Ferreira and João Canas Ferreira\* INESC TEC and University of Porto, Porto, Portugal

\*Address all correspondence to: jcf@fe.up.pt

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications DOI: http://dx.doi.org/10.5772/intechopen.91297*

#### **References**

**5. Conclusion**

*Field Programmable Gate Arrays (FPGAs) II*

**Acknowledgements**

Technology).

**Author details**

**52**

Mário Lopes Ferreira and João Canas Ferreira\* INESC TEC and University of Porto, Porto, Portugal

\*Address all correspondence to: jcf@fe.up.pt

provided the original work is properly cited.

This chapter presents a reconfigurable, multidimensional baseband modulator architecture suitable for multimode, multiple waveform coexistence and dynamic spectrum aggregation scenarios. The design combines the runtime specialization of computation and performance. By featuring three independent and reconfigurable baseband modulators, the architecture allows the processing of up to three component carriers using different waveforms (OFDM, FBMC and UFMC) and/or numerologies. The total reconfigurable area of the system covers more than half the available xc7z020 resources; the ICAP overclocking contributes to maintain the DPR latency low enough for the analyzed scenarios. In this design, the performance specialization through DFS resulted in dynamic power savings of up to 194 mW. Besides flexibility, scalability and forward compatibility, *cost-effectiveness* is perhaps the most relevant feature of this architecture. It is clearly demonstrated how the hardware virtualization through DPR enables implementations that exceed the hardware resources available on an FPGA device. This allows for system

implementations on a small-form, cost-optimized devices with immediate cost and power consumption benefits and without compromising system functionality.

This work was financed by the ERDF (European Regional Development Fund) through the Operational Programme for Competitiveness and Internationalization (COMPETE) 2020 Programme within Project POCI-01-0145-FEDER-006961 and by the National Fund through a Ph.D. Grant (PD/BD/105860/2014) from the FCT (Fundação para a Ciência e a Tecnologia) (Portuguese Foundation for Science and

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

[1] ITU-R. IMT Vision—Framework and Overall Objectives of the Future Development of IMT for 2020 and beyond. ITU-R; 2015. ITU-R M.2083-0

[2] TS G. NR; NR and NG-RAN Overall Description; Stage 2 (Release 15); 2018. 38.300 V15.3.1. Available from: http:// www.3gpp.org/DynaReport/38-series. htm

[3] Andrews JG, Buzzi S, Choi W, Hanly SV, Lozano A, Soong ACK, et al. What will 5G be? IEEE Journal on Selected Areas in Communications. 2014;**32**(6):1065-1082

[4] Luo FL, Zhang C. Signal Processing for 5G: Algorithms and Implementations. United Kingdom: John Wiley & Sons Ltd; 2016

[5] Jue G. Exploring 5G Coexistence Scenarios Using a Flexible Hardware/ Software Testbed—Application Note; 2017

[6] Zhao Q, Sadler BM. A survey of dynamic Spectrum access. IEEE Signal Processing Magazine. 2007;**24**(3):79-89

[7] Akyildiz IF, Nie S, Lin SC, Chandrasekaran M. 5G roadmap: 10 key enabling technologies. Computer Networks. 2016;**106**:17-48

[8] Crockett LH, Elliot RA, Enderwitz MA, Stewart RW. The Zynq Book: Embedded Processing with the ARM Cortex-A9 on the Xilinx Zynq-7000 all Programmable SoC. Glasgow, United Kingdom: Strathclyde Academic Media; 2014

[9] Delahaye JP, Palicot J, Moy C, Leray P. Partial reconfiguration of FPGAs for dynamical reconfiguration of a software radio platform. In: 16th IST Mobile and Wireless Communications Summit. Budapest: IEEE; 2007. pp. 1-5. DOI: 10.1109/ISTMWC.2007.4299250

[10] Delorme J, Martin J, Nafkha A, Moy C, Clermidy F, Leray P, et al. A FPGA partial reconfiguration design approach for cognitive radio based on NoC architecture. In: 2008 Joint 6th International IEEE Northeast Workshop on Circuits and Systems and TAISA Conference. Montreal, QC: IEEE; 2008. pp. 355-358. DOI: 10.1109/NEWCAS. 2008.4606394

[11] He K, Crockett L, Stewart R. Dynamic reconfiguration technologies based on FPGA in software defined radio system. Journal of Signal Processing Systems. 2011;**69**(1):75-85

[12] Vipin K, Fahmy SA. Mapping adaptive hardware systems with partial reconfiguration using CoPR for Zynq. In: 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS). Montreal, QC: IEEE; 2015. pp. 1-8. DOI: 10.1109/AHS.2015.7231169

[13] Rihani MAF, Mroue M, Prévotet JC, Nouvel F, Mohanna Y. ARM-FPGAbased platform for reconfigurable wireless communication systems using partial reconfiguration. EURASIP Journal on Embedded Systems. 2017;**2017**(1):35

[14] Shreejith S, Banarjee B, Vipin K, Fahmy SA. Dynamic cognitive radios on the Xilinx Zynq Hybrid FPGA. In: Weichold M, Hamdi M, Shakir M, Abdallah M, Karagiannidis G, Ismail M, editors. Cognitive Radio Oriented Wireless Networks. CrownCom 2015. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering. Vol. 156. Cham: Springer; 2015. DOI: 10.1007/978-3-319-24540-9\_35

[15] Pham TH, Fahmy SA, McLoughlin IV. An end-to-end multi-standard OFDM transceiver architecture using FPGA partial reconfiguration. IEEE Access. 2017;**5**: 21002-21015

[16] Ferreira ML, Barahimi A, Ferreira JC. Reconfigurable FPGA-based FFT processor for cognitive radio applications. In: Proceedings of the Applied Reconfigurable Computing: 12th International Symposium, ARC 2016; March 22–24 March 2016; Mangaratiba, RJ, Brazil: Springer International Publishing; 2016. pp. 223-232

[17] He S, Torkelson M. A new approach to pipeline FFT processor. In: Proceedings of International Conference on Parallel Processing. Honolulu, HI, USA: IEEE; 1996. pp. 766-770. DOI: 10.1109/IPPS.1996.508145

[18] Löfgren J, Nilsson P. On hardware implementation of radix 3 and radix 5 FFT kernels for LTE systems. In: 2011, NORCHIP. Lund: IEEE; 2011. pp. 1-4. DOI: 10.1109/NORCHP.2011. 6126703

[19] Doré JB, Gerzaguet R, Cassiau N, Ktenas D. Waveform contenders for 5G: Description, analysis and comparison. Physical Communication. Elsevier; 2017;**24**:46-61. DOI: 10.1016/j. phycom.2017.05.004. ISSN: 1874-4907

[20] Bellanger M, Ruyet DL, Roviras D, Terr'e M, Nossek J, Baltar L, et al. FBMC physical layer: A primer. PHYDYAS Project; 2010

[21] Carvalho M. FPGA implementation of a baseband processor for FBMC transmission [MSc thesis]. Faculty of Engineering of the University of Porto; 2017

[22] Bellanger M. FS-FBMC: An alternative scheme for filter bank based multicarrier transmission. In: 2012 5th International Symposium on Communications, Control and Signal Processing. Rome: IEEE; 2012. pp. 1-4

[23] Wang X, Wild T, Schaich F, dos Santos AF. Universal filtered multicarrier with leakage-based filter

optimization. In: European Wireless 2014; 20th European Wireless Conference. Barcelona, Spain: VDE; 2014. pp. 1-5

latency towards 5G: RAN, core network

*DOI: http://dx.doi.org/10.5772/intechopen.91297*

applications. ACM Computing Surveys.

[38] ITU-R. Minimum Requirements Related to Technical Performance for IMT-2020 Radio Interface(s). ITU-R; 2017. M.2410–0. Available from: https:// www.itu.int/pub/R-REP-M.2410-2017

2018;**51**(4):1-39

*Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications*

Communications Surveys Tutorials.

[30] Lopes Ferreira M, Canas FJ. An FPGA-oriented baseband modulator architecture for 4G/5G communication scenarios. Electronics. 2019;**8**(1):1-19

[31] Bhushan N, Ji T, Koymen O, Smee J, Soriaga J, Subramanian S, et al. Industry perspective—5G air Interface system design principles. IEEE Wireless Communications. 2017;**24**(5):6-8

[32] Yuan G, Zhang X, Wang W, Yang Y. Carrier aggregation for LTE-advanced mobile communication systems. IEEE Communications Magazine. 2010;**48**(2):

[33] Kaltenberger F, Knopp R, Vitiello C, Danneberg M, Festag A. Experimental analysis of 5G candidate waveforms and their coexistence with 4G systems. In: XAPP888 - MMCM and PLL Dynamic Reconfiguration. 2015. Available from: http://www.eurecom.fr/fr/publication/ 4725/download/cm-publi-4725.pdf. Unpublished material provided by

[34] UG909 - Vivado Design Suite User Guide: Partial Reconfiguration; 2015

[35] Claus C, Ahmed R, Altenried F, Stechele W. Towards rapid dynamic partial reconfiguration in video-based driver assistance systems. In: Sirisuk P, Morgan F, El-Ghazawi T, Amano H, editors. Applied Reconfigurable Computing: Architectures, Tools and

Applications. Springer: Berlin Heidelberg; 2010. pp. 55-67

V1.7. Xilinx Inc.; April 2017

architectures, methods, and

**55**

[36] Tatsukawa J. XAPP888 - MMCM and PLL Dynamic Reconfiguration;

[37] Vipin K, Fahmy SA. FPGA dynamic and partial reconfiguration: A survey of

and caching solutions. IEEE

2018;**20**(4):3098-3130

88-93

EURECOM

[24] Jafri AR, Majid J, Zhang L, Imran MA, Najam-ul-Islam M. FPGA implementation of UFMC based baseband transmitter: Case study for LTE 10MHz channelization. Wireless Communications and Mobile Computing. Hindawi; 2018;**2018**:1-12. Article ID: 2139794. DOI: 10.1155/2018/ 2139794

[25] Nadal J, Nour CA, Baghdadi A. Flexible hardware platform for demonstrating new 5G waveform candidates. In: 2017 29th International Conference on Microelectronics (ICM). Beirut: IEEE; 2017. pp. 1-4. DOI: 10.1109/ICM.2017.8268851

[26] Vakilian V, Wild T, Schaich F, ten Brink S, Frigon J. Universal-filtered multi-carrier technique for wireless systems beyond LTE. In: 2013 IEEE Globecom Workshops (GC Wkshps). Atlanta, GA: IEEE; 2013. pp. 223-228. DOI: 10.1109/GLOCOMW.2013. 6824990

[27] Knopp R, Kaltenberger F, Vitiello C, Luise M. Universal filtered multicarrier for machine type communications in 5G. In: Proceedings of EUCNC 2016, European Conference on Networks and Communications; 2016. Available from: http://www.eurecom.fr/publication/ 4910. Unpublished material provided by EURECOM

[28] Schaich F, Wild T, Chen Y. Waveform contenders for 5G— Suitability for short packet and low latency transmissions. In: 2014 IEEE 79th Vehicular Technology Conference (VTC Spring). Seoul: IEEE; 2014. pp. 1-5. DOI: 10.1109/VTCSpring.2014. 7023145

[29] Parvez I, Rahmati A, Guvenc I, Sarwat AI, Dai H. A survey on low

*Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications DOI: http://dx.doi.org/10.5772/intechopen.91297*

latency towards 5G: RAN, core network and caching solutions. IEEE Communications Surveys Tutorials. 2018;**20**(4):3098-3130

[16] Ferreira ML, Barahimi A,

pp. 223-232

6126703

Project; 2010

2017

**54**

Ferreira JC. Reconfigurable FPGA-based FFT processor for cognitive radio applications. In: Proceedings of the Applied Reconfigurable Computing: 12th International Symposium, ARC 2016; March 22–24 March 2016; Mangaratiba, RJ, Brazil: Springer International Publishing; 2016.

*Field Programmable Gate Arrays (FPGAs) II*

optimization. In: European Wireless 2014; 20th European Wireless Conference. Barcelona, Spain: VDE;

[24] Jafri AR, Majid J, Zhang L, Imran MA, Najam-ul-Islam M. FPGA implementation of UFMC based baseband transmitter: Case study for LTE 10MHz channelization. Wireless

Communications and Mobile

Computing. Hindawi; 2018;**2018**:1-12. Article ID: 2139794. DOI: 10.1155/2018/

[25] Nadal J, Nour CA, Baghdadi A. Flexible hardware platform for demonstrating new 5G waveform candidates. In: 2017 29th International Conference on Microelectronics (ICM). Beirut: IEEE; 2017. pp. 1-4. DOI: 10.1109/ICM.2017.8268851

[26] Vakilian V, Wild T, Schaich F, ten Brink S, Frigon J. Universal-filtered multi-carrier technique for wireless systems beyond LTE. In: 2013 IEEE Globecom Workshops (GC Wkshps). Atlanta, GA: IEEE; 2013. pp. 223-228. DOI: 10.1109/GLOCOMW.2013.

[27] Knopp R, Kaltenberger F, Vitiello C, Luise M. Universal filtered multicarrier for machine type communications in 5G. In: Proceedings of EUCNC 2016, European Conference on Networks and Communications; 2016. Available from: http://www.eurecom.fr/publication/ 4910. Unpublished material provided by

[28] Schaich F, Wild T, Chen Y. Waveform contenders for 5G— Suitability for short packet and low latency transmissions. In: 2014 IEEE 79th Vehicular Technology Conference (VTC Spring). Seoul: IEEE; 2014. pp. 1-5. DOI: 10.1109/VTCSpring.2014.

[29] Parvez I, Rahmati A, Guvenc I, Sarwat AI, Dai H. A survey on low

2014. pp. 1-5

2139794

6824990

EURECOM

7023145

[17] He S, Torkelson M. A new approach

Proceedings of International Conference on Parallel Processing. Honolulu, HI, USA: IEEE; 1996. pp. 766-770. DOI:

[18] Löfgren J, Nilsson P. On hardware implementation of radix 3 and radix 5 FFT kernels for LTE systems. In: 2011,

pp. 1-4. DOI: 10.1109/NORCHP.2011.

[19] Doré JB, Gerzaguet R, Cassiau N, Ktenas D. Waveform contenders for 5G: Description, analysis and comparison. Physical Communication. Elsevier; 2017;**24**:46-61. DOI: 10.1016/j.

phycom.2017.05.004. ISSN: 1874-4907

[20] Bellanger M, Ruyet DL, Roviras D, Terr'e M, Nossek J, Baltar L, et al. FBMC physical layer: A primer. PHYDYAS

[21] Carvalho M. FPGA implementation of a baseband processor for FBMC transmission [MSc thesis]. Faculty of Engineering of the University of Porto;

alternative scheme for filter bank based multicarrier transmission. In: 2012 5th

Communications, Control and Signal Processing. Rome: IEEE; 2012. pp. 1-4

[23] Wang X, Wild T, Schaich F, dos Santos AF. Universal filtered multicarrier with leakage-based filter

[22] Bellanger M. FS-FBMC: An

International Symposium on

to pipeline FFT processor. In:

10.1109/IPPS.1996.508145

NORCHIP. Lund: IEEE; 2011.

[30] Lopes Ferreira M, Canas FJ. An FPGA-oriented baseband modulator architecture for 4G/5G communication scenarios. Electronics. 2019;**8**(1):1-19

[31] Bhushan N, Ji T, Koymen O, Smee J, Soriaga J, Subramanian S, et al. Industry perspective—5G air Interface system design principles. IEEE Wireless Communications. 2017;**24**(5):6-8

[32] Yuan G, Zhang X, Wang W, Yang Y. Carrier aggregation for LTE-advanced mobile communication systems. IEEE Communications Magazine. 2010;**48**(2): 88-93

[33] Kaltenberger F, Knopp R, Vitiello C, Danneberg M, Festag A. Experimental analysis of 5G candidate waveforms and their coexistence with 4G systems. In: XAPP888 - MMCM and PLL Dynamic Reconfiguration. 2015. Available from: http://www.eurecom.fr/fr/publication/ 4725/download/cm-publi-4725.pdf. Unpublished material provided by EURECOM

[34] UG909 - Vivado Design Suite User Guide: Partial Reconfiguration; 2015

[35] Claus C, Ahmed R, Altenried F, Stechele W. Towards rapid dynamic partial reconfiguration in video-based driver assistance systems. In: Sirisuk P, Morgan F, El-Ghazawi T, Amano H, editors. Applied Reconfigurable Computing: Architectures, Tools and Applications. Springer: Berlin Heidelberg; 2010. pp. 55-67

[36] Tatsukawa J. XAPP888 - MMCM and PLL Dynamic Reconfiguration; V1.7. Xilinx Inc.; April 2017

[37] Vipin K, Fahmy SA. FPGA dynamic and partial reconfiguration: A survey of architectures, methods, and

applications. ACM Computing Surveys. 2018;**51**(4):1-39

[38] ITU-R. Minimum Requirements Related to Technical Performance for IMT-2020 Radio Interface(s). ITU-R; 2017. M.2410–0. Available from: https:// www.itu.int/pub/R-REP-M.2410-2017

**Chapter 4**

**Abstract**

**1. Introduction**

**57**

Systems

An Efficient FPGA-Based

62.13 to 153.58 dB without employing any correction.

can be scaled from 1.25 MHz up to 20 MHz [1–4].

Frequency Shifter for LTE/LTE-A

*Felipe A.P. de Figueiredo and Fabbryccio A.C.M. Cardoso*

The Physical Random Access Channel plays an important role in LTE and LTE-A systems. Through this channel, the user equipment aligns its uplink transmissions to the eNodeB's uplink and gains access to the network. One of the initial operations executed by the receiver at eNodeB side is the translation of the channel's signal back to base-band. This operation is a necessary step for preamble detection and can be executed through a time-domain frequency-shift operation. Therefore, in this paper, we present the hardware architecture and design details of an optimised and configurable FPGA-based time-domain frequency shifter. The proposed architecture is based on a customised Numerically Controlled Oscillator that is employed for creating complex exponential samples using only plain logical resources. The main advantage of the proposed architecture is that it completely removes the necessity of saving in memory a huge number of long complex exponentials by making use of a Look-Up Table and exploiting the quarter-wave symmetry of the basis waveform. The results demonstrate that the proposed architecture provides high Spurious Free Dynamic Range signals employing only a minimal number of FPGA resources. Additionally, the proposed architecture presents spur-suppression ranging from

**Keywords:** LTE, LTE-A, 4G, PRACH, NCO, time-domain frequency shift, FPGA

Long Term Evolution (LTE) technology is the next big step forward in cellular services. It is a 3GPP-defined standard that is able to provide uplink speeds of up to 50 megabits per second (Mbps) and downlink speeds of up to 100 Mbps. This new technology delivers several technical benefits to cellular networks. Its bandwidth

In order to make LTE a true fourth generation (4G) technology, it was enhanced to meet the IMT Advanced requirements issued by the International Telecommunication Union (ITU). The necessary improvements are specified in 3GPP Release 10 and also known as LTE Advanced (LTE-A). The LTE-A technology increases the peak data rates to 1 Gbit/s in the downlink and to 500 Mbit/s in the uplink. LTE-A has several new features such as MIMO extensions (up to 4 4 for UL and up to

#### **Chapter 4**

## An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems

*Felipe A.P. de Figueiredo and Fabbryccio A.C.M. Cardoso*

#### **Abstract**

The Physical Random Access Channel plays an important role in LTE and LTE-A systems. Through this channel, the user equipment aligns its uplink transmissions to the eNodeB's uplink and gains access to the network. One of the initial operations executed by the receiver at eNodeB side is the translation of the channel's signal back to base-band. This operation is a necessary step for preamble detection and can be executed through a time-domain frequency-shift operation. Therefore, in this paper, we present the hardware architecture and design details of an optimised and configurable FPGA-based time-domain frequency shifter. The proposed architecture is based on a customised Numerically Controlled Oscillator that is employed for creating complex exponential samples using only plain logical resources. The main advantage of the proposed architecture is that it completely removes the necessity of saving in memory a huge number of long complex exponentials by making use of a Look-Up Table and exploiting the quarter-wave symmetry of the basis waveform. The results demonstrate that the proposed architecture provides high Spurious Free Dynamic Range signals employing only a minimal number of FPGA resources. Additionally, the proposed architecture presents spur-suppression ranging from 62.13 to 153.58 dB without employing any correction.

**Keywords:** LTE, LTE-A, 4G, PRACH, NCO, time-domain frequency shift, FPGA

#### **1. Introduction**

Long Term Evolution (LTE) technology is the next big step forward in cellular services. It is a 3GPP-defined standard that is able to provide uplink speeds of up to 50 megabits per second (Mbps) and downlink speeds of up to 100 Mbps. This new technology delivers several technical benefits to cellular networks. Its bandwidth can be scaled from 1.25 MHz up to 20 MHz [1–4].

In order to make LTE a true fourth generation (4G) technology, it was enhanced to meet the IMT Advanced requirements issued by the International Telecommunication Union (ITU). The necessary improvements are specified in 3GPP Release 10 and also known as LTE Advanced (LTE-A). The LTE-A technology increases the peak data rates to 1 Gbit/s in the downlink and to 500 Mbit/s in the uplink. LTE-A has several new features such as MIMO extensions (up to 4 4 for UL and up to

8 8 for DL), carrier aggregation, improvement of the performance at cell edge by supporting enhanced intercell interference coordination (eICIC) and relay nodes (RN) and uplink access enhancements such as simultaneous data and control information (physical uplink shared channel (PUSCH) and physical uplink control channel (PUCCH)) transmissions and clustered single-carrier frequency-division multiple access (SC-FDMA) [3].

**2. Physical random access channel**

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

example of a possible PRACH's frequency-domain location.

*zu*ð Þ¼ *<sup>n</sup>* exp �*jπun n*ð Þ <sup>þ</sup> <sup>1</sup>

*Example of physical random access channel (PRACH) format 0.*

root ZC sequence.

**Figure 1.**

**59**

The PRACH is the physical channel that initiates the communication exchange with the eNodeB. Based on the sequences sent through this channel, the eNodeB is able to compute the time it takes for the signal to travel from the user equipment (UE) to it, identifying and correcting this time delay before establishing a data packet connection. In order to establish a connection with the eNodeB, the UE starts the random access procedure by transmitting the random access sequence (also known as preamble) through the PRACH. The PRACH preamble is made up of a cyclic prefix and a preamble part as presented in Table 5.7.1-1 of [7]. This preamble is orthogonal to other uplink user data to allow the eNodeB do differentiate each UE. The subcarrier spacing for the PRACH is 1.25 KHz for formats 0 to 3 and 7.5 KHz for format 4. See example in **Figure 1**. Formats 0 to 3 are used for frame structure type 1, i.e. frequency division duplexing (FDD), and Format 4 is used for the frame structure type 2, i.e. time division duplexing (TDD) only [3]. As will be discussed later, the PRACH can be positioned at different frequency locations, i.e. RBs, depending on a parameter configured by higher layers. **Figure 1** shows an

Prime-length Zadoff-Chu (ZC) sequences are employed as random access preambles in LTE and LTE-A systems due to their constant amplitude zero autocorrelation waveform (CAZAC) properties, i.e. all samples of a ZC sequence are located on the unit circle (unitary magnitude), and their autocorrelation values are equal to zero for all time-lags different from zero [9, 10]. These properties turn ZC sequences into very useful preambles for channel estimation, time synchronisation and improved performance of the detection of PRACH preambles [4]. ZC sequences transmitted through the PRACH channel present the form defined by Eq. (1) [7]:

where *u* is a positive integer known as ZC sequence index, *n* is the time index and *NZC* is the length of the ZC sequence, which for FDD systems is equal to 839 [7]. Random access preambles with zero correlation zones are defined from the *u*th

This sequence length, *NZC*, corresponds to approximately 69.92 physical uplink shared channel (PUSCH) subcarriers in each SC-FDMA symbol and offers a band

*<sup>N</sup>*ZC , 0<sup>≤</sup> *<sup>n</sup>*<sup>≤</sup> *<sup>N</sup>*ZC � 1, (1)

In LTE and LTE-A, uplink physical random access channel (PRACH) is used for initial access requests from the user equipment (UE) to the evolved base station (eNodeB) and to obtain time synchronisation [3, 4]. In case of a need to access the network, a UE requests access by transmitting a random access (RA) preamble through PRACH [5]. The RA preamble is then detected by the PRACH receiver at eNodeB side, which estimates both the ID of the transmitted preamble and the propagation delay between UE and eNodeB. Then, the UE is time-synchronised according to a time alignment (TA) value (derived from the propagation delay estimate) transmitted from the eNodeB before the uplink transmission [6].

PRACH transmission opportunity is set by higher layers [7] and determines the frequency-domain location of the random access preamble within the physical resource blocks (RB). In this way, at eNodeB side, a fundamental operation before any attempt to detect random access preambles takes place is the extraction of relevant preamble signals through a time-domain frequency shift operation. This operation translates the PRACH signal from the frequency-domain location set by higher layers back to baseband so that preamble detection can be totally carried out in baseband [4].

This paper is an extension of a previous conference paper [8]. Differently from [8], where we provided only some very superficial aspects of the proposed algorithm and architecture, the current paper presents a meticulous analysis on its design and implementation aspects. Therefore, the main contributions of the current paper are (i) the design of a low computational complexity time-domain frequency shifter algorithm and hardware architecture to be employed in the PRACH receiver at eNodeB side; (ii) a thorough analysis of design and implementation details; (iii) discussion of the computational complexity of the proposed architecture in terms of FPGA resource utilisation and speed; and (iv) careful analysis of the implementation results considering spur suppression, i.e. spurious-free dynamic range (SFDR), signal-to-noise ratio (SNR), probabilities of correct and error detection and average error between time-domain frequency shift operations carried out by a floating-point model, referred here as Golden Model (GM), and by the fixed-point FPGA implementation of the proposed architecture.

This paper contributes with a method and architecture optimised and tested for reduced complexity on a Xilinx Virtex-6 LX240T FPGA device. Results show that the architecture presents spur suppression better than 62 dB and when it is employed in the PRACH receiver, the probability of correct detection achieved by the receiver is greater than 99% at a SNR of 21 dB.

The remainder of the paper is organised as follows. In Section 2 we offer some background on the physical random access channel and its features. Section 3 presents an efficient algorithm for a time-domain frequency shifter. Section 4 gives important practical considerations on the implementation of the proposed algorithm as well as detailed description of the units composing the main architecture. Test methodology, simulation and implementation results are then presented in Section 5. Finally, Section 6 provides some concluding remarks.

#### **2. Physical random access channel**

8 8 for DL), carrier aggregation, improvement of the performance at cell edge by supporting enhanced intercell interference coordination (eICIC) and relay nodes (RN) and uplink access enhancements such as simultaneous data and control information (physical uplink shared channel (PUSCH) and physical uplink control channel (PUCCH)) transmissions and clustered single-carrier frequency-division

In LTE and LTE-A, uplink physical random access channel (PRACH) is used for initial access requests from the user equipment (UE) to the evolved base station (eNodeB) and to obtain time synchronisation [3, 4]. In case of a need to access the network, a UE requests access by transmitting a random access (RA) preamble through PRACH [5]. The RA preamble is then detected by the PRACH

PRACH transmission opportunity is set by higher layers [7] and determines the

frequency-domain location of the random access preamble within the physical resource blocks (RB). In this way, at eNodeB side, a fundamental operation before any attempt to detect random access preambles takes place is the extraction of relevant preamble signals through a time-domain frequency shift operation. This operation translates the PRACH signal from the frequency-domain location set by higher layers back to baseband so that preamble detection can be totally carried out

This paper is an extension of a previous conference paper [8]. Differently from [8], where we provided only some very superficial aspects of the proposed algorithm and architecture, the current paper presents a meticulous analysis on its design and implementation aspects. Therefore, the main contributions of the current paper are (i) the design of a low computational complexity time-domain frequency shifter algorithm and hardware architecture to be employed in the PRACH receiver at eNodeB side; (ii) a thorough analysis of design and implementation details; (iii) discussion of the computational complexity of the proposed architecture in terms of FPGA resource utilisation and speed; and (iv) careful analysis of the implementation results considering spur suppression, i.e. spurious-free dynamic range (SFDR), signal-to-noise ratio (SNR), probabilities of correct and error detection and average error between time-domain frequency shift operations carried out by a floating-point model, referred here as Golden Model (GM), and by the fixed-point FPGA implementation of the proposed

This paper contributes with a method and architecture optimised and tested for reduced complexity on a Xilinx Virtex-6 LX240T FPGA device. Results show that the architecture presents spur suppression better than 62 dB and when it is employed in the PRACH receiver, the probability of correct detection achieved by

The remainder of the paper is organised as follows. In Section 2 we offer some background on the physical random access channel and its features. Section 3 presents an efficient algorithm for a time-domain frequency shifter. Section 4 gives important practical considerations on the implementation of the proposed algorithm as well as detailed description of the units composing the main architecture. Test methodology, simulation and implementation results are then presented in Section 5. Finally, Section 6 provides some concluding remarks.

the receiver is greater than 99% at a SNR of 21 dB.

receiver at eNodeB side, which estimates both the ID of the transmitted preamble and the propagation delay between UE and eNodeB. Then, the UE is time-synchronised according to a time alignment (TA) value (derived from the propagation delay estimate) transmitted from the eNodeB before the uplink

multiple access (SC-FDMA) [3].

*Field Programmable Gate Arrays (FPGAs) II*

transmission [6].

in baseband [4].

architecture.

**58**

The PRACH is the physical channel that initiates the communication exchange with the eNodeB. Based on the sequences sent through this channel, the eNodeB is able to compute the time it takes for the signal to travel from the user equipment (UE) to it, identifying and correcting this time delay before establishing a data packet connection. In order to establish a connection with the eNodeB, the UE starts the random access procedure by transmitting the random access sequence (also known as preamble) through the PRACH. The PRACH preamble is made up of a cyclic prefix and a preamble part as presented in Table 5.7.1-1 of [7]. This preamble is orthogonal to other uplink user data to allow the eNodeB do differentiate each UE. The subcarrier spacing for the PRACH is 1.25 KHz for formats 0 to 3 and 7.5 KHz for format 4. See example in **Figure 1**. Formats 0 to 3 are used for frame structure type 1, i.e. frequency division duplexing (FDD), and Format 4 is used for the frame structure type 2, i.e. time division duplexing (TDD) only [3]. As will be discussed later, the PRACH can be positioned at different frequency locations, i.e. RBs, depending on a parameter configured by higher layers. **Figure 1** shows an example of a possible PRACH's frequency-domain location.

Prime-length Zadoff-Chu (ZC) sequences are employed as random access preambles in LTE and LTE-A systems due to their constant amplitude zero autocorrelation waveform (CAZAC) properties, i.e. all samples of a ZC sequence are located on the unit circle (unitary magnitude), and their autocorrelation values are equal to zero for all time-lags different from zero [9, 10]. These properties turn ZC sequences into very useful preambles for channel estimation, time synchronisation and improved performance of the detection of PRACH preambles [4]. ZC sequences transmitted through the PRACH channel present the form defined by Eq. (1) [7]:

$$z\_{\mathfrak{u}}(n) = \exp\left(\frac{-j\pi\omega n(n+1)}{N\mathbb{Z}\mathbb{C}}\right), 0 \le n \le N\_{\mathbb{Z}\mathbb{C}} - 1,\tag{1}$$

where *u* is a positive integer known as ZC sequence index, *n* is the time index and *NZC* is the length of the ZC sequence, which for FDD systems is equal to 839 [7]. Random access preambles with zero correlation zones are defined from the *u*th root ZC sequence.

This sequence length, *NZC*, corresponds to approximately 69.92 physical uplink shared channel (PUSCH) subcarriers in each SC-FDMA symbol and offers a band

**Figure 1.** *Example of physical random access channel (PRACH) format 0.*

protection of 72 � 69*.*92 = 2*.*08 PUSCH subcarriers, which corresponds to approximately one PUSCH subcarrier protection on each side of the preamble [7].

PUSCH subcarriers are spaced 15 KHz apart from each other.

The PRACH occupies a bandwidth of 1*.*08 MHz that is equivalent to six resource blocks (RB). Differently from other uplink channels, PRACH uses a subcarrier spacing of 1250 Hz for preamble formats 0 to 3 [7]. The ZC sequence is specifically positioned at the centre of the 1*.*08 MHz bandwidth, i.e. at the centre of the block of 864 available PRACH subcarriers, so that there is a guard band of 15*.*625 KHz on each side of the preamble, which corresponds to 12*.*5 null PRACH subcarriers. These guard bands are added to PRACH preamble edges in order to minimise interference from PUSCH. **Figure 1** depicts the PRACH preamble mapping according to what was just exposed.

The PRACH sequence, which for formats 0 and 1 is 800 us long, is created by cyclically shifting a root ZC sequence of prime-length *NZC*, defined as in Eq. (1). Random access preambles with zero correlation zones of length *NCS* � 1 are generated by applying cyclic shifts to the *u*th root ZC sequence, according to Eq. (2):

$$
\propto\_{u,v} (n) = \propto\_u ((n + \mathbf{C}v) \bmod N\_{\mathbf{Z}\mathbf{C}}),\tag{2}
$$

precise a 24576-point DFT. On the other hand, the hybrid time-/frequency-domain approach uses FFT/IFFT blocks of the same size, i.e. 2048-point FFT when the decimation factor adopted is 12. Thus, the hybrid time/frequency domain substantially reduces the complexity of the hardware implementation. Therefore, in order to reduce the implementation complexity of the PRACH receiver, we adopt the hybrid time-/frequency-domain approach, which results in more practical implementations [4]. **Figure 2** depicts the PRACH receiver architecture

The received signal, i.e. possible random access preamble signal, is first preprocessed in time domain and then transformed into the frequency domain by

sequence, and then the resulting sequence is searched for peaks above a predefined

**Figure 2** depicts the main components of the PRACH receiver at eNodeB side (for

The PRACH preamble, illustrated in **Figure 3**, consists of three parts: a cyclic prefix (CP) with length *TCP*, which is added to the preamble in order to effectively eliminate intersymbol interference (ISI) and a signature or sequence part of length

an FFT block, multiplied by a Fourier transformed root Zadoff-Chu (ZC)

threshold which is calculated to produce a given probability of false alarm.

further details refer to [12]). The first block in **Figure 2** is the cyclic prefix remover, which discards all samples from the CP part of the preamble. Next, the PRACH pass-band signal is shifted to baseband by multiplying it with a complex exponential. In the sequence, the baseband signal is fed into a decimator block, which decimates the signal by a factor of 12; now instead of 24,576 samples in the case of format 0, we have only 2048 samples. The FFT block is responsible for transforming the SC-FDMA symbols from time domain into frequency domain. Next, the subcarrier demapping module extracts the RACH preamble sequence from the correct FFT bins. Then, the output of subcarrier demapping module is multiplied by the locally stored root ZC preamble, and then, the result of the multiplication is fed into the zero-padding module. Finally, the IFFT block is used to produce the cross-correlation between the root ZC sequence and the received preamble signal. All samples coming out of the IFFT block have their square modulus calculated producing what is known as power delay profile (PDP) samples. Finally, the preamble detection block employs the PDP samples to estimate the noise power, set a detection threshold and then decide whether a preamble is present or not. As an output of the detection process, this block reports to the MAC layer all detected preambles and its respective time advance (TA) estimates. For further information on this receiver architecture and detection algorithm,

implemented and being used in our L1 solution.

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

refer to [4, 12].

**Figure 3.**

**61**

*Random access preamble format 0.*

**2.2 Preamble format**

where *v* is the sequence index and *Cv* is the cyclic shift applied to the root ZC sequence and calculated as *Cv* = *vNCS* for unrestricted sets [7]. The parameter *NCS* gives the fixed length of the cyclic shift. All the possible values for these parameters are defined in [7].

#### **2.1 PRACH receiver**

In the literature there are two approaches for PRACH receivers, the full frequency domain and the hybrid time/frequency domain [4, 11]. Although the full frequency-domain approach provides the optimal detection performance, this approach uses considerably large size discrete Fourier transform (DFT), to be more

**Figure 2.** *Architecture of a hybrid time-/frequency-domain PRACH receiver.*

#### *An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems DOI: http://dx.doi.org/10.5772/intechopen.91339*

precise a 24576-point DFT. On the other hand, the hybrid time-/frequency-domain approach uses FFT/IFFT blocks of the same size, i.e. 2048-point FFT when the decimation factor adopted is 12. Thus, the hybrid time/frequency domain substantially reduces the complexity of the hardware implementation. Therefore, in order to reduce the implementation complexity of the PRACH receiver, we adopt the hybrid time-/frequency-domain approach, which results in more practical implementations [4]. **Figure 2** depicts the PRACH receiver architecture implemented and being used in our L1 solution.

The received signal, i.e. possible random access preamble signal, is first preprocessed in time domain and then transformed into the frequency domain by an FFT block, multiplied by a Fourier transformed root Zadoff-Chu (ZC) sequence, and then the resulting sequence is searched for peaks above a predefined threshold which is calculated to produce a given probability of false alarm. **Figure 2** depicts the main components of the PRACH receiver at eNodeB side (for further details refer to [12]). The first block in **Figure 2** is the cyclic prefix remover, which discards all samples from the CP part of the preamble. Next, the PRACH pass-band signal is shifted to baseband by multiplying it with a complex exponential. In the sequence, the baseband signal is fed into a decimator block, which decimates the signal by a factor of 12; now instead of 24,576 samples in the case of format 0, we have only 2048 samples. The FFT block is responsible for transforming the SC-FDMA symbols from time domain into frequency domain. Next, the subcarrier demapping module extracts the RACH preamble sequence from the correct FFT bins. Then, the output of subcarrier demapping module is multiplied by the locally stored root ZC preamble, and then, the result of the multiplication is fed into the zero-padding module. Finally, the IFFT block is used to produce the cross-correlation between the root ZC sequence and the received preamble signal. All samples coming out of the IFFT block have their square modulus calculated producing what is known as power delay profile (PDP) samples. Finally, the preamble detection block employs the PDP samples to estimate the noise power, set a detection threshold and then decide whether a preamble is present or not. As an output of the detection process, this block reports to the MAC layer all detected preambles and its respective time advance (TA) estimates. For further information on this receiver architecture and detection algorithm, refer to [4, 12].

#### **2.2 Preamble format**

protection of 72 � 69*.*92 = 2*.*08 PUSCH subcarriers, which corresponds to approxi-

The PRACH occupies a bandwidth of 1*.*08 MHz that is equivalent to six resource

mately one PUSCH subcarrier protection on each side of the preamble [7]. PUSCH subcarriers are spaced 15 KHz apart from each other.

was just exposed.

*Field Programmable Gate Arrays (FPGAs) II*

are defined in [7].

**2.1 PRACH receiver**

Eq. (2):

**Figure 2.**

**60**

*Architecture of a hybrid time-/frequency-domain PRACH receiver.*

blocks (RB). Differently from other uplink channels, PRACH uses a subcarrier spacing of 1250 Hz for preamble formats 0 to 3 [7]. The ZC sequence is specifically positioned at the centre of the 1*.*08 MHz bandwidth, i.e. at the centre of the block of 864 available PRACH subcarriers, so that there is a guard band of 15*.*625 KHz on each side of the preamble, which corresponds to 12*.*5 null PRACH subcarriers. These guard bands are added to PRACH preamble edges in order to minimise interference from PUSCH. **Figure 1** depicts the PRACH preamble mapping according to what

The PRACH sequence, which for formats 0 and 1 is 800 us long, is created by cyclically shifting a root ZC sequence of prime-length *NZC*, defined as in Eq. (1). Random access preambles with zero correlation zones of length *NCS* � 1 are generated by applying cyclic shifts to the *u*th root ZC sequence, according to

where *v* is the sequence index and *Cv* is the cyclic shift applied to the root ZC sequence and calculated as *Cv* = *vNCS* for unrestricted sets [7]. The parameter *NCS* gives the fixed length of the cyclic shift. All the possible values for these parameters

In the literature there are two approaches for PRACH receivers, the full frequency domain and the hybrid time/frequency domain [4, 11]. Although the full frequency-domain approach provides the optimal detection performance, this approach uses considerably large size discrete Fourier transform (DFT), to be more

*xu*,*<sup>v</sup>*ð Þ¼ *n xu*ð Þ ð Þ *n* þ *Cv* mod *NZC* , (2)

The PRACH preamble, illustrated in **Figure 3**, consists of three parts: a cyclic prefix (CP) with length *TCP*, which is added to the preamble in order to effectively eliminate intersymbol interference (ISI) and a signature or sequence part of length

**Figure 3.** *Random access preamble format 0.*


*s t*ðÞ¼ *βPRACH*

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

in **Figure 3** and Δ*fRA*Δ*t* = 1*/NPRE* (where for formats 0 ad 1, *NPRE* = 24,576).

*N* X*ZC*�<sup>1</sup> *k*¼0

*s t*ðÞ¼ *<sup>β</sup>PRACH:x*,

Eq. (4) can be reorganised in the following way:

� *<sup>x</sup>*,

*u,v* (*n* � *NCP*), i.e.

*x*,

tial term given in Eq. (9).

**63**

mitted and it is given by the following equation

¼ *βPRACH:x*'*u*,*v*ð Þ *n* � *NCP :*

Then rewriting Eq. (5), we have

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

Eq. (4) can be rewritten as

applied to *x*<sup>0</sup>

*s t*ðÞ¼ *βPRACH*

*N* X*ZC*�<sup>1</sup> *k*¼0

where *NCP* is the number of samples corresponding to the CP interval as shown

Therefore as it can be easily seen, the result of the above equation is nothing more than the application of the DFT's time-shift theorem. It is also easy to see that this equation is the IDFT of *Xu,v*(*k*) with length *NPRE*. With that in mind,

*<sup>u</sup>*,*<sup>v</sup>*ð Þ *n* � *NCP*

*s t*ðÞ¼ *<sup>β</sup>PRACH:* exp �*j*2*πNCP*ð Þ *<sup>φ</sup>* <sup>þ</sup> *K k*ð Þ <sup>0</sup> <sup>þ</sup> <sup>1</sup>*=*<sup>2</sup>

*<sup>u</sup>*,*<sup>v</sup>*ð Þ *<sup>n</sup>* � *NCP :* exp *<sup>j</sup>*2*πnm*

*:* exp *<sup>j</sup>*2*π φ*ð Þ <sup>þ</sup> *K k*ð Þ <sup>0</sup> <sup>þ</sup> <sup>1</sup>*=*<sup>2</sup> ð Þ *<sup>n</sup>* � *NCP*

*NPRE* � �

*<sup>u</sup>*,*<sup>v</sup>*ð Þ *<sup>n</sup>* � *NCP :* exp *<sup>j</sup>*2*πn*ð Þ *<sup>φ</sup>* <sup>þ</sup> *K k*ð Þ <sup>0</sup> <sup>þ</sup> <sup>1</sup>*=*<sup>2</sup> *NPRE* � � � � �

The part of Eq. (8) between curly braces represents a circular frequency shift *u*, *v*

*NPRE* � � \$

where *m* is the frequency shift applied to the PRACH signal before it is trans-

Once we are only dealing with FDD, Eq. (10) can be further simplified as

is still shifted in frequency domain by an offset factor given by *m*. For further processing, it is necessary to convert the shifted preamble into baseband. This conversion is performed by the time-domain frequency shift module (see **Figure 2**), which multiplies the received preamble by the conjugate of the complex exponen-

Therefore, at the PRACH receiver side, after removing CP and GP, the preamble

*<sup>m</sup>* <sup>¼</sup> <sup>13</sup> <sup>þ</sup> <sup>144</sup>*nRA*

*DFT*

*PRB* � <sup>72</sup>*NUL*

*m* ¼ *φ* þ *K k*ð Þ <sup>0</sup> þ 1*=*2 *:* (10)

*Xu*,*v*ð Þ*k*

(5)

(6)

(8)

*:* exp *<sup>j</sup>*2*πk*Δ*<sup>f</sup> RA*Δ*t n*ð Þ � *NCP* � �,

*Xu*,*v*ð Þ*<sup>k</sup> :* exp *<sup>j</sup>*2*πk n*ð Þ � *NCP*

*NPRE* � �

*NPRE* � �*:* (7)

*xu*,*<sup>v</sup>*ð Þ *K* � *m* , (9)

*RB :* (11)

#### **Table 1.**

*Random access preamble formats.*

*TPRE* and of a guard period *TGP* which is an unused portion of time at the end of the preamble used for absorbing the propagation delay. The standard defines four different preamble formats for FDD operation [7]. Parameters *TPRE*,*TCP* and *TGP* are set according to the chosen preamble format.

**Figure 3** shows the parameter values for format 0, and the values for all formats are listed in **Table 1** where *Ts* is known as the standard time unit which is used throughout the LTE specification documents. It is defined as *Ts* = 1*/*(15,000 � 2048) seconds, which corresponds to a sampling rate of 30.72 MHz.

#### **2.3 PRACH preamble signal**

The PRACH preamble signal *s*(*t*) can be defined as follows [7]:

$$s(t) = \beta\_{\text{PRACH}} \sum\_{k=0}^{N\_{\text{ZCC}}} \sum\_{n=0}^{N\_{\text{ZCC}}-1} \mathbf{x}\_{u,v}(n) \cdot \exp\left[-\frac{j2\pi nk}{N\_{\text{ZC}}}\right] \tag{3}$$

$$\cdot \exp\left[j2\pi [k + \rho + K(k\_0 + \mathbf{1}/2)]\right] \Delta f\_{RA}(t - T\_{CP})\Big|,$$

where 0 ≤ *t < TPRE* + *TCP*, *βPRACH* is an amplitude scaling factor and *k*<sup>0</sup> ¼ *nRA PRB NRB SC* � *<sup>N</sup>UL RBNRB SC=*2. The location in the frequency domain is controlled by the parameter *nRA PRB* also known as *nRA PRBoffset* (it is the input *frequency\_offset\_i* of the timedomain frequency shifter module) expressed as a resource block number configure by higher layers and fulfilling 0 ≤*nRA PRB* ≤ *NUL RB* � 6; this inequality is only valid for formats 0, 1, 2 and 3, i.e. FDD. The factor *K* = Δ*f/*Δ*fRA* accounts for the ratio of subcarrier spacing between the PUSCH and PRACH, and it is equal to 12 as Δ*f* = 15 KHz and Δ*fRA* = 1250 Hz. The variable *φ* (equal to 7 for LTE FDD) defines a fixed offset determining the frequency-domain location of the random access preamble within the resource blocks. *NUL RB* is the uplink system bandwidth (in RBs), and *NBB SC* is the number of subcarriers per RB, i.e. 12.

By noticing that the inner summation is the DFT of *xu,v*(*n*) of length *NZC*, we can rewrite Eq. (3) in the following way:

$$s(t) = \beta\_{\text{PRACH}} \sum\_{k=0}^{N\_{\text{CC}}-1} X\_{u,v}(k) . \exp\left[j2\pi k \Delta f\_{RA}(t - T\_{CP})\right] \tag{4}$$

$$. \exp\left[j2\pi (\rho + K(k\_0 + 1/2))\Delta f\_{RA}(t - T\_{CP})\right].$$

Again, by noticing that the first part of the summation in Eq. (4) is a time shift applied to the DFT of *xu,v*(*n*), we can rewrite that first part of the equation as follows by replacing *t* by Δ*t*, which is referred in [7] as the standard time unit *Ts*, i.e. the sampling rate:

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems DOI: http://dx.doi.org/10.5772/intechopen.91339*

$$\begin{split} s(t) &= \beta\_{\text{PRACH}} \sum\_{k=0}^{N\_{\text{Z}}-1} X\_{u,v}(k) \\ &\quad \cdot \exp\left[j2\pi k \Delta f\_{RA} \Delta t (n - N\_{\text{C}}p)\right], \end{split} \tag{5}$$

where *NCP* is the number of samples corresponding to the CP interval as shown in **Figure 3** and Δ*fRA*Δ*t* = 1*/NPRE* (where for formats 0 ad 1, *NPRE* = 24,576).

Then rewriting Eq. (5), we have

*TPRE* and of a guard period *TGP* which is an unused portion of time at the end of the preamble used for absorbing the propagation delay. The standard defines four different preamble formats for FDD operation [7]. Parameters *TPRE*,*TCP* and *TGP*

**Preamble format** *TCP TPRE TGP* 3168.*Ts* 24576.*Ts* 2976 21024.*Ts* 24576.*Ts* 15,840 6240.*Ts* 2.24576.*Ts* 6048 21024.*Ts* 2.24576.*Ts* 21,984

**Figure 3** shows the parameter values for format 0, and the values for all formats

*:* exp ½ ½ *j*2*π k* þ *φ* þ *K k*ð Þ <sup>0</sup> þ 1*=*2 Δ*f RA*ð Þ *t* � *TCP*

where 0 ≤ *t < TPRE* + *TCP*, *βPRACH* is an amplitude scaling factor and *k*<sup>0</sup> ¼

domain frequency shifter module) expressed as a resource block number configure

By noticing that the inner summation is the DFT of *xu,v*(*n*) of length *NZC*, we can

*:* exp *j*2*π φ*ð Þ þ *K k*ð Þ <sup>0</sup> þ 1*=*2 Δ*f RA*ð Þ *t* � *TCP* � �*:*

Again, by noticing that the first part of the summation in Eq. (4) is a time shift

applied to the DFT of *xu,v*(*n*), we can rewrite that first part of the equation as follows by replacing *t* by Δ*t*, which is referred in [7] as the standard time unit *Ts*,

*PRB* ≤ *NUL*

formats 0, 1, 2 and 3, i.e. FDD. The factor *K* = Δ*f/*Δ*fRA* accounts for the ratio of subcarrier spacing between the PUSCH and PRACH, and it is equal to 12 as Δ*f* = 15 KHz and Δ*fRA* = 1250 Hz. The variable *φ* (equal to 7 for LTE FDD) defines a fixed offset determining the frequency-domain location of the random access preamble

*xu*,*<sup>v</sup>*ð Þ *<sup>n</sup> :* exp � *<sup>j</sup>*2*πnk*

*SC=*2. The location in the frequency domain is controlled by the

*NZC* � �

*PRBoffset* (it is the input *frequency\_offset\_i* of the time-

*RB* is the uplink system bandwidth (in RBs), and *NBB*

� �

*Xu*,*<sup>v</sup>*ð Þ*k :* exp *j*2*πk*Δ*f RA*ð Þ *t* � *TCP*

� �,

*RB* � 6; this inequality is only valid for

(3)

*SC* is

(4)

are listed in **Table 1** where *Ts* is known as the standard time unit which is used throughout the LTE specification documents. It is defined as *Ts* = 1*/*(15,000 � 2048)

are set according to the chosen preamble format.

*s t*ðÞ¼ *βPRACH*

*PRB* also known as *nRA*

**2.3 PRACH preamble signal**

*Random access preamble formats.*

*Field Programmable Gate Arrays (FPGAs) II*

*SC* � *<sup>N</sup>UL*

parameter *nRA*

*RBNRB*

within the resource blocks. *NUL*

i.e. the sampling rate:

**62**

by higher layers and fulfilling 0 ≤*nRA*

the number of subcarriers per RB, i.e. 12.

*s t*ðÞ¼ *βPRACH*

*N* X*ZC*�<sup>1</sup> *k*¼0

rewrite Eq. (3) in the following way:

*nRA PRB NRB*

**Table 1.**

seconds, which corresponds to a sampling rate of 30.72 MHz.

*N* X*ZC*�<sup>1</sup> *k*¼0

The PRACH preamble signal *s*(*t*) can be defined as follows [7]:

*N* X*ZC*�<sup>1</sup> *n*¼0

$$\begin{split} s(t) &= \beta\_{\text{PRACH}} \sum\_{k=0}^{N\_{\text{Z}C}-1} X\_{u,\nu}(k) . \exp\left[ \frac{j2\pi k (n - N\_{CP})}{N\_{\text{PRE}}} \right] \\ &= \beta\_{\text{PRACH}} . \mathbf{x}^{\prime}\_{\text{u,\nu}}(n - N\_{CP}) . \end{split} \tag{6}$$

Therefore as it can be easily seen, the result of the above equation is nothing more than the application of the DFT's time-shift theorem. It is also easy to see that this equation is the IDFT of *Xu,v*(*k*) with length *NPRE*. With that in mind, Eq. (4) can be rewritten as

$$\begin{split} \omega(t) &= \beta\_{\text{PRACH}} \mathcal{X}\_{u,v} (n - N\_{\text{CP}}) \\ &\cdot \exp\left[\frac{j2\pi (\rho + K(k\_0 + 1/2))(n - N\_{\text{CP}})}{N\_{\text{PRE}}}\right]. \end{split} \tag{7}$$

Eq. (4) can be reorganised in the following way:

$$\begin{split} s(t) &= \rho^{\text{PRACH.exp}} \left[ \frac{-j2\pi N\_{CP}(\rho + K(k\_0 + 1/2))}{N\_{PRE}} \right] \\ &\cdot \left\{ \mathbf{x}\_{u,\nu}^{\prime} (n - N\_{CP}) . \exp \left[ \frac{j2\pi n (\rho + K(k\_0 + 1/2))}{N\_{PRE}} \right] \right\}. \end{split} \tag{8}$$

The part of Eq. (8) between curly braces represents a circular frequency shift *u*, *v* applied to *x*<sup>0</sup> *u,v* (*n* � *NCP*), i.e.

$$
\mathcal{X}\_{\mathfrak{u},\boldsymbol{\nu}}(n - N\_{\rm CP}).\exp\left[\frac{j2\pi mm}{N\_{\rm PRE}}\right] \stackrel{\rm DFT}{\leftrightarrow} \mathcal{X}\_{\mathfrak{u},\boldsymbol{\nu}}(K - m),\tag{9}
$$

where *m* is the frequency shift applied to the PRACH signal before it is transmitted and it is given by the following equation

$$m = \rho + K(k\_0 + \mathbf{1}/2). \tag{10}$$

Once we are only dealing with FDD, Eq. (10) can be further simplified as

$$m = \mathbf{13} + \mathbf{14}4n\_{\rm PRB}^{\rm RA} - \mathbf{72N}\_{\rm RB}^{\rm UL}. \tag{11}$$

Therefore, at the PRACH receiver side, after removing CP and GP, the preamble is still shifted in frequency domain by an offset factor given by *m*. For further processing, it is necessary to convert the shifted preamble into baseband. This conversion is performed by the time-domain frequency shift module (see **Figure 2**), which multiplies the received preamble by the conjugate of the complex exponential term given in Eq. (9).

#### **3. Efficient algorithm of a time-domain frequency shifter**

In this section, we present an efficient algorithm used to apply frequencydomain shifts to random access preamble signals in time domain (i.e. without the need to convert them to the frequency domain) through the use of a customised numerically controlled oscillator (NCO) and a complex multiplier. We also discuss the advantages presented by the proposed algorithm.

#### **3.1 Numerically controlled oscillator**

Numerically controlled oscillators (NCO) are important components in many digital communication systems. They are generally employed in quadrature synthesisers, which are used for constructing digital down- and upconverters and demodulators and here for time-domain frequency shifters. A very common method for creating digital complex or real valued sinusoid signals uses a lookup table (LUT) approach [13]. In this approach, a LUT saves into memory digital samples of a sinusoid signal.

A digital integrator is then employed to compute the correct phase arguments, which are mapped by the LUT to the desired output sinusoid samples. The integrator computes a phase slope that is mapped to a sinusoid (possibly complex) by the LUT. This value is presented to the address port of the LUT that performs the mapping from phase space to time [14].

A LUT usually saves into memory uniformly spaced samples of sine and cosine waveforms. This set of samples comprises a single cycle of a prototype complex sinusoid waveform with length *<sup>N</sup>* <sup>¼</sup> <sup>2</sup>*<sup>B</sup>*Θð Þ *<sup>n</sup>* and consists of specific values of the argument Θ(*n*) of sinusoid waveform, as defined by Eq. (12).

$$
\Theta(n) = n \frac{2\pi}{N},
\tag{12}
$$

process. The width and length of the LUT memory directly impact the resolution of both the signal's amplitude and phase angle. These resolution limits correspond to time base jitter and amplitude quantization of the signal, respectively. Additionally, these resolution limits add a white broadband noise floor and spectral modulation

Quarter-wave symmetry in the basis waveform can be exploited to construct an

The optimised algorithm representing the time-domain frequency shift operation is presented in Algorithm 1. It depicts the data processing executed by each one

At first, during eNodeB's initialisation, the parameters *offset* and *bandwidth* (*bw*) are sent by higher layers to the PHY which in turn feeds them into the time-domain frequency shifter module so that the discrete frequency shift calculator unit is able to calculate the actual frequency shift to be applied to the received PRACH signal. Whenever a subframe in which random access preamble transmissions are allowed happens (it is set according to Table 5.7.1-2 in [7]) and after CP is removed, the customised NCO unit generates a complex exponential signal with frequency set earlier by the discrete frequency shift calculator unit and multiplies it sample by sample with the incoming PRACH complex signal samples; note that it is a complex

The procedure inputs are *re\_ad*, *im\_ad*, *offset*, *bw* and *cos table* where *re\_ad* and *im\_ad* are the already CP removed quadrature samples coming from the analog to digital converter (ADC), *offset* and *bw* are the configuration parameters coming from higher layers and used to calculate the frequency shift necessary to translate the pass-band preamble signal back to baseband and *cos\_table* is the LUT containing the samples of a sinusoid used to generate the complex exponential signal. The angle mapper is the main part of the customised NCO algorithm shown in Algorithm 1

In the light of what was presented in the previous section, we now discuss Algorithm 1. The first part of the algorithm is responsible for calculating the discrete frequency of the complex exponential signal that the NCO must generate in order to

lines to the spectrum of the generated signal sequence [15].

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

**3.2 Iterative time-domain frequency shift algorithm**

multiplication once both are complex signals.

*Blocks composing the time-domain frequency shifter module.*

once it maps *theta* into a value of a 1/4-length cosine table.

of the units in **Figure 4**.

**Figure 4.**

**65**

NCO that uses shortened tables. We will discuss this approach next.

where *n* is the index of the time sample and *B*Θ(*n*) is the number of bits employed in the phase accumulator which is calculated as shown in Eq. (13):

$$B\_{\Theta(n)} = \log\_2 \left[ \frac{f\_{clk}}{\Delta f} \right],\tag{13}$$

where d e� denotes the ceiling operator, *fclk* is the system clock frequency and Δ*f* is the frequency resolution of the NCO. The frequency resolution, Δ*f*, of the NCO is a function of *fclk* and *B*Θ(*<sup>n</sup>*). Then Δ*f* can be determined using the following equation:

$$
\Delta f = \frac{f\_{clk}}{2^{B\_{\Theta(\mathbf{x})}}} = \frac{f\_{clk}}{N}.\tag{14}
$$

The output frequency, *fout*, of the NCO waveform is a function of *fclk*, *B*Θ(*n*) and the phase increment value Δ*θ*. That is, *fout* = *f*(*fclk, B*Θ(*n*)*,* Δ*θ*) which is given in Hertz and is defined in Eq. (15). The phase increment, Δ*θ*, is an unsigned value which defines the NCO output frequency:

$$f\_{out} = \frac{f\_{clk}\Delta\theta}{2^{B\_{\Theta(\mathbf{u})}}} = \frac{f\_{clk}\Delta\theta}{N}.\tag{15}$$

The accuracy of a signal sequence formed by reading samples of a sinusoid signal from a LUT is influenced by both the amplitude and the phase of the quantization

**3. Efficient algorithm of a time-domain frequency shifter**

the advantages presented by the proposed algorithm.

**3.1 Numerically controlled oscillator**

*Field Programmable Gate Arrays (FPGAs) II*

mapping from phase space to time [14].

defines the NCO output frequency:

**64**

argument Θ(*n*) of sinusoid waveform, as defined by Eq. (12).

in the phase accumulator which is calculated as shown in Eq. (13):

samples of a sinusoid signal.

In this section, we present an efficient algorithm used to apply frequencydomain shifts to random access preamble signals in time domain (i.e. without the need to convert them to the frequency domain) through the use of a customised numerically controlled oscillator (NCO) and a complex multiplier. We also discuss

Numerically controlled oscillators (NCO) are important components in many

A digital integrator is then employed to compute the correct phase arguments, which are mapped by the LUT to the desired output sinusoid samples. The integrator computes a phase slope that is mapped to a sinusoid (possibly complex) by the LUT. This value is presented to the address port of the LUT that performs the

A LUT usually saves into memory uniformly spaced samples of sine and cosine waveforms. This set of samples comprises a single cycle of a prototype complex sinusoid waveform with length *<sup>N</sup>* <sup>¼</sup> <sup>2</sup>*<sup>B</sup>*Θð Þ *<sup>n</sup>* and consists of specific values of the

2*π*

*f clk* Δ*f* 

<sup>¼</sup> *<sup>f</sup> clk*

where *n* is the index of the time sample and *B*Θ(*n*) is the number of bits employed

where d e� denotes the ceiling operator, *fclk* is the system clock frequency and Δ*f* is the frequency resolution of the NCO. The frequency resolution, Δ*f*, of the NCO is a function of *fclk* and *B*Θ(*<sup>n</sup>*). Then Δ*f* can be determined using the following equation:

The output frequency, *fout*, of the NCO waveform is a function of *fclk*, *B*Θ(*n*) and the phase increment value Δ*θ*. That is, *fout* = *f*(*fclk, B*Θ(*n*)*,* Δ*θ*) which is given in Hertz and is defined in Eq. (15). The phase increment, Δ*θ*, is an unsigned value which

<sup>2</sup>*<sup>B</sup>*Θð Þ *<sup>n</sup>* <sup>¼</sup> *fclk*Δ*<sup>θ</sup>*

The accuracy of a signal sequence formed by reading samples of a sinusoid signal from a LUT is influenced by both the amplitude and the phase of the quantization

*<sup>N</sup>* , (12)

, (13)

*<sup>N</sup> :* (14)

*<sup>N</sup> :* (15)

Θð Þ¼ *n n*

*B*Θð Þ *<sup>n</sup>* ¼ log <sup>2</sup>

<sup>Δ</sup>*<sup>f</sup>* <sup>¼</sup> *fclk* 2*<sup>B</sup>*Θð Þ *<sup>n</sup>*

*<sup>f</sup> out* <sup>¼</sup> *<sup>f</sup> clk*Δ*<sup>θ</sup>*

digital communication systems. They are generally employed in quadrature synthesisers, which are used for constructing digital down- and upconverters and demodulators and here for time-domain frequency shifters. A very common method for creating digital complex or real valued sinusoid signals uses a lookup table (LUT) approach [13]. In this approach, a LUT saves into memory digital

process. The width and length of the LUT memory directly impact the resolution of both the signal's amplitude and phase angle. These resolution limits correspond to time base jitter and amplitude quantization of the signal, respectively. Additionally, these resolution limits add a white broadband noise floor and spectral modulation lines to the spectrum of the generated signal sequence [15].

Quarter-wave symmetry in the basis waveform can be exploited to construct an NCO that uses shortened tables. We will discuss this approach next.

#### **3.2 Iterative time-domain frequency shift algorithm**

The optimised algorithm representing the time-domain frequency shift operation is presented in Algorithm 1. It depicts the data processing executed by each one of the units in **Figure 4**.

At first, during eNodeB's initialisation, the parameters *offset* and *bandwidth* (*bw*) are sent by higher layers to the PHY which in turn feeds them into the time-domain frequency shifter module so that the discrete frequency shift calculator unit is able to calculate the actual frequency shift to be applied to the received PRACH signal. Whenever a subframe in which random access preamble transmissions are allowed happens (it is set according to Table 5.7.1-2 in [7]) and after CP is removed, the customised NCO unit generates a complex exponential signal with frequency set earlier by the discrete frequency shift calculator unit and multiplies it sample by sample with the incoming PRACH complex signal samples; note that it is a complex multiplication once both are complex signals.

The procedure inputs are *re\_ad*, *im\_ad*, *offset*, *bw* and *cos table* where *re\_ad* and *im\_ad* are the already CP removed quadrature samples coming from the analog to digital converter (ADC), *offset* and *bw* are the configuration parameters coming from higher layers and used to calculate the frequency shift necessary to translate the pass-band preamble signal back to baseband and *cos\_table* is the LUT containing the samples of a sinusoid used to generate the complex exponential signal. The angle mapper is the main part of the customised NCO algorithm shown in Algorithm 1 once it maps *theta* into a value of a 1/4-length cosine table.

In the light of what was presented in the previous section, we now discuss Algorithm 1. The first part of the algorithm is responsible for calculating the discrete frequency of the complex exponential signal that the NCO must generate in order to

**Figure 4.** *Blocks composing the time-domain frequency shifter module.*

shift the received pass-band preamble signal to baseband. The discrete frequency shift, *m*, is calculated as shown in Eq. (11). By remembering that Δ*fRA*Δ*t* = 1*/NPRE*, we can then rewrite the exponential part of Eq. (9) as

$$\exp\left[j2\pi(m\Delta\xi\_{RA})t\right].\tag{16}$$

38: cos\_idx = ((N/4) � 1);

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

49: theta = (theta + delta theta);

<sup>Δ</sup>*<sup>θ</sup>* <sup>¼</sup> *<sup>f</sup> out<sup>N</sup> f clk*

50: **if** theta >= N **then** 51: theta = (theta � N);

58: **return** re\_bb, im\_bb

44: ▷ —- Phase to Value Mapping (LUT) —- 45: re\_nco(i) = cos\_signal ∗ cos\_table(cos\_idx); 46: im\_nco(i) = sin\_signal ∗ cos\_table(sin\_idx);

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

48: ▷ —- Phase Increment (Phase Accumulator) —-

54: ▷\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* Complex Multiplier \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* 55: re\_bb(i) = re\_ad(i) ∗ re\_nco(i) – im\_ad(i) ∗ im\_nco(i); 56: im\_bb(i) = re\_ad(i) ∗ im\_nco(i) + im\_ad(i) ∗ re\_nco(i);

> <sup>¼</sup> *<sup>m</sup>*Δ*<sup>f</sup> RA<sup>N</sup> fclk*

operation is simply done by adding *NPRE* to the negative value of *m*.

Then, using the aforementioned definitions and rewriting Eq. (15) letting Δ*θ* in

¼ *m*

In case *m* is negative, it is necessary to calculate its module in relation to *NPRE* before feeding it into the customised NCO. Note in Algorithm 1 that the module

In a traditional NCO algorithm, i.e. one that adopts full-period waveforms, there would be two main parts, namely, phase accumulator and LUT. In its simplest form, there would be two LUTs storing samples of a cosine and a sine wave. However, this approach generally results very large tables, which sometimes are impractical. Therefore, for a practical implementation with reduced tables, the proposed algorithm employs only one LUT exploiting quarter-wave symmetry in the basis waveform and the constant phase offset (*pi/*2) between sine and cosine signals. In this approach we use one LUT with *N*/4 samples. However, when exploiting quarterwave symmetry, the mapping from phase space to time is not direct as in the

In order to exploit quarter-wave symmetry, an algorithm is needed to map the angle values (phase space), *θ*, output by the phase accumulator into valid positions of a shortened LUT containing the samples of a cosine signal. This task is performed by the angle mapper part of Algorithm 1. The angle mapper maps angle values in the second, third and fourth quadrants into the first one and tracks the signals that must be applied to cosine and sine values. As can be seen in Algorithm 1, the indices for generating the cosine signal are calculated first, and then a *N*/4-phase offset, which is equivalent to a *pi/*2 offset, is applied to it in order to generate the sine indexes. In order to store values ranging from 1 to 0, i.e. the first quadrant of a cosine signal, (*N*/4) + 1 samples would be necessary where the last one is zero.

*m*Δ*f RA f clk*

*f clk* Δ*f RA*

¼ *m:* (17)

40: **if** sin\_idx == N/4 **then** 41: sin\_idx = ((N/4) � 1);

39: **end if**

42: **end if**

52: **end if**

57: **end for**

evidence, we have

59: **end procedure**

traditional NCO algorithm.

**67**

43:

47:

53:

By analysing the equation above, it is noticeable that all frequency shifts are integer multiples of Δ*fRA*, and in this way we state that the output frequency of the NCO must be *fout* = *m*Δ*fRA*. Before proceeding we must define the values for some parameters presented in the previous section. The frequency resolution Δ*f* = Δ*fRA* = 1250 Hz, the system clock frequency *fclk* = 1*/Ts* = 30*.*72 MHz, the length, *<sup>N</sup>* <sup>¼</sup> <sup>2</sup>*B*Θð Þ *<sup>n</sup>* , of the single cycle of the basis complex waveform is made equal to 24,576 samples. The cycle length can be expressed as *N* = *fclk/*Δ*fRA*.

**Algorithm 1**. Time-domain frequency shifter algorithm

```
1: procedure TDFREQSHIFTER(re_ad, im_ad, offset, bw, cos_table)
2:
3: ▷ ********* Discrete Frequency Shift Calculator *********
4: m = 13 + 144 ∗ offset � 72 ∗ bw;
5: ▷ —- Frequency Control Word (FCW) —-
6: delta_theta = m;
7: if m < 0 then
8: delta_theta = N + m;
9: end if
10:
11: ▷ ************* customised NCO Algorithm ************
12: theta = 0;
13: for i 0 to N � 1 do
14: ▷ —- Angle Mapper —-
15: cos_signal = 1;
16: sin_signal = 1;
17: if theta > 3 ∗ N/4 then
18: cos_idx = (N � theta);
19: else
20: if theta > N/2 then
21: cos_idx = (theta � (N/2));
22: cos_signal = �1;
23: else
24: if theta > N/4 then
25: cos_idx = ((N/2) � theta);
26: cos_signal = �1;
27: sin_signal = �1;
28: else
29: cos_idx = theta;
30: sin_signal = �1;
31: end if
32: end if
33: end if
34:
35: sin_idx = (N/4) – cos_idx;
36:
37: if cos_idx == N/4 then
```
*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems DOI: http://dx.doi.org/10.5772/intechopen.91339*

shift the received pass-band preamble signal to baseband. The discrete frequency shift, *m*, is calculated as shown in Eq. (11). By remembering that Δ*fRA*Δ*t* = 1*/NPRE*,

exp *j*2*π m*Δ*f RA*

By analysing the equation above, it is noticeable that all frequency shifts are integer multiples of Δ*fRA*, and in this way we state that the output frequency of the NCO must be *fout* = *m*Δ*fRA*. Before proceeding we must define the values for some parameters presented in the previous section. The frequency resolution Δ*f* = Δ*fRA* = 1250 Hz, the system clock frequency *fclk* = 1*/Ts* = 30*.*72 MHz, the length, *<sup>N</sup>* <sup>¼</sup> <sup>2</sup>*B*Θð Þ *<sup>n</sup>* , of the single cycle of the basis complex waveform is made equal to 24,576 samples.

*t :* (16)

we can then rewrite the exponential part of Eq. (9) as

*Field Programmable Gate Arrays (FPGAs) II*

The cycle length can be expressed as *N* = *fclk/*Δ*fRA*.

4: *m* = 13 + 144 ∗ *offset* � 72 ∗ *bw*;

6: *delta\_theta* = *m*; 7: **if** *m <* 0 **then**

9: **end if**

12: *theta* = 0;

19: **else**

23: **else**

28: **else**

31: **end if** 32: **end if** 33: **end if**

34:

36:

**66**

10:

8: *delta\_theta* = *N* + *m*;

13: **for** *i* 0 *to N* � 1 **do**

15: *cos\_signal* = 1; 16: *sin\_signal* = 1;

14: ▷ —- Angle Mapper —-

17: **if** *theta >* 3 ∗ *N*/4 **then** 18: *cos\_idx* = (*N* � *theta*);

20: **if** *theta > N/*2 **then**

22: *cos\_signal* = �1;

24: **if** *theta > N/*4 **then**

26: *cos\_signal* = �1; 27: *sin\_signal* = �1;

29: *cos\_idx* = *theta*; 30: *sin\_signal* = �1;

35: sin\_idx = (N/4) – cos\_idx;

37: **if** cos\_idx == N/4 **then**

21: *cos\_idx* = (*theta* � (*N*/2));

25: *cos\_idx* = ((*N*/2) � *theta*);

2:

**Algorithm 1**. Time-domain frequency shifter algorithm

5: ▷ —- Frequency Control Word (FCW) —-

1: **procedure** TDFREQSHIFTER(*re\_ad, im\_ad, offset, bw, cos\_table*)

3: ▷ \*\*\*\*\*\*\*\*\* Discrete Frequency Shift Calculator \*\*\*\*\*\*\*\*\*

11: ▷ \*\*\*\*\*\*\*\*\*\*\*\*\* customised NCO Algorithm \*\*\*\*\*\*\*\*\*\*\*\*

```
38: cos_idx = ((N/4) � 1);
39: end if
40: if sin_idx == N/4 then
41: sin_idx = ((N/4) � 1);
42: end if
43:
44: ▷ —- Phase to Value Mapping (LUT) —-
45: re_nco(i) = cos_signal ∗ cos_table(cos_idx);
46: im_nco(i) = sin_signal ∗ cos_table(sin_idx);
47:
48: ▷ —- Phase Increment (Phase Accumulator) —-
49: theta = (theta + delta theta);
50: if theta >= N then
51: theta = (theta � N);
52: end if
53:
54: ▷*************** Complex Multiplier ***************
55: re_bb(i) = re_ad(i) ∗ re_nco(i) – im_ad(i) ∗ im_nco(i);
56: im_bb(i) = re_ad(i) ∗ im_nco(i) + im_ad(i) ∗ re_nco(i);
57: end for
58: return re_bb, im_bb
59: end procedure
```
Then, using the aforementioned definitions and rewriting Eq. (15) letting Δ*θ* in evidence, we have

$$
\Delta\theta = \frac{f\_{out}N}{f\_{clk}} = \frac{m\Delta f\_{RA}N}{f\_{clk}} = m\frac{m\Delta f\_{RA}}{f\_{clk}}\frac{f\_{clk}}{\Delta f\_{RA}} = m.\tag{17}
$$

In case *m* is negative, it is necessary to calculate its module in relation to *NPRE* before feeding it into the customised NCO. Note in Algorithm 1 that the module operation is simply done by adding *NPRE* to the negative value of *m*.

In a traditional NCO algorithm, i.e. one that adopts full-period waveforms, there would be two main parts, namely, phase accumulator and LUT. In its simplest form, there would be two LUTs storing samples of a cosine and a sine wave. However, this approach generally results very large tables, which sometimes are impractical. Therefore, for a practical implementation with reduced tables, the proposed algorithm employs only one LUT exploiting quarter-wave symmetry in the basis waveform and the constant phase offset (*pi/*2) between sine and cosine signals. In this approach we use one LUT with *N*/4 samples. However, when exploiting quarterwave symmetry, the mapping from phase space to time is not direct as in the traditional NCO algorithm.

In order to exploit quarter-wave symmetry, an algorithm is needed to map the angle values (phase space), *θ*, output by the phase accumulator into valid positions of a shortened LUT containing the samples of a cosine signal. This task is performed by the angle mapper part of Algorithm 1. The angle mapper maps angle values in the second, third and fourth quadrants into the first one and tracks the signals that must be applied to cosine and sine values. As can be seen in Algorithm 1, the indices for generating the cosine signal are calculated first, and then a *N*/4-phase offset, which is equivalent to a *pi/*2 offset, is applied to it in order to generate the sine indexes. In order to store values ranging from 1 to 0, i.e. the first quadrant of a cosine signal, (*N*/4) + 1 samples would be necessary where the last one is zero.

The zero value can be mapped to the value stored at the *N*/4-th position with minimal degradation on SFDR performance. Therefore, as can be seen in the algorithm, when either sine or cosine indexes are equal to *N*/4, their values are changed to (*N*/4) 1, which is the closest value to zero.

As the LUT only stores samples from the first quadrant of a cosine signal, i.e. only positive values, the phase to value mapper part of the algorithm must apply the correct signals (provided by the angle mapper) to the LUT's output.

The phase increment (also referred as phase accumulator) part of the algorithm acts as an integrator. It calculates at each iteration of the algorithm a new phase value, *θ*, by using the phase increment Δ*θ* value provided by the discrete frequency shift calculator part. Once the angle mapper algorithm can only map values ranging from 0 to *N* 1 into the range 0 to (*N*/4) – 1, it is necessary to apply a module operation in case the resulting phase value is equal or greater than *N*.

more area efficient implementation because the memory requirements are

valuable chip area and also reduces power consumption.

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

*Memory utilisation when employing quarter-wave symmetry.*

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

rate, the complex multiplier module being the only exception.

PRACH receiver module when setting the enable signal of the chip.

**4. Implementation details**

**Table 3.**

symmetry for shortened tables.

valid.

**69**

minimised, i.e. fewer FPGA BRAMs are required. Therefore this approach saves on

**Data width Size in kb No. of 36 kb BRAMs Reduction in %** 48 2 81.8 72 2 87.5 96 3 86.4 144 4 87.5

This section presents some discussions on implementation details of the proposed architecture. It is suitable for implementation on devices that employ hardware description language (HDL) as part of its design process such as fieldprogrammable gate arrays (FPGA) and application-specific integrated circuits (ASIC). **Figure 4** shows the hardware architecture of the time-domain frequency shifter module. The architecture employs only one LUT and exploits quarter-wave

The proposed architecture works with two different system clocks. The first clock, of 30.72 MHz, is used by the ADC unit and therefore dictates the rate the complex multiplication is performed. The complex multiplier and phase to value mapper modules are the only two modules running at this clock rate. The second clock rate employed in the system is 61*.*44 MHz and is used so that two samples of the complex exponential waveform can be read from the same LUT memory during the period of one sample arriving from the ADC module. This dual system clock scheme drastically reduces the amount of RAM memory necessary for the system to be implemented. All modules composing the proposed architecture run at this clock

The two input parameters, *bandwidth* and *frequency\_offset*, are fed into the module only when the eNodeB is being initialised. They can be considered static parameters of eNodeB. The input signal *config\_present* is asserted by higher layers during the initialisation process to inform when the input parameter values are

The proposed architecture has to be informed through the *ce\_*61*MHz*44 signal when the sequence section of the RACH preamble starts so that it can be multiplied by the local complex exponential, which is generated by the customised NCO module. The signal *ce\_*61*MHz*44 is set by the PRACH receiver module after it removes (i.e. discards) the CP portion of the RACH preamble and has to stay in high state level for the whole duration of the RACH sequence portion, e.g. in the case of format 0, the signal *ce\_*61*MHz*44 must remain in high state for 24,576 30*.*72 MHz clock cycles, i.e. 2 24,576, when considering the 61*.*44 MHz system clock rate. It is important to highlight that some latency is expected once all modules have registered outputs, and therefore, this latency has to be taken into consideration by the

Another important characteristic of the proposed architecture is that it only employs a total of four multipliers that are used by the complex multiplier module. Moreover, the proposed architecture only uses plain add and bit-shift operations to

The module operation is easily performed by subtracting *N* from the phase value. The last part of the algorithm multiplies sample by sample the generated complex exponential signal by the ADC signal. That complex multiplication operation then translates the pass-band PRACH signal into baseband.

#### **3.3 Advantage of the proposed algorithm**

The main advantage of the proposed algorithm is the memory savings attained by the use of a 1/4-length cosine table instead of storing in RAM each one of the 2*N* possible samples of a complex exponential signal. **Table 2** shows the memory utilisation for some data widths if we were to store all the 2N samples (sine and cosine waves) corresponding to one complete period of the basis complex exponential waveform necessary to translate the received PRACH signal into baseband. In Xilinx FPGAs, a block RAM (BRAM) is a dedicated, i.e. they cannot be used for anything, but RAM, a two-port memory containing several kilobits of RAM. A FPGA contains a limited number of these blocks. The configuration logical blocks (CLB) in most of Xilinx FPGA contain a small RAM. They are called distributed (LUT) RAM because they are distributed throughout the FPGA once they are part of a CLB. This kind of RAM can normally store only a dozen bits. A reasonable rule of thumb when designing with FPGAs is that if you need a lot of RAM, as is the case here, you should use BRAMs; otherwise, the FPGA resources will be eaten up implementing the RAM in distributed RAM. It is important to say that a Virtex-6 FPGA device has only 416 36 kb BRAMs which makes it a very precious resource when implementing a large project as an L1 PHY, and in this way its usage must be taken into account during planning and development. One alternative to decrease the number of occupied BRAMs is the exploitation of quarter-wave symmetry in the basis waveform. This alternative results in a customised NCO that employs a shortened LUT, as can be seen in **Table 3**.

The fourth column in **Table 3** shows the reduction of used BRAMs when exploiting quarter-wave symmetry. As can be noticed, this approach results in a


**Table 2.** *Memory utilisation when storing the full period of the complex exponential.*


*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems DOI: http://dx.doi.org/10.5772/intechopen.91339*

**Table 3.**

The zero value can be mapped to the value stored at the *N*/4-th position with minimal degradation on SFDR performance. Therefore, as can be seen in the algorithm, when either sine or cosine indexes are equal to *N*/4, their values are changed

correct signals (provided by the angle mapper) to the LUT's output.

operation in case the resulting phase value is equal or greater than *N*.

tion then translates the pass-band PRACH signal into baseband.

As the LUT only stores samples from the first quadrant of a cosine signal, i.e. only positive values, the phase to value mapper part of the algorithm must apply the

The phase increment (also referred as phase accumulator) part of the algorithm acts as an integrator. It calculates at each iteration of the algorithm a new phase value, *θ*, by using the phase increment Δ*θ* value provided by the discrete frequency shift calculator part. Once the angle mapper algorithm can only map values ranging from 0 to *N* 1 into the range 0 to (*N*/4) – 1, it is necessary to apply a module

The module operation is easily performed by subtracting *N* from the phase value. The last part of the algorithm multiplies sample by sample the generated complex exponential signal by the ADC signal. That complex multiplication opera-

The main advantage of the proposed algorithm is the memory savings attained by the use of a 1/4-length cosine table instead of storing in RAM each one of the 2*N* possible samples of a complex exponential signal. **Table 2** shows the memory utilisation for some data widths if we were to store all the 2N samples (sine and cosine waves) corresponding to one complete period of the basis complex exponential waveform necessary to translate the received PRACH signal into baseband. In Xilinx FPGAs, a block RAM (BRAM) is a dedicated, i.e. they cannot be used for anything, but RAM, a two-port memory containing several kilobits of RAM. A FPGA contains a limited number of these blocks. The configuration logical blocks (CLB) in most of Xilinx FPGA contain a small RAM. They are called distributed (LUT) RAM because they are distributed throughout the FPGA once they are part of a CLB. This kind of RAM can normally store only a dozen bits. A reasonable rule of thumb when designing with FPGAs is that if you need a lot of RAM, as is the case here, you should use BRAMs; otherwise, the FPGA resources will be eaten up implementing the RAM in distributed RAM. It is important to say that a Virtex-6 FPGA device has only 416 36 kb BRAMs which makes it a very precious resource when implementing a large project as an L1 PHY, and in this way its usage must be taken into account during planning and development. One alternative to decrease the number of occupied BRAMs is the exploitation of quarter-wave symmetry in the basis waveform. This alternative results in a customised NCO that employs a

The fourth column in **Table 3** shows the reduction of used BRAMs when exploiting quarter-wave symmetry. As can be noticed, this approach results in a

**Data width Size in kb No. of 36 kb BRAMs**

8 384 11 12 576 16 16 768 22 24 1152 32

*Memory utilisation when storing the full period of the complex exponential.*

to (*N*/4) 1, which is the closest value to zero.

*Field Programmable Gate Arrays (FPGAs) II*

**3.3 Advantage of the proposed algorithm**

shortened LUT, as can be seen in **Table 3**.

**Table 2.**

**68**

*Memory utilisation when employing quarter-wave symmetry.*

more area efficient implementation because the memory requirements are minimised, i.e. fewer FPGA BRAMs are required. Therefore this approach saves on valuable chip area and also reduces power consumption.

#### **4. Implementation details**

This section presents some discussions on implementation details of the proposed architecture. It is suitable for implementation on devices that employ hardware description language (HDL) as part of its design process such as fieldprogrammable gate arrays (FPGA) and application-specific integrated circuits (ASIC). **Figure 4** shows the hardware architecture of the time-domain frequency shifter module. The architecture employs only one LUT and exploits quarter-wave symmetry for shortened tables.

The proposed architecture works with two different system clocks. The first clock, of 30.72 MHz, is used by the ADC unit and therefore dictates the rate the complex multiplication is performed. The complex multiplier and phase to value mapper modules are the only two modules running at this clock rate. The second clock rate employed in the system is 61*.*44 MHz and is used so that two samples of the complex exponential waveform can be read from the same LUT memory during the period of one sample arriving from the ADC module. This dual system clock scheme drastically reduces the amount of RAM memory necessary for the system to be implemented. All modules composing the proposed architecture run at this clock rate, the complex multiplier module being the only exception.

The two input parameters, *bandwidth* and *frequency\_offset*, are fed into the module only when the eNodeB is being initialised. They can be considered static parameters of eNodeB. The input signal *config\_present* is asserted by higher layers during the initialisation process to inform when the input parameter values are valid.

The proposed architecture has to be informed through the *ce\_*61*MHz*44 signal when the sequence section of the RACH preamble starts so that it can be multiplied by the local complex exponential, which is generated by the customised NCO module. The signal *ce\_*61*MHz*44 is set by the PRACH receiver module after it removes (i.e. discards) the CP portion of the RACH preamble and has to stay in high state level for the whole duration of the RACH sequence portion, e.g. in the case of format 0, the signal *ce\_*61*MHz*44 must remain in high state for 24,576 30*.*72 MHz clock cycles, i.e. 2 24,576, when considering the 61*.*44 MHz system clock rate. It is important to highlight that some latency is expected once all modules have registered outputs, and therefore, this latency has to be taken into consideration by the PRACH receiver module when setting the enable signal of the chip.

Another important characteristic of the proposed architecture is that it only employs a total of four multipliers that are used by the complex multiplier module. Moreover, the proposed architecture only uses plain add and bit-shift operations to compute the value of trigonometric functions such as complex exponential sequences, which turns it into a highly efficient hardware architecture in terms of logical resource consumption.

in **Figure 4**. The first two modules run at a system clock of 61*.*44 MHz, and the last one runs at 30*.*72 and 61*.*44 MHz since it is the module in charge of reading both quadrature components, I and Q, from the LUT memory inside one period of the

**Figure 6** depicts the proposed architecture of the phase increment module. This module is the implementation of a digital integrator, which computes the phase argument, *θ*, sent to the angle mapper module. At each iteration, Δ*θ* is added to *θ*, which starts from a value equal to 0. If *θ* results in a value that is greater than *N* + 1, then the constant value *N* is added to it so that its value remains less than *N* and, therefore, it can be correctly mapped into a valid phase argument value. The procedure we have just described is nothing but the direct implementation of the module operation, mod(*θ, N*). This module only produces a valid *θ* value when the selection signal, *sel*, is set to high level. The *sel* signal is produced by a system clock divisor that divides the 61*.*44 MHz system clock by 2, i.e. the module produces a valid output value at a clock rate of 30*.*72 MHz. The *sel* signal is also generated by the module in order to feed the phase to value mapper module. As Eq. (13) is equal to *N* and it is not equal to a power of 2, the data width of the phase increment

**Figure 7** shows the proposed architecture for the angle mapper module. This module is in charge of translating the phase argument value, *θ*, which can vary in the range between 0 and 2 ∗ *π* (i.e. from 0 to *N*) into a phase argument value inside the first quadrant of the circle, i.e. a value in the range between 0 and *pi/*2 (i.e. from 0 and *N*/4). The output value of this module is the index of cosine waveform, which is employed as an address value to access one of the *N*/4 values saved in the LUT memory. In order to compute the index of the sine waveform, a constant phase offset value equal to *pi/*2, i.e. *N*/4, has to be applied to the cosine index value. Since the value corresponding to cos(*π/*2), i.e. 0, is not saved into the LUT memory, whenever either cosine or sine index values are equal to *N*/4, then their index values

The module is also responsible for keeping track of the signals (i.e. +/) that must be applied to the I and Q values at the output of the phase to value mapper module. These signals translate the phase argument value, which are represented by the sine and cosine index values, back to its original quadrant of the disc. At the input of this module, the phase argument, *θ*, presents a databus width of 15 bits so that it is able to access *N* samples stored in the LUT memory, i.e. all the four

module, *B*Θ(*<sup>n</sup>*), is ceiled, then resulting in a value with 15 bits.

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

are modified to (*N*/4) 1, which is the closest index value to 0.

30*.*72 MHz system clock rate.

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

**Figure 6.**

**71**

*Architecture of the phase increment unit.*

Additionally, the proposed frequency shift architecture can have its inputs and outputs entirely configured, i.e. the width of input and output signals can be set to one of the following choices: 8, 12, 16 and 24 bits. In the case of our actual implementation, we employ an ADC with an output of 12 bits for each one of the quadrature components, i.e. in-phase (I) and quadrature (Q) components, and it has a fixed-point representation (Q-format) of Q12.11, i.e. 1 bit for the integer part and 11 bits for the fractional part. The I and Q components computed by the customised NCO module present the same fixed-point representation of the input of the frequency shifter module. In relation to this particular point, after the complex multiplication between the NCO and the ADC quadrature samples, which requires the multiplication and subsequent addition of samples, the fixed-point representation of the modules' output is equal to Q25.22. Since the maximum possible value generated by the complex multiplication operation is 2, the integer part only needs 2 bits instead of 3 bits. Therefore, depending on the selected width configuration, the fixed-point representation of the complex signal output by the module can be configured to Q8.6, Q12.10, Q16.14 and Q24.22.

#### **4.1 Discrete frequency shift calculator unit**

**Figure 5** depicts the proposed architecture for the discrete frequency shift calculator module. This module is employed to compute the frequency shift, *m*, that must be applied to the received PRACH signal sequence in order to translate it into a baseband signal, i.e. a signal centred around 0 Hz. Therefore, in order to compute such frequency shifts, the module implements Eq. (11). All multiplications involved here are executed by bit-shifting the input values *NUL RB* and *nRA PRB* by the constant values 72 and 144, respectively, and then adding the result to the constant value 13. Before sending the value of *m* to the customised NCO module, it is necessary to verify whether the resulting value is negative or not; if it is negative, then the constant value *N* has to be summed to the result value, which turns *m* into a positive value. It is done due to the fact that the phase increment module, which composes the customised NCO module, only expects positive input values. As defined by Eq. (17), the discrete frequency shift, *m*, is equal to Δ*θ*, which is the necessary input value for the customised NCO module to operate properly.

#### **4.2 Customised numerically controlled oscillator**

The customised numerically controlled module is composed of three blocks, namely, phase increment, angle mapper and phase to value mapper, as can be seen

**Figure 5.** *Architecture of the discrete frequency shift calculator unit.*

compute the value of trigonometric functions such as complex exponential sequences, which turns it into a highly efficient hardware architecture in terms of

module can be configured to Q8.6, Q12.10, Q16.14 and Q24.22.

**4.1 Discrete frequency shift calculator unit**

here are executed by bit-shifting the input values *NUL*

value for the customised NCO module to operate properly.

**4.2 Customised numerically controlled oscillator**

*Architecture of the discrete frequency shift calculator unit.*

**Figure 5.**

**70**

Additionally, the proposed frequency shift architecture can have its inputs and outputs entirely configured, i.e. the width of input and output signals can be set to one of the following choices: 8, 12, 16 and 24 bits. In the case of our actual implementation, we employ an ADC with an output of 12 bits for each one of the quadrature components, i.e. in-phase (I) and quadrature (Q) components, and it has a fixed-point representation (Q-format) of Q12.11, i.e. 1 bit for the integer part and 11 bits for the fractional part. The I and Q components computed by the customised NCO module present the same fixed-point representation of the input of the frequency shifter module. In relation to this particular point, after the complex multiplication between the NCO and the ADC quadrature samples, which requires the multiplication and subsequent addition of samples, the fixed-point representation of the modules' output is equal to Q25.22. Since the maximum possible value generated by the complex multiplication operation is 2, the integer part only needs 2 bits instead of 3 bits. Therefore, depending on the selected width configuration, the fixed-point representation of the complex signal output by the

**Figure 5** depicts the proposed architecture for the discrete frequency shift calculator module. This module is employed to compute the frequency shift, *m*, that must be applied to the received PRACH signal sequence in order to translate it into a baseband signal, i.e. a signal centred around 0 Hz. Therefore, in order to compute such frequency shifts, the module implements Eq. (11). All multiplications involved

values 72 and 144, respectively, and then adding the result to the constant value 13. Before sending the value of *m* to the customised NCO module, it is necessary to verify whether the resulting value is negative or not; if it is negative, then the constant value *N* has to be summed to the result value, which turns *m* into a positive value. It is done due to the fact that the phase increment module, which composes the customised NCO module, only expects positive input values. As defined by Eq. (17), the discrete frequency shift, *m*, is equal to Δ*θ*, which is the necessary input

The customised numerically controlled module is composed of three blocks, namely, phase increment, angle mapper and phase to value mapper, as can be seen

*RB* and *nRA*

*PRB* by the constant

logical resource consumption.

*Field Programmable Gate Arrays (FPGAs) II*

in **Figure 4**. The first two modules run at a system clock of 61*.*44 MHz, and the last one runs at 30*.*72 and 61*.*44 MHz since it is the module in charge of reading both quadrature components, I and Q, from the LUT memory inside one period of the 30*.*72 MHz system clock rate.

**Figure 6** depicts the proposed architecture of the phase increment module. This module is the implementation of a digital integrator, which computes the phase argument, *θ*, sent to the angle mapper module. At each iteration, Δ*θ* is added to *θ*, which starts from a value equal to 0. If *θ* results in a value that is greater than *N* + 1, then the constant value *N* is added to it so that its value remains less than *N* and, therefore, it can be correctly mapped into a valid phase argument value. The procedure we have just described is nothing but the direct implementation of the module operation, mod(*θ, N*). This module only produces a valid *θ* value when the selection signal, *sel*, is set to high level. The *sel* signal is produced by a system clock divisor that divides the 61*.*44 MHz system clock by 2, i.e. the module produces a valid output value at a clock rate of 30*.*72 MHz. The *sel* signal is also generated by the module in order to feed the phase to value mapper module. As Eq. (13) is equal to *N* and it is not equal to a power of 2, the data width of the phase increment module, *B*Θ(*<sup>n</sup>*), is ceiled, then resulting in a value with 15 bits.

**Figure 7** shows the proposed architecture for the angle mapper module. This module is in charge of translating the phase argument value, *θ*, which can vary in the range between 0 and 2 ∗ *π* (i.e. from 0 to *N*) into a phase argument value inside the first quadrant of the circle, i.e. a value in the range between 0 and *pi/*2 (i.e. from 0 and *N*/4). The output value of this module is the index of cosine waveform, which is employed as an address value to access one of the *N*/4 values saved in the LUT memory. In order to compute the index of the sine waveform, a constant phase offset value equal to *pi/*2, i.e. *N*/4, has to be applied to the cosine index value. Since the value corresponding to cos(*π/*2), i.e. 0, is not saved into the LUT memory, whenever either cosine or sine index values are equal to *N*/4, then their index values are modified to (*N*/4) 1, which is the closest index value to 0.

The module is also responsible for keeping track of the signals (i.e. +/) that must be applied to the I and Q values at the output of the phase to value mapper module. These signals translate the phase argument value, which are represented by the sine and cosine index values, back to its original quadrant of the disc. At the input of this module, the phase argument, *θ*, presents a databus width of 15 bits so that it is able to access *N* samples stored in the LUT memory, i.e. all the four

**Figure 6.** *Architecture of the phase increment unit.*

Both sine and cosine index values remain constant for a cycle of the clock rate of 30*.*72 MHz. Since the LUT memory works at a clock rate of 61*.*44 MHz and both index values are present at the same time at its input, it is possible to read two samples inside one period of the clock rate of 30*.*72 MHz. Through the creation of a delayed data path with the use of a register at the output of the LUT memory, it is possible to redirect the I and Q sample components to two distinct data paths and therefore generate the complex exponential sequence that is necessary to translate the received PRACH preamble sequence into baseband. At this stage, the resulting quadrature sequence signal is fully synchronised to the 61*.*44 MHz clock rate; however, each quadrature pair of values lasts for one period of the 30*.*72 MHz clock rate. This is explained by the two registers with the chip enable signal inputs set by the *sel* signal that is located at the output of the multiplexers responsible for chang-

In order to convert the quadrature sample values to their original quadrants, the sine and cosine signals created by the angle mapper module are applied to their respective data paths. When due, the change of signal is easily executed through the application of the complement of two operations to the sample value. Finally, it is necessary to change the clock domain of the complex exponential signal sequence since the ADC module works at a data rate of 30*.*<sup>72</sup> <sup>10</sup><sup>6</sup> samples per second. Even though its samples last for the correct period, they are not synchronised to the 30*.*72 MHz clock rate. The simplest way to execute the clock domain crossing is to use two different registers at the desired clock rate. As we work with complex signals, we employ a pair of dual registers for each one of the quadrature component values. At this stage, the resulting complex exponential sequence signal is totally ready to be

**Figure 9** shows the proposed architecture for the complex multiplier module. A complex multiplier is necessary to multiply the samples coming from both NCO and ADC modules and perform the required frequency shift in time domain, once samples coming from these modules are complex. The complex multiplier module, which is also known as mixer, executes the multiplication of the ADC samples, i.e. the received PRACH sequence signal, by the complex exponential signal created by

ing the signal of the quadrature components.

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

multiplied by the received PRACH preamble signal.

**4.3 Complex multiplier unit**

**Figure 9.**

**73**

*Architecture of the complex multiplier unit.*

**Figure 7.** *Architecture of the angle mapper unit.*

#### **Figure 8.**

*Architecture of the phase to value mapper unit.*

quadrants of the disc. Therefore, since the angle mapper module converts *θ* to the first quadrant of the disc, its databus width can be decreased to 13 bits, which is the number of bits used to access the *N*/4 samples of the first quadrant (i.e. quarterwave symmetry) and that are saved in the LUT memory. This module runs at a clock rate of 61*.*44 MHz and outputs two new phase argument indexes, for the sine and cosine waveforms, at a clock rate of 30*.*72 MHz once the phase argument *θ* is sent to the module at that clock rate.

**Figure 8** depicts the proposed architecture for the phase to value mapper module, which is in charge of translating values from phase space to time domain. The sine and cosine index values are employed as address values to access the correct positions of the LUT memory. It is the LUT memory that executes the translation from phase space to time domain. The LUT memory stores only 1*/*4, i.e. *N*/4, samples of the cosine waveform signal employed as the basis waveform signal. The *sel* signal selects whether the sine or cosine index value is employed to access the LUT memory. As it is shown in **Figure 8**, the cosine index value is employed as the address value when the *sel* signal is at low level and the sine index value is used when it is at a high level.

#### *An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems DOI: http://dx.doi.org/10.5772/intechopen.91339*

Both sine and cosine index values remain constant for a cycle of the clock rate of 30*.*72 MHz. Since the LUT memory works at a clock rate of 61*.*44 MHz and both index values are present at the same time at its input, it is possible to read two samples inside one period of the clock rate of 30*.*72 MHz. Through the creation of a delayed data path with the use of a register at the output of the LUT memory, it is possible to redirect the I and Q sample components to two distinct data paths and therefore generate the complex exponential sequence that is necessary to translate the received PRACH preamble sequence into baseband. At this stage, the resulting quadrature sequence signal is fully synchronised to the 61*.*44 MHz clock rate; however, each quadrature pair of values lasts for one period of the 30*.*72 MHz clock rate. This is explained by the two registers with the chip enable signal inputs set by the *sel* signal that is located at the output of the multiplexers responsible for changing the signal of the quadrature components.

In order to convert the quadrature sample values to their original quadrants, the sine and cosine signals created by the angle mapper module are applied to their respective data paths. When due, the change of signal is easily executed through the application of the complement of two operations to the sample value. Finally, it is necessary to change the clock domain of the complex exponential signal sequence since the ADC module works at a data rate of 30*.*<sup>72</sup> <sup>10</sup><sup>6</sup> samples per second. Even though its samples last for the correct period, they are not synchronised to the 30*.*72 MHz clock rate. The simplest way to execute the clock domain crossing is to use two different registers at the desired clock rate. As we work with complex signals, we employ a pair of dual registers for each one of the quadrature component values. At this stage, the resulting complex exponential sequence signal is totally ready to be multiplied by the received PRACH preamble signal.

#### **4.3 Complex multiplier unit**

**Figure 9** shows the proposed architecture for the complex multiplier module. A complex multiplier is necessary to multiply the samples coming from both NCO and ADC modules and perform the required frequency shift in time domain, once samples coming from these modules are complex. The complex multiplier module, which is also known as mixer, executes the multiplication of the ADC samples, i.e. the received PRACH sequence signal, by the complex exponential signal created by

**Figure 9.** *Architecture of the complex multiplier unit.*

quadrants of the disc. Therefore, since the angle mapper module converts *θ* to the first quadrant of the disc, its databus width can be decreased to 13 bits, which is the number of bits used to access the *N*/4 samples of the first quadrant (i.e. quarterwave symmetry) and that are saved in the LUT memory. This module runs at a clock rate of 61*.*44 MHz and outputs two new phase argument indexes, for the sine and cosine waveforms, at a clock rate of 30*.*72 MHz once the phase argument *θ* is

**Figure 8** depicts the proposed architecture for the phase to value mapper module, which is in charge of translating values from phase space to time domain. The sine and cosine index values are employed as address values to access the correct positions of the LUT memory. It is the LUT memory that executes the translation from phase space to time domain. The LUT memory stores only 1*/*4, i.e. *N*/4, samples of the cosine waveform signal employed as the basis waveform signal. The *sel* signal selects whether the sine or cosine index value is employed to access the LUT memory. As it is shown in **Figure 8**, the cosine index value is employed as the address value when the *sel* signal is at low level and the sine index value is used

sent to the module at that clock rate.

*Architecture of the phase to value mapper unit.*

when it is at a high level.

**72**

**Figure 7.**

**Figure 8.**

*Architecture of the angle mapper unit.*

*Field Programmable Gate Arrays (FPGAs) II*

the customised NCO module. The multiplication of these two complex values, i.e. *a* + *jb* and *c* + *jd*, results in the complex product defined by Eq. (18):

$$\begin{split} real+j\*imag &= (a+j\*b)\*(c+j\*d) \\ &= (ac-bd)+j\*(ad+bc). \end{split} \tag{18}$$

frequency shifts, *m*, and for some Q-formats. PRACH format 0 preambles were considered for this and all other simulation results. **Figure 11** presents the SFDR variation of the implemented NCO unit, i.e. the DUT, along all possible discrete frequency shifts, *m*, and for some Q-formats. SFDR is the power ratio between the fundamental signal and the strongest spurious signal, i.e. the most prominent harmonic, present at the output of the customised NCO. By analysing this result, it is noticeable that the SFDR attained by the DUT is almost the same for formats Q24.23 and Q32.31. The SFDR values achieved by the DUT for formats Q24.23 and Q32.31 are 153.58 and 154.2 dB, respectively. These high SFDR values are due to the fact that the phase increment bits are not truncated and all output frequencies, *fout*, are integer multiples of the system clock frequency, *fclk*, which is the frequency used to sample the basis waveform, therefore eliminating the spectral artefacts resultant

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

As an illustrative example of the performance presented by the customised NCO, **Figure 12** shows its power spectrum for some fixed-point formats with SFDR indication for frequency shift, *m*, equal to 7187, i.e. 8,983,750 Hz. These results clearly show the cleanliness achieved by the proposed customised NCO even for format Q8.7. The noise floor for format Q24.23 is so small that it is imperceptible. **Figure 13** depicts the SNR variation of the customised NCO along all possible discrete frequency shifts, *m*, and for some Q-formats. The SNR results are obtained

by the ratio between the signal average power and the noise average power.

This section presents results regarding the implementation of the time-domain frequency shifter architecture. From this point on, we refer to the architecture implementation as the DUT. This time a Matlab floating-point model (GM) of the whole time-domain frequency shifter architecture is used to assess the performance

**Table 4** presents information regarding the resource usage of the proposed architecture. It sums up the key results obtained after the implementation of the proposed frequency shifter architecture on a given FPGA chip. The number of

from phase jitter [15].

of the circuit.

**Figure 11.**

**75**

*SFDR variation of the customised NCO.*

**5.2 Time-domain frequency shifter**

As can be noticed by analysing Eq. (18), the complex multiplication operation needs two additions and four multiplications since a subtraction operation is considered as being an addition in complement of two. The complex multiplier modules works at the clock rate of 30*.*72 MHz since it must always obey the data rate determined by the ADC module.

#### **5. Implementation and simulation results**

In order to assess the efficiency of the customised NCO and time-domain frequency shifter units proposed in this paper, some simulations were carried out. The proposed time-domain frequency shifter architecture was developed in VHSIC hardware description language (VHDL), and a corresponding bit-accurate Matlab model, referred here as Golden Model (GM), was developed for verification. The full design was targeted to a Xilinx Virtex-6 xc6vlx240t FPGA. The results presented next are split into parts: the first one provides the simulation results for the customised NCO architecture implementation, and the second part presents the results regarding the implementation of the time-domain frequency shifter architecture.

#### **5.1 Customised numerically controlled oscillator**

This section presents results regarding the customised NCO implementation. The first simulation result, shown in **Figure 10**, compares floating-point precision Matlab-generated complex exponential sequences with fixed-point precision sequences generated by the device under test (DUT) along all possible discrete

**Figure 10.** *Average error between GM and DUT implementations of the customised NCO.*

#### *An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems DOI: http://dx.doi.org/10.5772/intechopen.91339*

the customised NCO module. The multiplication of these two complex values, i.e.

*real* þ *j* ∗ *imag* ¼ ð Þ *a* þ *j* ∗ *b* ∗ ð Þ *c* þ *j* ∗ *d*

As can be noticed by analysing Eq. (18), the complex multiplication operation needs two additions and four multiplications since a subtraction operation is considered as being an addition in complement of two. The complex multiplier modules works at the clock rate of 30*.*72 MHz since it must always obey the data rate

In order to assess the efficiency of the customised NCO and time-domain frequency shifter units proposed in this paper, some simulations were carried out. The proposed time-domain frequency shifter architecture was developed in VHSIC hardware description language (VHDL), and a corresponding bit-accurate Matlab model, referred here as Golden Model (GM), was developed for verification. The full design was targeted to a Xilinx Virtex-6 xc6vlx240t FPGA. The results presented next are split into parts: the first one provides the simulation results for the customised NCO architecture implementation, and the second part presents the results regarding the implementation of the time-domain frequency shifter

This section presents results regarding the customised NCO implementation. The first simulation result, shown in **Figure 10**, compares floating-point precision Matlab-generated complex exponential sequences with fixed-point precision sequences generated by the device under test (DUT) along all possible discrete

<sup>¼</sup> ð Þþ *ac* � *bd <sup>j</sup>* <sup>∗</sup> ð Þ *ad* <sup>þ</sup> *bc :* (18)

*a* + *jb* and *c* + *jd*, results in the complex product defined by Eq. (18):

determined by the ADC module.

*Field Programmable Gate Arrays (FPGAs) II*

architecture.

**Figure 10.**

**74**

**5. Implementation and simulation results**

**5.1 Customised numerically controlled oscillator**

*Average error between GM and DUT implementations of the customised NCO.*

frequency shifts, *m*, and for some Q-formats. PRACH format 0 preambles were considered for this and all other simulation results. **Figure 11** presents the SFDR variation of the implemented NCO unit, i.e. the DUT, along all possible discrete frequency shifts, *m*, and for some Q-formats. SFDR is the power ratio between the fundamental signal and the strongest spurious signal, i.e. the most prominent harmonic, present at the output of the customised NCO. By analysing this result, it is noticeable that the SFDR attained by the DUT is almost the same for formats Q24.23 and Q32.31. The SFDR values achieved by the DUT for formats Q24.23 and Q32.31 are 153.58 and 154.2 dB, respectively. These high SFDR values are due to the fact that the phase increment bits are not truncated and all output frequencies, *fout*, are integer multiples of the system clock frequency, *fclk*, which is the frequency used to sample the basis waveform, therefore eliminating the spectral artefacts resultant from phase jitter [15].

As an illustrative example of the performance presented by the customised NCO, **Figure 12** shows its power spectrum for some fixed-point formats with SFDR indication for frequency shift, *m*, equal to 7187, i.e. 8,983,750 Hz. These results clearly show the cleanliness achieved by the proposed customised NCO even for format Q8.7. The noise floor for format Q24.23 is so small that it is imperceptible.

**Figure 13** depicts the SNR variation of the customised NCO along all possible discrete frequency shifts, *m*, and for some Q-formats. The SNR results are obtained by the ratio between the signal average power and the noise average power.

#### **5.2 Time-domain frequency shifter**

This section presents results regarding the implementation of the time-domain frequency shifter architecture. From this point on, we refer to the architecture implementation as the DUT. This time a Matlab floating-point model (GM) of the whole time-domain frequency shifter architecture is used to assess the performance of the circuit.

**Table 4** presents information regarding the resource usage of the proposed architecture. It sums up the key results obtained after the implementation of the proposed frequency shifter architecture on a given FPGA chip. The number of

**Figure 11.** *SFDR variation of the customised NCO.*

After observing the results presented in **Table 4**, we realise that three block RAM (BRAM) memory resources are employed instead of the two mentioned before. This is explained due to the fact that the synthesis tool maps all the contents of the LUT memory into three BRAM resources since the number of bits employed to address the LUT memory is equal to 13, and therefore, ceil((2<sup>13</sup> ∗ 12)*/*36*K*)=3 instead of the two BRAMs mentioned earlier. Noticed that each address position of the BRAM resources saves a 12 bit value that is sampled from the basis cosine waveform. The four Xilinx DSP48 resources that are instantiated are used in the complex multiplier module to implement the multiplication operation defined in

FPGA model number XC6VLX240T-1ff1156 (Virtex-6) Amount of slice registers 170 out of 301440 0% Amount of slice LUTs 215 out of 150720 0% Amount of occupied slices 84 out of 37,680 0% Amount of RAMB36E1/FIFO36E1s 3 out of 416 0% Amount of DSP48E1s 4 out of 768 0% Maximum achievable frequency 239.981 MHz

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

It is important to mention that in Virtex-6 family of FPGAs, one slice consists of

The reuse of DSP48 units is an important and feasible approach that is able to optimise the FPGA area utilisation at the expense of a higher clock rate operation and additional usage of control logical resources. For instance, the four fully parallel multiplications in the complex multiplier module could be serialised, which would

After analysing **Table 4**, it is possible to notice that the proposed frequency shifter architecture uses less than 1% of all available Virtex-6 logical resources. Given that the actual implementation of the proposed architecture on FPGA presents a very low occupancy rate, the utilisation of low-cost FPGA models is possible. Therefore, there are two important points that must be taken into consideration when selecting a low-cost FPGA model: (i) the maximum achievable frequency operation, once low-cost FPGA models tend to present worse timing characteristics, and (ii) the number of used slices might increase in the case of families earlier than the Virtex-6 family, since other families may employ LUT memories with 4 bits

In **Figure 14**, the average error between the frequency shifted preambles generated by the GM and DUT is presented. In order to generate a representative result, the average error for a given RACH preamble is averaged over all possible offsets applied to that RACH preamble. In other words, the figure shows the average error over all possible offsets that can be applied to a given RACH preamble. Moreover, the figure presents the average error for several Q-formats when the PRACH band-

width parameters correspond to bandwidths of 5 and 10 MHz, respectively.

**Figure 15** shows an exploded view of the results presented in **Figure 14**. Each subplot, representing the error for a specific Q-format, depicts the average of the

*RB* , is made equal to 25 and 50 RBs, respectively. These band-

*PRB*, is set to all possible values.

eight flip-flops and four LUTs. Block RAMs and FIFO resources are embedded resources of the 36 bit memory resources. A DSP48 resource is an embedded processing unit that corresponds to one multiplier with two 18 bit inputs and one

save three out of the four DSP48 already being used.

instead of LUT memories with 6 bits per slice.

Additionally, the PRACH offset parameter, *nRA*

Eq. (18).

**Table 4.** *Resource usage.*

accumulator of 48 bits.

width parameter, *NUL*

**77**

**Figure 12.**

occupied slices, registers, memory resources, LUTs and digital signal processor (DSP) resource blocks is shown in the table. The maximum achievable working frequency that can be reached by the module is equal to 239*.*981 MHz.

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems DOI: http://dx.doi.org/10.5772/intechopen.91339*


#### **Table 4.**

*Resource usage.*

After observing the results presented in **Table 4**, we realise that three block RAM (BRAM) memory resources are employed instead of the two mentioned before. This is explained due to the fact that the synthesis tool maps all the contents of the LUT memory into three BRAM resources since the number of bits employed to address the LUT memory is equal to 13, and therefore, ceil((2<sup>13</sup> ∗ 12)*/*36*K*)=3 instead of the two BRAMs mentioned earlier. Noticed that each address position of the BRAM resources saves a 12 bit value that is sampled from the basis cosine waveform. The four Xilinx DSP48 resources that are instantiated are used in the complex multiplier module to implement the multiplication operation defined in Eq. (18).

It is important to mention that in Virtex-6 family of FPGAs, one slice consists of eight flip-flops and four LUTs. Block RAMs and FIFO resources are embedded resources of the 36 bit memory resources. A DSP48 resource is an embedded processing unit that corresponds to one multiplier with two 18 bit inputs and one accumulator of 48 bits.

The reuse of DSP48 units is an important and feasible approach that is able to optimise the FPGA area utilisation at the expense of a higher clock rate operation and additional usage of control logical resources. For instance, the four fully parallel multiplications in the complex multiplier module could be serialised, which would save three out of the four DSP48 already being used.

After analysing **Table 4**, it is possible to notice that the proposed frequency shifter architecture uses less than 1% of all available Virtex-6 logical resources. Given that the actual implementation of the proposed architecture on FPGA presents a very low occupancy rate, the utilisation of low-cost FPGA models is possible. Therefore, there are two important points that must be taken into consideration when selecting a low-cost FPGA model: (i) the maximum achievable frequency operation, once low-cost FPGA models tend to present worse timing characteristics, and (ii) the number of used slices might increase in the case of families earlier than the Virtex-6 family, since other families may employ LUT memories with 4 bits instead of LUT memories with 6 bits per slice.

In **Figure 14**, the average error between the frequency shifted preambles generated by the GM and DUT is presented. In order to generate a representative result, the average error for a given RACH preamble is averaged over all possible offsets applied to that RACH preamble. In other words, the figure shows the average error over all possible offsets that can be applied to a given RACH preamble. Moreover, the figure presents the average error for several Q-formats when the PRACH bandwidth parameter, *NUL RB* , is made equal to 25 and 50 RBs, respectively. These bandwidth parameters correspond to bandwidths of 5 and 10 MHz, respectively. Additionally, the PRACH offset parameter, *nRA PRB*, is set to all possible values.

**Figure 15** shows an exploded view of the results presented in **Figure 14**. Each subplot, representing the error for a specific Q-format, depicts the average of the

occupied slices, registers, memory resources, LUTs and digital signal processor (DSP) resource blocks is shown in the table. The maximum achievable working

frequency that can be reached by the module is equal to 239*.*981 MHz.

**Figure 12.**

**Figure 13.**

**76**

*SNR variation of the customised NCO.*

*Customised NCO power spectrum.*

*Field Programmable Gate Arrays (FPGAs) II*

**Figure 14.** *Average error between GM and DUT.*

average error over all possible offsets applied to a given preamble for 25 and 50 RB bandwidths. It is clearly seen that the error has a very small variation, almost constant, along the preambles.

is calculated based on the noise power estimate that minimises the probability of false alarms. Based on the RACH detection threshold, the PRACH receiver is able to decide whether a RACH preamble is present or not. The RACH detection module reports back to the MAC layer the timing offset estimates and IDs of all detected RACH preambles in a given reception interval. Interested readers are referred to [4, 16] for further details on the PARCH receiver architecture and RACH detection

The bit-accurate PRACH receiver model adopted in this work includes the proposed time-domain frequency shift algorithm. In order to assess the performance of the proposed architecture, format 0 RACH preambles with *NCS* = 13 and corrupted with additive white Gaussian noise (AWGN) were sent to the PRACH receiver model. When *NCS* is equal to 13, all the 64 RACH

preambles that are allocated to a given cell can be created out of a single root ZC

several UE devices [4]. The RACH detection process follows the algorithm

representations Q8.6, Q12.10, Q16.14 and Q24.22, respectively.

Through the execution of only one circular cross-correlation operation in the frequency domain between the noisy RACH preambles and the corresponding local root ZC sequence, the PRACH receiver is able to detect random access attempts by

presented in [12]. For the next results, the bit width of the output data path of the proposed architecture was set to 8, 12, 16 and 24 bits, resulting in the fixed-point

**Figures 16** and **17** present the complementary results of preamble detection when the time-domain frequency shifter is employed along with the preamble detection algorithm proposed in [12]. They depict comparisons between floating-

algorithm, respectively.

*Exploded view of the average error between GM and DUT.*

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

sequence.

**79**

**Figure 15.**

Next we present results regarding the use of the time-domain frequency shifter architecture proposed in this work in the context of the PRACH receiver at eNodeB PHY side. The PRACH receiver architecture adopted in this work is shown in **Figure 2**, and a bit-accurate Matlab model was developed for its verification.

At the receiver side, the eNodeB attempts to detect a transmitted preamble by first extracting the PRACH signal from a received OFDM signal. The extraction involves applying downconversion, analog to digital conversion, CP removal, frequency shift, demapping and decimation to the received PRCAH signal. Next the receiver performs a matched filtering across the pool of preambles allocated to the eNodeB. Matched filtering is performed as a circular cross-correlation between the extracted PRACH signal and each of the known preambles dedicated to the eNodeB.

**Figure 2** depicts the preamble detection module, which is the last block in the PRACH receiver processing flow. This module is responsible for detecting the transmission of RACH preambles at the PHY layer. This module employs the detection algorithm proposed in a previous work by the authors of [12]. All samples being received from the IFFT module have their squared modulus computed, then producing what is called as the power delay profile (PDP) samples. This module uses the PDP samples to (i) estimate a noise power value, which is performed by identifying PDP samples that can be regarded as containing only the presence of noise, and (ii) compute a RACH detection threshold. The RACH detection threshold

#### *An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems DOI: http://dx.doi.org/10.5772/intechopen.91339*

**Figure 15.** *Exploded view of the average error between GM and DUT.*

is calculated based on the noise power estimate that minimises the probability of false alarms. Based on the RACH detection threshold, the PRACH receiver is able to decide whether a RACH preamble is present or not. The RACH detection module reports back to the MAC layer the timing offset estimates and IDs of all detected RACH preambles in a given reception interval. Interested readers are referred to [4, 16] for further details on the PARCH receiver architecture and RACH detection algorithm, respectively.

The bit-accurate PRACH receiver model adopted in this work includes the proposed time-domain frequency shift algorithm. In order to assess the performance of the proposed architecture, format 0 RACH preambles with *NCS* = 13 and corrupted with additive white Gaussian noise (AWGN) were sent to the PRACH receiver model. When *NCS* is equal to 13, all the 64 RACH preambles that are allocated to a given cell can be created out of a single root ZC sequence.

Through the execution of only one circular cross-correlation operation in the frequency domain between the noisy RACH preambles and the corresponding local root ZC sequence, the PRACH receiver is able to detect random access attempts by several UE devices [4]. The RACH detection process follows the algorithm presented in [12]. For the next results, the bit width of the output data path of the proposed architecture was set to 8, 12, 16 and 24 bits, resulting in the fixed-point representations Q8.6, Q12.10, Q16.14 and Q24.22, respectively.

**Figures 16** and **17** present the complementary results of preamble detection when the time-domain frequency shifter is employed along with the preamble detection algorithm proposed in [12]. They depict comparisons between floating-

average error over all possible offsets applied to a given preamble for 25 and 50 RB bandwidths. It is clearly seen that the error has a very small variation, almost

Next we present results regarding the use of the time-domain frequency shifter architecture proposed in this work in the context of the PRACH receiver at eNodeB PHY side. The PRACH receiver architecture adopted in this work is shown in **Figure 2**, and a bit-accurate Matlab model was developed for its verification.

At the receiver side, the eNodeB attempts to detect a transmitted preamble by first extracting the PRACH signal from a received OFDM signal. The extraction involves applying downconversion, analog to digital conversion, CP removal, frequency shift, demapping and decimation to the received PRCAH signal. Next the receiver performs a matched filtering across the pool of preambles allocated to the eNodeB. Matched filtering is performed as a circular cross-correlation between the extracted PRACH signal and each of the known preambles dedicated to the

**Figure 2** depicts the preamble detection module, which is the last block in the PRACH receiver processing flow. This module is responsible for detecting the transmission of RACH preambles at the PHY layer. This module employs the detection algorithm proposed in a previous work by the authors of [12]. All samples being received from the IFFT module have their squared modulus computed, then producing what is called as the power delay profile (PDP) samples. This module uses the PDP samples to (i) estimate a noise power value, which is performed by identifying PDP samples that can be regarded as containing only the presence of noise, and (ii) compute a RACH detection threshold. The RACH detection threshold

constant, along the preambles.

*Average error between GM and DUT.*

*Field Programmable Gate Arrays (FPGAs) II*

eNodeB.

**78**

**Figure 14.**

**Figure 16.**

*Comparison of the correct detection rate between the bit-accurate and the floating-point model.*

point and fixed-point detection results when bandwidth parameter, *NUL RB*, is set to 50 RBs (10 MHz).

**6. Conclusions**

**Figure 17.**

logical elements.

**81**

A hardware-efficient algorithm and architecture for translating PRACH preambles into baseband featuring high-accuracy and low-complexity characteristics has been presented. This paper is an extension of a previous work where we only introduced some superficial aspects of the proposed hardware architecture. In this paper we present theoretical derivations showing how to arrive at the equations used to design and implement the proposed architecture. We provide the

*Comparison of the error detection rate between the bit-accurate and the floating-point model.*

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

pseudocode for the proposed algorithm and discuss the advantage of the proposed architecture in terms of memory utilisation when comparing our proposed solution with an approach where the full period of the complex exponential is stored in FPGA memory. The proposed architecture is optimised to shrink the use of BRAMs, multipliers and logical resources. The low resource utilisation exhibited by the proposed architecture demonstrates its feasibility to be employed as part of large physical layer designs or to be used in FPGAs with small amount of

The corresponding hardware architecture has been developed and employed in the PRACH receiver. Implementation and simulation results have demonstrated

Each subplot in **Figure 16** presents the comparison of the achieved correct detection rates versus signal-to-noise ratio (SNR) in dB for a given offset, *nRA PRB*. **Figure 17** presents the comparison of the achieved error detection rates versus SNR for a given offset. For both plots the probability of false alarm (*Pfa*) is made equal to 0.1%. The plots demonstrate the high accuracy of the proposed algorithm and corresponding architecture in translating the received PRACH preambles to baseband.

An important requirement for the PRACH receiver is that it must be capable of serving a huge number of UE devices per cell maintaining a reasonable detection probability and providing them with quasi-instantaneous access to the radio resources, while keeping the false alarm rate to low levels. The probability of a correct detection of the RACH preambles at the receiver side ought to be greater than or equal to 99% at an SNR of 8*.*0 dB, as defined in Section 8.3.4.1 of [17].

By analysing **Figure 16**, it is possible to see that the proposed algorithm achieves a probability of correct detection greater than 99% at a SNR of 21 dB, clearly outperforming [17] in 13 dB.

**Figure 17.** *Comparison of the error detection rate between the bit-accurate and the floating-point model.*

#### **6. Conclusions**

point and fixed-point detection results when bandwidth parameter, *NUL*

*Comparison of the correct detection rate between the bit-accurate and the floating-point model.*

Each subplot in **Figure 16** presents the comparison of the achieved correct detection rates versus signal-to-noise ratio (SNR) in dB for a given offset, *nRA*

**Figure 17** presents the comparison of the achieved error detection rates versus SNR for a given offset. For both plots the probability of false alarm (*Pfa*) is made equal to 0.1%. The plots demonstrate the high accuracy of the proposed algorithm and corresponding architecture in translating the received PRACH preambles to

An important requirement for the PRACH receiver is that it must be capable of serving a huge number of UE devices per cell maintaining a reasonable detection probability and providing them with quasi-instantaneous access to the radio resources, while keeping the false alarm rate to low levels. The probability of a correct detection of the RACH preambles at the receiver side ought to be greater than or equal to 99% at an SNR of 8*.*0 dB, as defined in Section 8.3.4.1 of [17]. By analysing **Figure 16**, it is possible to see that the proposed algorithm achieves

a probability of correct detection greater than 99% at a SNR of 21 dB, clearly

RBs (10 MHz).

*Field Programmable Gate Arrays (FPGAs) II*

**Figure 16.**

baseband.

**80**

outperforming [17] in 13 dB.

*RB*, is set to 50

*PRB*.

A hardware-efficient algorithm and architecture for translating PRACH preambles into baseband featuring high-accuracy and low-complexity characteristics has been presented. This paper is an extension of a previous work where we only introduced some superficial aspects of the proposed hardware architecture. In this paper we present theoretical derivations showing how to arrive at the equations used to design and implement the proposed architecture. We provide the pseudocode for the proposed algorithm and discuss the advantage of the proposed architecture in terms of memory utilisation when comparing our proposed solution with an approach where the full period of the complex exponential is stored in FPGA memory. The proposed architecture is optimised to shrink the use of BRAMs, multipliers and logical resources. The low resource utilisation exhibited by the proposed architecture demonstrates its feasibility to be employed as part of large physical layer designs or to be used in FPGAs with small amount of logical elements.

The corresponding hardware architecture has been developed and employed in the PRACH receiver. Implementation and simulation results have demonstrated

the efficiency, accuracy and low complexity of the proposed algorithm and architecture. Finally, this paper provides detailed information on the architectural design that was tested on an FPGA device for real-time LTE applications.

**References**

2014

2014

Wiley; 2013

[1] Zarrinkoub H. Overview of the LTE Physical Layer. Hoboken, New Jersey, USA: John Wiley & Sons; 2014

*DOI: http://dx.doi.org/10.5772/intechopen.91339*

*An Efficient FPGA-Based Frequency Shifter for LTE/LTE-A Systems*

[10] Chu DC. Polyphase codes with good periodic correlation properties. IEEE Transactions on Information Theory.

[11] Yang X, Fapojuwo AO. Enhanced preamble detection for PRACH in LTE. In: IEEE Wireless Communications and Networking Conference (WCNC). 2013

[12] de Figueiredo FAP et al. Multi-stage based cross-correlation peak detection for LTE random access preambles. Revista Telecomunicações. 2013;**15**:1-7

[14] Kadam S, Sasidaran D, Awawdeh A, Johnson L, Soderstrand M. Comparison of various numerically controlled oscillators. In: Midwest Symposium on Circuits and Systems (MWSCAS). 2002

[15] Xilinx. DS246—DDS Logic Core Product Specification—v5. 2005

Lenzi KG, Bianco Filho JA, Figueiredo FL. A modified ca-cfar method for lte random access detection. In: 7th International Conference on Signal Processing and Communication Systems (ICSPCS). 2013. pp. 1-6

(Release 10). 2007

[16] de Figueiredo FAP, Cardoso FACM,

[17] 3GPP TS 36.104. Base Station (BS) radio transmission and reception

[13] Ranabhatt NA, Agarwal S, Bhattar RK, Gandhi PP. Design and implementation of numerical controlled oscillator on FPGA. In: International Conference on Wireless and Optical Communications Networks (WOCN).

2013

1972;**18**(4):531-532

[2] Kanchi S, Sandilya S, Bhosale D, Pitkar A, Gondhalekar M. Overview of LTE-A technology. In: IEEE Global High Tech Congress on Electronics (GHTCE).

[3] Rumney M. LTE and the Evolution to 4G Wireless: Design and Measurement Challenges. Hoboken, New Jersey, USA:

[4] Sesia S, Toufik I, Baker M. LTE—The UMTS Long Term Evolution: From Theory to Practice. Hoboken, New Jersey, USA: John Wiley & Sons; 2011

[5] de Figueiredo FAP, Mathilde FS, Cardoso FACM, Vilela RM, Miranda JP. Efficient FPGA-based implementation of a CAZAC sequence generator for 3GPP LTE. In: IEEE International Conference on Re-ConFigurable Computing and FPGAs (ReConFig 14).

[6] de Andrade TPC, Astudillo CA, Sekijima LR, da Fonseca NLS. The random access procedure in long term evolution networks for the internet of

[7] 3GPP TS 36.211. Physical Channels and Modulation (Release 10). 2009

[9] Frank RL, Zadoff SA, Heimiller R. Phase shift pulse codes with good periodic correlation properties. IRE Transactions on Information Theory.

[8] Figueiredo FAP, Mathilde FS, Figueiredo FL, Cardoso FACM. An FPGA-based time-domain frequency shifter with application to LTE and LTE-A systems. In: IEEE Latin American Symposium on Circuits & Systems

(LASCAS). 2015

1961;**7**:254-257

**83**

things. IEEE Communications Magazine. 2017;**55**(3):124-131

#### **Author details**

Felipe A.P. de Figueiredo1,2\* and Fabbryccio A.C.M. Cardoso3

1 Instituto Nacional de Telecomunicações—INATEL, Santa Rita do Sapucaí, MG, Brazil

2 Department of Information Technology, IDLab, Ghent University—imec, Ghent, Belgium

3 CPqD—Research and Development Center on Telecommunication, Campinas, SP, Brazil

\*Address all correspondence to: felipe.figueiredo@inatel.br

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

the efficiency, accuracy and low complexity of the proposed algorithm and architecture. Finally, this paper provides detailed information on the architectural design

that was tested on an FPGA device for real-time LTE applications.

*Field Programmable Gate Arrays (FPGAs) II*

Felipe A.P. de Figueiredo1,2\* and Fabbryccio A.C.M. Cardoso3

\*Address all correspondence to: felipe.figueiredo@inatel.br

provided the original work is properly cited.

1 Instituto Nacional de Telecomunicações—INATEL, Santa Rita do Sapucaí, MG,

2 Department of Information Technology, IDLab, Ghent University—imec, Ghent,

3 CPqD—Research and Development Center on Telecommunication, Campinas, SP,

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

**Author details**

Brazil

Belgium

Brazil

**82**

[1] Zarrinkoub H. Overview of the LTE Physical Layer. Hoboken, New Jersey, USA: John Wiley & Sons; 2014

[2] Kanchi S, Sandilya S, Bhosale D, Pitkar A, Gondhalekar M. Overview of LTE-A technology. In: IEEE Global High Tech Congress on Electronics (GHTCE). 2014

[3] Rumney M. LTE and the Evolution to 4G Wireless: Design and Measurement Challenges. Hoboken, New Jersey, USA: Wiley; 2013

[4] Sesia S, Toufik I, Baker M. LTE—The UMTS Long Term Evolution: From Theory to Practice. Hoboken, New Jersey, USA: John Wiley & Sons; 2011

[5] de Figueiredo FAP, Mathilde FS, Cardoso FACM, Vilela RM, Miranda JP. Efficient FPGA-based implementation of a CAZAC sequence generator for 3GPP LTE. In: IEEE International Conference on Re-ConFigurable Computing and FPGAs (ReConFig 14). 2014

[6] de Andrade TPC, Astudillo CA, Sekijima LR, da Fonseca NLS. The random access procedure in long term evolution networks for the internet of things. IEEE Communications Magazine. 2017;**55**(3):124-131

[7] 3GPP TS 36.211. Physical Channels and Modulation (Release 10). 2009

[8] Figueiredo FAP, Mathilde FS, Figueiredo FL, Cardoso FACM. An FPGA-based time-domain frequency shifter with application to LTE and LTE-A systems. In: IEEE Latin American Symposium on Circuits & Systems (LASCAS). 2015

[9] Frank RL, Zadoff SA, Heimiller R. Phase shift pulse codes with good periodic correlation properties. IRE Transactions on Information Theory. 1961;**7**:254-257

[10] Chu DC. Polyphase codes with good periodic correlation properties. IEEE Transactions on Information Theory. 1972;**18**(4):531-532

[11] Yang X, Fapojuwo AO. Enhanced preamble detection for PRACH in LTE. In: IEEE Wireless Communications and Networking Conference (WCNC). 2013

[12] de Figueiredo FAP et al. Multi-stage based cross-correlation peak detection for LTE random access preambles. Revista Telecomunicações. 2013;**15**:1-7

[13] Ranabhatt NA, Agarwal S, Bhattar RK, Gandhi PP. Design and implementation of numerical controlled oscillator on FPGA. In: International Conference on Wireless and Optical Communications Networks (WOCN). 2013

[14] Kadam S, Sasidaran D, Awawdeh A, Johnson L, Soderstrand M. Comparison of various numerically controlled oscillators. In: Midwest Symposium on Circuits and Systems (MWSCAS). 2002

[15] Xilinx. DS246—DDS Logic Core Product Specification—v5. 2005

[16] de Figueiredo FAP, Cardoso FACM, Lenzi KG, Bianco Filho JA, Figueiredo FL. A modified ca-cfar method for lte random access detection. In: 7th International Conference on Signal Processing and Communication Systems (ICSPCS). 2013. pp. 1-6

[17] 3GPP TS 36.104. Base Station (BS) radio transmission and reception (Release 10). 2007

**Chapter 5**

**Abstract**

DWT-PCA method.

**1. Introduction**

**85**

Algorithms

*and Pullakura Rajesh Kumar*

VLSI Implementation of Medical

Nowadays, the usage of DIP is more important in the medical field to identify

**Keywords:** application specified integrated chips, discrete wavelet transform, field programmable gate array, principle component analysis, maximum selection rule

In recent years, Image Fusion (IF) importance has increased rapidly. The process of combining two or more images into one image is called as IF. Through this, all kinds of information are possible to take from the different images [1]. Based on the image stage, the fusion has been classified into two types, those are transform domain and spatial domain fusion [2]. IF is used in so many applications like medical, automated industry, engineering field, military, etc. [3]. Among all those fields, medical field application is more important in IF which helps to identify the human problems [4]. In medical, two major models like MRI and CT scan

the activities of the patients related to various diseases. Magnetic Resonance Imaging (MRI) and Computer Tomography (CT) scan images are used to perform the fusion process. In brain medical image, MRI scan is used to show the brain structural information without functional data. But, CT scan image is included the functional data with brain activity. To improve the low dose CT scan, hybrid algorithm is introduced in this paper which is implemented in FPGA. The main objective of this work is to optimize performances of the hardware. This work is implemented in FPGA. The combination of Discrete Wavelet Transform (DWT) and Principle Component Analysis (PCA) is known as hybrid algorithm. The Maximum Selection Rule (MSR) is used to select the high frequency component from DWT. These three algorithms have RTL architecture which is implemented by Verilog code. Application Specified Integrated Chips (ASIC) and Field Programmable Gate Array (FPGA) performances analyzed for the different methods. In 180 nm technology, DWT-PCA-IF architecture achieved 5.145 mm<sup>2</sup> area, 298.25 mW power, and 124 ms delay. From the fused medical image, mean, Standard Deviation (SD), entropy, and Mutual Information (MI) performances are evaluated for

Image Fusion Using DWT-PCA

*Surya Prasada Rao Borra, Rajesh K. Panakala*

#### **Chapter 5**

## VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms

*Surya Prasada Rao Borra, Rajesh K. Panakala and Pullakura Rajesh Kumar*

#### **Abstract**

Nowadays, the usage of DIP is more important in the medical field to identify the activities of the patients related to various diseases. Magnetic Resonance Imaging (MRI) and Computer Tomography (CT) scan images are used to perform the fusion process. In brain medical image, MRI scan is used to show the brain structural information without functional data. But, CT scan image is included the functional data with brain activity. To improve the low dose CT scan, hybrid algorithm is introduced in this paper which is implemented in FPGA. The main objective of this work is to optimize performances of the hardware. This work is implemented in FPGA. The combination of Discrete Wavelet Transform (DWT) and Principle Component Analysis (PCA) is known as hybrid algorithm. The Maximum Selection Rule (MSR) is used to select the high frequency component from DWT. These three algorithms have RTL architecture which is implemented by Verilog code. Application Specified Integrated Chips (ASIC) and Field Programmable Gate Array (FPGA) performances analyzed for the different methods. In 180 nm technology, DWT-PCA-IF architecture achieved 5.145 mm<sup>2</sup> area, 298.25 mW power, and 124 ms delay. From the fused medical image, mean, Standard Deviation (SD), entropy, and Mutual Information (MI) performances are evaluated for DWT-PCA method.

**Keywords:** application specified integrated chips, discrete wavelet transform, field programmable gate array, principle component analysis, maximum selection rule

#### **1. Introduction**

In recent years, Image Fusion (IF) importance has increased rapidly. The process of combining two or more images into one image is called as IF. Through this, all kinds of information are possible to take from the different images [1]. Based on the image stage, the fusion has been classified into two types, those are transform domain and spatial domain fusion [2]. IF is used in so many applications like medical, automated industry, engineering field, military, etc. [3]. Among all those fields, medical field application is more important in IF which helps to identify the human problems [4]. In medical, two major models like MRI and CT scan

helps to analyze the normal and abnormal tissue and internal structure of human body because both MRI and CT contain some different information of the human brain [5]. MRI scan is used for soft tissue which detects the skull problems as well as CT scan is used for hard tissue to identify the bone structure [6]. Earlier many techniques used in IF like pixel level based, decision level, and feature level based [7]. Many of the existing algorithm has been used for IF process such as Electrical Capacitance Tomography (ECT) algorithm [8], Non-Subsample Contour let Transform (NSCT) [9], sparse representation and decision [10], Curvelet transform [11], hybrid Entropy concept [12], hybrid Dual tree complex wavelet transform [13], and hybrid IF and image registration [14]. The main problem with these methods is information loss. To check the hardware utilization and improve the efficiency, the IF has been implemented in FPGA. The way of implementation is also different in FPGA. In FPGA, DWT [15], multi model method [16], and configurable pixel level [17] methods have been implemented for IF process. The hardware utilization of these methods is high. To overcome these problems, hybrid algorithm with the maximum selection rule is implemented in this paper. From the DWT, high frequency component signal only processes the MSR and output of this is given to the Inverse DWT. The combination of DWT and PCA is named as hybrid algorithm. The PCA output gives the IF output. These methods implemented in FPGA architecture to improve the efficiency of the IF. At last, FPGA and ASIC performances improved in proposed method compared to conventional methods. Mean, Standard Deviation (SD), Entropy, and Mutual Information (MI) performances also calculated for all the algorithms. The rest of the paper is organized as follows: Section-2 elaborates the literature survey, Section-3 describes the proposed method, Section-4 discusses the experimental results, and Conclusion is summarized in section-5.

NSCT with all sub bands. The accuracy, contrast, and versatility was also evaluated.

Bhaskar and Munde [22] proposed image fusion using Non-Subsampled Shearlet Transform (NST) in FPGA implementation. Input image was separated into individual image co-efficient using NST. Different rules were applied to fuse the high and low bands. With the help of inverse NST, the fused image was taken. This proposed method was implemented in Xilinx system generator and MATLAB. The power value was reduced in proposed method. But, the hardware utilization of this

Agarwal and Bedi [23] presented hybrid image fusion for medical diagnosis. In

Sanjay et al. [24] proposed IF based on DWT and type-2 fuzzy logic. In this paper, CT and MRI images were fused with the help of hybrid method. The fused low level bands and high level bands were reconstructed to perform the IDWT. This hybrid algorithm fails to use more logic function and analyses the hardware utilization.

Image Fusion is one of the important processes for obtaining more information from different images. The overall process of image fusion is shown in **Figure 1**.

• The input CT image is read into MATLAB and the pixel is converted to binary

this paper, wavelet and Curvelet transforms were used to perform the IF. The segmented blocks were fused into sub bands using Curvelet transform. The resolu-

tion of the fused image is too less which affects quality of the image.

value. These binary values are stored in a text file.

• The same process is applied to MRI images also.

The main drawback of this method is low spatial resolution.

*DOI: http://dx.doi.org/10.5772/intechopen.91298*

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms*

proposed method is high.

**3. DWT-PCA-IF architecture**

**Figure 1.**

**87**

*Block diagram of entire process.*

#### **2. Literature review**

Mishra et al. [18] presented Modified Frei-chen based image fusion method. This method was utilized in Structural Similarity (SS), and contrast in Night Vision (NV) based two-scale decomposition. This method achieved 48%, 15%, and 100% of improvements in total edge transfer, SS, and NV. This architecture was implemented in the Xilinx tool which consumes 4% of resources. This proposed method was analyzed in synopsis tool with 90 nm CMOS technology. This algorithm provides less accuracy and less fusion efficiency.

Bavirisetti and Dhulli [19] proposed two scale image fusion using saliency detection. This method was used for Saliency extraction process, which can highlight the significant information. This works gave better results compared to multiscale fusion technique. This method failed to process the medical images perfectly.

Pemmaraju et al. [20] presented wavelet based image fusion using FPGA. This proposed method was implemented in Xilinx EDK 10.1 using Spartan 3E. This FPGA contains combinational blocks which are flexible for high speed application. This architecture contains memory, flip flops, and LUT. This proposed method was applied to multi focus image fusion. DWT does not provide stationary outputs and low frequency component has less efficiency.

Yang et al. [21] proposed multi model based image fusion based on fuzzy logic. With the help of type 2 fuzzy, NSCT was analyzed using pre-registered source image for getting low and high bands. Low frequency bands are used by local energy algorithm. The proposed fused image was taken with the help of inverse

#### *VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms DOI: http://dx.doi.org/10.5772/intechopen.91298*

NSCT with all sub bands. The accuracy, contrast, and versatility was also evaluated. The main drawback of this method is low spatial resolution.

Bhaskar and Munde [22] proposed image fusion using Non-Subsampled Shearlet Transform (NST) in FPGA implementation. Input image was separated into individual image co-efficient using NST. Different rules were applied to fuse the high and low bands. With the help of inverse NST, the fused image was taken. This proposed method was implemented in Xilinx system generator and MATLAB. The power value was reduced in proposed method. But, the hardware utilization of this proposed method is high.

Agarwal and Bedi [23] presented hybrid image fusion for medical diagnosis. In this paper, wavelet and Curvelet transforms were used to perform the IF. The segmented blocks were fused into sub bands using Curvelet transform. The resolution of the fused image is too less which affects quality of the image.

Sanjay et al. [24] proposed IF based on DWT and type-2 fuzzy logic. In this paper, CT and MRI images were fused with the help of hybrid method. The fused low level bands and high level bands were reconstructed to perform the IDWT. This hybrid algorithm fails to use more logic function and analyses the hardware utilization.

#### **3. DWT-PCA-IF architecture**

helps to analyze the normal and abnormal tissue and internal structure of human body because both MRI and CT contain some different information of the human brain [5]. MRI scan is used for soft tissue which detects the skull problems as well as CT scan is used for hard tissue to identify the bone structure [6]. Earlier many techniques used in IF like pixel level based, decision level, and feature level based [7]. Many of the existing algorithm has been used for IF process such as Electrical Capacitance Tomography (ECT) algorithm [8], Non-Subsample Contour let Transform (NSCT) [9], sparse representation and decision [10], Curvelet transform [11], hybrid Entropy concept [12], hybrid Dual tree complex wavelet transform [13], and hybrid IF and image registration [14]. The main problem with these methods is information loss. To check the hardware utilization and improve the efficiency, the IF has been implemented in FPGA. The way of implementation is also different in FPGA. In FPGA, DWT [15], multi model method [16], and configurable pixel level [17] methods have been implemented for IF process. The hardware utilization of these methods is high. To overcome these problems, hybrid algorithm with the maximum selection rule is implemented in this paper. From the DWT, high frequency component signal only processes the MSR and output of this is given to the Inverse DWT. The combination of DWT and PCA is named as hybrid algorithm. The PCA output gives the IF output. These methods implemented in FPGA architecture to improve the efficiency of the IF. At last, FPGA and ASIC performances improved in proposed method compared to conventional methods. Mean, Standard Deviation (SD), Entropy, and Mutual Information (MI) performances also calculated for all the algorithms. The rest of the paper is organized as follows: Section-2 elaborates the literature survey, Section-3 describes the proposed method, Section-4 discusses the experimental results, and Conclusion is summa-

*Field Programmable Gate Arrays (FPGAs) II*

Mishra et al. [18] presented Modified Frei-chen based image fusion method. This

method was utilized in Structural Similarity (SS), and contrast in Night Vision (NV) based two-scale decomposition. This method achieved 48%, 15%, and 100%

Bavirisetti and Dhulli [19] proposed two scale image fusion using saliency detection. This method was used for Saliency extraction process, which can highlight the significant information. This works gave better results compared to multiscale fusion technique. This method failed to process the medical images perfectly. Pemmaraju et al. [20] presented wavelet based image fusion using FPGA. This proposed method was implemented in Xilinx EDK 10.1 using Spartan 3E. This FPGA contains combinational blocks which are flexible for high speed application. This architecture contains memory, flip flops, and LUT. This proposed method was applied to multi focus image fusion. DWT does not provide stationary outputs and

Yang et al. [21] proposed multi model based image fusion based on fuzzy logic. With the help of type 2 fuzzy, NSCT was analyzed using pre-registered source image for getting low and high bands. Low frequency bands are used by local energy algorithm. The proposed fused image was taken with the help of inverse

of improvements in total edge transfer, SS, and NV. This architecture was implemented in the Xilinx tool which consumes 4% of resources. This proposed method was analyzed in synopsis tool with 90 nm CMOS technology. This algo-

rithm provides less accuracy and less fusion efficiency.

low frequency component has less efficiency.

rized in section-5.

**86**

**2. Literature review**

Image Fusion is one of the important processes for obtaining more information from different images. The overall process of image fusion is shown in **Figure 1**.

• The input CT image is read into MATLAB and the pixel is converted to binary value. These binary values are stored in a text file.

• The same process is applied to MRI images also.

**Figure 1.** *Block diagram of entire process.*


These three results are given to the IDWT process along with low frequency component (LL).


#### **3.1 DWT architecture**

For analyzing the signal, wavelet converts the time domain to frequency domain. The DWT is implemented using two major blocks namely Filter Bank (FB) and Lifting Scheme (LS). The DWT is a decimated wavelet transform, where the size of the image reduces by half at each scale. It is easy to convert the spatial domain inputs into frequency domain in wavelet transform [25]. High pass and low pass coefficient series are obtained from the input series *y*0, *y*1, … . *yn*. The high pass and low pass coefficients are represented by using the following two Eqs. (1) and (2).

$$H\_i = \sum\_{n=0}^{l-1} (2j - n) . s\_n(z) \tag{1}$$

Yo ¼ ½Yð1Þ*;* Yð3Þ*;* Yð5Þ … Yð2n � 1Þ� (3) Ye ¼ ½Yð2Þ*;* Yð4Þ*;* Yð6Þ … Yð2nÞ� (4) Q1ðnÞ ¼ YoðnÞ þ aðYeðnÞ þ Yeðn þ 1ÞÞ (5) V1ðnÞ ¼ YeðnÞ þ bðQ1ðnÞ þ Yeðn þ 1ÞÞ (6)

The 2D-DWT architecture and 1D-DWT are shown in **Figures 3** and **4**. The control signals represent as *clk* and *rst*. The odd input and even input are mentioned as *odd*\_*in* [7:0] and *even*\_*in* [7:0]. These two inputs are given to the line buffer to perform even and odd extraction which outputs are given to the PIPO for capturing the data. From that block, four outputs are generated which is given to the lifting block. After processing the lifting block, the final output is generated as detailed

co-efficient *dc*\_*out* [27:0] and significance co-efficient *sc\_out* [23:0].

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.91298*

dcðnÞ ¼ L*:*Q1ðnÞ Here, Q1ðnÞ is scaled by *L*.

**Figure 2.**

**Figure 3.**

**89**

*2D-DWT architecture.*

*Discrete wavelet transform.*

$$L\_i = \sum\_{n=0}^{l-1} (2j - n).t\_n(x) \tag{2}$$

where, the wavelet filters are represented as *sn*(*z*) and *tn*(*z*), and length of the filter is denoted as *l* and *j* = 0, … , [*n*/2] � 1.

The spatial domain DWT is applied in two directions. First, 1D-DWT is applied on the horizontal axis and that results are applied to the vertical axis of 1D-DWT. There are four parts named as *LL*, *LH*, *HL* and *HH* obtained from the 2D-DWT.

The two-dimensional DWT applies to all the rows and columns of an image. If the input image is of size 2*<sup>k</sup>* � <sup>2</sup>*<sup>k</sup>* pixels at level *<sup>L</sup>* + 1 its size will be 2k/2 � 2k/2 The various kinds of decomposition methods are used in wavelets over an image. The DWT is applied to the input image, which is decomposed into four sub image. These sub images are named as sub bands. The *LL* sub band is the coarse level sub image, *HH*, *LH*, and *HL* are the diagonal, vertical and horizontal components of the image respectively. Finally, the input image is decomposed into four major components that is shown in **Figure 2**. A high level 2D-DWT is developed by *LL* frequency and low pass components for multi resolution analysis.

Let assume input image is *Y*.

Here, *Y* is splitting into two different bands such as *Yo* and *Ye*.

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms DOI: http://dx.doi.org/10.5772/intechopen.91298*

**Figure 2.** *Discrete wavelet transform.*

• Both CT and MRI images binary values perform the DWT process which gives four frequency components such as Low High (LH), High Low (HL), High

• These frequency components perform MSR. In this operation, high frequency

• So, HH, HL, and LH frequency components performed MSR operation which

• After performing IDWT, both results are given to the PCA component which

• DWT, MSR, IDWT and PCA are implemented in Verilog and the final output is

• With the help of MATLAB, that binary values are converted to pixel which

For analyzing the signal, wavelet converts the time domain to frequency domain. The DWT is implemented using two major blocks namely Filter Bank (FB) and Lifting Scheme (LS). The DWT is a decimated wavelet transform, where the size of the image reduces by half at each scale. It is easy to convert the spatial domain inputs into frequency domain in wavelet transform [25]. High pass and low pass coefficient series are obtained from the input series *y*0, *y*1, … . *yn*. The high pass and low pass coefficients are represented by using the following two

> *Hi* <sup>¼</sup> <sup>X</sup> *l*�1

*Li* <sup>¼</sup> <sup>X</sup> *l*�1

filter is denoted as *l* and *j* = 0, … , [*n*/2] � 1.

low pass components for multi resolution analysis.

Here, *Y* is splitting into two different bands such as *Yo* and *Ye*.

Let assume input image is *Y*.

**88**

*n*¼0

*n*¼0

where, the wavelet filters are represented as *sn*(*z*) and *tn*(*z*), and length of the

The spatial domain DWT is applied in two directions. First, 1D-DWT is applied on the horizontal axis and that results are applied to the vertical axis of 1D-DWT. There are four parts named as *LL*, *LH*, *HL* and *HH* obtained from the 2D-DWT. The two-dimensional DWT applies to all the rows and columns of an image. If the input image is of size 2*<sup>k</sup>* � <sup>2</sup>*<sup>k</sup>* pixels at level *<sup>L</sup>* + 1 its size will be 2k/2 � 2k/2 The various kinds of decomposition methods are used in wavelets over an image. The DWT is applied to the input image, which is decomposed into four sub image. These sub images are named as sub bands. The *LL* sub band is the coarse level sub image, *HH*, *LH*, and *HL* are the diagonal, vertical and horizontal components of the image respectively. Finally, the input image is decomposed into four major components that is shown in **Figure 2**. A high level 2D-DWT is developed by *LL* frequency and

ð Þ 2*j* � *n :sn*ð Þ*z* (1)

ð Þ 2*j* � *n :tn*ð Þ*z* (2)

These three results are given to the IDWT process along with low frequency

High (HH), and Low Low (LL).

*Field Programmable Gate Arrays (FPGAs) II*

component only required.

gives three results.

gives the fused image.

written in text file.

**3.1 DWT architecture**

Eqs. (1) and (2).

shows the fused image.

component (LL).

$$\mathbf{Y\_o = [Y(1), \ Y(3), \ Y(5) \dots Y(2n - 1)]} \tag{3}$$

$$\mathbf{Y\_e = [Y(2), \ Y(4), \ Y(6) \dots Y(2n)]} \tag{4}$$

$$\mathbf{Q\_1(n)} = \mathbf{Y\_o(n)} + \mathbf{a(Y\_e(n) + Y\_e(n+1))}\tag{5}$$

$$\mathbf{V\_1(n)} = \mathbf{Y\_e(n)} + \mathbf{b(Q\_1(n) + Y\_e(n+1))}\tag{6}$$

dcðnÞ ¼ L*:*Q1ðnÞ Here, Q1ðnÞ is scaled by *L*.

The 2D-DWT architecture and 1D-DWT are shown in **Figures 3** and **4**. The control signals represent as *clk* and *rst*. The odd input and even input are mentioned as *odd*\_*in* [7:0] and *even*\_*in* [7:0]. These two inputs are given to the line buffer to perform even and odd extraction which outputs are given to the PIPO for capturing the data. From that block, four outputs are generated which is given to the lifting block. After processing the lifting block, the final output is generated as detailed co-efficient *dc*\_*out* [27:0] and significance co-efficient *sc\_out* [23:0].

**Figure 3.** *2D-DWT architecture.*

**Figure 4.** *1D-DWT architecture.*

#### **3.2 Maximum selection rule**

The MSR diagram is shown in **Figure 5**. This rule is applicable for the high frequency component. So that *HH*, *HL*, *LH* frequency values perform the MSR operation. Both DWT output values are connected to the MUX for choosing maximum value.

processing. FSM is very effective for controlling the remaining signal [26]. These outputs are helpful to perform the image fusion. The fused architecture binary

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.91298*

In this section, the experimental simulation results and discussion of the proposed methodology is detailed effectively in terms of performance measure. The performance of the proposed methodology was evaluated by ASIC and FPGA per-

The input images (CT and MRI) are shown in **Figures 7** and **8**. These images are

output is read in MATLAB for showing fused image.

converted to binary which are shown in **Figures 9** and **10**.

**4. Experimental results and discussion**

formances.

**Figure 7.** *Input CT image.*

**91**

**Figure 6.** *PCA architecture.*

**4.1 Discussion**

These outputs are given to the IDWT for changing the frequency domain into the time domain.

#### **3.3 PCA architecture**

The architecture of PCA is shown in **Figure 6** which contains control engine, covariance matrix, MUX, multiplier, adder, and comparator. With the help of detected spike waveform, the covariance matrix is calculated. The covariance matrix is called as PC spike waveform. The MAC address is used for distilling and orthogonalization process to improve the PCA efficiency. Comparator and right shift are used to shift the procedure and level checking. The entire algorithm split into four processing units and the data is stored in register files. Finite State Machine (FSM) is used for scheduling and allocating the resources during the PCA

**Figure 5.** *Maximum selection rule diagram.*

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms DOI: http://dx.doi.org/10.5772/intechopen.91298*

**3.2 Maximum selection rule**

*Field Programmable Gate Arrays (FPGAs) II*

maximum value.

*1D-DWT architecture.*

**Figure 4.**

the time domain.

**Figure 5.**

**90**

*Maximum selection rule diagram.*

**3.3 PCA architecture**

The MSR diagram is shown in **Figure 5**. This rule is applicable for the high frequency component. So that *HH*, *HL*, *LH* frequency values perform the MSR operation. Both DWT output values are connected to the MUX for choosing

These outputs are given to the IDWT for changing the frequency domain into

The architecture of PCA is shown in **Figure 6** which contains control engine, covariance matrix, MUX, multiplier, adder, and comparator. With the help of detected spike waveform, the covariance matrix is calculated. The covariance matrix is called as PC spike waveform. The MAC address is used for distilling and orthogonalization process to improve the PCA efficiency. Comparator and right shift are used to shift the procedure and level checking. The entire algorithm split into four processing units and the data is stored in register files. Finite State Machine (FSM) is used for scheduling and allocating the resources during the PCA processing. FSM is very effective for controlling the remaining signal [26]. These outputs are helpful to perform the image fusion. The fused architecture binary output is read in MATLAB for showing fused image.

#### **4. Experimental results and discussion**

In this section, the experimental simulation results and discussion of the proposed methodology is detailed effectively in terms of performance measure. The performance of the proposed methodology was evaluated by ASIC and FPGA performances.

#### **4.1 Discussion**

The input images (CT and MRI) are shown in **Figures 7** and **8**. These images are converted to binary which are shown in **Figures 9** and **10**.

**Figure 7.** *Input CT image.*

**Figure 8.** *Input MRI image.*

**Figure 9.** *Binary value of CT image.*

The ASIC performance of the different methods are tabulated in **Table 1**. In this table, values of ASIC performance of the Existing-I [18], existing-II [20], existing-III [22], and DWT-PCA-IF are compared.

**4.2 Comparative analysis**

**Figure 10.**

**Table 1.**

**93**

*Binary value of MRI image.*

**Technology Method Area (mm<sup>2</sup>**

*Comparison of area, power, and delay for different methods.*

In this work, three papers have been compared with proposed method. A. Mishra, S. Mahapatra, and S. Banerjee [18], applied modified Frei-chen operator based IF for real time applications. Scalable decomposition was used to perform the fusion operation which was implemented in Virtex 4 FPGA. The overall architecture RTL was too complex to perform the IF algorithm which caused more area. Pemmaraju et al. [20], implemented IF based on DWT using FPGA. This algorithm was implemented in Xilinx EDK 10.1 FPGA Spartan 3E hardware. There is no explanation of RTL architecture, and .ucf file. Due to use of wavelet, the power consumption is too high. Bhaskar and Munde [22] performed IF based on nonsubsampled shearlet transform. Xilinx system generator was used to implement this

180 nm Existing-I [l8] 8.471 387.1 180

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.91298*

45 nm Existing-I [l8] 3.014 198.25 104

Existing-II [20] 7.321 345.71 158 Existing-III [22] 6.214 314.21 143 DWT-PCA-IF 5.145 298.25 124

Existing-II [20] 2.987 168.12 101 Existing-III [22] 2.158 148.687 98 DWT-PCA-IF 1.982 111.21 91

**) Power (mW) Delay (ms)**

The comparison of ASIC performances is tabulated in **Table 1**. Here, all the methods are implemented and the results are tabulated. All the methods are implemented in the cadence RTL compiler with 180 and 45 nm technology. From this table, it's clear that DWT-PCA-IF provides better performances when compared to previous existing architectures.

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms DOI: http://dx.doi.org/10.5772/intechopen.91298*


**Figure 10.** *Binary value of MRI image.*


#### **Table 1.**

*Comparison of area, power, and delay for different methods.*

#### **4.2 Comparative analysis**

In this work, three papers have been compared with proposed method. A. Mishra, S. Mahapatra, and S. Banerjee [18], applied modified Frei-chen operator based IF for real time applications. Scalable decomposition was used to perform the fusion operation which was implemented in Virtex 4 FPGA. The overall architecture RTL was too complex to perform the IF algorithm which caused more area. Pemmaraju et al. [20], implemented IF based on DWT using FPGA. This algorithm was implemented in Xilinx EDK 10.1 FPGA Spartan 3E hardware. There is no explanation of RTL architecture, and .ucf file. Due to use of wavelet, the power consumption is too high. Bhaskar and Munde [22] performed IF based on nonsubsampled shearlet transform. Xilinx system generator was used to implement this

The ASIC performance of the different methods are tabulated in **Table 1**. In this table, values of ASIC performance of the Existing-I [18], existing-II [20], existing-

The comparison of ASIC performances is tabulated in **Table 1**. Here, all the methods are implemented and the results are tabulated. All the methods are implemented in the cadence RTL compiler with 180 and 45 nm technology. From this table, it's clear that DWT-PCA-IF provides better performances when com-

III [22], and DWT-PCA-IF are compared.

**Figure 8.** *Input MRI image.*

*Field Programmable Gate Arrays (FPGAs) II*

**Figure 9.**

**92**

*Binary value of CT image.*

pared to previous existing architectures.

design with MATLAB tool. The fused image affected by more noise and it require more hardware utilization.

The comparison graph of area, power, and delay are shown in **Figures 11**–**13**. The dark blue bar graph is represented as DWT-PCA-IF architecture. All the ASIC performance is reduced due to the hybrid algorithm.

The FPGA performances are tabulated in **Table 2**. In this table, Virtex 4 and Virtex 5 devices are used to evaluate LUT, flip flop, slices, and frequency. These values are shows that the DWT-PCA-IF architecture achieves better FPGA performance parameters.

The comparison graph of LUT, Flip flop, slices, and frequency are shown in **Figures 14**–**17**. The hardware utilizations are evaluated from this FPGA performance. The top module and 2D DWT and 1D DWT RTL schematic diagram are shown in **Figures 18** and **19**.

**Figure 13.**

**Table 2.**

**Figure 14.**

**95**

*Comparison of LUT for different methods.*

*Comparison of delay performance for different methods.*

*DOI: http://dx.doi.org/10.5772/intechopen.91298*

*Comparison of FPGA performances for different methods.*

**Devices Method LUT Flip flop slices Frequency** Virtex 4 Existing-I [l8] 4038 4852 2857 250.3

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms*

Virtex 5 Existing-I [l8] 3104 4125 1964 185.41

Existing-II [20] 4002 4657 2654 289.64 Existing-III [22] 3541 4214 2011 314.21 DWT-PCA-IF 3014 3987 1968 355.14

Existing-II [20] 3014 4032 1847 193.21 Existing-III [22] 2987 3987 1752 255.14 DWT-PCA-IF 2741 3789 1648 287.96

**Figure 11.** *Comparison of area performance for different methods.*

**Figure 12.** *Comparison of power performance for different methods.*

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms DOI: http://dx.doi.org/10.5772/intechopen.91298*

#### **Figure 13.**

design with MATLAB tool. The fused image affected by more noise and it require

The comparison graph of area, power, and delay are shown in **Figures 11**–**13**. The dark blue bar graph is represented as DWT-PCA-IF architecture. All the ASIC

The FPGA performances are tabulated in **Table 2**. In this table, Virtex 4 and Virtex 5 devices are used to evaluate LUT, flip flop, slices, and frequency. These values are shows that the DWT-PCA-IF architecture achieves better FPGA perfor-

The comparison graph of LUT, Flip flop, slices, and frequency are shown in **Figures 14**–**17**. The hardware utilizations are evaluated from this FPGA performance. The top module and 2D DWT and 1D DWT RTL schematic diagram are

more hardware utilization.

*Field Programmable Gate Arrays (FPGAs) II*

mance parameters.

**Figure 11.**

**Figure 12.**

**94**

*Comparison of area performance for different methods.*

*Comparison of power performance for different methods.*

shown in **Figures 18** and **19**.

performance is reduced due to the hybrid algorithm.

*Comparison of delay performance for different methods.*


#### **Table 2.**

*Comparison of FPGA performances for different methods.*

**Figure 14.** *Comparison of LUT for different methods.*

**Figure 15.** *Comparison of flip flop for different methods.*

**Figure 16.** *Comparison of slices for different methods.*

The performance evaluation for different methods is given in **Table 3**. Here, some of the performances are evaluated such as Mean, Standard Deviation (SD), Entropy, and Mutual information (MI). This performance evaluated for fused medical image. From this table, it is clears that DWT-PCA gives better performances than existing methods. Finally, the fused image is shown in **Figure 20**. The above RTL schematics are taken from the Xilinx tool.

provided better fused image compared to previous works. The DWT-PCA-IF architecture was implemented using Verilog code. DWT and PCA method were used to reduce the power and area consumption. The ASIC and FPGA performance were analyzed for different architectures. In 180 nm technology, DWT-PCA-IF architecture achieved 5.145 mm2 area, 298.25 mW power, and 124 ms delay. In Virtex 4, the proposed architecture achieved 3014 LUT, 3987 flip flop, 1968 slices, and 355.14 MHz frequency. From the fused image, 55.658 mean, 53.14 SD, 9.621 entropy, and 3.141 MI value has been evaluated. In the future, different kind of optimization algorithm will be designed to improve the ASIC and FPGA

performances.

**97**

**Figure 18.**

**Figure 17.**

*Comparison of frequency for different methods.*

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms*

*DOI: http://dx.doi.org/10.5772/intechopen.91298*

*Top module of DWT architecture.*

#### **5. Conclusion**

The proposed architecture has been designed effectively in order to reduce the hardware utilization. In this work, DWT-PCA-IF architecture has been designed to perform the image fusion. In this work, medical images like MRI and CT have been used in the fusion process to obtain more information. The hybrid VLSI architecture *VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms DOI: http://dx.doi.org/10.5772/intechopen.91298*

**Figure 17.** *Comparison of frequency for different methods.*

**Figure 18.** *Top module of DWT architecture.*

provided better fused image compared to previous works. The DWT-PCA-IF architecture was implemented using Verilog code. DWT and PCA method were used to reduce the power and area consumption. The ASIC and FPGA performance were analyzed for different architectures. In 180 nm technology, DWT-PCA-IF architecture achieved 5.145 mm2 area, 298.25 mW power, and 124 ms delay. In Virtex 4, the proposed architecture achieved 3014 LUT, 3987 flip flop, 1968 slices, and 355.14 MHz frequency. From the fused image, 55.658 mean, 53.14 SD, 9.621 entropy, and 3.141 MI value has been evaluated. In the future, different kind of optimization algorithm will be designed to improve the ASIC and FPGA performances.

The performance evaluation for different methods is given in **Table 3**. Here, some of the performances are evaluated such as Mean, Standard Deviation (SD), Entropy, and Mutual information (MI). This performance evaluated for fused medical image. From this table, it is clears that DWT-PCA gives better performances than existing methods. Finally, the fused image is shown in **Figure 20**. The

The proposed architecture has been designed effectively in order to reduce the hardware utilization. In this work, DWT-PCA-IF architecture has been designed to perform the image fusion. In this work, medical images like MRI and CT have been used in the fusion process to obtain more information. The hybrid VLSI architecture

above RTL schematics are taken from the Xilinx tool.

**5. Conclusion**

**96**

**Figure 16.**

**Figure 15.**

*Comparison of flip flop for different methods.*

*Field Programmable Gate Arrays (FPGAs) II*

*Comparison of slices for different methods.*

**Acknowledgements**

*DOI: http://dx.doi.org/10.5772/intechopen.91298*

**Author details**

**99**

Surya Prasada Rao Borra<sup>1</sup>

At the outset, I would like to take this as an opportunity to convey my gratitude to Intechopen publishing house and their team to for their consistent support at every step in bringing out this chapter in their book. The process that followed in reviewing this chapter and giving valuable review remarks helped a lot to meet the standards of this book and IntechOpen publishing house has enriched my writing skills. I would like to extend my sincere gratitude to my Ph.D. supervisor Dr. Rajesh K Panakala and Dr.P.Rajesh Kumar for valuable guidance and continuous encouragement in publishing this chapter. I would like to thank my family members for their love and support and the management of our college, PVP Siddhartha Institute of Technology for their constant encouragement to carryout my research work.

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms*

\*, Rajesh K. Panakala<sup>1</sup> and Pullakura Rajesh Kumar<sup>2</sup>

1 Prasad V. Potluri Siddhartha Institute of Technology, Kanuru, A.P., India

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

2 Andhra University College of Engineering, Visakhapatnam, A.P., India

\*Address all correspondence to: suryaborra1679@gmail.com

provided the original work is properly cited.

#### **Figure 19.**

*Internal schematic of 2D-DWT.*


#### **Table 3.**

*Performance evaluation for different methods.*

**Figure 20.** *Fused image.*

#### **Acknowledgements**

At the outset, I would like to take this as an opportunity to convey my gratitude to Intechopen publishing house and their team to for their consistent support at every step in bringing out this chapter in their book. The process that followed in reviewing this chapter and giving valuable review remarks helped a lot to meet the standards of this book and IntechOpen publishing house has enriched my writing skills. I would like to extend my sincere gratitude to my Ph.D. supervisor Dr. Rajesh K Panakala and Dr.P.Rajesh Kumar for valuable guidance and continuous encouragement in publishing this chapter. I would like to thank my family members for their love and support and the management of our college, PVP Siddhartha Institute of Technology for their constant encouragement to carryout my research work.

#### **Author details**

**Figure 19.**

Fused Image

**Table 3.**

**Figure 20.** *Fused image.*

**98**

*Internal schematic of 2D-DWT.*

**Image Performance DWT**

*Field Programmable Gate Arrays (FPGAs) II*

*Performance evaluation for different methods.*

**[2]**

**Haar [3]**

**Kekre's wavelet [7]**

Mean 44.25 32.53 32.41 45.14 53.22 55.65 SD 40.14 36.07 34.82 51.24 37.44 53.14 Entropy 8.145 5.97 5.9108 47.21 6.63 9.621 MI 0.147 0.39 0.5541 2.12 0.2832 3.141

**DTCWT [13]**

**PCA [24]** **DWT-PCA**

> Surya Prasada Rao Borra<sup>1</sup> \*, Rajesh K. Panakala<sup>1</sup> and Pullakura Rajesh Kumar<sup>2</sup>


\*Address all correspondence to: suryaborra1679@gmail.com

<sup>© 2020</sup> The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Mahajan S, Singh A. A comparative analysis of different image fusion techniques. IPASJ International Journal of Computer Science. 2014;**2**(1):008-015

[2] Hussain DK, Reddy CL, Kumar VA. Implementation of medical image fusion using DWT process on FPGA. International Journal of Computer Applications Technology and Research. 2013;**2**(6):676-679

[3] Phanindra P, Babu JC, Shree VU. VLSI implementation of medical image fusion using Haar transform. International Journal of Scientific and Engineering Research. 2013;**4**(9): 1437-1442

[4] Yang B, Li S. Pixel-level image fusion with simultaneous orthogonal matching pursuit. Information Fusion. 2012;**13**(1): 10-19

[5] Pavithra C, Bhargavi S. Fusion of two images based on wavelet transform. International Journal of Innovative Research in Science, Engineering and Technology. 2013;**2**(5):1814-1819

[6] Jose B, Kumar BS. Design of 2-D DWT VLSI architecture for image processing. International Journal of Engineering Research and Technology. 2014;**3**(4):692-696

[7] B. Kekre H, Sarode T, Dhannawat R. Implementation and comparison of different transform techniques using kekre's wavelet transform for image fusion. International Journal of Computer Applications. 2012;**44**(10): 41-48

[8] Olmos AM, Botella G, Castillo E, Morales DP, Banqueri J, García A. A reconstruction method for electrical capacitance tomography based on image fusion techniques. Digital Signal Processing. 2012;**22**(6):885-893

[9] Bhatnagar G, Wu QMJ, Liu Z. Directive contrast based multimodal medical image fusion in NSCT domain. IEEE Transactions on Multimedia. 2013; **15**(5):1014-1024

[17] Besiris D, Tsagaris V, Fragoulis N, Theoharatos C. An FPGA-based hardware implementation of configurable pixel-level color image fusion. IEEE Transactions on

*DOI: http://dx.doi.org/10.5772/intechopen.91298*

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms*

Engineering and Systems. 2017;**10**(3):

[25] Surya PRB, Panakala RK, Kumar PR. Hybrid image fusion algorithm using DWT maximum selection rule and PCA. International Journal of Scientific and Engineering Research;**8**(8):814-820

[26] Surya PRB, Panakala RK, Kumar PR.

Qualitative analysis of MRI and enhanced low dose CT scan image fusion. In: Proceedings of International Conference on Advanced Computing and Communication Systems. 2017.

355-362

pp. 1752-1757

Geoscience and Remote Sensing. 2012;

[18] Mishra A, Mahapatra S, Banerjee S. Modified Frei-Chen operator-based infrared and visible sensor image fusion for real-time applications. IEEE Sensors

[19] Bavirisetti DP, Dhuli R. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Physics & Technology. 2016;

Journal. 2017;**17**(14):4639-4646

[20] Pemmaraju M, Mashetty SC, Aruva S, Saduvelly M, Edara BB. Implementation of image fusion based on wavelet domain using FPGA. In: Proceedings of International Conference

on Trends in Electronics and Informatics. 2017. pp. 500-504

**16**(10):3735-3745

**101**

[21] Yang Y, Que Y, Huang S, Lin P. Multimodal sensor medical image fusion based on type-2 fuzzy logic in NSCT domain. IEEE Sensors Journal. 2016;

[22] Bhaskar PC, Munde MV. FPGA implementation of non-subsampled Shearlet transform for image fusion. In: Proceedings of International Conference

on Computing, Communication, Control and Automation. 2017. pp. 1-6

[23] Agarwal J, Bedi SS. Implementation of hybrid image fusion technique for feature enhancement in medical diagnosis. Human-Centric Computing and Information Sciences. 2015;**5**(1):3

[24] Sanjay AR, Soundrapandiyan R, Karuppiah M, Ganapathy R. CT and MRI image fusion based on discrete wavelet transform and Type-2 fuzzy logic. International Journal of Intelligent

**50**(2):362

**76**:52-64

[10] Fei Y, Wei G, Zongxi S. Medical image fusion based on feature extraction and sparse representation. International Journal of Biomedical Imaging. 2017; **2017**:1-11

[11] Tank VP, Shah DD, Vyas TV, Chotaliya SB, Manavadaria MS. Image fusion based on wavelet and Curvelet transform. IOSR Journal of VLSI and Signal Processing. 2013;**1**(5):32-36

[12] Sharmila K, Rajkumar S, Vijayarajan V. Hybrid method for multimodality medical image fusion using discrete wavelet transform and entropy concepts with quantitative analysis. In: Proceedings of International Conference on Communications and Signal Processing. 2013. pp. 489-493

[13] Gurjar R. Hybrid image fusion implemented in DTCWT. International Journal of Engineering Technology and Computer Research. 2014;**2**(1):688-692

[14] Bhosle DS, Gorde KS. Image registration and wavelet based hybrid image fusion. IOSR Journal of VLSI and Signal Processing. 2014;**4**(2):1-5

[15] Suraj AA, Francis M, Kavya TS, Nirmal TM. Discrete wavelet transform based image fusion and de-noising in FPGA. Journal of Electrical Systems and Information Technology. 2014;**1**(1): 72-81

[16] Kaur R, Kaur S. An approach for image fusion using PCA and genetic algorithm. International Journal of Computer Applications. 2016;**145**(6): 54-59

*VLSI Implementation of Medical Image Fusion Using DWT-PCA Algorithms DOI: http://dx.doi.org/10.5772/intechopen.91298*

[17] Besiris D, Tsagaris V, Fragoulis N, Theoharatos C. An FPGA-based hardware implementation of configurable pixel-level color image fusion. IEEE Transactions on Geoscience and Remote Sensing. 2012; **50**(2):362

**References**

[1] Mahajan S, Singh A. A comparative analysis of different image fusion techniques. IPASJ International Journal of Computer Science. 2014;**2**(1):008-015

*Field Programmable Gate Arrays (FPGAs) II*

[9] Bhatnagar G, Wu QMJ, Liu Z. Directive contrast based multimodal medical image fusion in NSCT domain. IEEE Transactions on Multimedia. 2013;

[10] Fei Y, Wei G, Zongxi S. Medical image fusion based on feature extraction and sparse representation. International Journal of Biomedical Imaging. 2017;

[11] Tank VP, Shah DD, Vyas TV, Chotaliya SB, Manavadaria MS. Image fusion based on wavelet and Curvelet transform. IOSR Journal of VLSI and Signal Processing. 2013;**1**(5):32-36

[12] Sharmila K, Rajkumar S, Vijayarajan V. Hybrid method for multimodality medical image fusion using discrete wavelet transform and entropy concepts with quantitative analysis. In: Proceedings of International Conference on Communications and Signal Processing. 2013. pp. 489-493

[13] Gurjar R. Hybrid image fusion implemented in DTCWT. International Journal of Engineering Technology and Computer Research. 2014;**2**(1):688-692

[14] Bhosle DS, Gorde KS. Image registration and wavelet based hybrid image fusion. IOSR Journal of VLSI and

Signal Processing. 2014;**4**(2):1-5

72-81

54-59

[15] Suraj AA, Francis M, Kavya TS, Nirmal TM. Discrete wavelet transform based image fusion and de-noising in FPGA. Journal of Electrical Systems and Information Technology. 2014;**1**(1):

[16] Kaur R, Kaur S. An approach for image fusion using PCA and genetic algorithm. International Journal of Computer Applications. 2016;**145**(6):

**15**(5):1014-1024

**2017**:1-11

[2] Hussain DK, Reddy CL, Kumar VA. Implementation of medical image fusion

[3] Phanindra P, Babu JC, Shree VU. VLSI implementation of medical image

International Journal of Scientific and Engineering Research. 2013;**4**(9):

[4] Yang B, Li S. Pixel-level image fusion with simultaneous orthogonal matching pursuit. Information Fusion. 2012;**13**(1):

[5] Pavithra C, Bhargavi S. Fusion of two images based on wavelet transform. International Journal of Innovative Research in Science, Engineering and Technology. 2013;**2**(5):1814-1819

[6] Jose B, Kumar BS. Design of 2-D DWT VLSI architecture for image processing. International Journal of Engineering Research and Technology.

[7] B. Kekre H, Sarode T, Dhannawat R. Implementation and comparison of different transform techniques using kekre's wavelet transform for image fusion. International Journal of Computer Applications. 2012;**44**(10):

[8] Olmos AM, Botella G, Castillo E, Morales DP, Banqueri J, García A. A reconstruction method for electrical capacitance tomography based on image

fusion techniques. Digital Signal Processing. 2012;**22**(6):885-893

2014;**3**(4):692-696

41-48

**100**

using DWT process on FPGA. International Journal of Computer Applications Technology and Research.

fusion using Haar transform.

2013;**2**(6):676-679

1437-1442

10-19

[18] Mishra A, Mahapatra S, Banerjee S. Modified Frei-Chen operator-based infrared and visible sensor image fusion for real-time applications. IEEE Sensors Journal. 2017;**17**(14):4639-4646

[19] Bavirisetti DP, Dhuli R. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Physics & Technology. 2016; **76**:52-64

[20] Pemmaraju M, Mashetty SC, Aruva S, Saduvelly M, Edara BB. Implementation of image fusion based on wavelet domain using FPGA. In: Proceedings of International Conference on Trends in Electronics and Informatics. 2017. pp. 500-504

[21] Yang Y, Que Y, Huang S, Lin P. Multimodal sensor medical image fusion based on type-2 fuzzy logic in NSCT domain. IEEE Sensors Journal. 2016; **16**(10):3735-3745

[22] Bhaskar PC, Munde MV. FPGA implementation of non-subsampled Shearlet transform for image fusion. In: Proceedings of International Conference on Computing, Communication, Control and Automation. 2017. pp. 1-6

[23] Agarwal J, Bedi SS. Implementation of hybrid image fusion technique for feature enhancement in medical diagnosis. Human-Centric Computing and Information Sciences. 2015;**5**(1):3

[24] Sanjay AR, Soundrapandiyan R, Karuppiah M, Ganapathy R. CT and MRI image fusion based on discrete wavelet transform and Type-2 fuzzy logic. International Journal of Intelligent Engineering and Systems. 2017;**10**(3): 355-362

[25] Surya PRB, Panakala RK, Kumar PR. Hybrid image fusion algorithm using DWT maximum selection rule and PCA. International Journal of Scientific and Engineering Research;**8**(8):814-820

[26] Surya PRB, Panakala RK, Kumar PR. Qualitative analysis of MRI and enhanced low dose CT scan image fusion. In: Proceedings of International Conference on Advanced Computing and Communication Systems. 2017. pp. 1752-1757

## *Edited by George Dekoulis*

This Edited Volume *Field Programmable Gate Arrays (FPGAs) II* is a collection of reviewed and relevant research chapters, offering a comprehensive overview of recent developments in the field of Computer and Information Science. The book comprises single chapters authored by various researchers and edited by an expert active in the Computer and Information Science research area. All chapters are complete in itself but united under a common research study topic. This publication aims at providing a thorough overview of the latest research efforts by international authors on Computer and Information Science, and open new possible research paths for further novel developments.

Published in London, UK © 2020 IntechOpen © Hello I'm Nik / unsplash

Field Programmable Gate Arrays (FPGAs) II

Field Programmable Gate

Arrays (FPGAs) II

*Edited by George Dekoulis*