**4.1. Hardware implementation**

In order to verify the functionality of the 1024‐point FPP‐FFT processor, the VHDL code for the overall processor is developed. Register transfer level (RTL) behavior description of the pro‐ cessor is generated for downloading into FPGA prototyping. The procedure is continued by attaching the library cell and constraint file for ASIC implementation. High performance FFT is transferred into the gate level synthesis to complete postsimulation stage. The design moves forward to the back‐end implementation by 0.18 µm Silterra technology and 0.35 Mimos tech‐ nology library. Generated netlist with constraint file is transferred to complete floor planning and place and route stage. The implementation process is summarized in **Figure 20**.

The high‐tech 1024‐point FPP‐FFT specification generated by Xilinx ISE synthesis report is provided in **Table 4**.

As stated in **Table 4**, high‐tech FFT processor operates with the maximum clock frequency of 227.7 MHz and the total latency of 5131 clock cycles (**Figure 21**) to prove the computation complexity derived from (*N*/2log2*N*) + 11 when *N* = 1024.

Place and route (PAR) process was completed and the processor routed successfully on silicon chip (**Figure 22**).

Later, the 1024‐point FPP‐FFT processor was optimized in Silterra 0.18 µm and Mimos 0.35 µm technology for power consumption and die size measurement in maximum clock frequency. **Table 5** shows the optimization result of FFT processor implementation in Silterra 0.18 µm and Mimos 0.35 µm technology library.

**Figure 20.** Flowchart of hardware implementation.

**3.2. Advantages of 1024‐point parallel pipeline FFT structure**

88 Fourier Transforms - High-tech Application and Current Trends

able clock rate and with low latency of (*N*/2 Log 2

**4. 1024 point FPP‐FFT implementation**

**4.1. Hardware implementation**

is presented here.

provided in **Table 4**.

chip (**Figure 22**).

limited by the amount of available logic in the target device.

complexity derived from (*N*/2log2*N*) + 11 when *N* = 1024.

Design algorithm of the 1024 point Radix II FPP‐FFT processor was based on the smart sub‐ blocks where the result was optimized accordingly. The designed processor takes the advan‐ tages of (i) shared memory to store the input and output data and makes the system as single chip. Hence, it reduces hardware complexity. Furthermore, (ii) the entire individual arith‐ metic unit is designed to operate within one clock cycle to increase the maximum clock fre‐ quency. Additionally, (iii) the butterfly structure is in parallel and pipelined architecture to minimize delay caused by the FFT calculations, and finally, (iv) the strong controller with collaboration of address generator unit ignores the need of using *N* numbers of butterfly unit, since Radix II calculation is carried out within one butterfly unit that results reduction of power consumption, area, and avoid system complexity. The high performance processor is implemented with optimizing the architecture to enable the system in maintaining a reason‐

*N*

Section 4 details the implementation of introduced 1024‐point floating‐point parallel pipeline Radix II FFT algorithm. Hardware implementation of the algorithm as system on chip (SOC)

In order to verify the functionality of the 1024‐point FPP‐FFT processor, the VHDL code for the overall processor is developed. Register transfer level (RTL) behavior description of the pro‐ cessor is generated for downloading into FPGA prototyping. The procedure is continued by attaching the library cell and constraint file for ASIC implementation. High performance FFT is transferred into the gate level synthesis to complete postsimulation stage. The design moves forward to the back‐end implementation by 0.18 µm Silterra technology and 0.35 Mimos tech‐ nology library. Generated netlist with constraint file is transferred to complete floor planning

The high‐tech 1024‐point FPP‐FFT specification generated by Xilinx ISE synthesis report is

As stated in **Table 4**, high‐tech FFT processor operates with the maximum clock frequency of 227.7 MHz and the total latency of 5131 clock cycles (**Figure 21**) to prove the computation

Place and route (PAR) process was completed and the processor routed successfully on silicon

Later, the 1024‐point FPP‐FFT processor was optimized in Silterra 0.18 µm and Mimos 0.35 µm technology for power consumption and die size measurement in maximum clock frequency.

and place and route stage. The implementation process is summarized in **Figure 20**.

) + 11 . The throughput of the operation is


**Table 4.** 1024‐point FPP‐FFT specification.


**Figure 21.** 1024‐point FPP‐FFT processor output signal.

**Figure 22.** Chip layout of high‐tech FFT processor.


**Table 5.** Optimized power consumption and die area size in different technology library.

To conclude, after FPGA implementation and ASIC optimization and with considering avail‐ able software and hardware resources, the high‐tech 1024‐point Radix II FPP‐FFT processor was implemented and tested in FPGA prototyping under Xilinx ISE software and CAD tools in synopsis. **Figure 23** shows relevant FPGA board, and **Table 6** summarizes the design property.

**Figure 23.** FPGA implementation of high‐tech FFT processor.


**Table 6.** High‐tech 1024 point FFT specification.

**Figure 22.** Chip layout of high‐tech FFT processor.

**Figure 21.** 1024‐point FPP‐FFT processor output signal.

90 Fourier Transforms - High-tech Application and Current Trends

**Silterra 0.18 µm technology Mimos 0.35 µm technology**

) 2.32 × 2.32 4.256 × 4.256

Power consumption (mW) 640 1198

**Table 5.** Optimized power consumption and die area size in different technology library.

**FPP‐FFT specification**

Active core area (mm<sup>2</sup>
