**10.4 Hardware In the Loop (HIL)**

150 Wireless Communications and Networks – Recent Advances

The FFT algorithm itself has not been optimized due to the data dependency among inner and outer loops. Additional pipe stages will need to be implemented in order to break the loop dependency implicit in the direct implementation of the FFT. This probes the point that there the designer has to guide the tool by writing the C code in such a way that the

Another simple tradeoff was executed by increasing the frequency of operation from 100 MHz to 500 MHz as shown in Figure 16. We can observe that the area remained almost constant, while the latency cycles increased by 3% with respect to the 200 MHz implementation baseline, the latency cycles increased by 19%. We can interpret these numbers as the logic required to implement the FFT had a larger critical path, but since the clock was increased 2.5x, the latency time was reduced by 2.0x demonstrating that there is not a linear relationship between the parameters and depends on the implementation given

Talking about power, increasing the frequency by 2.5x will have an impact on the power, but at the same time if it is 2.0x faster, we can think for example on reusing the FFT for some other part of the OFDM processor such as computing the IFFT and FFT using the same hardware and sharing it on the time domain rather than have two cores to perform both

Fig. 13. Different solutions by selecting different architectural constraints.

hardware can be inferred.

by the particular constraints.

operations independently.

Fig. 14. Graphical view plotting Area.

Hardware in the loop has become a buzz word when designers want to run their algorithm at full speed or at least hundredths or thousands times faster than an RTL or gate level simulation. In SoCs, simulation can take days, weeks and sometimes months, and that depends on the level of detail that is included in the top level simulation. That is why it is important to be able to replace each block by its behavioural, RTL and gate level models in order to refine the level of simulation control and granularity.

Rather than talking about ASIC emulators that are not traditionally available for small companies or universities, we will take a poor's man approach and show how we can integrate hardware in our computations to able to speed up the testing and processing of algorithms.

Let's take a closer look at the first level of implementation which is generating automatic HDL code from a Simulink model. Each block or a set of few blocks of the entire communication system can be implemented on hardware this was demonstrated in Section 10.1. So far, we have used an Altera Stratix III FPGA to do system level hardware testing of the Fast Fourier Transform block in the OFDM communication model. For this purpose we have used Hardware in Loop (HIL) block provided by the DSP builder Altera library. This block acts as a link between Simulink and the actual hardware we want to configure.

In modern digital communication systems, the current trend is to implement a pipelined FFT to generate orthogonal sub-carriers. A pipelined FFT generate an output every clock cycle which helps in real-time applications like digital communication systems where data is being continuously fed. We have designed Simulink models to implement FFT using butterfly diagrams which use simple Simulink blocks as well as pipelined FFT which use the advanced block set from DSP Builder. In this section we are going to talk more about the pipelined FFT for the above mentioned reasons. For more information on the architecture of the pipelined FFT implemented refer to (Shousheng & Torkelson, 1998).

The hardware implementation was done using the Altera's Quartus II version 10.1 and DSP Builder version 10.1. Care must be taken to properly design a Simulink model which would involve block sets from both advanced and standard block sets of DSP Builder. We created this model in layers. The lower level consists of the device block which has the information about the FPGA available in the hardware platform (Stratix III) and the functional blocks that essentially form the FFT. However, on the top level we could only use the signal and control blocks from the advanced block set and other blocks have to be at the lowest level in the design hierarchy.

We make use of the signal compiler and testbench from the standard block set on the top level. The signal compiler is used for creating a Quartus II project, start synthesis, to launch place and route after generating the HDL code. The testbench is used to compare the block level simulations in Simulink and the HDL simulations using Modelsim. Input and output blocks are inserted before and after the subsystem that contains the advanced block set. These blocks have external type parameters to convert from floating or other format handled by Simulink to fixed point as FPGA implementations can only be configured for fixed point. These blocks act as boundaries to the advanced and basic block sets. The procedure to convert the FFT model to HDL, configure the FPGA with the HDL code, and running it from Simulink is detailed below.

Fig. 17. Hardware In the Loop (HIL) Simulink simulation, actual code runs on the FPGA.

We first run the signal compiler block on the top level to generate HDL code and create a Quartus II project. Then compile the design with Quartus II using the compile option in the signal compiler block. We have now created a Quartus II project for the model and synthesized the HDL code for the same. Now save a copy of this model and instantiate a HIL block on the top layer of the new model from the Altera DSP Builder library found in the standard block set. Open the HIL block and copy the Quartus II project that was earlier created into the file path. This would generate proper ports for the HIL block. Connect these ports to the appropriate signals. Configure the simulation in burst mode to observe high speed of simulation. In the next menu entry of the HIL block, compile the Quartus II project again, scan JTAG in order to recognize the FPGA device and program it. If we simulate this model it runs at a remarkable speed when compared with the native Simulink simulation. Figure 17 above shows the model which has the advanced block set replaced with a HIL block. This example was modified from the one supplied by Altera to run the FFT on the FPGA platform and to be controlled by the Simulink simulation. We are in the process of converting some other algorithms into hardware following the same methodology to be able to create custom hardware acceleration blocks (Altera, 2007).
