**9. Hardware acceleration**

Sometimes it is not possible to evaluate an algorithm using regular simulation techniques due to the computing power that is required to perform these tasks. SoC designs are a good examples of these constraints, not all block could be implemented and verified at the gate level in simulation due to the fact that it will take from hours to weeks to perform these simulations. For these cases it is common to use FPGAs as hardware accelerators or ASIC emulators. ESL tools are very efficient in generating these type of blocks that can be either instantiated for FPGA or ASIC and the only real difference is on the characterized libraries used as well as the system clock frequency.

The basic requirements while designing custom datapath components is to create hardware accelerators that could work as standalone blocks. Normally these components will become part of a large SoC. Many of the current embedded products recently designed are composed of a microcontroller such as an ARM core, a standard bus such as AMBA, and a series of Intellectual Property (IP) blocks that realize specific functions that require high performance and low-power. This is mostly true on cellular mobile devices, while for base stations a dedicated Digital Signal Processor (DSP) could be used since throughput is a more important constraint than power consumption. It is worth mention that these designs could be done in the same technology geometry, but with different characteristics: base station would most likely use a high performance, higher threshold voltage and large leakage process while the mobile device will be constrained to medium performance, very low leakage process and low and probably variable threshold voltages.

Some examples of systems that are designed as hardware accelerators in cellular technologies are:


In FPGAs the pool of resources is fixed. Depending on the particular algorithm, it could be better placed in one of the different families of FPGAs available by different vendors. Datapath architectures can be very efficiently instantiated on FPGAs since most of building blocks included in these devices are designed for very high performance digital signal processing algorithms. We will talk about the tradeoffs when FPGA utilization is low and

Most of the wireless communication algorithms would have two versions: one for wireless infrastructure that needs high performance and power is important but not critical since it is always connected to an external power source, and another for mobile wireless devices in which performance is a requirement but power has to be optimized in order to make the device usable, power efficient and competitive. In this section we will explore these two types of implementation in applications specific integrated circuits (ASIC). We will give an example of a turbo code interleaver/de-interleaver that had been implemented and verified using simulation and an FPGA platform and the changes required to take it to an ASIC

Sometimes it is not possible to evaluate an algorithm using regular simulation techniques due to the computing power that is required to perform these tasks. SoC designs are a good examples of these constraints, not all block could be implemented and verified at the gate level in simulation due to the fact that it will take from hours to weeks to perform these simulations. For these cases it is common to use FPGAs as hardware accelerators or ASIC emulators. ESL tools are very efficient in generating these type of blocks that can be either instantiated for FPGA or ASIC and the only real difference is on the characterized libraries

The basic requirements while designing custom datapath components is to create hardware accelerators that could work as standalone blocks. Normally these components will become part of a large SoC. Many of the current embedded products recently designed are composed of a microcontroller such as an ARM core, a standard bus such as AMBA, and a series of Intellectual Property (IP) blocks that realize specific functions that require high performance and low-power. This is mostly true on cellular mobile devices, while for base stations a dedicated Digital Signal Processor (DSP) could be used since throughput is a more important constraint than power consumption. It is worth mention that these designs could be done in the same technology geometry, but with different characteristics: base station would most likely use a high performance, higher threshold voltage and large leakage process while the mobile device will be constrained to medium performance, very low

Some examples of systems that are designed as hardware accelerators in cellular

high and the effort to place and route (P&R) as well as timing closure.

**8. ASIC implementation** 

**9. Hardware acceleration** 

used as well as the system clock frequency.

Viterbi, Turbo and LDPC decoders

leakage process and low and probably variable threshold voltages.

implementation.

technologies are: Equalizers


The question is which functions will run on software and which functions will run on hardware. This lies in the gray area of hardware/software partitioning. There are different specifications that need to be considered before taking an educated decision. In theory, anything that could be done in hardware could be done in software and vice versa (of course having an infinitely fast processor with a humongous bus bandwidth and a large number of I/Os). We must carefully evaluate the hardware components to be implemented since no field upgradeability will be possible once an ASIC has been manufactured; we need to find the equilibrium where a firmware patch could potentially get rid of any anomaly not detected at verification and validation time.

In particular, the author worked for many years in teams concentrated on hardware accelerators, but all these components were part of a SoC where traditionally an ARM processor was used with a standard interconnect such as AMBA(ARM, 2011) or OCP (OCP, 2011) and the hardware accelerators were mapped as peripherals in the processor memory space. The ASIC design was first simulated, then emulated on a large FPGA platform at a constrained speed and then the ASIC could finally be developed.

In academia we are more involved with FPGA designs and in particular the platforms being used for teaching include the possibility of a soft core processor. For the author's particular case the platform is Altera and the soft core processor is the Nios II. It is interesting to find that a C to RTL application program exists that allows functions implemented in software could be converted into hardware accelerators. The application is C2H (Altera, 2011b) and even that the author has not been able to test it, it looks promising since it allows the exploration of different hardware/software partitions that could impact the total silicon area, performance, power and cost of a particular application (Frazer, 2088). In the case of FPGA design it could lead to be able to reduce costs or performance by moving back and forth different FPGA migration devices that are pin compatible, but vary in the number of logic elements available, the number of I/O pins available and cost. An equivalent tool exist from Xilinx called Auto-ESL (Xilinx, 2011a) that generates code from C/C++/SystemC.
