**5.2 Maximum operating frequency**

While it could be easy to convert an algorithm from floating point to fixed point and to identify architectures for its implementation, the final underlying technology should be taken into account to determine the maximum operating frequency and in some cases the required level of parallelism and/or pipelining. It can be true that an algorithm designed for FPGA will run without major modifications on ASIC, but the reverse is not always true. FPGAs are used widely to perform ASIC emulation, but it does not make much sense to have two different versions of the algorithm running on either technology, since this could invalidate the overall algorithm validation. Sometimes the same code could be run, but in slow motion on FPGAs if real time constraints are not required. If real time is a factor, only some of the low throughput modes could be run on the FPGA platform and simulated for ASIC.

## **5.3 Power consumption**

Power consumption in mobile devices is a crucial part of the algorithm selection and it is tightly coupled to architecture's implementation, frequency of operation, underlying technology, voltage supply, and gate level node toggle rates to give some examples. In this section we will cover some of the important features to be considered when designing power optimized algorithms implementations.

When designing digital systems we all know that a magic button exists that reduces power consumption to the minimum. Unfortunately this is not the case, the magic button does not exist and power savings start at the system level design, the architecture selection, the RTL implementation, the operating frequency, the integrated circuit technology chosen, the gate clocking methodology, use of multi-*Vdd* and multi-*Vth* technologies, and leakage among some of the most important factors. In reality power savings are being done in small steps starting from efficiency at the system and RTL level design. One power saving criteria is: if you do not have to toggle a signal, don't do it! Power consumption is a function of the frequency of operation, the load capacitance and the power supply voltage. On average, the gate level nodes switch at around 10% to 12%, while an RTL level simulation could have toggles close to 50% meaning that all units are being used all the time and there is no waste in terms of hardware resources.

When deciding the fixed point representation, every bit in the precision counts towards the total power consumption, the number of gate levels between registers the load capacitance of each node. If we decide to include saturation and/or rounding, there are additional gates required to perform these operations. The cost of additional hardware can be worth the gates if the bit precision is reduced from a system with a wide dynamic range that takes into account no overflow for signals that can have very large excursions but are very infrequent. So what could be the best tradeoff between complexity, fixed point precision, internal normalizations, and processing? There is not a single solution to the problem, the best will be to statistically characterize the signals being handled to find out their probability distributions and then based on these determine the dynamic range to be used and if saturation/wrapping and truncation/rounding could be used and within these which methods to apply as mentioned in section 3.

Power consumption depends on the circuit layout as well, while old technologies used to be characterized in terms of gate delays, input capacitance and output load driving capacitance, the end game has changed and modern technologies have to take into account the effects of interconnection delays due to distributed resistance, inductance and capacitance. The solution to the power consumption estimate is not final until the circuit has been placed and routed and transistors are sized. If an FPGA implementation is sought, a similar approach is taken but control is coarser due to the huge number of paths that the signals have to flow in order to be routed among all resources.

Another important factor are the power supply *Vdd* and the threshold voltage *Vth* of the transistors. These two factors control the voltage excursion of the signals and most important the operation region of the transistor. Most of the digital logic design rules assume that the transistors are operating in saturation, power is consumed while transitioning through the active region and this is the region where you want to get out as fast as possible. A transistor operating under saturation regime has a quadratic transconductance relation of the current *I* and the input gate voltage *Vg*. When a transistor is not in saturation, it could be in linear region or even in sub-threshold. A transistor in the latter does not have a quadratic, but an exponential transconductance relation. While this is the most power efficient operating regime, it is also the slowest. Many circuits that need very low power consumption can work in sub-threshold, but there is a huge variability and precision constraints. Most of these designs involve linear analog mode operations.

So what is the secret formula to design power efficient devices? The answer is discipline! Try to save as much as possible at each level in the design hierarchy. If it is in software, set the processor to sleep if there is nothing important to do. If it is hardware, do not toggle nodes that do not require to be toggled, gate the clocks so you can lower power consumption in blocks not used, reduce powers supply *Vdd* to the minimum allowed for efficient operation of the algorithm and design using just the right number of bits. More techniques for low-power CMOS design have been published and good overviews are given in (Chandrakasan & Brodersen, 1998) and (Sanchez-Sinencio & Andreou, 1999).
