**3.3. Challenges**

Although the CONV layer algorithm is simple, due to the enormous amount of data and computation, hardware accelerators face some grave challenges. One of the challenges is the limitation of off-chip memory bandwidth. Generally, CNN accelerator computes with high parallelism by increasing the processing elements (PEs), which can improve the computational performance of accelerator. However, it is accompanied by the pressure of the bandwidth caused by the large amount of data access. Another challenge is that the large amount of off-chip memory access consumes a lot of energy.

**Figure 10** is a diagram from reference [22]. It shows the normalized energy cost of each level memory hierarchy relative to the computation of one multiply-accumulate (MAC) operation and the data are extracted from a commercial 65-nm process. We can see that the energy cost of DRAM access is much higher than the energy cost of on-chip buffer access and MAC operation. Therefore, large amount of DRAM access will cause the high power consumption. Besides, large memory footprint caused by enormous amount of data is also an inevitable challenge.

To solve these problems, previous works proposed some optimization methods. One of the optimization methods is reducing data precision. Several researches show that reducing the data precision appropriately of CNN model almost has no impact on image recognition

**Figure 10.** Normalized energy cost of each level memory hierarchy [22].

accuracy. It inspires us that reducing data precision is an efficient and significant method to reduce memory footprint and computational hardware resource. In addition, data-reusing [7, 8, 23] is also an optimization direction for CNN accelerator. Due to most of data in CNN are used repeatedly, data-reusing is also an efficient method to reduce memory access, consequently reducing the memory bandwidth and power consumption. In the next section, we will introduce these two optimization methods in detail.
