**2.1. CNN development overview**

2012, the champion work, one kind of CNN model namely AlexNet, achieved the top-5 accuracy of 84.7% in the image classification task, and the CNN model has 5 convolutional layers and 3 fully connected layers [2]. The ResNet, which won the first place in ILSVRC 2015 and achieved 96.43% accuracy exceeding human-level accuracy, consists of 152 layers [3]. Although making CNN model deeper can improve the performance, the computing process of CNN involves an enormous number of computation and data. It brings more pressure to the computing hardware. Traditional CPU became a limitation to CNN. Lacking of parallel computing, using CPU for CNN computing result in poor computing performance and high power consumption. It is necessary to find a better hardware to replace CPU for CNN computing. Therefore, more and more hardware are designed and used for CNN computing, such as FPGA designs [7, 8], GPU designs [9], and ASIC designs [10, 11]. These designs aim to accelerate the computing of CNN, improve the computing performance and reduce the energy consumption. Designing and optimizing a specific CNN hardware accelerator became

In this chapter, we will propose a typical architecture of CNN accelerator and we will introduce two optimize methods of CNN accelerator, consisting of reducing data precision and data-reusing. Generally, reducing data precision will affect the accuracy of CNN model and data-reusing will bring extra on-chip buffer of CNN accelerator. In this paper, we will focus on these two optimization methods and make a deep analysis of the impact on CNN accelerator. We investigate the impact of reducing data precision to accuracy of CNN from several references [12–16], and found that reducing data precision appropriately almost have no impact on the accuracy of CNN model. It inspires us that reducing data precision is a feasible method to optimize CNN accelerator, which will bring many benefits to CNN accelerator, such as reducing the hardware resource, memory footprint and power consumption, etc. In addition, we will analyze three influence factors of data-reusing, including loops execution order, reusing strategy and parallelism strategy. Based on the analysis, we enumerate all legal design possibilities and find out the optimal hardware design with low off-chip memory access and low buffer size. In this way, we can optimize our CNN accelerator with high performance,

one of popular topics.

148 Green Electronics

small memory footprint and low power consumption.

• Review the history and development of convolution neural networks.

CNN accelerator by enumerating all legal design possibilities.

• Describe the architecture and the principle of CNN. A real-life CNN, namely VGG-16 will

• Present a typical architecture of a CNN accelerator implemented on an embedded FPGA platform and introduce the workflow of a CNN accelerator. Present a typical processing

• Propose and analyze two optimize methods of CNN accelerator including reducing data precision and data-reusing. Based on the analysis, find out the optimal design space for the

This chapter is organized as follows:

element of CNN accelerator.

be presented.

Convolutional neural network (CNN) is one type of Deep Neural Networks. The first CNN model namely LeNet-5 is proposed by LeCun in the paper [1] in 1998 and this model is used in handwriting digit recognition. However, due to the enumerated number of computing in training, CNN has been silence for some time. Until 2012, a breakthrough of CNN occurred. A group from University of Toronto used a deep neural network, namely AlexNet, won the first place in image classification in ILSVRC 2012, and its top-5 error rate achieved 15.3%, compared to the second place which achieved 26.2%, and also dropped the error rate by 10% approximately [2]. They improve the algorithm of CNN model in some aspects, such as deepening the model, using ReLu as the activation function, etc. And they train their CNN model with 2 GPU. Their effort resulted a great leap in Deep Neural Network. Since then, CNN developed rapidly. In 2015, the top-5 error rate of ImageNet champion work namely ResNet achieved 96.43% accuracy and exceeded human-level accuracy [3]. In their work, they continued to deepen the CNN model to 152 layers. In the latest ILSVRC 2017, the champion work top-5 error achieved 2.251% [17]. For the rapid development, CNN is making a great impact on many application areas, such as image and video recognition, speech recognition, game play, etc.
