**1. Introduction**

Convolution neural networks are developing rapidly in recent years. Due to the outstanding performance in image recognition, CNN are used widely in image classification [1–3]. Moreover, since its great success in image recognition, CNN are studied and applied to many other fields of artificial intelligence, such as speech recognition [4, 5], game play [6], etc.

Increasing the depth of CNN by increasing the amount of layers of CNN is a common and effective method to improve the accuracy of image recognition. For instance, in ILSVRC

> © 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

2012, the champion work, one kind of CNN model namely AlexNet, achieved the top-5 accuracy of 84.7% in the image classification task, and the CNN model has 5 convolutional layers and 3 fully connected layers [2]. The ResNet, which won the first place in ILSVRC 2015 and achieved 96.43% accuracy exceeding human-level accuracy, consists of 152 layers [3]. Although making CNN model deeper can improve the performance, the computing process of CNN involves an enormous number of computation and data. It brings more pressure to the computing hardware. Traditional CPU became a limitation to CNN. Lacking of parallel computing, using CPU for CNN computing result in poor computing performance and high power consumption. It is necessary to find a better hardware to replace CPU for CNN computing. Therefore, more and more hardware are designed and used for CNN computing, such as FPGA designs [7, 8], GPU designs [9], and ASIC designs [10, 11]. These designs aim to accelerate the computing of CNN, improve the computing performance and reduce the energy consumption. Designing and optimizing a specific CNN hardware accelerator became one of popular topics.

**2. Background of convolution neural networks**

Convolutional neural network (CNN) is one type of Deep Neural Networks. The first CNN model namely LeNet-5 is proposed by LeCun in the paper [1] in 1998 and this model is used in handwriting digit recognition. However, due to the enumerated number of computing in training, CNN has been silence for some time. Until 2012, a breakthrough of CNN occurred. A group from University of Toronto used a deep neural network, namely AlexNet, won the first place in image classification in ILSVRC 2012, and its top-5 error rate achieved 15.3%, compared to the second place which achieved 26.2%, and also dropped the error rate by 10% approximately [2]. They improve the algorithm of CNN model in some aspects, such as deepening the model, using ReLu as the activation function, etc. And they train their CNN model with 2 GPU. Their effort resulted a great leap in Deep Neural Network. Since then, CNN developed rapidly. In 2015, the top-5 error rate of ImageNet champion work namely ResNet achieved 96.43% accuracy and exceeded human-level accuracy [3]. In their work, they continued to deepen the CNN model to 152 layers. In the latest ILSVRC 2017, the champion work top-5 error achieved 2.251% [17]. For the rapid development, CNN is making a great impact on many application areas, such as image and video recognition, speech recognition, game

Optimizing of Convolutional Neural Network Accelerator

http://dx.doi.org/10.5772/intechopen.75796

149

Convolution neural networks (CNNs) have an inference process for recognition and a back propagation process for training. Since training CNN spends a lot of time, many CNN application usually complete training off-line in advanced, and then use the trained CNN in terminals to execute tasks. Therefore, the inference process for recognition in terminals has more pressing demands on speed and power. In this work, we focus on the inference process of

**Figure 1** is the simplified structure of a typical CNN. A typical CNN model usually consists of several parts, including convolutional layer (CONV), nonlinear activation function,

CNN and explore how to speed up the inference process with hardware accelerator.

**Figure 1.** Structure of a typical convolution neural networks.

**2.1. CNN development overview**

play, etc.

**2.2. CNN basic**

In this chapter, we will propose a typical architecture of CNN accelerator and we will introduce two optimize methods of CNN accelerator, consisting of reducing data precision and data-reusing. Generally, reducing data precision will affect the accuracy of CNN model and data-reusing will bring extra on-chip buffer of CNN accelerator. In this paper, we will focus on these two optimization methods and make a deep analysis of the impact on CNN accelerator. We investigate the impact of reducing data precision to accuracy of CNN from several references [12–16], and found that reducing data precision appropriately almost have no impact on the accuracy of CNN model. It inspires us that reducing data precision is a feasible method to optimize CNN accelerator, which will bring many benefits to CNN accelerator, such as reducing the hardware resource, memory footprint and power consumption, etc. In addition, we will analyze three influence factors of data-reusing, including loops execution order, reusing strategy and parallelism strategy. Based on the analysis, we enumerate all legal design possibilities and find out the optimal hardware design with low off-chip memory access and low buffer size. In this way, we can optimize our CNN accelerator with high performance, small memory footprint and low power consumption.

This chapter is organized as follows:

