*2.2.4. Fully connected layer*

Usually fully connected (FC) Layers are located posterior in the CNN model. The operation of FC layers can be expressed by Eq. (1). Input features (*Ni* ) are fully connected to an output feature vector (*No* ) by weights (*Ni* × *No* ). The *Ni* and *No* are the lengths of the input and output feature vector and the amount of weights is *Ni* × *No* . *Biaso* is the corresponding bias term of the *o*th output feature vector.

FC layers usually turn the input feature maps into a feature vector. And finally the output feature vector read by a classifier which is a Softmax function and then the probability of each label are generated.

$$
\Delta \Omega \boldsymbol{\mu}\_o = \sum\_{i=1}^{\mathcal{N}} \boldsymbol{L} \boldsymbol{n}\_i \times \boldsymbol{W}\_{io} + \boldsymbol{B} \boldsymbol{i} \boldsymbol{s} \boldsymbol{s}\_o \tag{1}
$$

The 13 bars represent the CONV layer from layer 1 to 13. The height of each bar represents the total amount of data received and generated for each CONV layer, consisting of Input data, Weights and Output data. We can find that in different CONV layers, the amount of Input, Weight or Output vary greatly. For example, the CONV2 requires about 0.035 million weights, and the CONV9 requires 2.25 million weights, which is approximately 64 times to CONV2. **Figure 7(b)** shows the total amount of operation (multiply-add operation) for each CONV layer of VGG-16. The height of each bar represents the total amount of operation for each CONV layer. For example, there are about 87 million times multiply-add operations in the CONV1 and about 1764 million times in the CONV2. We can see that although the operation of CONV is similar, the amount of data involved and operation vary greatly between

**Figure 7.** Statistic for each CONV layer of VGG-16: (a) amount of Input, Weight and Output for different CONV layers;

Optimizing of Convolutional Neural Network Accelerator

http://dx.doi.org/10.5772/intechopen.75796

153

different CONV layers.

**Figure 6.** Architecture of VGG-16 model [21].

(b) amount of operations for different CONV layers.

#### **2.3. A real-life CNN**

Next, we will introduce a real-life CNN namely VGG-16 [20]. In ILSVRC 2014, VGG-16 won the second place in image classification task and its top-5 error achieved 7.4%. **Figure 6** [21] shows the architecture of VGG-16 model. VGG-16 is consisting of 13 CONV layers in 5 groups, 5 max pooling layers, and 3 FC layers. The network receives three 224 × 224 input images and the input images go through a series of layers. Finally, the network generates a output feature vector with the depth of 1000, which represent the likelihoods of 1000 categories. Compared to other CNN models, one of the characteristics of VGG-16 is that all kernel size of each CONV layer is 3 × 3.

Actually, each CONV layer of VGG-16 involves a large amount of operations and data. For a more visualized understanding, we make a detailed statistic for VGG-16. **Figure 7(a)** shows the amount of Input data, Weights and Output data for each CONV layer of VGG-16.

**Figure 6.** Architecture of VGG-16 model [21].

Pooling layer has two major functions. First, pooling can aggregate information from the input feature maps and reduce the size and the dimension of feature maps. It brings a benefit that the size of feature maps and the amount of computation of CNN can be reduced greatly. The second function is that pooling layer can increase the invariance to small shifts.

Usually fully connected (FC) Layers are located posterior in the CNN model. The operation

FC layers usually turn the input feature maps into a feature vector. And finally the output feature vector read by a classifier which is a Softmax function and then the probability of each

Next, we will introduce a real-life CNN namely VGG-16 [20]. In ILSVRC 2014, VGG-16 won the second place in image classification task and its top-5 error achieved 7.4%. **Figure 6** [21] shows the architecture of VGG-16 model. VGG-16 is consisting of 13 CONV layers in 5 groups, 5 max pooling layers, and 3 FC layers. The network receives three 224 × 224 input images and the input images go through a series of layers. Finally, the network generates a output feature vector with the depth of 1000, which represent the likelihoods of 1000 categories. Compared to other CNN models, one of the characteristics of VGG-16 is that all kernel

Actually, each CONV layer of VGG-16 involves a large amount of operations and data. For a more visualized understanding, we make a detailed statistic for VGG-16. **Figure 7(a)** shows the amount of Input data, Weights and Output data for each CONV layer of VGG-16.

and *No*

. *Biaso*

). The *Ni*

) are fully connected to an output

(1)

are the lengths of the input and output

is the corresponding bias term of the

of FC layers can be expressed by Eq. (1). Input features (*Ni*

) by weights (*Ni* × *No*

**Figure 5.** Two type of pooling layer: max pooling and average pooling.

feature vector and the amount of weights is *Ni* × *No*

*2.2.4. Fully connected layer*

*o*th output feature vector.

feature vector (*No*

152 Green Electronics

label are generated.

**2.3. A real-life CNN**

size of each CONV layer is 3 × 3.

**Figure 7.** Statistic for each CONV layer of VGG-16: (a) amount of Input, Weight and Output for different CONV layers; (b) amount of operations for different CONV layers.

The 13 bars represent the CONV layer from layer 1 to 13. The height of each bar represents the total amount of data received and generated for each CONV layer, consisting of Input data, Weights and Output data. We can find that in different CONV layers, the amount of Input, Weight or Output vary greatly. For example, the CONV2 requires about 0.035 million weights, and the CONV9 requires 2.25 million weights, which is approximately 64 times to CONV2. **Figure 7(b)** shows the total amount of operation (multiply-add operation) for each CONV layer of VGG-16. The height of each bar represents the total amount of operation for each CONV layer. For example, there are about 87 million times multiply-add operations in the CONV1 and about 1764 million times in the CONV2. We can see that although the operation of CONV is similar, the amount of data involved and operation vary greatly between different CONV layers.
