*2.2.1. Convolutional layer*

Convolutional layer (CONV) is the most critical layer of CNN. Usually the operations of CONV layers constitute more than 90% of the total CNN operations [18]. For example, in AlexNet, the operations of CONV layers constitute 85–90% of total operations. Therefore, in this work, our discussion and analysis focus on the CONV layers.

CONV layer is like a feature extractor which can extract the features from the input image by the convolutional kernels. When input image read by CONV layer and convolves with weights kernel (generated by training previously), the feature information such as corners, lines are extracted into the output feature maps.

**Figure 2** shows a typical CONV layer. The CONV layer takes a set of C feature maps as input and convolves with M sets of N weights kernel to obtain M output feature maps. C is the number of the input channels, M is the number of output channels, H and W are the input feature maps size, R and C are the kernel size, and E and F are the output feature maps size. Usually H is equal to W, R is equal to S and E is equal to F. It shows that the input feature maps convolved by a shifting window with a weight kernel iteratively to get the output feature maps. Actually, the weight kernel shifting with stride S and in most cases the stride S is 1.

Usually the computing process of CONV layer can be expressed with a pseudo code in **Figure 3**. The total computing process of CONV layer can be executed by a loop nest. Every output pixel of output feature maps is the sum of products of Input and Weight. After all loops are executed, all output pixels of output feature maps can be obtained.

**Figure 2.** Diagram of convolutional layer.

#### *2.2.2. Nonlinear activation function*

pooling layer and fully connected layer (FC). Layers are executed layer by layer. CNN reads an input image, and go through a series of CONV layers, nonlinear activation function, pooling layers and generate the output feature maps. These feature maps are turned into a feature vector in the FC layers. Finally, the feature vector is read by a classifier and the input image is classified to the most possible category and the probabilities of each classification

Convolutional layer (CONV) is the most critical layer of CNN. Usually the operations of CONV layers constitute more than 90% of the total CNN operations [18]. For example, in AlexNet, the operations of CONV layers constitute 85–90% of total operations. Therefore, in

CONV layer is like a feature extractor which can extract the features from the input image by the convolutional kernels. When input image read by CONV layer and convolves with weights kernel (generated by training previously), the feature information such as corners,

**Figure 2** shows a typical CONV layer. The CONV layer takes a set of C feature maps as input and convolves with M sets of N weights kernel to obtain M output feature maps. C is the number of the input channels, M is the number of output channels, H and W are the input feature maps size, R and C are the kernel size, and E and F are the output feature maps size. Usually H is equal to W, R is equal to S and E is equal to F. It shows that the input feature maps convolved by a shifting window with a weight kernel iteratively to get the output feature maps. Actually, the weight kernel shifting with stride S and in most cases

Usually the computing process of CONV layer can be expressed with a pseudo code in **Figure 3**. The total computing process of CONV layer can be executed by a loop nest. Every output pixel of output feature maps is the sum of products of Input and Weight. After all loops are executed,

this work, our discussion and analysis focus on the CONV layers.

lines are extracted into the output feature maps.

all output pixels of output feature maps can be obtained.

are generated.

150 Green Electronics

the stride S is 1.

**Figure 2.** Diagram of convolutional layer.

*2.2.1. Convolutional layer*

CNN usually applies nonlinear activation function after each CONV layers or FC layers. The main function of nonlinear activation function is introducing nonlinearity into the CNN. Generally, nonlinear and differentiable are the two conditions that activation functions should to meet. Some conventional nonlinear activation functions in CNN such as sigmoid and tanh are shown in **Figure 4(a)**. However, these activation functions require long training time. In recent years the activation function Rectified Linear Unit (ReLU) [19] becomes more and more popular among CNN models. The expression of ReLU is f(x) = max(0, x), and it can be shown in **Figure 4(b)**. Compared to the conventional activation functions, the computation process of ReLU is more simple, and it makes training faster. In addition, since a number of output of ReLU are equal to 0, it makes the sparsity of CNN model.

#### *2.2.3. Pooling layer*

Pooling layer usually attach to the CONV layer. As **Figure 5** shown, usually the type of pooling is maximum or average. The pooling layers read input feature maps, and calculate the maximum or average value of every sub-area of input feature maps so that get low-dimension feature maps. Usually the stride of pooling is equal to the size of pooling.

$$\begin{array}{l} \mathsf{f}\bullet\mathsf{r}\{\times\mathsf{e}\}\times\mathsf{e}\ \mathsf{f}\ \mathsf{x}\ \mathsf{y}++\}\{ \\ \mathsf{f}\bullet\mathsf{r}\{\times\mathsf{e}\}\ \mathsf{y}\ \mathsf{c}\ \mathsf{f}\ \mathsf{y}++\}\{ \\ \mathsf{f}\bullet\mathsf{r}\{\times\mathsf{e}\}\ \mathsf{x}\ \mathsf{y}++\}\{ \\ \mathsf{f}\bullet\mathsf{r}\{\times\mathsf{e}\}\ \mathsf{x}\ \mathsf{y}\ \mathsf{x}+\}\{ \\ \mathsf{f}\bullet\mathsf{r}\{\,\exists\mathsf{e}\,;\,\mathsf{i}\,\mathsf{e}\,\mathsf{i}\,\mathsf{j}++\}\{ \\ \mathsf{f}\bullet\mathsf{r}\{\,\exists\mathsf{e}\,;\,\mathsf{j}\,\mathsf{e}\,;\,\mathsf{j}++\}\{ \\ \mathsf{M}\mathtt{n}\,\mathsf{f}\,\mathsf{m}\leftarrow\mathsf{e}\,\mathsf{j}\,\mathsf{m}+\}\{ \\ \qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\qquad\$$

**Figure 3.** Pseudo code of the computing process of convolutional layer.

**Figure 4.** Nonlinear activation function: (a) sigmoid and Tanh; (b) Rectified Linear Unit.

**Figure 5.** Two type of pooling layer: max pooling and average pooling.

Pooling layer has two major functions. First, pooling can aggregate information from the input feature maps and reduce the size and the dimension of feature maps. It brings a benefit that the size of feature maps and the amount of computation of CNN can be reduced greatly. The second function is that pooling layer can increase the invariance to small shifts.
