**7. Summary and future trends**

In this research, we have conducted a hierarchically-structured survey of the main components in CNNs from the low level to the high level, namely, convolution operations, convolutional layers, architecture design, loss functions. In addition to introducing the recent advances of these aspects in CNNs, we have also discussed the advanced applications based on the three types of architectures including encoder, encoder-decoder and GANs, from which we can see that CNNs have made numerous breakthroughs and achieved state-of-the-art in computer vision, natural language processing and speech recognition, especially these fantastic results based on GANs.

From the above analyses, we can summarize that the current development tendencies in CNNs mainly focus on designing new architectures and loss functions. Because these two aspects are the core parts when applying CNNs into various types of tasks. On the other hand, the fundamental ideas behind these various applications are very similar, as summarized above.

However, there are still many disadvantages in the current deep learning. The first problem is the requirement of large-scale datasets, in particular constructing a labeled dataset is very time-consuming and expensive such as in the medical

domain. Therefore, we need to pay much more attention to semi-supervised learning and unsupervised learning. The second disadvantage is the high computational cost related to training deep CNNs, as the current standard CNN structures become deeper and deeper and they usually consists of millions of parameters. The third issue is that applying CNNs into tasks is not an easy job and it usually requires professional skills and experiences, because training a network involves a lot of hyper-parameters to tune, such as the number of kernels in each layer, the size of kernels, the total number of layers, learning rate etc.

**References**

436-444

[1] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;**521**(7553):

*Advances in Convolutional Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.93512*

> convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2015;**38**(2):295-307

Systems. 1992;**5**(4):455

2016. pp. 630-645

[10] Cybenko G. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and

[11] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. pp. 770-778

[12] He K, Zhang X, Ren S, Sun J. Identity mappings in deep residual networks. In: European Conference on Computer Vision. Cham: Springer;

[13] Szegedy C, Liu W, Jia Y,

Computer Vision and Pattern Recognition. 2015. pp. 1-9

Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. In: Proceedings of the IEEE Conference on

[14] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016. pp. 2818-2826

[15] Sabour S, Frosst N, Hinton GE. Dynamic routing between capsules. In: Advances in Neural Information

[16] Hinton GE, Sabour S, Frosst N. Matrix capsules with EM routing. In: International Conference on Learning

[17] Kosiorek A, Sabour S, Teh YW,

autoencoders. In: Advances in Neural Information Processing Systems. 2019.

Hinton GE. Stacked capsule

Representations. 2018

pp. 15512-15522

Processing Systems. 2017. pp. 3856-3866

[2] LeCun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, et al. Handwritten digit recognition with

a back-propagation network. In: Advances in Neural Information Processing Systems. 1990. pp. 396-404

[4] Kalchbrenner N, Espeholt L, Simonyan K, Oord AV, Graves A, Kavukcuoglu K. Neural Machine Translation in Linear Time. arXiv preprint arXiv:1610.10099. October 31,

sequences with time-dilated

November 28, 2016

2015. pp. 234-241

pp. 612-621

**39**

November 23, 2015

2016

[3] Yu F, Koltun V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv preprint arXiv:1511.07122.

[5] Sercu T, Goel V. Dense prediction on

convolutions for speech recognition. arXiv preprint arXiv:1611.09288.

[6] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer;

[7] Zeiler MD, Fergus R. Visualizing and understanding convolutional networks. In: European Conference on Computer Vision. Cham: Springer; 2014. pp. 818-833

[8] Zhang Y, Lee K, Lee H. Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In: International Conference on Machine Learning. 2016.

[9] Dong C, Loy CC, He K, Tang X. Image

super-resolution using deep

Future work should focus on deep learning theory as the solid theory for supporting the current neural models is lacking. Unlike other machine learning algorithms such as support vector machines that have obvious mathematical logic, it is usually very hard to totally understand why a deep network can achieve such an excellent performance on a task. Therefore, based on the current developments of deep learning, we give three trends on which we need to work in the future: Neural Topologies such as the graph neural networks, Uncertainty Estimation such as Bayesian neural networks and Privacy Preservation.
