**5. Loss functions**

Before introducing the loss functions, we need to understand that the ultimate goal to train a neural network Fð Þ *X*; Θ is to find a suitable set of parameters Θ so that our model can achieve good performance on the unseen samples (i.e., test dataset). The typical way to search Θ in machine learning is to use loss functions as a criterion during training. In other words, training neural networks is equivalent to optimizing the loss functions by back-propagation. Accurately, a loss function outputs a scalar value which is regarded as a criterion for measuring the difference between the predicted result and the true label over one sample. And during training, our goal is to minimize the scalar value over *m* training samples (i.e., cost function). Therefore, as shown in **Figure 1**, loss functions play a significant role in constructing CNNs.

$$\mathcal{J} = \frac{1}{m} \sum\_{i=1}^{m} \mathcal{L}\_i \tag{11}$$

where L*<sup>i</sup>* denotes a loss function for the training sample *i*, and J is often known as cost function, which is just the mean of the sum of the losses over *m* training samples (i.e., usually a batch of *m* training samples is fed into a CNN during each iteration of training).

## *Advances in Convolutional Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.93512*

Note that there are numerous variants of loss functions used in the deep learning literature. However, the fundamental theories behind them are very similar. We group them into two categories, namely Divergence Loss Functions and Margin Loss Functions. And we also introduce six typical and classic loss functions that are commonly used for training neural networks.
