**2.9 Stochastic gradient descent (SGD)**

*Advances and Applications in Deep Learning*

cation and dimensionality reduction.

classification.

**Figure 4.**

*Deep belief network.*

**2.7 Deep auto-encoder**

sequence to sequence prediction.

**2.8 Gradient descent (GD)**

**2.5 Restricted Boltzmann machine (RBM)**

**2.6 Convolutional neural network (CNN)**

computer vision, audio, and text processing [10].

very complex behavior in training dataset. Boltzmann machine is used for classifi-

RBM introduced in 1986 by Smolensky: two layers visible and hidden units, while there is no connection between visible-visible and hidden-hidden. It can learn a probability distribution over a collection of datasets. The applications of RBM are features learning, collaborative filtering, dimensionality reduction, and

In CNN, the layers are delicately connected to input layer as well as each other. There is a specific function for each neuron of the subsequent layer like it is only responsible for only a part of the input. CNN is now widely used for remote sensing,

Just like others, deep auto-encoder has also many hidden layers. The difference between a simple auto-encoder and deep-auto-encoder is the simple auto-encoder that has one hidden layer, while the deep-auto-encoder has many hidden layers. In deep-auto-encoder, the training is complex normally, you need to train one hidden layer first to reconstruct the structure of the input data, and this input data are further used to train other hidden layers and so on. Some applications of deep auto-encoder are image extraction, image generation recommendation system, and

GD is used to reduce the overall cost function; it is considered as an optimization algorithm and is widely used for determination of coefficient function in machine learning. When there is not possible to estimate the parameters analytically, then

**8**

Just like GD, SGD is also an optimization algorithm but GD is used when the datasets are small, while SGD is usually used when the datasets are large, and SD becomes very costly if used for a large number of datasets.
