2. The DBN model for time series forecasting

#### 2.1 The structure of the model

The model of time series forecasting is given as the following:

$$\mathbf{x}\_{t+1} = f(\mathbf{x}\_t, \mathbf{x}\_{t-1}, \dots, \mathbf{x}\_{t-n+1}) \tag{1}$$

Denote t = 1, 2, 3, …, where T is the time, n is the dimensionality of the input of function f(x), xt is the time series data, and xtþ<sup>1</sup> is unknown data in the future as well as the output of model.

A deep belief net (DBN) composed by restricted Boltzmann machines (RBMs) and multilayer perceptron (MLP) is shown in Figure 1.

## 2.2 RBM

(RNNs); vector quantization; fuzzy logic; and ensemble methods. As the comment of the organizers, the different prediction precisions were reported though the similar prediction methods were used for the know-how and

Time Series Analysis - Data, Methods, and Applications

is still on the way.

experiments.

40

experience of the authors. So the development of time series forecasting by ANN

neurons, error back-propagation (BP) algorithm [6] is generally utilized in the training process of ANN. However, due to every sample data (a pair of the input data and the output data) is used in the BP method, noise data influences the optimization of the model, and robustness of the model becomes weak for unknown input. Another problem of ANN models is how to determine the structure of the network, i.e., the number of layers and the number of neurons in each layer. To overcome these problems of BP, Kuremoto et al. [7] adopted a reinforcement learning (RL) method "stochastic gradient ascent (SGA)" [8] to adjust the connection weights of units and the particle swarm optimization (PSO) to find the optimal structure of ANN. SGA, which is proposed by Kimura and Kobayshi, improved Williams' REINFORCE [9], which uses rewards to modify the stochastic policies (likelihood). In SGA learning algorithm, the accumulated modification of policies named "eligibility trace" is used to adjust the parameters of model (see Section 2). In the case of time series forecasting, the reward of RL system can be defined as a suitable error zone to instead of the distance (error) between the output of the model and the teach data which is used in BP learning algorithm. So the sensitivity to noise data is possible to be reduced, and the robustness to the unknown data may be raised. As a deep learning method for time series forecasting, Kuremoto et al. [10] firstly applied Hinton and Salakhutdinov's deep belief net (DBN) which is a kind of stacked auto-encoder (SAE) composed by multiple restricted Boltzmann machines (RBMs) [11]. An improved DBN for time series forecasting is proposed in [12], which DBN is composed by multiple RBMs and a multilayer perceptron (MLP) [6]. The improved DBN with RBMs and MLP [6] gives its priority to the conventional DBN [5] for time series forecasting due to the continuous output unit is used; meanwhile the conventional one had a binary value unit in the output layer. As same as the RL method, SGA adopted to MLP, RBFN, and self-organized fuzzy neural network (SOFNN) [7]; the prediction precision of DBN utilized SGA may also be raised comparing to the BP learning algorithm. Furthermore, it is available to raise the prediction precision by a hybrid model which forecasts the future data by the linear model ARIMA at first and modifying the forecasting by the predicted error given by an ANN which is trained by error time series [13, 14]. In this chapter, we concentrate to introduce the DBN which is composed by

multiple RBMs and MLP and show the higher efficiency of the RL learning

method SGA for the DBN [15, 16] comparing to the conventional learning method BP using the results of time series forecasting experiments. Kinds of benchmark data including artificial time series data CATS [3], natural phenomenon time series data provided by Aalto University [18], and TSDL [18] were used in the

As a kind of classifiers or a kind of function approximators, the advances of the ANN are bought out by the nonlinear transforms to the input space. In fact, units (or neurons) with nonlinear firing functions connected to each other usually produce higher dimensional output space and various feature spaces in the networks. Additionally, as a connective system, it is not necessary to design fixed mathematical models for different nonlinear phenomena, but adjusting the weights of connections between units. So according to the report of NN3—Artificial Neural Networks and Computational Intelligence Forecasting Competition [5], there have been more than 5000 publications of time series forecasting using ANN till 2007. To find the suitable parameters of ANN, such as weights of connections between

> Restricted Boltzmann machine (RBM) is a kind of probabilistic generative neural network which composed by two layers of units: visible layer and hidden layer (see Figure 2).

> Units of different layers connect to each other with weights wij ¼ wji, where i ¼ 1, 2, …, n and j ¼ 1, 2, …, m are the numbers of units of visible layer and hidden layer, respectively. The outputs of units vi, hj are binary, i.e., 0 or 1, except for the initial value of visible units which is given by the input data. The probabilities of 1 of a visible unit and a hidden unit are according to the following:

$$p\left(h\_{j} = \mathbf{1}|\nu\right) = \frac{1}{\mathbf{1} + \exp\left(-b\_{j} - \sum\_{i=1}^{n} w\_{ji} v\_{i}\right)}\tag{2}$$

$$p(v\_i = \mathbf{1} | \mathbf{h}) = \frac{\mathbf{1}}{\mathbf{1} + \exp\left(-b\_i - \sum\_{j=1}^{m} w\_{ij} h\_j\right)} \tag{3}$$

#### Figure 1.

The structure of DBN for time series forecasting.

Figure 2. The structure of RBM.

Here bi, bj are the biases of units. The learning rules of RBM are given as follows:

$$
\Delta w\_{ij} = \varepsilon \left( \_{\text{data}} - \_{\text{model}} \right) \tag{4}
$$

$$
\Delta b\_i = \varepsilon \left(  - <\tilde{v}\_i> \right) \tag{5}
$$

Step 4. Modify the connections,

DOI: http://dx.doi.org/10.5772/intechopen.85457

2.4 The training method of DBN

are shown in Figure 4 and Eqs. (11)–(13).

ΔwL

Δb<sup>L</sup>

mean μ and the variance σ instead of one unit y.

ji ¼ �ε ∑

<sup>j</sup> ¼ �ε ∑

i

i

<sup>E</sup> <sup>¼</sup> <sup>1</sup> 2 ∑ T t¼1

∂E ∂w<sup>L</sup>þ<sup>1</sup> ji

> ∂E ∂w<sup>L</sup>þ<sup>1</sup> ji

w<sup>L</sup>þ<sup>1</sup> ji ! <sup>1</sup> � hL

w<sup>L</sup>þ<sup>1</sup> ji ! <sup>1</sup> � <sup>h</sup><sup>L</sup>

yt � ~yt

In the case of reinforcement learning (RL), the output is decided by a probability distribution, e.g., the Gaussian distribution <sup>y</sup> � π μ; <sup>σ</sup><sup>2</sup> ð Þ. So the output units are the

> μ ¼ ∑ j

1 þ exp �∑<sup>j</sup>

The learning algorithm of stochastic gradient ascent (SGA) [7] is as follows.

Step 2. Predict a future data yt ¼ xtþ<sup>1</sup> according to a probability yt � π x<sup>t</sup> ð Þ ; w with

Step 3. Receive a scalar reward/punishment rt by calculating the prediction error:

wσjzj

<sup>2</sup><sup>π</sup> <sup>p</sup> <sup>σ</sup> exp � ð Þ <sup>y</sup> � <sup>μ</sup>

<sup>σ</sup> <sup>¼</sup> <sup>1</sup>

ffiffiffiffiffi

π μ; <sup>σ</sup><sup>2</sup> � � <sup>¼</sup> <sup>1</sup>

ANN models which are constructed by parameters w w<sup>μ</sup>j; w<sup>σ</sup>j; wij; vji � �.

Step 1. Observe an input x<sup>t</sup> ¼ xt ð Þ ; xt�<sup>1</sup>; …; xt�nþ<sup>1</sup> .

Figure 4.

43

The training of DBN by BP method.

j � �v<sup>L</sup>

j

<sup>Þ</sup> � (13)

wμjzj (14)

� � (15)

2

2σ<sup>2</sup> ! <sup>i</sup> (11)

(16)

� � (12)

Step 5. For the next time step t þ 1, return to step 1.

wj wj þ Δwj; vji vjj þ Δvji;

Training Deep Neural Networks with Reinforcement Learning for Time Series Forecasting

As same as the training process proposed in [10], the training process of DBN is performed by two steps. The first one, pretraining, utilizes the learning rules of RBM, i.e., Eqs. (4–6), for each RBM independently. The second step is a fine-tuning process using the pretrained parameters of RBMs and BP algorithm. These processes

$$
\Delta b\_j = e \left(  - <\tilde{h}\_j> \right) \tag{6}
$$

where 0 <ε<1 is a learning rate, pij ¼ < vihj > data, p<sup>0</sup> ij <vihj > model and < vi >, <hj > indicate the expectations of the first Gibbs sampling (k = 0), and <sup>&</sup>lt;~vi <sup>&</sup>gt;, <sup>&</sup>lt; <sup>~</sup> hj > the kth Gibbs sampling, and it works when k = 1.

### 2.3 MLP

Multilayer perceptron (MLP) is the most popular neural network which is generally composed by three layers of units: input layer, hidden layer, and output layer (see Figure 3).

The output of the unit y ¼ f zð Þ and unit zk ¼ fð Þ x is given as the following logistic sigmoid functions:

$$f(y) = \frac{1}{1 + \exp\left(-\sum\_{j=1}^{K+1} w\_j z\_j\right)}\tag{7}$$

$$f\left(\mathbf{z}\_{i}\right) = \frac{1}{\mathbf{1} + \exp\left(-\sum\_{i=1}^{n+1} v\_{ji}\boldsymbol{\omega}\_{i}\right)}\tag{8}$$

Here n is the dimensionality of the input, K is the number of hidden units, and xnþ<sup>1</sup> ¼ 1:0, zKþ<sup>1</sup> ¼ 1:0 are the support units of biases vj nð Þ <sup>þ</sup><sup>1</sup> , wKþ1.

The learning rules of MLP using error back-propagation (BP) method [5] are given as follows:

$$
\Delta w\_{\circ} = -\varepsilon (\mathcal{y} - \tilde{\mathcal{y}}) \mathcal{y} (\mathbf{1} - \mathcal{y}) \mathbf{z}\_{\circ} \tag{9}
$$

$$
\Delta v\_{\rm fi} = -\varepsilon (\mathbf{y} - \mathbf{\tilde{y}}) \mathbf{y} (\mathbf{1} - \mathbf{y}) w\_{\mathbf{\tilde{j}}} \mathbf{z}\_{\mathbf{\tilde{j}}} (\mathbf{1} - \mathbf{z}\_{\mathbf{\tilde{j}}}) \mathbf{x}\_{\mathbf{i}} \tag{10}
$$

where 0 <ε<1 is a learning rate and ~y is the teacher signal.

The learning algorithm of MLP using BP is as follows:

Step 1. Observe an input x<sup>t</sup> ¼ xt ð Þ ; xt�<sup>1</sup>; …; xt�nþ<sup>1</sup> ;

Step 2. Predict a future data yt ¼ xtþ<sup>1</sup> according to Eqs. (7) and (8).

Step 3. Calculate the modification of connection weights, Δwj, Δvji according to Eqs. (9) and (10).

Figure 3. The structure of MLP.

Training Deep Neural Networks with Reinforcement Learning for Time Series Forecasting DOI: http://dx.doi.org/10.5772/intechopen.85457

Step 4. Modify the connections,

Here bi, bj are the biases of units. The learning rules of RBM are given as follows:

(4)

ij <vihj > model and

(7)

(8)

(9)

 xi (10)

(6)

<sup>Δ</sup>bi <sup>¼</sup> <sup>ε</sup> <sup>&</sup>lt;vi <sup>&</sup>gt; � <sup>&</sup>lt;~vi <sup>&</sup>gt;<sup>Þ</sup> (5)

hj >Þ

Δwij ¼ ε <vihj >data � <vihj > model

<sup>Δ</sup>bj <sup>¼</sup> <sup>ε</sup> <sup>&</sup>lt;hj <sup>&</sup>gt; � <sup>&</sup>lt; <sup>~</sup>

< vi >, <hj > indicate the expectations of the first Gibbs sampling (k = 0), and

Multilayer perceptron (MLP) is the most popular neural network which is generally composed by three layers of units: input layer, hidden layer, and output layer

<sup>1</sup> <sup>þ</sup> exp �∑<sup>K</sup>þ<sup>1</sup>

<sup>1</sup> <sup>þ</sup> exp �∑<sup>n</sup>þ<sup>1</sup>

Here n is the dimensionality of the input, K is the number of hidden units, and

The learning rules of MLP using error back-propagation (BP) method [5] are

Δwj ¼ �ε y � ~yÞyð Þ 1 � y zj

Step 3. Calculate the modification of connection weights, Δwj, Δvji according to

Δvji ¼ �ε y � ~yÞyð Þ 1 � y wjzj 1 � zj

<sup>j</sup>¼<sup>1</sup> wjzj

<sup>i</sup>¼<sup>1</sup> vjixi

The output of the unit y ¼ f zð Þ and unit zk ¼ fð Þ x is given as the following

f y ð Þ¼ <sup>1</sup>

<sup>¼</sup> <sup>1</sup>

f zj

xnþ<sup>1</sup> ¼ 1:0, zKþ<sup>1</sup> ¼ 1:0 are the support units of biases vj nð Þ <sup>þ</sup><sup>1</sup> , wKþ1.

where 0 <ε<1 is a learning rate and ~y is the teacher signal. The learning algorithm of MLP using BP is as follows: Step 1. Observe an input x<sup>t</sup> ¼ xt ð Þ ; xt�<sup>1</sup>; …; xt�nþ<sup>1</sup> ;

Step 2. Predict a future data yt ¼ xtþ<sup>1</sup> according to Eqs. (7) and (8).

hj > the kth Gibbs sampling, and it works when k = 1.

where 0 <ε<1 is a learning rate, pij ¼ < vihj > data, p<sup>0</sup>

Time Series Analysis - Data, Methods, and Applications

<sup>&</sup>lt;~vi <sup>&</sup>gt;, <sup>&</sup>lt; <sup>~</sup>

2.3 MLP

(see Figure 3).

given as follows:

Eqs. (9) and (10).

Figure 3.

42

The structure of MLP.

logistic sigmoid functions:

$$
\omega\_{j} \leftarrow w\_{j} + \Delta w\_{j}; \\
v\_{j} \leftarrow v\_{j} + \Delta v\_{j};
$$

Step 5. For the next time step t þ 1, return to step 1.
