3. The experiments and results

## 3.1 CATS benchmark time series data

CATS time series data is the artificial benchmark data for forecasting competition with ANN methods [3, 4].This artificial time series is given with 5000 data, among which 100 are missed (hidden by competition the organizers). The missed data exist in five blocks:


The mean square error E<sup>1</sup> is used as the prediction precision in the competition, and it is computed by the 100 missing data and their predicted values as the following:

$$\begin{aligned} E\_1 &= \left\{ \sum\_{t=981}^{1000} \left( y\_t - \overline{y}\_t \right)^2 + \sum\_{t=1981}^{2000} \left( y\_t - \overline{y}\_t \right)^2 + \sum\_{t=2981}^{3000} \left( y\_t - \overline{y}\_t \right)^2 + \sum\_{t=3981}^{3000} \left( y\_t - \overline{y}\_t \right)^2 \right\} \\ &\stackrel{4000}{\sum}\_{t=3981} \left( y\_t - \overline{y}\_t \right)^2 + \sum\_{t=4981}^{5000} \left( y\_t - \overline{y}\_t \right)^2 \right\} / 100 \end{aligned} \tag{28}$$

where yt is the long-term prediction result of the missed data. The CATS time series data is shown in Figure 6.

The prediction results of different blocks of CATS data are shown in Figure 7. Comparing to the conventional learning method of DBN, i.e., using Hinton's RBM unsupervised learning method [6, 8] and back-propagation (BP), the proposed method which used the reinforcement learning method SGA instead of BP showed its superiority in the sense of the average prediction precision E<sup>1</sup> (see Figure 7f).

In addition, the proposed method, DBN with SGA, yielded the highest prediction (E1 measurement) comparing to all previous studies such as MLP with BP, the best prediction of CATS competition IJCNN'04 [4], the conventional DBNs with BP

The prediction results of different methods for CATS data: (a) block 1; (b) block 2; (c) block 3; (d) block 4;

Training Deep Neural Networks with Reinforcement Learning for Time Series Forecasting

DOI: http://dx.doi.org/10.5772/intechopen.85457

The meta-parameters obtained by random search method are shown in Table 2. And we found that the MSE of learning, i.e., given by one-ahead prediction results, showed that the proposed method has worse convergence compared to the conventional BP training. In Figure 8, the case of the first block learning MSE of two methods is shown. The convergence of MSE given by BP converged in a long training process and SGA gave unstable MSE of prediction. However, as the basic consideration of a sparse model, the better results of long-term prediction of the proposed method may successfully avoid the over-fitting problem which is caused by the model that is built

Three types of natural phenomenon time series data provided by Aalto University [17] were used in the one-ahead forecasting experiments of real time series

too strictly by the training sample and loses its robustness for unknown data.

3.2 Real time series data

data.

47

Figure 7.

[9, 11], and hybrid models [13]. The details are shown in Table 1.

(e) block 5; and (f) results of the long-term forecasting.

Figure 6. CATS benchmark data.

Training Deep Neural Networks with Reinforcement Learning for Time Series Forecasting DOI: http://dx.doi.org/10.5772/intechopen.85457

#### Figure 7.

3. The experiments and results

data exist in five blocks:

• Elements 981 to 1000

• Elements 1981 to 2000

• Elements 2981 to 3000

• Elements 3981 to 4000

• Elements 4981 to 5000

( ∑ 1000 t¼981

∑ 4000 t¼3981

series data is shown in Figure 6.

yt � yt � �<sup>2</sup> <sup>þ</sup> <sup>∑</sup>

yt � yt � �<sup>2</sup> <sup>þ</sup> <sup>∑</sup>

E<sup>1</sup> ¼

Figure 6.

46

CATS benchmark data.

3.1 CATS benchmark time series data

Time Series Analysis - Data, Methods, and Applications

CATS time series data is the artificial benchmark data for forecasting competition with ANN methods [3, 4].This artificial time series is given with 5000 data, among which 100 are missed (hidden by competition the organizers). The missed

The mean square error E<sup>1</sup> is used as the prediction precision in the competition, and it is computed by the 100 missing data and their predicted values as the following:

> yt � yt � �<sup>2</sup> <sup>þ</sup> <sup>∑</sup>

> > ) =100

yt � yt � �<sup>2</sup>

where yt is the long-term prediction result of the missed data. The CATS time

The prediction results of different blocks of CATS data are shown in Figure 7. Comparing to the conventional learning method of DBN, i.e., using Hinton's RBM unsupervised learning method [6, 8] and back-propagation (BP), the proposed method which used the reinforcement learning method SGA instead of BP showed its superiority in the sense of the average prediction precision E<sup>1</sup> (see Figure 7f).

3000 t¼2981

yt � yt � �<sup>2</sup>

þ

(28)

2000 t¼1981

5000 t¼4981

The prediction results of different methods for CATS data: (a) block 1; (b) block 2; (c) block 3; (d) block 4; (e) block 5; and (f) results of the long-term forecasting.

In addition, the proposed method, DBN with SGA, yielded the highest prediction (E1 measurement) comparing to all previous studies such as MLP with BP, the best prediction of CATS competition IJCNN'04 [4], the conventional DBNs with BP [9, 11], and hybrid models [13]. The details are shown in Table 1.

The meta-parameters obtained by random search method are shown in Table 2. And we found that the MSE of learning, i.e., given by one-ahead prediction results, showed that the proposed method has worse convergence compared to the conventional BP training. In Figure 8, the case of the first block learning MSE of two methods is shown. The convergence of MSE given by BP converged in a long training process and SGA gave unstable MSE of prediction. However, as the basic consideration of a sparse model, the better results of long-term prediction of the proposed method may successfully avoid the over-fitting problem which is caused by the model that is built too strictly by the training sample and loses its robustness for unknown data.

#### 3.2 Real time series data

Three types of natural phenomenon time series data provided by Aalto University [17] were used in the one-ahead forecasting experiments of real time series data.


• Sunspot number: Monthly averages of sunspot numbers from A.D. 1749 to the

Training Deep Neural Networks with Reinforcement Learning for Time Series Forecasting

The prediction results of these three datasets are shown in Figure 9. Short-term prediction error is shown in Table 3. DBN with the SGA learning method showed

The efficiency of random search to find the optimal meta-parameters, i.e., the structure of RBM and MLP, learning rates, discount factor, etc. which are explained in Section 2.5 is shown in Figure 10 in the case of DBN with SGA learning algo-

We also used seven types of natural phenomenon time series data of TSDL [18].

From Table 5, it can be confirmed that SGA showed its priority to BP except the cases of Vehicles and Wine. From Table 6, an interesting result of random search for meta-parameter showed that the structures of DBN for different datasets were different, not only the number of units on each layer but also the number of RBMs. In the case of SGA learning method, the number of layer for Sunspots, River flow,

The experiment results showed the DBN composed by multiple RBMs and MLP is the state-of-the-art predictor comparing to all conventional methods in the case of CATS data. Furthermore, the training method for DBN may be more efficient by the RL method SGA for real time series data than using the conventional BP algorithm. Here let us glance back at the development of this useful deep learning

• Why the DBN composed by multiple RBMs and MLP [11, 13] is better than the

The output of the last RBM of DBN, a hidden unit of the last RBM in DBN, has a binary value during pretraining process. So the weights of connections between the unit and units of the visible layer of the last RBM are affected and with lower complexity than using multiple units with continuous values, i.e., MLP, or so-called

In 1992, Williams proposed to adopt a RL method named REINFORCE to modify artificial neural networks [8]. In 2008, Kuremoto et al. showed the RL method SGA is more efficient than the conventional BP method in the case of time series forecasting [6]. Recently, researchers in DeepMind Ltd. adopted RL into deep neural networks and resulted a famous game software AlphaGo [20–23].

Generally, the training process for ANN by BP uses mean square error as loss function. So every sample data affects the learning process and results including noise data. Meanwhile, SGA uses reward which may be an error zone to modify the

The data to be predicted was chosen based on [19] which are named as Lynx, Sunspots, River flow, Vehicles, RGNP, Wine, and Airline. The short-term (one-

rithm. The random search results are shown in Table 4.

and Wine were more than DBN using BP learning.

DBN with multiple RBMs only [9]?

full connections in deep learning architecture.

• Why SGA is more efficient than BP?

• How are RL methods active at ANN training?

ahead) prediction results are shown in Figure 11 and Table 5.

present 3078 values

DOI: http://dx.doi.org/10.5772/intechopen.85457

its priority in all cases.

4. Discussions

method.

49
