**3. Methodology**

We propose a gamut of predictive models built on deep learning architectures. We train, validate, and then test the models based on the historical stock price records of a well-known stock listed in the NSE, viz. *Century Textiles*. The historical prices of Century Textiles stock from 31st Dec 2012, a Monday to 9th Jan 2015, a Friday, are collected at 5 minutes intervals using the Metastock tool [56]. We carry out the training and validation of the models using the stock price data from 31st Dec 2012 to 30th Dec 2013. The models are tested based on the records for the remaining period, i.e., from 31st Dec 2013, to 9th Jan 2015. For maintaining uniformity in the sequence, we organize the entire dataset as a sequence of daily records arranged on a weekly basis from Monday to Friday. After the dataset is organized suitably, we split the dataset into two parts – the training set and the test set. While the training dataset consists of 19500 records, there are 20500 tuples in the test data. Every record has five attributes – open, high, low, close, and volume. We have not considered any adjusted attribute (i.e., adjusted close, adjusted volume, etc.) in our analysis.

In this chapter, we present ten regression models for stock price forecasting using a deep learning approach. For the univariate models, the objective is to forecast the future values of the variable *open* based on its past values. On the other hand, for the multivariate models, the job is to predict the future values of *open* using the historical values of all the five attributes in the stock data. The models are tested following an approach known as *multi-step prediction using a walk-forward validation* [22]. In this method, we use the training data for constructing the models. The models are then used for predicting the daily *open* values of the stock prices for the coming week. As a week completes, we include the actual stock price records of the week in the training dataset. With this extended training dataset, the open values are forecasted with a forecast horizon of 5 days so that the forecast for the days in the next week is available. This process continues till all the records in the test dataset are processed.

The suitability of CNNs in building predictive models for predicting future stock prices has been demonstrated in our previous work [22]. In the current work, we present a gamut of deep learning models built on CNN and LSTM architectures and illustrate their efficacy and effectiveness in solving the same problem.

CNNs perform two critical functions for extracting rich feature sets from input data. These functions are: (1) *convolution* and (2) *pooling or sub-sampling* [57]. A rich

### *Design and Analysis of Robust Deep Learning Models for Stock Price Prediction DOI: http://dx.doi.org/10.5772/intechopen.99982*

set of features is extracted by the convolution operation from the input, while the subsampling summarizes the salient features in a given locality in the feature space. The result of the final sub-sampling in a CNN is passed on to possibly multiple dense layers. The fully connected layers learn from the extracted features. The fully connected layers provide the network with the power of prediction.

LSTM is an adapted form of a *recurrent neural network* (RNN) and can interpret and then forecast sequential data like text and numerical time series data [57]. The networks have the ability to memorize the information on their past states in some designated cells in memory. These memory cells are called *gates*. The information on the past states, which is stored in the memory cells, is aggregated suitably at the forget gates by removing the irrelevant information. The input gates, on the other hand, receive information available to the network at the current timestamp. Using the information available at the input gates and the forget gates, the computation of the predicted values of the target variable is done by the network. The predicted value at each timestamp is made available through the output gate of the network [57].

The deep learning-based models we present in this paper differ in their design, structure, and dataflows. Our proposition includes four models based on the CNN architecture and six models built on the LSTM network architecture. The proposed models are as follows. The models have been named following a convention. The first part of the model's name indicates the model type (CNN or LSTM), the second part of the name indicates the nature of the input data (univariate or multivariate). Finally, the third part is an integer indicating the size of the input data to the model (5 or 10). The ten models are as follows:

(i) CNN\_UNIV\_5 – a CNN model with an input of univariate open values of stock price records of the last week, (ii) CNN\_UNIV\_10 – a CNN model with an input of univariate open values of stock price records of the last couple of weeks, (iii) CNN\_MULTV\_10 – a CNN model with an input of multivariate stock price records consisting of five attributes of the last couple of weeks, where each variable is passed through a separate channel in a CNN, (iv) CNN\_MULTH\_10 – a CNN model with the last couple of weeks' multivariate input data where each variable is used in a dedicated CNN and then combined in a multi-headed CNN architecture, (v) LSTM\_UNIV\_5 – an LSTM with univariate open values of the last week as the input, (vi) LSTM\_UNIV\_10 – an LSTM model with the last couple of weeks' univariate open values as the input, (vii) LSTM\_UNIV\_ED\_10 – an LSTM having an encoding and decoding ability with univariate open values of the last couple of weeks as the input, (viii) LSTM\_MULTV\_ED\_10 – an LSTM based on encoding and decoding of the multivariate stock price data of five attributes of the last couple of weeks as the input, (ix) LSTM\_UNIV\_CNN\_10 – a model with an encoding CNN and a decoding LSTM with univariate open values of the last couple of weeks as the input, and (x) LSTM\_UNIV\_CONV\_10 – a model having a convolutional block for encoding and an LSTM block for decoding and with univariate open values of the last couple of weeks as the input.

We present a brief discussion on the model design. All the hyperparameters (i.e., the number of nodes in a layer, the size of a convolutional, LSTM or pooling layer, etc.) used in all the models are optimized using grid-search. However, we have not discussed the parameter optimization issues in this work.

### **3.1 The CNN\_UNIV\_5 model**

This CNN model is based on a univariate input of *open* values of the last week's stock price records. The model forecasts the following five values in the sequence as the predicted daily *open* index for the coming week. The model input has a shape (5, 1) as

the five values of the last week's daily *open* index are used as the input. Since the input data for the model is too small, a solitary convolutional block and a subsequent maxpooling block are deployed. The convolutional block has a feature space dimension of 16 and the filter (i.e., the kernel) size of 3. The convolutional block enables the model to read each input three times, and for each reading, it extracts 16 features from the input. Hence, the output data shape of the convolutional block is (3,16). The max-pooling layer reduces the dimension of the data by a factor of 1/2. Thus, the max-pooling operation transforms the data shape to (1, 16). The result of the max-pooling layer is transformed into an array structure of one-dimension by a flattening operation. This one-dimensional vector is then passed through a *dense layer* block and fed into the final output layer of the model. The output layer yields the five forecasted *open* values in sequence for the coming week. A batch size of 4 and an epoch number of 20 are used for training the model. The *rectified linear unit* (ReLU) activation function and the Adam optimizer for the *gradient descent algorithm* are used in all layers except the final output layer. In the output layer of the model, the sigmoid is used as the activation function. The use of the activation function and the optimizer is the same for all the models. The schematic architecture of the model is depicted in **Figure 1**.

We compute the number of trainable parameters in the CNN\_UNIV\_5 model. As the role of the input layer is to provide the input data to the network, there is no learning involved in the input layer. There is no learning in the pooling layers as all these layers do is calculate the local aggregate features. The flatten layers do not involve any learning as well. Hence, in a CNN model, the trainable parameters are involved only in the convolutional layers and the dense layers. The number of trainable parameters (*n1*) in a one-dimensional convolutional layer is given by (1), where *k* is the kernel size, and *d* and *f* are the sizes of the feature space in the previous layer and the current layer, respectively. Since each element in the feature space has a bias, the term 1 is added in (1)

**Figure 1.** *The schematic architecture of the model CNN\_UNIV\_5.*

*Design and Analysis of Robust Deep Learning Models for Stock Price Prediction DOI: http://dx.doi.org/10.5772/intechopen.99982*

$$n\_1 = (k \ast d + 1) \ast f \tag{1}$$

The number of parameters (*n2*) in a dense layer of a CNN is given by (2), in which *pcurrent* and *pprevious* refer to the node count in the current layer and the previous layer, respectively. The second term on the right-hand side of (2) refers to the *bias* terms for the nodes in the current layer.

$$m\_2 = \left(p\_{curr} \* p\_{prev}\right) + \mathbf{1} \* p\_{curr} \tag{2}$$

The computation of the number of parameters in the CNN\_UNIV\_5 model is presented in **Table 1**. It is observed that the model involves 289 trainable parameters. The number of parameters in the convolutional layer is 64, while the two dense layers involve 170 and 55 parameters, respectively.

### **3.2 The CNN\_UNIV\_10 model**

This model is based on a univariate input of the *open* values of the last couple of weeks'stock price data. The model computes the five forecasted daily *open* values in sequence for the coming week. The structure and the data flow for this model are identical to the CNN\_UNIV\_5 model. However, the input of the model has a shape of (10, 1). We use 70 epochs and 16 batch-size for training the model. **Figure 2** shows the architecture of the model CNN\_UNIV\_10. The computation of the number of parameters in the model CNN\_UNIV\_10 is exhibited in **Table 2.**

It is evident from **Table 2** that the CNN\_UNIV\_10 involves 769 trainable parameters. The parameter counts for the convolutional layer, and the two dense layers are 64, 650, and 55 respectively.

### **3.3 The CNN\_MULTV\_10 model**

This CNN model is built on the input of the last two weeks' multivariate stock price records data. The five variables of the stock price time series are used in a CNN in five separate channels. The model uses a couple of convolutional layers, each of size (32, 3). The parameter values of the convolutional blocks indicate that 32 features are extracted from the input data by each convolutional layer using a feature map size of 32 and a filter size of 3. The input to the model has a shape of (10, 5), indicating ten records, each record having five features of the stock price data. After the first convolutional operation, the shape of the data is transformed to (8, 32). The value 32 corresponds to the number of features extracted, while the value 8 is obtained by the formula: *f* = (*k* - *n*) +1, where, *k* = 10, *n* = 3, hence, *f* = 8. Similarly, the output data


**Table 1.**

*Computation of the number of params in the model CNN\_UNIV\_5.*

### **Figure 2.**

*The architecture of the model CNN\_UNIV\_10.*


### **Table 2.**

*The number of parameters in the model CNN\_UNIV\_10 model.*

shape of the second convolutional layer is (6, 32). A max-pooling layer reduces the feature space size by a factor of 1/2 producing an output data shape of (3, 32). The max-pooling block's output is then passed on to a third convolutional layer with a feature map of 16 and a kernel size of 3. The data shape of the output from the third convolutional layer becomes (1, 16) following the same computation rule. Finally, another max-pooling block receives the results of the final convolutional layer. This block does not reduce the feature space since the input data shape to it already (1, 16). Hence, and the output of the final max-pooling layer remains unchanged to (1,16). A flatten operation follows that converts the 16 arrays containing one value to a single array containing 16 values. The output of the flatten operation is passed on to a fully connected block having 100 nodes. Finally, the output block with five nodes computes the predicted daily open index of the coming week. The epochs size and the batch size used in training the model are 70 and 16, respectively. **Figure 3** depicts the CNN\_MULTV\_10 model. **Table 3** shows the computation of the number of trainable parameters involved in the model.

From **Table 3**, it is observed that the total number of trainable parameters in the model CNN\_MULTV\_10 is 7373. The three convolutional layers *conv1d\_4*, *conv1d\_5*, and *conv1d\_6* involve 512, 3014, and 1552 parameters, respectively. It is to be noted

*Design and Analysis of Robust Deep Learning Models for Stock Price Prediction DOI: http://dx.doi.org/10.5772/intechopen.99982*

### **Figure 3.** *The schematic architecture of the model CNN\_MULTV\_10.*


### **Table 3.**

*The number of parameters in the model CNN\_MULTV\_10.*

that the value of *k* for the first convolutional layer, *conv1d\_4*, is multiplied by a factor of five since there are five attributes in the input data for this layer. The two dense layers, *dense\_3* and *dense\_4* include 1700 and 505 parameters, respectively.
