**3. The properties and structure of Jordan pi-sigma neural network (JPSNN)**

The structure of JPSN is quite similar to the ordinary PSNN. The main difference is the architecture of JPSN is constructed by having a recurrent link from output layer back to the input layer. This structure gives the temporal dynamics of the time-series process that allows the network to compute in a more parsimonious way (Hussain & Liatsis, 2002). The architecture of the proposed JPSN is shown in Figure 2 below.

Fig. 2. The architecture of JPSN

An Application of Jordan Pi-Sigma

For each training example: 1. Calculate the output.

*h t <sup>l</sup>* can be calculated as:

of output range of [0,1].

4. Update the weight:

where the value of

minimising the following index:

where *ji h* is the output of summing unit and

Neural Network for the Prediction of Temperature Time Series Signal 279

where *d t <sup>j</sup>* denotes the target output at time *t* 1 . At each time *t* 1 , the output of each *y t <sup>j</sup>* is determined and the error *e t <sup>j</sup>* is calculated as the difference between the

> 1

*k L L y t f ht* 

where *h t <sup>L</sup>* represents the activation of the *L* unit at time *t*, and *y(t)* is the previous network output. The unit's transfer function *f* sigmoid activation function, which bounded

2. Compute the output error at time *(t)* using standard Mean Squared Error (MSE) by

1

where *ik z* denotes the output of the *k*-th node with respect to the *i*-th data, and *ntr* is the number of training sets. This step is completed repeatedly for all nodes on the current layer.

> 1 *m j ji k <sup>j</sup> w hx*

> > is the learning rate.

5. To accelerate the convergence of the error in the learning process, the momentum term,

*ww w ii i* 

is a user-selected positive constant 0 1

neurons are calculated and can be numerically expressed as

is added into Equation 3.6. Then, the values of the weight for the interconnection on

1 *ntr k i ki tr i E yz n*

3. By adapting the BP gradient descent algorithm, compute the weight changes:

1 1

*m m L Lm m Lm Lm Lm m m m h t w x t w w yt w z t*

1

<sup>2</sup>

(5)

(4)

1

1 1

(6)

*ww w ii i* (8)

(7)

(9)

actual value expected from each unit *i* and the predicted value *yjt* .

Generally, JPSN can be operated in the following steps:

where


Weights from the input layers *x t* to the summing units layer are tunable, while weights between the summing unit layers and the output layer are fixed to 1. The tuned weights are used for network testing to see how well the network model generalizes on unseen data. <sup>1</sup> *Z* denotes time delay operation.

Let the number of external inputs to the network be *M* and the number of the output be 1. Let *x t <sup>m</sup>* be the *m -th* external input to the network at time *t*. The overall input at time *t* is the concatenation of *yt* and *x t where k M <sup>k</sup>* , 1,..., *,* and is referred to *z t* where:

$$z\_k(t) = \begin{cases} x\_k(t) & \text{if } \quad 1 \le k \le M \\ 1 & \text{if } \quad k = M+1 \\ y\_k(t) & \text{if } \quad k = M+2 \end{cases} \tag{2}$$

Meanwhile, weights from *z t* to the summing unit are set to 1 in order to reduce the complexity of the network.

The proposed JPSN combines the properties of both PSNN and Recurrent Neural Network (RNN) so that better performance can be achieved. When utilizing the newly proposed JPSN as predictor for one-step-ahead prediction, the previous input values are used to predict the next elements in the data. Since network with recurrent connection holds several advantages over ordinary feedforward MLP especially in dealing with time-series problems, therefore, by adding the dynamic properties to the PSNN, this network may outperform the ordinary feedforward MLP and also the ordinary PSNN. Additionally, the unique architecture of JPSN may also avoid from the combinatorial explosion of higher-order terms as the network order increases.

#### **3.1 Learning algorithm of JPSN**

The supervised learning used in JPSN can be solved with the standard backpropagation (BP) gradient descent algorithm (Rumelhart *et al.*, 1986), with the recurrent link from output layer back to the input layer nodes. Since the same weights are used for all networks, the learning algorithm starts by initialising the weights to a small random value before training the weights. The JPSN is trained adaptively in which the errors produced are calculated and the overall error function *E* of the JPSN is defined as:

$$d\_j(t) = d\_j(t) - y\_j(t) \tag{3}$$

where *d t <sup>j</sup>* denotes the target output at time *t* 1 . At each time *t* 1 , the output of each *y t <sup>j</sup>* is determined and the error *e t <sup>j</sup>* is calculated as the difference between the actual value expected from each unit *i* and the predicted value *yjt* .

Generally, JPSN can be operated in the following steps:

For each training example:

278 Recurrent Neural Networks and Soft Computing

Weights from the input layers *x t* to the summing units layer are tunable, while weights between the summing unit layers and the output layer are fixed to 1. The tuned weights are used for network testing to see how well the network model generalizes on unseen data.

Let the number of external inputs to the network be *M* and the number of the output be 1. Let *x t <sup>m</sup>* be the *m -th* external input to the network at time *t*. The overall input at time *t* is the concatenation of *yt* and *x t where k M <sup>k</sup>* , 1,..., *,* and is referred to *z t* where:

> 1 1 1

> *x t if k M*

*y t if k M*

Meanwhile, weights from *z t* to the summing unit are set to 1 in order to reduce the

The proposed JPSN combines the properties of both PSNN and Recurrent Neural Network (RNN) so that better performance can be achieved. When utilizing the newly proposed JPSN as predictor for one-step-ahead prediction, the previous input values are used to predict the next elements in the data. Since network with recurrent connection holds several advantages over ordinary feedforward MLP especially in dealing with time-series problems, therefore, by adding the dynamic properties to the PSNN, this network may outperform the ordinary feedforward MLP and also the ordinary PSNN. Additionally, the unique architecture of JPSN may also avoid from the combinatorial explosion of higher-order terms as the network

The supervised learning used in JPSN can be solved with the standard backpropagation (BP) gradient descent algorithm (Rumelhart *et al.*, 1986), with the recurrent link from output layer back to the input layer nodes. Since the same weights are used for all networks, the learning algorithm starts by initialising the weights to a small random value before training the weights. The JPSN is trained adaptively in which the errors produced are calculated and

2

*et dt yt jjj* (3)

(2)

*k*

*k*

*z t if k M*

*k*

where

*x t* - the input nodes at *t*-*th* time

*wkj* - the trainable weights *h t <sup>k</sup>* 1 - the summing unit *y t* 1 - the output at time 1 *t yt* the output at time *t f* the activation function

<sup>1</sup> *Z* denotes time delay operation.

complexity of the network.

**3.1 Learning algorithm of JPSN** 

the overall error function *E* of the JPSN is defined as:

order increases.

1. Calculate the output.

$$y(t) = f\left(\prod\_{L=1}^{k} h\_L(t)\right) \tag{4}$$

*h t <sup>l</sup>* can be calculated as:

$$h\_L\left(t\right) = \sum\_{m=1}^{m} w\_{Lm} x\_m\left(t\right) + w\_{L\left(m\right)} + w\_{L\left(m+1\right)} \left. y\left(t-1\right) \right| = \sum\_{m=1}^{m+1} w\_{Lm} z\_m\left(t-1\right) \tag{5}$$

where *h t <sup>L</sup>* represents the activation of the *L* unit at time *t*, and *y(t)* is the previous network output. The unit's transfer function *f* sigmoid activation function, which bounded of output range of [0,1].

2. Compute the output error at time *(t)* using standard Mean Squared Error (MSE) by minimising the following index:

$$E\_k = \frac{1}{n\_{tr}} \sum\_{i=1}^{n\_{tr}} \left( y\_i - z\_{ki} \right)^2 \tag{6}$$

where *ik z* denotes the output of the *k*-th node with respect to the *i*-th data, and *ntr* is the number of training sets. This step is completed repeatedly for all nodes on the current layer.

3. By adapting the BP gradient descent algorithm, compute the weight changes:

$$
\Delta \mathbf{w}\_j = \eta \left( \prod\_{j \neq 1}^m h\_{ji} \right) \mathbf{x}\_k \tag{7}
$$


$$
\Delta w\_i = w\_i + \Delta w\_i \tag{8}
$$

5. To accelerate the convergence of the error in the learning process, the momentum term, is added into Equation 3.6. Then, the values of the weight for the interconnection on neurons are calculated and can be numerically expressed as

$$
\Delta w\_i = w\_i + \alpha \Delta w\_i \tag{9}
$$

where the value of is a user-selected positive constant 0 1 

An Application of Jordan Pi-Sigma

monotonically increasing function, 1 1 *<sup>x</sup> e*

min '

the range *new A new A* \_ min , \_ max .

suitable as the network inputs.

normalization.

*v A*

*A A*

Normalization can be implemented using the following equation:

Neural Network for the Prediction of Temperature Time Series Signal 281

To purify the data for further processing, it is needed to identify and remove the contaminating effects of the outlying objects on the data. Therefore, in this study, a Max-Min Normalization technique was used so that the data can be distributed evenly and scaled into an acceptable range. In order to avoid computational problems, the range was set between the upper and lower bound of the network transfer function, which often to be the

( \_ max \_ min ) \_ min max min

*v new A new A new A*

Let *A* be the temperature data of Batu Pahat region and *min A, max A* indicate the minimum and maximum values of data *A*. Max-Min Normalization maps a value *v* of data *A* to ' *v* in

In data normalization, the statistical distribution values for each input and output are roughly uniform. Therefore, removing the outliers should make the data more accurate. Figure 3 shows the daily temperature data of Batu Pahat region before normalization while Figure 4 shows the daily temperature data of Batu Pahat region after

Meanwhile, Figure 5 shows the frequency of temperature distribution data for 5-years after normalization process. From Figure 5, it can be seen that the histogram of the transformed data is symmetrical. Therefore, it can be said that the temperature data for Batu Pahat (after normalization) is relatively uniform, and closely follow the normal distribution, thus

Fig. 3. Daily Temperature Data of Batu Pahat Region (before normalization)

(10)

(Cybenko, 1989) between [0, 1]. The Max-Min

6. The JPSN algorithm is terminated when all the stopping criteria (training error, maximum epoch and early stopping) are satisfied. If not, repeat step 1)

The utilisation of product units in the output layer indirectly incorporates the capabilities of JPSN while using a small number of weights and processing units. Therefore, the proposed JPSN combines the properties of both PSNN and JNN so that better performance can be achieved. When utilising the newly proposed JPSN as predictor for one-step-ahead, the previous input values are used to predict the next element in the data. Since network with recurrent connection holds several advantages over ordinary feedforward networks especially in dealing with time-series problems, therefore, by adding the dynamic properties to the PSNN, this network may outperformed the MLP and also the ordinary PSNN. Additionally, the unique architecture of JPSN may also avoid from the combinatorial explosion of higher-order terms as the network order increases. The JPSN has a topology of a fully connected two-layered feedforward network. Considering the fixed weights that are not tuneable, it can be said that the summing layer is not "hidden" as in the case of the MLP. This is by means; such a network topology with only one layer of tuneable weights may reduce the training time.
