**Self Adaptive Genetic Algorithms for Automated Linear Modelling of Time Series**

Pedro Flores1, Larysa Burtseva2 and Luis B. Morales3 <sup>1</sup>*Universidad de Sonora* <sup>2</sup>*Universidad Autónoma de Baja California* <sup>3</sup>*Universidad Nacional Autónoma de México México*

#### **1. Introduction**

212 Bio-Inspired Computational Algorithms and Their Applications

Socolinsky D. A. and Selinger A., A Comparative Analysis of Face Recognition Performance

Srinivas M. and Patnaik L. M., Genetic Algorithms: a Survey, pp. 17 – 26, Vol. 27, Jun 1994. Tao L., An Adaptive And Integrated Neighborhood Dependent Approach For Nonlinear

Tao L., Tompkins R. C., and Asari K. V., An Illuminance Reflectance Model For Nonlinear

Wang A., Sun H. and Guan Y., The Application of Wavelet Transform to Multi-Modality

Wang H., Multisensory Image Fusion by using Discrete Multiwavelet Transform, *The Third* 

Wilson T. A., and Rogers S. K., Perceptual-Based Image Fusion for Hyperspectral Data, *IEEE* 

Yu L., Yung T., Chan K., Ho Y. and Ping Chu Y., Image Hiding with an improved Genetic

Zhang J., Feng X., Song B., Li M. and Lu Y., Multi-Focus Image Fusion using Quality

Zitova B. and Flusser J., Image Registration Methods: A Survey, *Image and Vision Computing* 

*2006 IEEE International Conference*, pp. 270-274, 2006.

*On Intelligent Systems Design And Applications*, 2008.

pp. 71 – 75, 25-27 May 2008.

*21*, pp. 977–1000, 2003.

*Trans. Ge. Remote Sensing*, Vol. 35, pp. 1007–1017, July 1997.

*Recognition*, Vol. 4, pp. 217 –222, 2002.

21, 2005.

2004.

with Visible and Thermal Infrared Imagery, *IEEE International Conference of Pattern* 

Enhancement Of Color Images, *SPIE Journal of Electronic Imaging*, pp. 1.1-1.14, 2005.

Enhancement Of Video Stream For Homeland Security Applications, *IEEE International Workshop on Applied Imagery and Pattern Recognition, AIPR*, October 19 -

Medical Image Fusion, *Networking, Sensing and Control, ICNSC Proceedings Of The* 

*International Conference on Machine Learning and Cybernetics, Shanghai*, 26-29 August

Algorithm and an Optimal Pixel Adjustment Process, *Eighth International Conference* 

Assessment of Spatial Domain And Genetic Algorithm, *Human System Interactions*,

In this work it is developed a methodological proposal to build linear models of Time Series (TS) from setting out the problem of obtaining a good linear model, such as solving a problem of nonlinear optimization with bounded variables. It is worth to mention that to build these problems are taken some ideas of the traditional statistical approach.

As product of the methodology here presented, it will be developed two heuristic algorithms for the treatment of TS, which allow building several models for the same problem, where the accuracy of these can be increased by increasing the number of terms of the model, situation that does not happen with the traditional statistical approach. Thus, with this algorithms it can be obtained several proposals of solution for the same problem, of which it can be selected the one that presents the best results in the forecasting. In addition, the algorithms proposed in this work allow building different linear versions, but equivalent to the Autoregressive (AR) and the classic Autoregressive with Moving Average (ARMS) models, with the added advantage of the possibility of obtaining models for not stationary TS, and with non stationary variance, in cases where the traditional methodology does not work.

Since optimization problems set out here may present multiple local minimums, it is needed to use a special technique to solve them. With this end it was developed a version of the Self Adaptive Genetic Algorithms (SAGA), encoded on real numbers that allows, without intervention of the user, to find satisfactory solutions for different problems without making changes in the parameters of the code.

On the other hand, among the principal points of this methodology it is the fact that in many cases, these linear versions present a phenomenon that has been named 'forecasting delay', which allows to modify the linear model obtained to find a more accurate forecasting.

It is important to notice that the first AR version of the algorithms developed for the TS were tested in the examples of the international competition:

#### **"NN3 Artificial Networks & Computational Intelligence Forecasting Competition"**

that from now on it will be called NN3, which was realized in 2006-2007 (http://www.neural-forecasting-competition.com/NN3/results.htm). This competition is

of the form:

terms utilized.

with *t* > *p*.

results.

models.

with bounded variables.

**3. Self adaptive genetic algorithms**

these values are called *delays* or *lags*.

*Zt* = *<sup>δ</sup>* + *<sup>φ</sup>*1*Zt*−<sup>1</sup> + *<sup>φ</sup>*2*Zt*−<sup>2</sup> + ... + *<sup>φ</sup>pZt*−*<sup>p</sup>* + *at* (1)

*Ft* = *<sup>δ</sup>* + *<sup>φ</sup>*1*Zt*−<sup>1</sup> + *<sup>φ</sup>*2*Zt*−<sup>2</sup> + ... + *<sup>φ</sup>pZt*−*<sup>p</sup>* (2)

(*Zi* − *Fi*)<sup>2</sup> (3)

Where *Zt* is the observable variable in question,*δ* and *φ<sup>i</sup>* are the parameters to be determined and the variable *at* represents a random variable of noise called *residual*. The expression (1) means that to predict what will happen at the time *t* are required the *p* values previous to *t*,

Self Adaptive Genetic Algorithms for Automated Linear Modelling of Time Series 215

In the classic theory of linear models is set the restriction that *at* represents a white noise, but in this work it was not included this boundary, which will allow to find AR expressions for

The interest in this type of models is originated in the fact that they represent the most important information about the behavior of the series eliminating the noise that may appear. It should also be added that, for these models, it is important that in the expression (1) only appears a number of terms set in advance. This will allow finding models for a TS, controlling the accuracy of the approximation of the same series according to the number of

the residuals with which it will be possible to increase the accuracy of the models.

**Problem 1:** If {*Zt*} is the original TS and {*Ft*} is the forecasting obtained of the form

It is necessary to find the values for *δ* and *φ<sup>i</sup>* that minimize the function:

 *N* ∑ *i*=*p*

This function will be called *Root of the Sum of Squares* (RSS). It is necessary to add that for rapidity in calculation it is preferable to use the square of this function obtaining the same

In this initial setting out the construction of the model is presented as if a linear interpolation problem was solved, and given that the values for *δ* and *φ<sup>i</sup>* will not be arbitrary but will be looked at certain intervals are necessary methods to solve the Problem 1 working in addition

The RSS function can have multiple local optima, and to solve this problem it was developed an original version of SAGA algorithms, which allows to solve real nonlinear optimization problems and with bounded variables. The selection of a self Adaptive version was carried out by the fact that it is wanted to automate as much as possible the process of building these

The SAGA algorithms were developed by Thomas Bäck (Bäck, 1992a, 1992b) and have the characteristic that they alone look for the best parameters for their operation. In them, the parameters that will be self Adaptive are encoded in the representation of the individual, for which they are altered by the actions of the genetic operators. With this, the best

held annually to evaluate the accuracy of methods in the area of Computational Intelligence in diverse problems of TS. In this edition the problem at hand was to forecast with the same methodology 18 future values of a set of series where the majority are measurements of real phenomena. The competition has two categories: the NN3-Complete has 111 problems and the NN3-Reduced consists of 11 problems. In this competition using only models with four terms it was obtained the third place in the category NN3-Complete from 29 competitors, and the sixth in the category NN3-Reduced from 53 competitors. This work will be referenced in various sections in relation to the examples of this competition. An analysis of the results of NN3 can be found in (Crone & Hibon & Nikolopoulos, 2011)

### **2. Methodology**

The forecasting process consists in calculating or predicting the value of some event that is going to happen in the future. To realize adequately this process it is needed to analyze the event data in question and build a model that allows the incorporation of the behavior patterns that have occurred in the past under the assumption that they can happen again in the future. It is important to note that there is not interest in explaining how the mechanism that produces the events works, but to predict their behavior.

The TS models are used for studying the behavior of data that varies with time. The data can be interpreted as measurements of some value (observable variable) of a phenomenon, realized at time intervals equal and consecutive. There are several methods to construct TS models and an overview of the most important can be found in (Weigend & Gershenfeld, 1994). In (Palit & Popovic, 2005) it is shown an overview of the methodologies most used in the area of computational intelligence. One of the most used methods is based on considering the TS as a realization of a stochastic process. This approach is the basis of statistical treatment of TS that can be found in (Box & Jenkins, 1976) and (Guerrero, 2003). Nowadays the construction of model for TS is an area of great development as evidenced by the articles of the Journal of Time Series Analysis (http://www.wiley.com/bw/journal.asp?ref=0143-9782&site=1) in addition to the papers presented in international competitions on time series modelling such as NN3. Nevertheless the existence of GA papers in which are used the TS (Alberto et all, 2010; Battaglia & Protopapas, 2011; Chiogna & Gaetan & Masarotto, 2008; Hansen et all, 1999; Mateo & Sovilj & Gadea, 2010; Szpiro, 1997; Yadavalli et all, 1999), it is important to note that it was not found any reference to the use of SAGA for this purpose.

The data will be represented by {*Zt*} with the implicit assumption that *t* takes the values 1, 2, ..., *N* where the parameter *N* indicates up to what moment the information is had. When it is had a model for the data set, then it can be estimated values for the TS, which are denoted by {*Ft*}. In addition in order to consider a model as a good one, it is required that the values of {*Ft*} be "similar" to those of {*Zt*}. The main purpose of this work is to build linear models for the data set to have good estimates of the *K* unknown values of the phenomenon being studied in the moments *N* + 1, *N* + 2, ..., *N* + *K*.

In the forecasting subject, when it is had a TS with these *N* + *K* data, the set of the first *N* is called training set, and is used to construct the model of the series and realize the estimation of its parameters. The set of the last *K* terms in the series is called training set, and is used for the comparison of different models to choose the most suitable. Especially, it is been interested in building automated Autoregressive models of order p (AR (p)). For the TS are expressions of the form:

2 Will-be-set-by-IN-TECH

held annually to evaluate the accuracy of methods in the area of Computational Intelligence in diverse problems of TS. In this edition the problem at hand was to forecast with the same methodology 18 future values of a set of series where the majority are measurements of real phenomena. The competition has two categories: the NN3-Complete has 111 problems and the NN3-Reduced consists of 11 problems. In this competition using only models with four terms it was obtained the third place in the category NN3-Complete from 29 competitors, and the sixth in the category NN3-Reduced from 53 competitors. This work will be referenced in various sections in relation to the examples of this competition. An analysis of the results of

The forecasting process consists in calculating or predicting the value of some event that is going to happen in the future. To realize adequately this process it is needed to analyze the event data in question and build a model that allows the incorporation of the behavior patterns that have occurred in the past under the assumption that they can happen again in the future. It is important to note that there is not interest in explaining how the mechanism

The TS models are used for studying the behavior of data that varies with time. The data can be interpreted as measurements of some value (observable variable) of a phenomenon, realized at time intervals equal and consecutive. There are several methods to construct TS models and an overview of the most important can be found in (Weigend & Gershenfeld, 1994). In (Palit & Popovic, 2005) it is shown an overview of the methodologies most used in the area of computational intelligence. One of the most used methods is based on considering the TS as a realization of a stochastic process. This approach is the basis of statistical treatment of TS that can be found in (Box & Jenkins, 1976) and (Guerrero, 2003). Nowadays the construction of model for TS is an area of great development as evidenced by the articles of the Journal of Time Series Analysis (http://www.wiley.com/bw/journal.asp?ref=0143-9782&site=1) in addition to the papers presented in international competitions on time series modelling such as NN3. Nevertheless the existence of GA papers in which are used the TS (Alberto et all, 2010; Battaglia & Protopapas, 2011; Chiogna & Gaetan & Masarotto, 2008; Hansen et all, 1999; Mateo & Sovilj & Gadea, 2010; Szpiro, 1997; Yadavalli et all, 1999), it is important to note that it was not found

The data will be represented by {*Zt*} with the implicit assumption that *t* takes the values 1, 2, ..., *N* where the parameter *N* indicates up to what moment the information is had. When it is had a model for the data set, then it can be estimated values for the TS, which are denoted by {*Ft*}. In addition in order to consider a model as a good one, it is required that the values of {*Ft*} be "similar" to those of {*Zt*}. The main purpose of this work is to build linear models for the data set to have good estimates of the *K* unknown values of the phenomenon being

In the forecasting subject, when it is had a TS with these *N* + *K* data, the set of the first *N* is called training set, and is used to construct the model of the series and realize the estimation of its parameters. The set of the last *K* terms in the series is called training set, and is used for the comparison of different models to choose the most suitable. Especially, it is been interested in building automated Autoregressive models of order p (AR (p)). For the TS are expressions

NN3 can be found in (Crone & Hibon & Nikolopoulos, 2011)

that produces the events works, but to predict their behavior.

any reference to the use of SAGA for this purpose.

studied in the moments *N* + 1, *N* + 2, ..., *N* + *K*.

**2. Methodology**

$$Z\_t = \delta + \phi\_1 Z\_{t-1} + \phi\_2 Z\_{t-2} + \dots + \phi\_p Z\_{t-p} + a\_t \tag{1}$$

Where *Zt* is the observable variable in question,*δ* and *φ<sup>i</sup>* are the parameters to be determined and the variable *at* represents a random variable of noise called *residual*. The expression (1) means that to predict what will happen at the time *t* are required the *p* values previous to *t*, these values are called *delays* or *lags*.

In the classic theory of linear models is set the restriction that *at* represents a white noise, but in this work it was not included this boundary, which will allow to find AR expressions for the residuals with which it will be possible to increase the accuracy of the models.

The interest in this type of models is originated in the fact that they represent the most important information about the behavior of the series eliminating the noise that may appear. It should also be added that, for these models, it is important that in the expression (1) only appears a number of terms set in advance. This will allow finding models for a TS, controlling the accuracy of the approximation of the same series according to the number of terms utilized.

**Problem 1:** If {*Zt*} is the original TS and {*Ft*} is the forecasting obtained of the form

$$F\_t = \delta + \phi\_1 Z\_{t-1} + \phi\_2 Z\_{t-2} + \dots + \phi\_p Z\_{t-p} \tag{2}$$

with *t* > *p*.

It is necessary to find the values for *δ* and *φ<sup>i</sup>* that minimize the function:

$$\sqrt{\sum\_{i=p}^{N} (Z\_i - F\_i)^2} \tag{3}$$

This function will be called *Root of the Sum of Squares* (RSS). It is necessary to add that for rapidity in calculation it is preferable to use the square of this function obtaining the same results.

In this initial setting out the construction of the model is presented as if a linear interpolation problem was solved, and given that the values for *δ* and *φ<sup>i</sup>* will not be arbitrary but will be looked at certain intervals are necessary methods to solve the Problem 1 working in addition with bounded variables.

The RSS function can have multiple local optima, and to solve this problem it was developed an original version of SAGA algorithms, which allows to solve real nonlinear optimization problems and with bounded variables. The selection of a self Adaptive version was carried out by the fact that it is wanted to automate as much as possible the process of building these models.

#### **3. Self adaptive genetic algorithms**

The SAGA algorithms were developed by Thomas Bäck (Bäck, 1992a, 1992b) and have the characteristic that they alone look for the best parameters for their operation. In them, the parameters that will be self Adaptive are encoded in the representation of the individual, for which they are altered by the actions of the genetic operators. With this, the best

an individual mutates, this is mutated as many times as the value of the integer part that has in the repetition of mutation of himself. To apply the mutation to an individual a coordinate of the vector is chosen at random, and it is changed its value for another (chosen also at random) between the limits established for the above mentioned coordinate. The multiple mutation is

Self Adaptive Genetic Algorithms for Automated Linear Modelling of Time Series 217

Since SAGA are random, it is common to realize several runs on the same problem and choose the best result from them. These algorithms are applied in three stages to solve our problems. In the first two stages are defined which are the important variables to solve the problem, and in the third stage, it is where the solution is calculated properly. It is important to note that the individuals have two parts, but in this section only there is born in mind the first part of the individual, which corresponds to the Autoregressive components. Here is the procedure

In the first stage are used SAGA to explore the space of solutions and later to define which variables among *δ*, *φ*1, *φ*2, ..., *φp*, are the most important for the problem in question. For this were done 10 repetitions of 1000 iterations each, and with the solutions of each repetition, a vector is constructed by the sum of the absolute values of *δ*, *φ*1, *φ*2, ..., *φp*. (see Figure 1)

In this first stage, the aim is to realize an exploration of the space of solutions, and for that are performed 10 iterations with all variables to consider. Then, with the 10 solutions obtained, a vector is built by adding the 10 solutions with all its positive components, and it is assumed

In the second stage the SAGA are applied to find solutions by considering only the important variables of the problem. For this is defined in advance how many variables are required (this will be seen to detail below), and are chosen those which correspond to larger values of the first stage. In this stage 5 repetitions are realized, where each one is finished until the optimum is not modified in the last 200 iterations. Of these 5 repetitions the best result

In this second stage, only are considered the variables that had greatest values in the part of the autoregressive components of the individual, and for them are kept the original intervals of its values: For all the other variables in this part of the individual, it is stated that the upper and lower limits are zero. In this stage 5 repetitions are realized and from them is chosen the

In the third stage it will be found the solution in which only are taken into account the important variables obtained in the previous stage. For this are extended the boundaries

the application of this procedure to the same individual several times.

based on SAGA, which is performed to obtain a solution to the problem.

that the largest values of these components are the most important.

**3.2 Use of the self adaptive genetic algorithms**

Fig. 1. Solution using all variables.

obtained is chosen (see Figure 2).

one that has lower value of RSS.

values of these parameters will produce better individuals, which have major probability of surviving, and in consequence, will spread towards the whole population the best values of the parameters. There are several versions of SAGA that differ especially in the parameters that will be adjusted automatically (Eiben at all, 1999). In the case of this work four self Adaptive parameters are used: *individual probability of crossing pc*, *repetition of crossing rc*, *individual probability of mutation pm* and *repetition of mutation rm* as presented in (4). The selection of these parameters and its values is based on the idea that genetic operations of crossing and mutation can be multiple, but they cannot have very large values. A binary version of this algorithm already had been used by one of the authors of this work in other problems (Flores, 1999; Garduño, 2000; Garduño, 2001; Sanchez, 2004), and this, as well as the presented one here (with the representation of real numbers) according to the literature reviewed, are original of himself.

The individuals for these problems will be proposed as solutions to them, and in addition will have four more components, where it will be represented the values of: *individual probability of crossing pc, repetition of crossing rc, individual probability of mutation pm* and *repetition of mutation rm*. To this section of the individual it is called section of the self Adaptive parameters, and with this, our entire individual is represented by:

$$(\delta\_{\prime}\phi\_{1\prime}\phi\_{2\prime},...,\phi\_{p\prime}\,\boldsymbol{p}\_{\circ\prime}\boldsymbol{r}\_{\circ\prime}\,\boldsymbol{p}\_{m\prime}\boldsymbol{r}\_{m\prime})\tag{4}$$

The above mentioned is necessary, so in this model, the probability of crossing and mutation will be characteristic of each individual (not of the population as is traditional in the GA), and in addition it is considered that the crossing and the mutation can be multiple, that is to say, to operate several times in the same time. The multiple crossing and mutation are repetitions of the crossing and mutation that are used in the GA, when are used individuals represented by vectors of real components. The way of operating with these parameters is similar to that presented in (Bäck, 1992a, 1992b).

The limits that were used in the code of this work for the self Adaptive parameters are: *individual probability of crossing pc* that changes in the interval (0.5, 0.95), *repetition of crossing rc* in (1.0, 4.0) what means that only can be crossed from one to three times, *individual probability of mutation pm* that varies in (0.5, 0.85) and *repetition of mutation rm* in (1.0, 5.0) what means that just it is possible to mutated from one to four times. The limits of these self Adaptive parameters were chosen on the basis of the experience of other works (Flores, 1999; Garduño, 2000; Garduño, 2001; Sanchez, 2004) , where they proved to give good results.

Later there are detailed the procedures of crossing and mutation.

#### **3.1 Crossing and mutation**

Given two individual, the crossing is realized taking as probability of crossing the average of the values of the individual crossings. Once it has been decided if the individuals cross, it is taken the integer part of the average individual crossing, and that is the number of times they cross. The crossing of two individuals consists of exchanging the coordinates of both vectors from a certain coordinate chosen at random. The multiple crossing is the result of applying this procedure several times to the same vectors.

For the mutation it is taken the individual probability of mutation of the individual, and accordingly to this it is decided whether mutated or not. As soon as has been decided that an individual mutates, this is mutated as many times as the value of the integer part that has in the repetition of mutation of himself. To apply the mutation to an individual a coordinate of the vector is chosen at random, and it is changed its value for another (chosen also at random) between the limits established for the above mentioned coordinate. The multiple mutation is the application of this procedure to the same individual several times.

#### **3.2 Use of the self adaptive genetic algorithms**

4 Will-be-set-by-IN-TECH

values of these parameters will produce better individuals, which have major probability of surviving, and in consequence, will spread towards the whole population the best values of the parameters. There are several versions of SAGA that differ especially in the parameters that will be adjusted automatically (Eiben at all, 1999). In the case of this work four self Adaptive parameters are used: *individual probability of crossing pc*, *repetition of crossing rc*, *individual probability of mutation pm* and *repetition of mutation rm* as presented in (4). The selection of these parameters and its values is based on the idea that genetic operations of crossing and mutation can be multiple, but they cannot have very large values. A binary version of this algorithm already had been used by one of the authors of this work in other problems (Flores, 1999; Garduño, 2000; Garduño, 2001; Sanchez, 2004), and this, as well as the presented one here (with the representation of real numbers) according to the literature

The individuals for these problems will be proposed as solutions to them, and in addition will have four more components, where it will be represented the values of: *individual probability of crossing pc, repetition of crossing rc, individual probability of mutation pm* and *repetition of mutation rm*. To this section of the individual it is called section of the self Adaptive parameters, and

The above mentioned is necessary, so in this model, the probability of crossing and mutation will be characteristic of each individual (not of the population as is traditional in the GA), and in addition it is considered that the crossing and the mutation can be multiple, that is to say, to operate several times in the same time. The multiple crossing and mutation are repetitions of the crossing and mutation that are used in the GA, when are used individuals represented by vectors of real components. The way of operating with these parameters is similar to that

The limits that were used in the code of this work for the self Adaptive parameters are: *individual probability of crossing pc* that changes in the interval (0.5, 0.95), *repetition of crossing rc* in (1.0, 4.0) what means that only can be crossed from one to three times, *individual probability of mutation pm* that varies in (0.5, 0.85) and *repetition of mutation rm* in (1.0, 5.0) what means that just it is possible to mutated from one to four times. The limits of these self Adaptive parameters were chosen on the basis of the experience of other works (Flores, 1999; Garduño,

Given two individual, the crossing is realized taking as probability of crossing the average of the values of the individual crossings. Once it has been decided if the individuals cross, it is taken the integer part of the average individual crossing, and that is the number of times they cross. The crossing of two individuals consists of exchanging the coordinates of both vectors from a certain coordinate chosen at random. The multiple crossing is the result of applying

For the mutation it is taken the individual probability of mutation of the individual, and accordingly to this it is decided whether mutated or not. As soon as has been decided that

2000; Garduño, 2001; Sanchez, 2004) , where they proved to give good results.

Later there are detailed the procedures of crossing and mutation.

this procedure several times to the same vectors.

(*δ*, *φ*1, *φ*2, ..., *φp*, *pc*,*rc*, *pm*,*rm*) (4)

reviewed, are original of himself.

presented in (Bäck, 1992a, 1992b).

**3.1 Crossing and mutation**

with this, our entire individual is represented by:

Since SAGA are random, it is common to realize several runs on the same problem and choose the best result from them. These algorithms are applied in three stages to solve our problems. In the first two stages are defined which are the important variables to solve the problem, and in the third stage, it is where the solution is calculated properly. It is important to note that the individuals have two parts, but in this section only there is born in mind the first part of the individual, which corresponds to the Autoregressive components. Here is the procedure based on SAGA, which is performed to obtain a solution to the problem.

In the first stage are used SAGA to explore the space of solutions and later to define which variables among *δ*, *φ*1, *φ*2, ..., *φp*, are the most important for the problem in question. For this were done 10 repetitions of 1000 iterations each, and with the solutions of each repetition, a vector is constructed by the sum of the absolute values of *δ*, *φ*1, *φ*2, ..., *φp*. (see Figure 1)

Fig. 1. Solution using all variables.

In this first stage, the aim is to realize an exploration of the space of solutions, and for that are performed 10 iterations with all variables to consider. Then, with the 10 solutions obtained, a vector is built by adding the 10 solutions with all its positive components, and it is assumed that the largest values of these components are the most important.

In the second stage the SAGA are applied to find solutions by considering only the important variables of the problem. For this is defined in advance how many variables are required (this will be seen to detail below), and are chosen those which correspond to larger values of the first stage. In this stage 5 repetitions are realized, where each one is finished until the optimum is not modified in the last 200 iterations. Of these 5 repetitions the best result obtained is chosen (see Figure 2).

In this second stage, only are considered the variables that had greatest values in the part of the autoregressive components of the individual, and for them are kept the original intervals of its values: For all the other variables in this part of the individual, it is stated that the upper and lower limits are zero. In this stage 5 repetitions are realized and from them is chosen the one that has lower value of RSS.

In the third stage it will be found the solution in which only are taken into account the important variables obtained in the previous stage. For this are extended the boundaries

that once it is built a good AR model of TS, it can be build for it another AR model for residuals {*at*}, which together with the original one allow obtaining the equivalent of an ARMA model,

Self Adaptive Genetic Algorithms for Automated Linear Modelling of Time Series 219

On the other hand, to solve the problem 1 it is first necessary to address the following questions, taking into account that is necessary to find an AR model with K terms, where

The following summarizes the results of the BJ methodology that is used in our proposal.

Univarieted TS were analyzed by the Box-Jenkins (BJ) methodology from the formulation of equations in differences with a random additive component denominated white noise. For these BJ models the conditions in which is presented the stationarity property of the series and the scheme that has to be follow to determine the parameters of the particular model

The most general model is denominated ARMA(p,q) (Autoregressive Moving Average) and indicates the presence of autoregressive components both in the observable variable {*Zt*} as well as in the white noise {*at*}. A particular class of model for stationary series corresponds to the Autoregressive models AR(p) (that are denoted as AR), which is represented by the

When the series is stationary *δ* and *φ<sup>i</sup>* are constants that satisfy the following relations:

*<sup>μ</sup>* <sup>=</sup> *<sup>δ</sup>*

1 − ∑ *φ<sup>i</sup>*

∑*φ<sup>i</sup>* < 1

Where *μ* represents the average of the series { *Ft* }. The relations in (6) are a consequence of the

The correlation structure presented by a TS related to an AR model for separate observations

*pk* = *<sup>φ</sup>*<sup>1</sup> *pk*−<sup>1</sup> + *<sup>φ</sup>*<sup>2</sup> *pk*−<sup>2</sup> + ... + *<sup>φ</sup><sup>p</sup> pk*−*<sup>p</sup>*

where *pk* is the autocorrelation for data of series separated *k* time units. From the initial conditions that satisfy this equation in differences are presented the following possible

stationarity property and can be consulted in (Box & Jenkins, 1976).

k time units is given by the autocorrelation function:

*Zt* = *<sup>δ</sup>* + *<sup>φ</sup>*1*Zt*−<sup>1</sup> + *<sup>φ</sup>*2*Zt*−<sup>2</sup> + ... + *<sup>φ</sup>pZt*−*<sup>p</sup>* + *at* (5)


but with major forecasting possibilities.

1. How many p terms must be considered?

**4.1 Main results of the Box Jenkins methodology**

2. At what intervals are the coefficients of the linear expression? 3. What K terms are most appropriate to solve this problem?

4. What are the values of this K terms that minimize the function (3)?

K is established beforehand:

were studied.

expression:

and

Fig. 2. Solution using the most important values.

of the variables of the solution obtained in the previous stage, which absolute value is grater than 0.01. The upper limits of the variables considered are the nonzero values obtained in the previous best solution of more than 1.0 and the lower with less than 1.0. The upper and lower limits of the other variables of the autoregressive components of the individual will be zero. With these limits is solved once until the optimum is not improved in 250 iterations. Since the GA are random for each problem were performed 5 iterations, and of them it was chosen the best.

The main characteristics of the SAGA version used in this work that make them original are:


Above all, the last three features are inspired by the fact that the nature behavior is more flexible than rigid, and therefore should be allowed more variability within the SAGA.

The main disadvantage that has the use of SAGA, is the major computational cost compared with traditional versions, but the advantage that is obtained is that with the same code it is possible to solve automatically all the problems of nonlinear modelling of TS.

#### **4. Autoregressive models**

The TS linear models are important because there are many applications where linear estimations are sufficient, besides they have a wide use in industrial situations. On the other hand, are also important because there are other methodologies that use forecasting (Medeiros & Veiga, 2000,2005). The classic reference for the treatment of linear models is (Box & Jenkins, 1976).

In the specific case of the AR that we care for TS, the value at a certain time should be calculated as a linear expression of the values of a certain number of previous measurements, as described in (Box & Jenkins, 1976). The AR models developed here fulfill the stochastic process of the residuals {*at*} associated with them; it is not a white noise. The latter will allow that once it is built a good AR model of TS, it can be build for it another AR model for residuals {*at*}, which together with the original one allow obtaining the equivalent of an ARMA model, but with major forecasting possibilities.

On the other hand, to solve the problem 1 it is first necessary to address the following questions, taking into account that is necessary to find an AR model with K terms, where K is established beforehand:


The following summarizes the results of the BJ methodology that is used in our proposal.

#### **4.1 Main results of the Box Jenkins methodology**

Univarieted TS were analyzed by the Box-Jenkins (BJ) methodology from the formulation of equations in differences with a random additive component denominated white noise. For these BJ models the conditions in which is presented the stationarity property of the series and the scheme that has to be follow to determine the parameters of the particular model were studied.

The most general model is denominated ARMA(p,q) (Autoregressive Moving Average) and indicates the presence of autoregressive components both in the observable variable {*Zt*} as well as in the white noise {*at*}. A particular class of model for stationary series corresponds to the Autoregressive models AR(p) (that are denoted as AR), which is represented by the expression:

$$Z\_t = \delta + \phi\_1 Z\_{t-1} + \phi\_2 Z\_{t-2} + \dots + \phi\_p Z\_{t-p} + a\_t \tag{5}$$

When the series is stationary *δ* and *φ<sup>i</sup>* are constants that satisfy the following relations:

$$|\phi\_1| < 1,\tag{6}$$

$$\mu = \frac{\delta}{1 - \sum \phi\_i}$$

$$\sum \phi\_i < 1$$

and

6 Will-be-set-by-IN-TECH

of the variables of the solution obtained in the previous stage, which absolute value is grater than 0.01. The upper limits of the variables considered are the nonzero values obtained in the previous best solution of more than 1.0 and the lower with less than 1.0. The upper and lower limits of the other variables of the autoregressive components of the individual will be zero. With these limits is solved once until the optimum is not improved in 250 iterations. Since the GA are random for each problem were performed 5 iterations, and of them it was chosen the

The main characteristics of the SAGA version used in this work that make them original are: • Real coding is used for the variables of the problem. This allows a more simple code that

• The probabilities of crossing and mutation are characteristics of each individual and the crossing and mutation procedures are established on the basis of these individual

• The repetitions of crossing and mutation are multiple though the values that take are not

• It was introduced a control mechanism that prevents the proliferation within the population of the best individual copies, thus eliminating the risk of premature

Above all, the last three features are inspired by the fact that the nature behavior is more flexible than rigid, and therefore should be allowed more variability within the SAGA.

The main disadvantage that has the use of SAGA, is the major computational cost compared with traditional versions, but the advantage that is obtained is that with the same code it is

The TS linear models are important because there are many applications where linear estimations are sufficient, besides they have a wide use in industrial situations. On the other hand, are also important because there are other methodologies that use forecasting (Medeiros & Veiga, 2000,2005). The classic reference for the treatment of linear models is (Box & Jenkins,

In the specific case of the AR that we care for TS, the value at a certain time should be calculated as a linear expression of the values of a certain number of previous measurements, as described in (Box & Jenkins, 1976). The AR models developed here fulfill the stochastic process of the residuals {*at*} associated with them; it is not a white noise. The latter will allow

possible to solve automatically all the problems of nonlinear modelling of TS.

can easily pass from one stage to another of those presented here.

Fig. 2. Solution using the most important values.

best.

characteristics.

convergence.

**4. Autoregressive models**

very big.

1976).

Where *μ* represents the average of the series { *Ft* }. The relations in (6) are a consequence of the stationarity property and can be consulted in (Box & Jenkins, 1976).

The correlation structure presented by a TS related to an AR model for separate observations k time units is given by the autocorrelation function:

$$p\_k = \phi\_1 p\_{k-1} + \phi\_2 p\_{k-2} + \dots + \phi\_p p\_{k-p}$$

where *pk* is the autocorrelation for data of series separated *k* time units. From the initial conditions that satisfy this equation in differences are presented the following possible

lim *<sup>i</sup>*→<sup>∞</sup> *<sup>φ</sup><sup>i</sup>* <sup>=</sup> <sup>0</sup> The above mention allows that with a finite number of terms could be obtained an expression that satisfies the form (1) for the series. This means that only the ARIMA models that have the invertibility property can be approximated by an AR model of the form (1).

Self Adaptive Genetic Algorithms for Automated Linear Modelling of Time Series 221

(b). Estimation of the model parameters by means of non linear estimation techniques. (c). Checking that the model provides an adequate fitting and that the basic assumptions implied in the model are satisfied through the analysis of the residuals behavior. Is important to mention that our proposal does not need such analysis because the residuals

Next are presented the characteristics of the heuristic proposed algorithms. Note that these algorithms are used to build models AR of TS since the ARMA models are built from these.

(a). Regardless the original series type (stationary or non stationary) the model looked will

(b). To determine how many delays p are required, first is necessary to choose the differences series that will be used to estimate these, afterwards it is defined the number of delays according to the behavior of the autocorrelation sample function of the difference series chosen. This implies a difference with the BJ methodology, which applies the number of delays under the terms of the information that provides both the autocorrelation function as well as the partial autocorrelation function and the hypothesis of the random component as white noise. This choice has as consequence in the models developed here that at will

(c). The conditions of (6) become more relax, since in spite of be satisfied it in the stationary

It is necessary to add that the heuristic algorithms presented here allow the treatment of series with trend and variance time-dependant, since they do not require the conditions that traditionally are asked to the TS, as is the fact that they are stationary or of stationary variance

The first algorithm that we propose builds a linear approximation for the series of differences (of first, second or third order) that could be stationary. Then, from this linear approximation

In this stage, first it is decided which series will be used to work with among the original, the first differences, the second differences and in our case it is included the possibility of working with third differences series. In order to decide this it is chosen the series that have the lowest variance, which we consider as an indication of having a stationary series (Box & Jenkins,

series, in this work these will be applied to series that could not be stationary.

or that they result from applying a logarithmic transformation or moving averages.

and using the result 1, it is built another linear model of the original series.

The heuristic algorithms built in this work are based in the following assumptions:

do not correspond, in general, to the white noise.

always be of the form AR presented in (1).

(d). Use of the model.

**4.2 Proposed algorithms**

not be white noise.

**4.2.1 First algorithm**

1976).

behaviors: exponential or sinusoidal decay. This permits to determine if a series is stationary or not.

The most general model is the model ARIMA (p, d, q) (AutoRegressive Integrated Moving Average processes) that includes not stationary series for which apply differences of order d to stationarize it:

$$
\phi\_p(B)\nabla^d z\_t = \delta + \theta\_q(B)a\_t
$$

Where *φp*(*B*), *θq*(*B*) and *B* are operators that satisfy the following relations:

$$\phi\_p(B)z\_t = (1 - \phi\_1 B - \phi\_2 B^2 - \dots - \phi\_p B^p) = z\_t - \phi\_1 z\_{t-1} - \dots - \phi\_p z\_{t-p}$$

and

$$\theta\_q(\mathcal{B})a\_t = (1 - \theta\_1 \mathcal{B} - \theta\_2 \mathcal{B}^2 - \dots - \theta\_q \mathcal{B}^q)a\_t = a\_t - \theta\_1 a\_{t-1} - \dots - \theta\_q a\_{t-q}$$

$$\mathcal{B}^k z\_t = z\_{t-k}$$

$$\nabla^d z\_t = (1 - \mathcal{B})^d z\_t$$

Similarly, there is a general model that considers the presence of stationarity or cyclic movement of short term of longitude s modeled by the expression:

$$
\phi\_P(B^s)\phi\_p(B)\nabla^{\mathcal{A}}z\_t = \delta + \theta\_Q(B^s)\theta\_q(B)a\_t,
$$

Where *φP*(*Bs*) y *θQ*(*Bs*) are polynomial operators similar to the above mention, but its powers are multiples of *s*, {*at*} are residuals in the moment *t* and *θ<sup>t</sup>* are its components in the part of moving averages.

BJ methodology satisfies the following stages:

(a). Identification of a possible model among the ARIMA type models. To accomplish this first is necessary to determine if the series is stationary or not. When an observed series is not stationary the difference operator is applied:

$$\nabla z\_t = z\_t - z\_{t-1}$$

as many times as it will be necessary up to stationarity. To avoid overdifferentiation it is calculated the variances of the new obtained series choosing the one with the smallest value.

When a series is stationary in its mean, but its variance is increasing or decreasing according to BJ methodology it should be applied a transformation (generally logarithmic) for the stability of the variance. It is important to notice that this is not necessary in our proposal.

Given a stationary series the behavior pattern of the autocorrelation function and the partial autocorrelation indicate the possible number of parameters i and j that the model should have.

Besides the presence of stationarity in a temporal series there is other property that is required in the ARIMA models denominated invertibility, which permits to represent the series as an autoregressive model of infinite extension that satisfy the condition:

$$\lim\_{i \to \infty} \phi\_i = 0$$

The above mention allows that with a finite number of terms could be obtained an expression that satisfies the form (1) for the series. This means that only the ARIMA models that have the invertibility property can be approximated by an AR model of the form (1).


8 Will-be-set-by-IN-TECH

behaviors: exponential or sinusoidal decay. This permits to determine if a series is stationary

The most general model is the model ARIMA (p, d, q) (AutoRegressive Integrated Moving Average processes) that includes not stationary series for which apply differences of order d

*<sup>φ</sup>p*(*B*)∇*dzt* <sup>=</sup> *<sup>δ</sup>* <sup>+</sup> *<sup>θ</sup>q*(*B*)*at*

*<sup>φ</sup>p*(*B*)*zt* = (<sup>1</sup> <sup>−</sup> *<sup>φ</sup>*1*<sup>B</sup>* <sup>−</sup> *<sup>φ</sup>*2*B*<sup>2</sup> <sup>−</sup> ... <sup>−</sup> *<sup>φ</sup>pBp*) = *zt* <sup>−</sup> *<sup>φ</sup>*1*zt*−<sup>1</sup> <sup>−</sup> ... <sup>−</sup> *<sup>φ</sup>pzt*−*<sup>p</sup>*

*<sup>θ</sup>q*(*B*)*at* = (<sup>1</sup> <sup>−</sup> *<sup>θ</sup>*1*<sup>B</sup>* <sup>−</sup> *<sup>θ</sup>*2*B*<sup>2</sup> <sup>−</sup> ... <sup>−</sup> *<sup>θ</sup>qBq*)*at* <sup>=</sup> *at* <sup>−</sup> *<sup>θ</sup>*1*at*−<sup>1</sup> <sup>−</sup> ... <sup>−</sup> *<sup>θ</sup><sup>q</sup> at*−*<sup>q</sup> <sup>B</sup>kzt* <sup>=</sup> *zt*−*<sup>k</sup>* <sup>∇</sup>*dzt* = (<sup>1</sup> <sup>−</sup> *<sup>B</sup>*)*dzt* Similarly, there is a general model that considers the presence of stationarity or cyclic

)*φp*(*B*)∇*dzt* <sup>=</sup> *<sup>δ</sup>* <sup>+</sup> *<sup>θ</sup>Q*(*B<sup>s</sup>*

Where *φP*(*Bs*) y *θQ*(*Bs*) are polynomial operators similar to the above mention, but its powers are multiples of *s*, {*at*} are residuals in the moment *t* and *θ<sup>t</sup>* are its components in the part of

(a). Identification of a possible model among the ARIMA type models. To accomplish this first is necessary to determine if the series is stationary or not. When an observed series is

∇*zt* = *zt* − *zt*−<sup>1</sup> as many times as it will be necessary up to stationarity. To avoid overdifferentiation it is calculated the variances of the new obtained series choosing the one with the smallest

When a series is stationary in its mean, but its variance is increasing or decreasing according to BJ methodology it should be applied a transformation (generally logarithmic) for the stability of the variance. It is important to notice that this is not necessary in our

Given a stationary series the behavior pattern of the autocorrelation function and the partial autocorrelation indicate the possible number of parameters i and j that the model

Besides the presence of stationarity in a temporal series there is other property that is required in the ARIMA models denominated invertibility, which permits to represent the

series as an autoregressive model of infinite extension that satisfy the condition:

)*θq*(*B*)*at*

Where *φp*(*B*), *θq*(*B*) and *B* are operators that satisfy the following relations:

movement of short term of longitude s modeled by the expression:

*φP*(*B<sup>s</sup>*

BJ methodology satisfies the following stages:

not stationary the difference operator is applied:

or not.

and

to stationarize it:

moving averages.

value.

proposal.

should have.

Next are presented the characteristics of the heuristic proposed algorithms. Note that these algorithms are used to build models AR of TS since the ARMA models are built from these.

#### **4.2 Proposed algorithms**

The heuristic algorithms built in this work are based in the following assumptions:


It is necessary to add that the heuristic algorithms presented here allow the treatment of series with trend and variance time-dependant, since they do not require the conditions that traditionally are asked to the TS, as is the fact that they are stationary or of stationary variance or that they result from applying a logarithmic transformation or moving averages.

The first algorithm that we propose builds a linear approximation for the series of differences (of first, second or third order) that could be stationary. Then, from this linear approximation and using the result 1, it is built another linear model of the original series.

#### **4.2.1 First algorithm**

In this stage, first it is decided which series will be used to work with among the original, the first differences, the second differences and in our case it is included the possibility of working with third differences series. In order to decide this it is chosen the series that have the lowest variance, which we consider as an indication of having a stationary series (Box & Jenkins, 1976).

**4.2.2 Second algorithm**

series.

**5. NN3 results**

that used in NN3.

equal to:

The second algorithm only utilizes of the BJ methodology the estimation of how many terms are necessary in the linear approximation of the series of differences, which could be stationary, thus, from this is determined the numbers of terms that will be used in the original

Self Adaptive Genetic Algorithms for Automated Linear Modelling of Time Series 223

From now on, are applied the stages presented in section 3.2, taking the limits of all the coefficients in [−1, 1], but always working with the original series. There is not a result that justifies the use of these limits, and only it has been found a reference (Cortez at all, 2004) where it is used. On the other hand is a fact that a high percentage of cases in the NN3 presented better results with this algorithm than with the first. As an example of this the

The international competition *NN3 Artificial Neural Network & Computational Intelligence Forecasting Competition 2007* aims at assessing the latest methodologies for the forecasting of TS. This competition is open to use methods based on Neural Networks, Fuzzy Logic, Genetic Algorithms and others in the area of artificial intelligence. The problems in question are presented in two groups called NN3-Complete (with 111 examples of TS) and NN3-Reduced (with 11 examples), and the purpose of the competition is to obtain the best models for each example of the two sets using the same methodology. The notation of this section is similar to

To evaluate the performance of a model in some example *s*, it is estimated the forecasting F and it is measured the performance with the average of the indicator *Symmteric Mean Absolute Percent Error* SMAPE in all the values of the series. The SMAPE measures the absolute symmetric error in percentage between the real values of the original series Z and the forecasting F for all observations *t* of the test set of size *n* for each series *s* with SMAPE


and finally it is averaged over all examples in the same set of data. Other measures of

This indicator can evaluate the performance of applying different methodologies on the same set of data and the methodology that produces the lowest value is considered the best. In the set NN3-Complete the best result was of 14.84% and applying the algorithms developed in this work was of 16.31%. In the NN3-Reduced the results were 13.07% and 15.00% respectively. However, it is possible to build linear models with the methodology presented

• Although the competition was intended to determine the best model for each example in this work was found an AR model with 4 terms for each example. It is expected that if it is divided the series in a training set and in other set of test it can be found models with

• It were not used ARMA models that include the behavior of the residuals or the

(*zt* <sup>+</sup> *ft*)/2 <sup>∗</sup> <sup>100</sup> (8)

1 *n*

higher forecasting capacity that improve the results obtained.

advancement of forecasting that substantially improve the results.

in this work to improve these results because:

*n* ∑ *t*=1

forecasting accuracy of a model can be found in (Hyndman & Koehler, 2006).

second algorithm outperformed the first in 46 of the 111 examples of NN3-Complete.

Once that was chosen the series to work with it will be estimated how many terms are necessary for the linear approximation of the series with base in the autocorrelation function. In this work were calculated 30 values for the autocorrelation function and for selecting how many terms are required two cases were utilized. If the function is decreasing a value of 4 is taken, on the contrary a value equal to the value in which the first maximum of this function is observed it will be chosen (see Figure 3). With this procedure if the series presents stationarity and the period is smaller than 30 the models that are built here can represent appropriately such stationarity.

Fig. 3. Possible autocorrelation function graphs.

With this information are built the limits for the coefficients intervals of the chosen series, for that are taken all the *φ<sup>i</sup>* in [−1, 1] except the independent term *δ* which limits are calculated between zero an the average value of the series. The reason why these limits are established is obtained from the equations presented in (6) With all the previous information it is complete the proposal of the *p* number of terms required and that are the limits of its coefficients. From this information is solved the **problem 1** applying the SAGA in the first two stages depicted in section 3.2 with base on the following:

**Result 1.** If {*yt*} is a difference series for {*xt*} with a model

$$y\_t = h\_0 + h\_1 y\_{t-1} + h\_2 y\_{t-2} + \dots + + h\_k y\_{t-k}$$

then, for the difference series with terms *yt* = *xt* − *xt*−<sup>1</sup> must be

$$\mathbf{x}\_{t} = h\_{0} + (1 + h\_{1})\mathbf{x}\_{t-1} + (h\_{2} - h\_{1})\mathbf{x}\_{t-2} + \dots + (h\_{k} - h\_{k-1})y\_{t-k} - h\_{k}y\_{t-k-1} \tag{7}$$

is a model for the series {*xt*}.

From this result two important consequences are obtained:


Applying the **result 1** as many times as necessary, it can be obtained a model for the original series, and to this model it is applied the stage three of section 3.2 to obtain a linear model for the TS. Note that if it is had a model AR for some series of differences, the model built for the original series has more terms than the series of differences, so if *K* terms are needed for the original series, then must be found models for the series of differences of less terms that *K*.

#### **4.2.2 Second algorithm**

10 Will-be-set-by-IN-TECH

Once that was chosen the series to work with it will be estimated how many terms are necessary for the linear approximation of the series with base in the autocorrelation function. In this work were calculated 30 values for the autocorrelation function and for selecting how many terms are required two cases were utilized. If the function is decreasing a value of 4 is taken, on the contrary a value equal to the value in which the first maximum of this function is observed it will be chosen (see Figure 3). With this procedure if the series presents stationarity and the period is smaller than 30 the models that are built here can represent appropriately

With this information are built the limits for the coefficients intervals of the chosen series, for that are taken all the *φ<sup>i</sup>* in [−1, 1] except the independent term *δ* which limits are calculated between zero an the average value of the series. The reason why these limits are established is obtained from the equations presented in (6) With all the previous information it is complete the proposal of the *p* number of terms required and that are the limits of its coefficients. From this information is solved the **problem 1** applying the SAGA in the first two stages depicted

*yt* = *<sup>h</sup>*<sup>0</sup> + *<sup>h</sup>*1*yt*−<sup>1</sup> + *<sup>h</sup>*2*yt*−<sup>2</sup> + ... + +*hkyt*−*<sup>k</sup>*

• If *yt* has a coefficient value between −1.0 and 1.0, the coefficient of *xt* may not be in this

Applying the **result 1** as many times as necessary, it can be obtained a model for the original series, and to this model it is applied the stage three of section 3.2 to obtain a linear model for the TS. Note that if it is had a model AR for some series of differences, the model built for the original series has more terms than the series of differences, so if *K* terms are needed for the original series, then must be found models for the series of differences of less terms that *K*.

*xt* = *<sup>h</sup>*<sup>0</sup> + (<sup>1</sup> + *<sup>h</sup>*1)*xt*−<sup>1</sup> + (*h*<sup>2</sup> − *<sup>h</sup>*1)*xt*−<sup>2</sup> + ... + (*hk* − *hk*−1)*yt*−*<sup>k</sup>* − *hkyt*−*k*−<sup>1</sup> (7)

such stationarity.

Fig. 3. Possible autocorrelation function graphs.

in section 3.2 with base on the following:

is a model for the series {*xt*}.

range.

**Result 1.** If {*yt*} is a difference series for {*xt*} with a model

then, for the difference series with terms *yt* = *xt* − *xt*−<sup>1</sup> must be

From this result two important consequences are obtained:

• The model for the series {*xt*} has one term more than the series {*yt*}

The second algorithm only utilizes of the BJ methodology the estimation of how many terms are necessary in the linear approximation of the series of differences, which could be stationary, thus, from this is determined the numbers of terms that will be used in the original series.

From now on, are applied the stages presented in section 3.2, taking the limits of all the coefficients in [−1, 1], but always working with the original series. There is not a result that justifies the use of these limits, and only it has been found a reference (Cortez at all, 2004) where it is used. On the other hand is a fact that a high percentage of cases in the NN3 presented better results with this algorithm than with the first. As an example of this the second algorithm outperformed the first in 46 of the 111 examples of NN3-Complete.

#### **5. NN3 results**

The international competition *NN3 Artificial Neural Network & Computational Intelligence Forecasting Competition 2007* aims at assessing the latest methodologies for the forecasting of TS. This competition is open to use methods based on Neural Networks, Fuzzy Logic, Genetic Algorithms and others in the area of artificial intelligence. The problems in question are presented in two groups called NN3-Complete (with 111 examples of TS) and NN3-Reduced (with 11 examples), and the purpose of the competition is to obtain the best models for each example of the two sets using the same methodology. The notation of this section is similar to that used in NN3.

To evaluate the performance of a model in some example *s*, it is estimated the forecasting F and it is measured the performance with the average of the indicator *Symmteric Mean Absolute Percent Error* SMAPE in all the values of the series. The SMAPE measures the absolute symmetric error in percentage between the real values of the original series Z and the forecasting F for all observations *t* of the test set of size *n* for each series *s* with SMAPE equal to:

$$\frac{1}{n}\sum\_{t=1}^{n}\frac{|z\_t - f\_t|}{(z\_t + f\_t)/2} \* 100\tag{8}$$

and finally it is averaged over all examples in the same set of data. Other measures of forecasting accuracy of a model can be found in (Hyndman & Koehler, 2006).

This indicator can evaluate the performance of applying different methodologies on the same set of data and the methodology that produces the lowest value is considered the best. In the set NN3-Complete the best result was of 14.84% and applying the algorithms developed in this work was of 16.31%. In the NN3-Reduced the results were 13.07% and 15.00% respectively. However, it is possible to build linear models with the methodology presented in this work to improve these results because:


Fig. 5. Example 102.

Fig. 6. Example 103.

Fig. 7. Example 104.

Fig. 8. Example 105.

In the first part of this section is presented, as an example, the Fig. 10 of the error obtained with our methodology for a certain series for a particular series that for its behavior it can be concluded that is not a white noise. Note that when are realized tests of white noise to the errors obtained with this methodology it was not observed that this was a white noise.

Self Adaptive Genetic Algorithms for Automated Linear Modelling of Time Series 225

To build the NN3 competition models were conducted several activities. First it was worked the NN3-Reduced problems where, with the two algorithms developed, were realized 50 runs of every algorithm in each example looking for linear models with 4 terms. Table 1 presents the results of linear expressions and calculation of RSS.

After reviewing the behavior of the 50 solutions of these examples it was concluded that five runs were enough to obtain satisfactory results. For this reason only five runs were realized for the examples of the NN3-Complete using each algorithm and it was chosen the best of these. The results of the NN3-Complete examples are not presented.


Table 1. Linear models for the NN3-REDUCED.

#### **5.1 NN3 graphs**

In this section are showed some of the graphs of the series obtained with the best result of some heuristic algorithms here presented. The values correspondent to the last 18 points on the graph are the result of the forecasting obtained on having evaluated the expressions of the linear models that appear in Table 1.

Fig. 4. Example 101.

#### **6. ARMA models for time series**

In this section the methodology already developed is applied to obtain AR components of the error series obtained by subtracting from the original series the values that are assigned by the AR model. With this is obtained a new model by adding these two components, thus it is obtained the equivalent in our methodology of the traditional ARMA models.

Fig. 5. Example 102.

12 Will-be-set-by-IN-TECH

To build the NN3 competition models were conducted several activities. First it was worked the NN3-Reduced problems where, with the two algorithms developed, were realized 50 runs of every algorithm in each example looking for linear models with 4 terms. Table 1 presents

After reviewing the behavior of the 50 solutions of these examples it was concluded that five runs were enough to obtain satisfactory results. For this reason only five runs were realized for the examples of the NN3-Complete using each algorithm and it was chosen the best of

Problem Linear Model RSS *Ft* = 1269.3358 + 0.3467*Zt*−<sup>1</sup> + 0.6978*Zt*−<sup>12</sup> − 0.2921*Zt*−<sup>13</sup> 1713.55 *Ft* = 1.9987 + 0.9218*Zt*−<sup>1</sup> + 0.9574*Zt*−<sup>12</sup> − 0.8792*Zt*−<sup>13</sup> 5440.262 *Ft* = 1.9989 + 0.9218*Zt*−<sup>1</sup> + 0.8124*Zt*−<sup>12</sup> − 0.3734*Zt*−<sup>13</sup> 80019.738 *Ft* = 9.113 + 0.7252*Zt*−<sup>1</sup> + 0.8316*Zt*−<sup>12</sup> − 0.5592*Zt*−<sup>13</sup> 7321.538 *Ft* = 1.998 + 0.9099*Zt*−<sup>1</sup> + 0.3104*Zt*−<sup>11</sup> − 0.2225*Zt*−<sup>13</sup> 1513.984 *Ft* = 2821.9541 + 0.2673*Zt*−<sup>2</sup> − 0.1699*Zt*−<sup>7</sup> + 0.3422*Zt*−<sup>12</sup> 4464.87 *Ft* = 0.9978 + 0.7937*Zt*−<sup>1</sup> + 0.3152*Zt*−<sup>12</sup> − 0.1125*Zt*−<sup>13</sup> 1387.011 *Ft* = 2000.5819 + 0.2885*Zt*−<sup>2</sup> − 0.1456*Zt*−<sup>4</sup> + 0.2379*Zt*−<sup>5</sup> 10417.433 *Ft* = 1.9988 + 0.9951*Zt*−<sup>1</sup> 2297.306 *Ft* = 1863.0699 + 0.2520*Zt*−<sup>1</sup> − 0.1058*Zt*−<sup>5</sup> + 0.2359*Zt*−<sup>11</sup> 18593.279 *Ft* = 474.1106 + 0.2420*Zt*−<sup>11</sup> − 0.3319*Zt*−<sup>12</sup> + 0.2688*Zt*−<sup>13</sup> 7248.281

In this section are showed some of the graphs of the series obtained with the best result of some heuristic algorithms here presented. The values correspondent to the last 18 points on the graph are the result of the forecasting obtained on having evaluated the expressions of the

In this section the methodology already developed is applied to obtain AR components of the error series obtained by subtracting from the original series the values that are assigned by the AR model. With this is obtained a new model by adding these two components, thus it is

obtained the equivalent in our methodology of the traditional ARMA models.

the results of linear expressions and calculation of RSS.

Table 1. Linear models for the NN3-REDUCED.

linear models that appear in Table 1.

**6. ARMA models for time series**

**5.1 NN3 graphs**

Fig. 4. Example 101.

these. The results of the NN3-Complete examples are not presented.

Fig. 6. Example 103.

Fig. 7. Example 104.

Fig. 8. Example 105.

In the first part of this section is presented, as an example, the Fig. 10 of the error obtained with our methodology for a certain series for a particular series that for its behavior it can be concluded that is not a white noise. Note that when are realized tests of white noise to the errors obtained with this methodology it was not observed that this was a white noise.

AR model. In both procedures the most important stage is to define how many terms are

Self Adaptive Genetic Algorithms for Automated Linear Modelling of Time Series 227

From know on the ARMA notation for a series changes, for this it will be indicated to which part of the expression of AR or MA corresponds, and the constants *φ<sup>i</sup>* and *γ<sup>j</sup>* will represent the terms of the corresponding expression, in other words the terms *Ft*−*<sup>i</sup>* and *at*−*<sup>j</sup>* it will not

Analyzing the graphs of the built models with this methodology for the examples of the NN3-complete it was detected a phenomenon that visually appears as if the graph of the model were almost the same that the original series, but with a displacement of one unit to the right. This phenomenon was observed in the NN3-Complete in 20 examples: 51, 64, 66,

Given that the first 50 examples of the competition corresponded to series of 50 values (apparently built by experts) and the last 61 examples were series of 150 terms (seemingly of real phenomenon) it was supposed that the 34% of the real examples of the NN3 present this behavior. From this information we can assume that this phenomenon appears in a large percentage of the models built with this metodology and, for this reason the model built with this methodology will give better results when applying to these series. Following is showed in Fig. 11 an example of this phenomenon corresponding to the AR model of the example 74

This phenomenon was called in this work as *forecasting delay* (FD), since is equivalent to

The FD phenomenon can be used by modifying the graph of the linear models obtained by applying a displacement of one unit to the left of its graph. This procedure was defined as

forecast in a certain moment what happen in the previous moment.

**8. The procedure of advancement of forecasting**

*advancement of forecasting* (AF) and it is formalized next.

required for each model.

**7. The forcasting delay phenomenon**

obtained with the methodology of this work.

Fig. 11. Example 74 of the NN3-Complete.

74, 80, 82, 83, 84, 85, 86, 88, 89, 90, 91, 92, 95, 100, 105, 107 and 109.

be written.

Fig. 9. Example 106.

Therefore it can be built AR models for these error series, which will have the capability to adequately model the error, which allows, when considering these two models, to obtain a bigger forecasting capability.

#### **6.1 Building of the ARMA models**

The most general models used in this work are the Autoregressive Moving Averages ARMA (p, q) that contain the presence of autoregressive components in the observable variable *Zt* and in the error *at*, where:

$$a\_t = Z\_t - (\delta + \sum\_{i=1}^p \phi\_i Z\_{t-i})^2$$

and

$$F\_t = \delta + \sum\_{i=1}^p \phi\_i Z\_{t-i} + \sum\_{j=1}^q \gamma\_j a\_{t-j}$$

Once the AR model is obtained for a series it can be built an ARMA model from the acquiring other AR model for the series obtained when considering the *at* errors between the original series and its AR model. When is added to the AR model an additional component that considers the autoregressive terms corresponding to the error is obtained the complete ARMA model. Figure. 10 shows an example of the error for the series.

Fig. 10. Example of a TS corresponding to the error.

The procedure to build the ARMA models is realized in two stages. First is built an AR model for the original series, afterwards it is considered the error series *at* to which it is found other AR model. In both procedures the most important stage is to define how many terms are required for each model.

From know on the ARMA notation for a series changes, for this it will be indicated to which part of the expression of AR or MA corresponds, and the constants *φ<sup>i</sup>* and *γ<sup>j</sup>* will represent the terms of the corresponding expression, in other words the terms *Ft*−*<sup>i</sup>* and *at*−*<sup>j</sup>* it will not be written.
