Time Series and Artificial Neural Networks

Chapter 1

Abstract

Patterns

and Nelson Obregón

evolutionary computation

1. Introduction

3

Time Series from Clustering:

An Approach to Forecast Crime

Miguel Melgarejo, Cristian Rodriguez, Diego Mayorga

This chapter presents an approach to forecast criminal patterns that combines the time series from clustering method with a computational intelligence-based prediction. In this approach, clusters of criminal events are parametrized according to simple geometric prototypes. Cluster dynamics are captured as a set of time series. The size of this set corresponds to the number of clusters multiplied by the number of parameters per cluster. One of the main drawbacks of clustering is the difficulty of defining the optimal number of clusters. The paper also deals with this problem by introducing a validation index of dynamic partitions of crime events that relates the optimal number of clusters with the foreseeability of time series by means of non-linear analysis. The method as well as the validation index was tested over two cases of reported urban crime. Our results showed that crime clusters can be predicted by forecasting their representative time series using an evolutionary adaptive neural fuzzy inference system. Thus, we argue that the foreseeability of these series can be anticipated satisfactorily by means of the proposed index.

Keywords: crime, crime pattern theory, Fuzzy clustering, neuro-fuzzy systems,

A crime is an event that emerges from opportunities configured by the interac-

Environmental criminology recognizes that crime is not uniformly distributed over space, time, or society [5]. Finding rules that explain the non-randomness of crime dynamics has become an intense research area. Typically, stochastic process analysis has been used in the study of crime dynamics [6–8]. However, other epistemological approaches, like fuzzy set theory [9], non-classical topology [10], and complex network theory [11] have been also applied to describe this phenomenon.

tion of offenders, victims, and the surrounding environment [1]. The process behind a crime event has a decisional nature which evaluates the benefits and risks for the offender [2]. Because of the social value in understanding and preventing crime, it has been studied from different perspectives. It has been noted that crime is a complex phenomenon [3] since criminal activity is connected to the complex

dimension of social systems and their actors [4].

#### Chapter 1

## Time Series from Clustering: An Approach to Forecast Crime Patterns

Miguel Melgarejo, Cristian Rodriguez, Diego Mayorga and Nelson Obregón

#### Abstract

This chapter presents an approach to forecast criminal patterns that combines the time series from clustering method with a computational intelligence-based prediction. In this approach, clusters of criminal events are parametrized according to simple geometric prototypes. Cluster dynamics are captured as a set of time series. The size of this set corresponds to the number of clusters multiplied by the number of parameters per cluster. One of the main drawbacks of clustering is the difficulty of defining the optimal number of clusters. The paper also deals with this problem by introducing a validation index of dynamic partitions of crime events that relates the optimal number of clusters with the foreseeability of time series by means of non-linear analysis. The method as well as the validation index was tested over two cases of reported urban crime. Our results showed that crime clusters can be predicted by forecasting their representative time series using an evolutionary adaptive neural fuzzy inference system. Thus, we argue that the foreseeability of these series can be anticipated satisfactorily by means of the proposed index.

Keywords: crime, crime pattern theory, Fuzzy clustering, neuro-fuzzy systems, evolutionary computation

#### 1. Introduction

A crime is an event that emerges from opportunities configured by the interaction of offenders, victims, and the surrounding environment [1]. The process behind a crime event has a decisional nature which evaluates the benefits and risks for the offender [2]. Because of the social value in understanding and preventing crime, it has been studied from different perspectives. It has been noted that crime is a complex phenomenon [3] since criminal activity is connected to the complex dimension of social systems and their actors [4].

Environmental criminology recognizes that crime is not uniformly distributed over space, time, or society [5]. Finding rules that explain the non-randomness of crime dynamics has become an intense research area. Typically, stochastic process analysis has been used in the study of crime dynamics [6–8]. However, other epistemological approaches, like fuzzy set theory [9], non-classical topology [10], and complex network theory [11] have been also applied to describe this phenomenon.

Crime pattern theory points out that crime forms patterns in space, time, and society [5, 10]. A pattern is the interconnection between objects, rules, or processes. This interconnection can be either physical or conceptual. This theory deals with the problem of forecasting when and where a criminal event will occur. The observation of patterns can come from evidence or theoretical considerations. Therefore, the analysis of criminal patterns can be described in terms of agents, rules, or clusters of events taking as a reference the structure of the urban form [12].

partition quality index, and describes the forecasting approach. Section 3 presents our results for the two study cases. Section 4 discusses our main findings and

Time Series from Clustering: An Approach to Forecast Crime Patterns

The overall TSC method is depicted in Figure 1. Reported spatial events are organized and filtered in frames according to a time scale (i.e., daily, weekly, etc.). A clustering algorithm is applied over these data frames in order to detect and track spatial patterns. These clusters are synthesized in terms of relevant spatial variables computed from each data frame, resulting in several time series. This process is repeated until the maximum number of clusters is reached. Non-linear signal analysis is used to compute a foreseeability index for all time series. The optimal number of clusters is selected as the one that minimizes this index. Selected time series are characterized in order to perform their forecasting. These stages are

Reported criminal events are grouped in time frames. These events must have a specified date, and longitude and latitude of occurrence to be organized. Time can be defined daily, weekly, monthly, etc., depending on the number of events in the database. Time frames can be generated independently (i.e., without sharing any

Following data organization, clustering of criminal events in a given number of clusters is performed. The result is a set of time series that represents the spatiotemporal dynamics of crime. The number of time series is equivalent to the predefined number of clusters C multiplied by the number of parameters per cluster. It is important to ensure that these resulting time series have spatial and temporal structure. The clustering reorganization algorithm (CRA) was proposed in [15]. This algorithm ensures that the order of the identified clusters remains the

The time series from clustering method uses a clustering reorganization algorithm over reported data to produce clusters of crime events that evolve in time and space. The evolution is represented by means of time series of the clusters' parameters. A designed index relates the number of clusters with the foreseeability of time series by

events) or with some amount of overlapping (i.e., sharing some events).

presents some conclusions.

DOI: http://dx.doi.org/10.5772/intechopen.89561

described in detail as follows.

2.2 Clustering reorganization algorithm

2.1 Data organization

Figure 1.

5

using non-linear analysis.

2. Method

Crime is ubiquitous in modern cities [13]. It has been observed that there are urban areas with high crime concentration [8]. Therefore, the term space-time dynamics of crime indicates how crime patterns evolve in time and space. A recent work [11] approaches the analysis of crime dynamics by suggesting that robberies are geographically correlated with urban form. A previous work addressed a similar perspective by means of fuzzy topology [10]. In this case, it was observed that crime patterns correlate to the fuzzy edges of neighborhoods, disperse into vulnerable neighborhoods, and concentrate on some main roads.

Some techniques for non-hierarchical clustering have been employed to detect spatial patterns of criminal activity [14], including basic fuzzy clustering (i.e., Fuzzy C-means algorithm) [9]. However, the problems of criminal directionality and crime dynamics using this perspective have been recently addressed in a work by Mayorga et al. [15]. Fuzzy clustering algorithms for spatio-temporal data have been introduced recently. However, these algorithms are mainly focused on clustering of time series (CTS). An enhanced version of the Fuzzy C-means algorithm was proposed to consider both spatial and temporal components of data [16]. This algorithm deals with the clustering of time series produced by spatial sources (i.e., sensors, monitoring stations, etc.). The method is well adjusted to cluster time series that come from a structured sampling of spatial variables. However, when analyzing criminal events, time series provided by sensors are not available, only discrete points of criminal activity in space and time. Thus, the algorithm is not suitable for analyzing this kind of data. Another spatio-temporal fuzzy clustering algorithm was reported in a study by Ji et al. [17]. Clusters are assigned over time series by introducing a switching function that establishes the correspondence between a section of a time series and a cluster in the partition. This algorithm also assumes that time series are available, which is not how spatio-temporal criminal data are collected and studied.

This work deals with clustering the dynamics of criminal events and to forecast it. It contributes by introducing a method that: (1) uses a clustering reorganization algorithm that tracks the dynamics of crime clusters by producing time series of their geometric parameters, (2) sets the optimal number of clusters by minimizing a fuzzy partition index that quantifies the predictability of time series, and (3) forecasts the time series by means of evolutionary-fuzzy predictors. Conceptually, the main contribution of this work is focused on strengthening the time series from clustering (TSC) method in forecasting crime patterns and connecting the quality of a dynamic partition of crime events with concepts of non-linear analysis.

The method is applied over two independent study cases: (1) house burglary in San Francisco, USA and (2) cellular phone robbery in Bogota DC, Colombia. A comparison between the best and worst situations predicted by the introduced index is also provided in each case, giving a preliminary validation. Results reveal that our approach is promising in terms of prediction capabilities, which motivates its application to forecast spatial crime patterns over fine temporal scales. Hence, this method may be considered as a working tool in the practice of predictive policing [18].

The paper is organized as follows: Section 2 describes our method in detail, reviews our clustering organization algorithm, introduces the dynamic fuzzy

partition quality index, and describes the forecasting approach. Section 3 presents our results for the two study cases. Section 4 discusses our main findings and presents some conclusions.

### 2. Method

Crime pattern theory points out that crime forms patterns in space, time, and society [5, 10]. A pattern is the interconnection between objects, rules, or processes. This interconnection can be either physical or conceptual. This theory deals with the problem of forecasting when and where a criminal event will occur. The observation of patterns can come from evidence or theoretical considerations. Therefore, the analysis of criminal patterns can be described in terms of agents, rules, or clusters of events taking as a reference the structure of the urban form [12].

Recent Trends in Artificial Neural Networks - From Training to Prediction

Crime is ubiquitous in modern cities [13]. It has been observed that there are urban areas with high crime concentration [8]. Therefore, the term space-time dynamics of crime indicates how crime patterns evolve in time and space. A recent work [11] approaches the analysis of crime dynamics by suggesting that robberies are geographically correlated with urban form. A previous work addressed a similar perspective by means of fuzzy topology [10]. In this case, it was observed that crime patterns correlate to the fuzzy edges of neighborhoods, disperse into vulner-

Some techniques for non-hierarchical clustering have been employed to detect spatial patterns of criminal activity [14], including basic fuzzy clustering (i.e., Fuzzy C-means algorithm) [9]. However, the problems of criminal directionality and crime dynamics using this perspective have been recently addressed in a work by Mayorga et al. [15]. Fuzzy clustering algorithms for spatio-temporal data have been introduced recently. However, these algorithms are mainly focused on clustering of time series (CTS). An enhanced version of the Fuzzy C-means algorithm was proposed to consider both spatial and temporal components of data [16]. This algorithm deals with the clustering of time series produced by spatial sources (i.e., sensors, monitoring stations, etc.). The method is well adjusted to cluster time series that come from a structured sampling of spatial variables. However, when analyzing criminal events, time series provided by sensors are not available, only discrete points of criminal activity in space and time. Thus, the algorithm is not suitable for analyzing this kind of data. Another spatio-temporal fuzzy clustering algorithm was

reported in a study by Ji et al. [17]. Clusters are assigned over time series by introducing a switching function that establishes the correspondence between a section of a time series and a cluster in the partition. This algorithm also assumes that time series are available, which is not how spatio-temporal criminal data are

a dynamic partition of crime events with concepts of non-linear analysis.

This work deals with clustering the dynamics of criminal events and to forecast it. It contributes by introducing a method that: (1) uses a clustering reorganization algorithm that tracks the dynamics of crime clusters by producing time series of their geometric parameters, (2) sets the optimal number of clusters by minimizing a fuzzy partition index that quantifies the predictability of time series, and (3) forecasts the time series by means of evolutionary-fuzzy predictors. Conceptually, the main contribution of this work is focused on strengthening the time series from clustering (TSC) method in forecasting crime patterns and connecting the quality of

The method is applied over two independent study cases: (1) house burglary in San Francisco, USA and (2) cellular phone robbery in Bogota DC, Colombia. A comparison between the best and worst situations predicted by the introduced index is also provided in each case, giving a preliminary validation. Results reveal that our approach is promising in terms of prediction capabilities, which motivates its application to forecast spatial crime patterns over fine temporal scales. Hence, this method may be considered as a working tool in the practice of predictive

The paper is organized as follows: Section 2 describes our method in detail, reviews our clustering organization algorithm, introduces the dynamic fuzzy

able neighborhoods, and concentrate on some main roads.

collected and studied.

policing [18].

4

The overall TSC method is depicted in Figure 1. Reported spatial events are organized and filtered in frames according to a time scale (i.e., daily, weekly, etc.). A clustering algorithm is applied over these data frames in order to detect and track spatial patterns. These clusters are synthesized in terms of relevant spatial variables computed from each data frame, resulting in several time series. This process is repeated until the maximum number of clusters is reached. Non-linear signal analysis is used to compute a foreseeability index for all time series. The optimal number of clusters is selected as the one that minimizes this index. Selected time series are characterized in order to perform their forecasting. These stages are described in detail as follows.

#### 2.1 Data organization

Reported criminal events are grouped in time frames. These events must have a specified date, and longitude and latitude of occurrence to be organized. Time can be defined daily, weekly, monthly, etc., depending on the number of events in the database. Time frames can be generated independently (i.e., without sharing any events) or with some amount of overlapping (i.e., sharing some events).

#### 2.2 Clustering reorganization algorithm

Following data organization, clustering of criminal events in a given number of clusters is performed. The result is a set of time series that represents the spatiotemporal dynamics of crime. The number of time series is equivalent to the predefined number of clusters C multiplied by the number of parameters per cluster. It is important to ensure that these resulting time series have spatial and temporal structure. The clustering reorganization algorithm (CRA) was proposed in [15]. This algorithm ensures that the order of the identified clusters remains the

#### Figure 1.

The time series from clustering method uses a clustering reorganization algorithm over reported data to produce clusters of crime events that evolve in time and space. The evolution is represented by means of time series of the clusters' parameters. A designed index relates the number of clusters with the foreseeability of time series by using non-linear analysis.

same throughout the time. In CRA, the clusters are identified by the Fuzzy C-Means algorithm (FCM) [19]. The outline of clusters can also be determined by several other clustering algorithms, such as the Gustafson-Kessel [20], among others. The identification of the clustering algorithm is related to the possible prototypes that can be found in the data frames. These prototypes are related to the urban form where the events take place [11].

whole time window is processed there will be a set of time series that represents the spatio-temporal dynamics of the groups that have been identified by the clustering

An example of three time series that surrogate the dynamics of a cluster is depicted in Figure 2. The spatial dynamics of a cluster is presented as a circle that moves in different frames and whose radius also changes. The dynamics of this cluster is summarized by three time series: X tð Þ displacement of the cluster centroid in dimension x, Y tð Þ displacement of the cluster centroid in dimension y, and R tð Þ radius of the cluster measured as the Euclidean distance from its centroid to the

The memory, information, and geometry (MIG) index is proposed in this work as a scalar that quantifies the predictability of a time series obtained from the CRA.

<sup>q</sup> <sup>¼</sup> hI <sup>G</sup>

The predictability of a time series becomes more plausible as the MIG index is minimized. M evaluates the amount of non-linear correlation inside the time series (i.e., Memory). I is an indicator of how much information is produced by the signal. G accounts for the size of the phase space in which the dynamics can be embedded

In practical terms, the MIG index for a time series S tð Þ can be evaluated from

<sup>λ</sup> D

qS ¼ qSt ð Þ¼ ð Þ e

Example of time series that describe a cluster dynamics. Parameters of a circular cluster tracked by the CRA evolve in time producing three time series that correspond to: Horizontal displacement X tð Þ, vertical

where λ corresponds to the estimated Largest Lyapunov Exponent (LLE) of S tð Þ, D is the size of the embedding space of the series obtained from the estimation of

statistics grounded in non-linear signal processing and chaos theory as:

<sup>M</sup> (2)

<sup>τ</sup> (3)

farthest criminal event that belongs to the cluster.

Time Series from Clustering: An Approach to Forecast Crime Patterns

DOI: http://dx.doi.org/10.5772/intechopen.89561

2.3 Non-linear signal processing and optimal number of clusters

algorithm.

2.3.1 The MIG index

(i.e., Geometry).

Figure 2.

7

displacement Y tð Þ, and radius of the cluster R tð Þ.

The index is constructed as follows:

The FCM and other algorithms alike initialize their parameters randomly. In this case, the centers of the clusters, their order and the membership values vary from one clustering experiment to another. Because of this random initialization, the clusters identified will not remain in the same zone, nor will the order of the fuzzy partitions created be preserved throughout time.

Therefore, if a study of C clusters is carried out, there must be certainty that the order of the partitions is maintained. It was considered for this case that the spatiotemporal trends in crime tend to be regular due to the normal population behavior. Thus, if a cluster is identified in a time frame, call it Cl, then in all future time frames clusters must be identified near that previously identified cluster. Euclidean distance is taken as a basis for the evaluation between the centers of the clusters of the different time frames (Time frame 0 and Time frame k). The CRA obtains a time series from clustering in five steps. These steps are: Initialization, Iteration, Distance Evaluation, Discarding non-minima, and Assignment.

First, in the initialization stage, the CRA requires the number of clusters to be set, the k-th time frame in which the dataset is in, as well as, the center guide matrix. The center guide determines the order of the clusters in the future time frames. As a second step, the CRA ensures that throughout all time frames, the number of clusters is maintained. In this step, the algorithm verifies that for each reorganization process carried out, each center identified in the time frame k is assigned to a different cluster in the center guide (time frame 0). If the number of clusters is not maintained, this stage is responsible for executing the FCM algorithm once more to re-evaluate the organization of the clusters.

Third, in the distance evaluation stage, the process to measure the Euclidean distance dð Þ<sup>k</sup> ij between the center of clusters identified in time frames t<sup>0</sup> and tk, where <sup>i</sup> and <sup>j</sup> represent the cluster order of the <sup>k</sup>‐th frame and the first frame, respectively, takes place as follows:

$$d\_{ij}^{(k)} = \sum\_{j=1}^{C} \sqrt{\left(\varkappa\_j^{(0)} - \varkappa\_i^{(k)}\right)^2 + \left(\mathcal{Y}\_j^{(0)} - \mathcal{Y}\_i^{(k)}\right)^2}, \qquad i = 1, 2, \ldots, C. \tag{1}$$

where dð Þ<sup>k</sup> ij is the Euclidian distance between the center of the cluster j in time 0 and the cluster i in time k. Based on this distance evaluation, a criterion for the reorganization is established, where the clusters order is assigned according to the minimum distance of each cluster to the centers in the center guide matrix.

Even though clusters will tend to occupy the same zones, in certain cases there may be some clusters that are identified further away from their usual locations. To prevent the CRA confusing the cluster organization, the "discarding non-minima" stage is introduced. In this stage, it is assumed that the i-th center cri in time frame k to the closest j-th center crj in the time frame 0 is selected for the order in which the i-th cluster is organized.

Finally, the assignment stage takes place once the iteration condition is met. The centers and the membership values have already been assigned in the correct order, according to the identification of the clusters in the first time frame. This stage assigns the correct order to the variables of the centers of the clusters as well as to the membership value of each event. By doing so for each time frame, once the

Time Series from Clustering: An Approach to Forecast Crime Patterns DOI: http://dx.doi.org/10.5772/intechopen.89561

whole time window is processed there will be a set of time series that represents the spatio-temporal dynamics of the groups that have been identified by the clustering algorithm.

An example of three time series that surrogate the dynamics of a cluster is depicted in Figure 2. The spatial dynamics of a cluster is presented as a circle that moves in different frames and whose radius also changes. The dynamics of this cluster is summarized by three time series: X tð Þ displacement of the cluster centroid in dimension x, Y tð Þ displacement of the cluster centroid in dimension y, and R tð Þ radius of the cluster measured as the Euclidean distance from its centroid to the farthest criminal event that belongs to the cluster.

#### 2.3 Non-linear signal processing and optimal number of clusters

#### 2.3.1 The MIG index

same throughout the time. In CRA, the clusters are identified by the Fuzzy C-Means algorithm (FCM) [19]. The outline of clusters can also be determined by several other clustering algorithms, such as the Gustafson-Kessel [20], among others. The identification of the clustering algorithm is related to the possible prototypes that can be found in the data frames. These prototypes are related to the urban form

Recent Trends in Artificial Neural Networks - From Training to Prediction

The FCM and other algorithms alike initialize their parameters randomly. In this case, the centers of the clusters, their order and the membership values vary from one clustering experiment to another. Because of this random initialization, the clusters identified will not remain in the same zone, nor will the order of the fuzzy

Therefore, if a study of C clusters is carried out, there must be certainty that the order of the partitions is maintained. It was considered for this case that the spatiotemporal trends in crime tend to be regular due to the normal population behavior. Thus, if a cluster is identified in a time frame, call it Cl, then in all future time frames clusters must be identified near that previously identified cluster. Euclidean distance is taken as a basis for the evaluation between the centers of the clusters of the different time frames (Time frame 0 and Time frame k). The CRA obtains a time series from clustering in five steps. These steps are: Initialization, Iteration,

First, in the initialization stage, the CRA requires the number of clusters to be set, the k-th time frame in which the dataset is in, as well as, the center guide matrix. The center guide determines the order of the clusters in the future time frames. As a second step, the CRA ensures that throughout all time frames, the number of clusters is maintained. In this step, the algorithm verifies that for each reorganization process carried out, each center identified in the time frame k is assigned to a different cluster in the center guide (time frame 0). If the number of clusters is not maintained, this stage is responsible for executing the FCM algorithm

Third, in the distance evaluation stage, the process to measure the Euclidean

where <sup>i</sup> and <sup>j</sup> represent the cluster order of the <sup>k</sup>‐th frame and the first frame,

and the cluster i in time k. Based on this distance evaluation, a criterion for the reorganization is established, where the clusters order is assigned according to the minimum distance of each cluster to the centers in the center guide matrix.

Even though clusters will tend to occupy the same zones, in certain cases there may be some clusters that are identified further away from their usual locations. To prevent the CRA confusing the cluster organization, the "discarding non-minima" stage is introduced. In this stage, it is assumed that the i-th center cri in time frame k to the closest j-th center crj in the time frame 0 is selected for the order in which the

Finally, the assignment stage takes place once the iteration condition is met. The centers and the membership values have already been assigned in the correct order, according to the identification of the clusters in the first time frame. This stage assigns the correct order to the variables of the centers of the clusters as well as to the membership value of each event. By doing so for each time frame, once the

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

þ y ð Þ 0 <sup>j</sup> � y ð Þk i � �<sup>2</sup>

ij between the center of clusters identified in time frames t<sup>0</sup> and tk,

ij is the Euclidian distance between the center of the cluster j in time 0

, i ¼ 1, 2, … ,C: (1)

where the events take place [11].

partitions created be preserved throughout time.

Distance Evaluation, Discarding non-minima, and Assignment.

once more to re-evaluate the organization of the clusters.

distance dð Þ<sup>k</sup>

dð Þ<sup>k</sup> ij <sup>¼</sup> <sup>X</sup> C

where dð Þ<sup>k</sup>

i-th cluster is organized.

6

respectively, takes place as follows:

r

xð Þ <sup>0</sup> <sup>j</sup> � <sup>x</sup>ð Þ<sup>k</sup> i � �<sup>2</sup>

j¼1

The memory, information, and geometry (MIG) index is proposed in this work as a scalar that quantifies the predictability of a time series obtained from the CRA. The index is constructed as follows:

$$q = h^I \frac{G}{M} \tag{2}$$

The predictability of a time series becomes more plausible as the MIG index is minimized. M evaluates the amount of non-linear correlation inside the time series (i.e., Memory). I is an indicator of how much information is produced by the signal. G accounts for the size of the phase space in which the dynamics can be embedded (i.e., Geometry).

In practical terms, the MIG index for a time series S tð Þ can be evaluated from statistics grounded in non-linear signal processing and chaos theory as:

$$q\_S = q(\mathcal{S}(t)) = e^{\lambda} \frac{D}{\tau} \tag{3}$$

where λ corresponds to the estimated Largest Lyapunov Exponent (LLE) of S tð Þ, D is the size of the embedding space of the series obtained from the estimation of

#### Figure 2.

Example of time series that describe a cluster dynamics. Parameters of a circular cluster tracked by the CRA evolve in time producing three time series that correspond to: Horizontal displacement X tð Þ, vertical displacement Y tð Þ, and radius of the cluster R tð Þ.

the False Nearest Neighbors in the attractor dynamics of the signal and τ is the correlation lag located in the first minimum of the mutual average information of S tð Þ. A detailed explanation about the computation of these quantities can be found in [21], while a practical estimation of LLEs is proposed in [22].

#### 2.3.2 Optimal number of clusters

Dynamics of the <sup>i</sup>‐th cluster is represented by three signals (i.e., Xið Þ<sup>t</sup> , Yið Þ<sup>t</sup> , and Rið Þt ), then the MIG index qC for the dynamics of a partition with C clusters is computed as follows:

$$q\_C = \frac{1}{\mathcal{C}} \sum\_{i=1}^{\mathcal{C}} q\_{X\_i} + \frac{1}{\mathcal{C}} \sum\_{i=1}^{\mathcal{C}} q\_{Y\_i} + \frac{1}{\mathcal{C}} \sum\_{i=1}^{\mathcal{C}} q\_{R\_i} \tag{4}$$

<sup>E</sup> ¼ �<sup>X</sup> L

Time Series from Clustering: An Approach to Forecast Crime Patterns

DOI: http://dx.doi.org/10.5772/intechopen.89561

ch <sup>¼</sup> <sup>1</sup> L � 1

this way, the analysis is done for each matrix separately:

where C represents the number of clusters.

processes [24].

below 0:2c<sup>0</sup> for h>0.

2.4.2 Series analysis

normalized.

9

vector vS for each MS:

vector (Eq. (10)) is obtained per signal:

i¼1

The autocorrelation function is shown in Eq. (9). The proposed measure represents a confidence bound of the number of regressors with a linear dependence on the signal. It establishes the grade of possible foreseeability in the basis of its periodical behavior as long as forecasting crime is given in terms of stochastic

> rh <sup>¼</sup> ch c0

S the average of the signal. The number of lags Lg is obtained when ch drops

The chosen statistics are applied over time series whereby a four-dimensional

where S is the class identification of the corresponding series: X—displacement of the cluster in dimension x, Y—displacement of the cluster in dimension y, and R—radius of the cluster. These vectors are arranged in matrices as in Eq. (11). In

> vS1 ⋮ vSC

1

0

B@

• Normalization: the aim is to find the most representative signals (i.e., X, Y, R). The statistics are considered as a way to describe each series, but most of them have a dependence of the scale avoiding a comparison in value. Consequently, a difference of one order of magnitude could be significant in the lags of autocorrelation, but insignificant in the mean. Therefore, normalization is a way to establish a common reference. In this case, the normalization interval is 0, 1 ð � avoiding the value 0. Each MS matrix is independently

• Finding the average vectors: an average vector is built from the normalized matrices. As per the last step, this process is independently applied. It is important to emphasize that in a determined clustering, there is an average

> i¼1 vsi !

vS <sup>¼</sup> <sup>1</sup> C X C

MS ¼

Sk � <sup>S</sup> � � Skþ<sup>h</sup> � <sup>S</sup> � �

X L�h

k¼1

p Sð Þ ∗ log ð Þ p Sð Þ (8)

vS ¼ ½ � μ, σ, E, Lg (10)

CA (11)

(9)

(12)

The optimal number of clusters Copt is obtained by minimizing qC:

$$\mathbf{C}\_{opt} = \arg\left(\min\left(q\_{C\_j}\right)\right) \qquad \mathbf{C}\_j = \mathbf{2}, \mathbf{3}, \dots, \mathbf{C}\_{\max} \tag{5}$$

where Cmax is the largest number of clusters considered in the optimization process.

The minimum MIG index q opt <sup>C</sup> suggests that in average the time series produced when Cj ¼ Copt is more predictable than in any other case. Thus the dynamics should be studied over 3 ∗Copt signals.

#### 2.4 Time series characterization

MIG index is proposed as an a-priori estimation of the predictability of a time series produced by the CRA. The signals that belong to the most and least predictable scenarios should be forecasted so the MIG validity can be corroborated, however it is computationally expensive. Instead, these signals are characterized through several statistics (independent from the MIG index) in order to select some as the most representative signals for each scenario.

#### 2.4.1 Characterization

Characterization is performed through a selected group of statistics: the first and second estimated statistical moments of the distribution p of the series S tð Þ, the Shannon entropy and the number of lags beyond which the autocorrelation of S tð Þ is effectively zero. The estimators [23] of the two first statistical moments are computed as follows:

$$\mu = \frac{1}{L} \sum\_{k=1}^{L} \mathbf{S}\_k \tag{6}$$

$$\sigma = \frac{1}{L - 1} \sum\_{k=1}^{L} |\mathbf{S}\_k - \mu|^2 \tag{7}$$

where Sk is the discrete representation of S tð Þ and L is the number of samples. The third measure, the entropy described in (8), is related to the information uncertainty inside the series. A high entropy value is interpreted as a lowpredictability variable.

Time Series from Clustering: An Approach to Forecast Crime Patterns DOI: http://dx.doi.org/10.5772/intechopen.89561

$$E = -\sum\_{i=1}^{L} p(\mathcal{S}) \* \log \left( p(\mathcal{S}) \right) \tag{8}$$

The autocorrelation function is shown in Eq. (9). The proposed measure represents a confidence bound of the number of regressors with a linear dependence on the signal. It establishes the grade of possible foreseeability in the basis of its periodical behavior as long as forecasting crime is given in terms of stochastic processes [24].

$$r\_h = \frac{c\_h}{c\_0} \tag{9}$$

$$c\_h = \frac{1}{L-1} \sum\_{k=1}^{L-h} \left(\mathbb{S}\_k - \overline{\mathbb{S}}\right) \left(\mathbb{S}\_{k+h} - \overline{\mathbb{S}}\right)$$

S the average of the signal. The number of lags Lg is obtained when ch drops below 0:2c<sup>0</sup> for h>0.

#### 2.4.2 Series analysis

the False Nearest Neighbors in the attractor dynamics of the signal and τ is the correlation lag located in the first minimum of the mutual average information of S tð Þ. A detailed explanation about the computation of these quantities can be found

Dynamics of the <sup>i</sup>‐th cluster is represented by three signals (i.e., Xið Þ<sup>t</sup> , Yið Þ<sup>t</sup> , and

i¼1

where Cmax is the largest number of clusters considered in the optimization

MIG index is proposed as an a-priori estimation of the predictability of a time series produced by the CRA. The signals that belong to the most and least predictable scenarios should be forecasted so the MIG validity can be corroborated, however it is computationally expensive. Instead, these signals are characterized through several statistics (independent from the MIG index) in order to select some

Characterization is performed through a selected group of statistics: the first and

k¼1

j j Sk � μ

X L

k¼1

where Sk is the discrete representation of S tð Þ and L is the number of samples. The third measure, the entropy described in (8), is related to the information

second estimated statistical moments of the distribution p of the series S tð Þ, the Shannon entropy and the number of lags beyond which the autocorrelation of S tð Þ is effectively zero. The estimators [23] of the two first statistical moments are com-

> <sup>μ</sup> <sup>¼</sup> <sup>1</sup> L X L

<sup>σ</sup> <sup>¼</sup> <sup>1</sup> L � 1

uncertainty inside the series. A high entropy value is interpreted as a low-

when Cj ¼ Copt is more predictable than in any other case. Thus the dynamics

qYi þ 1 C X C

i¼1

<sup>C</sup> suggests that in average the time series produced

qRi (4)

Cj ¼ 2, 3, … ,Cmax (5)

Sk (6)

<sup>2</sup> (7)

Rið Þt ), then the MIG index qC for the dynamics of a partition with C clusters is

qXi þ 1 C X C

The optimal number of clusters Copt is obtained by minimizing qC:

� � � �

in [21], while a practical estimation of LLEs is proposed in [22].

Recent Trends in Artificial Neural Networks - From Training to Prediction

i¼1

opt

qC <sup>¼</sup> <sup>1</sup> C X C

Copt ¼ arg min qC <sup>j</sup>

as the most representative signals for each scenario.

2.3.2 Optimal number of clusters

The minimum MIG index q

2.4 Time series characterization

2.4.1 Characterization

puted as follows:

predictability variable.

8

should be studied over 3 ∗Copt signals.

computed as follows:

process.

The chosen statistics are applied over time series whereby a four-dimensional vector (Eq. (10)) is obtained per signal:

$$\nu\_{\rm S} = [\mu, \sigma, E, \mathbf{Lg}] \tag{10}$$

where S is the class identification of the corresponding series: X—displacement of the cluster in dimension x, Y—displacement of the cluster in dimension y, and R—radius of the cluster. These vectors are arranged in matrices as in Eq. (11). In this way, the analysis is done for each matrix separately:

$$M\_{\mathbb{S}} = \begin{pmatrix} \upsilon\_{\mathbb{S}\_1} \\ \vdots \\ \upsilon\_{\mathbb{S}\_{\mathbb{C}}} \end{pmatrix} \tag{11}$$

where C represents the number of clusters.


$$
\overline{v}\_{\mathcal{S}} = \left(\frac{1}{\mathcal{C}} \sum\_{i=1}^{\mathcal{C}} v\_{t\_i}\right) \tag{12}
$$

• Determining distances: the distance between each row of the matrices MS and the corresponding average vector is computed. Thereby, the Euclidean distance is calculated for all vectors vSi as follows:

$$d\_{\overline{\nu}\_{\mathbb{S}^{p\_{S\_i}}}} = \left( (\overline{\nu}\_{\mathbb{S}} - \nu\_{\mathbb{S}\_i})(\overline{\nu}\_{\mathbb{S}} - \nu\_{\mathbb{S}\_i})^T \right)^{1/2} \tag{13}$$

The matrices DS are organized by collecting the distances calculated from the matrices MS.

$$D\_S = \begin{pmatrix} d\_{\overline{v}\_{\overline{v}\wp\_{\mathbb{S}\_1}}} \\ \vdots \\ d\_{\overline{v}\_{\overline{v}\wp\_{\mathbb{S}\_C}}} \end{pmatrix} \tag{14}$$

function, select them. This re-evaluation must be carried out since ANFIS relies on the RMSE to evaluate the solutions but the memetic algorithm relies on the fitness function. Thus, an interesting individual for ANFIS may not necessarily be good

The evolutionary algorithm used to tune fuzzy forecasters. This algorithm uses the script of the differential evolution algorithm with a step of local search provided by an adaptive neural fuzzy inference system.

In order to guarantee population diversity, and avoiding a fast-convergence caused by ANFIS, the diversity calculator considered is the FANO factor [26], which calculates the average variation of the mean in the membership functions according to Eq. (15). The population is re-initialized if the FANO factor is greater

> <sup>f</sup>¼<sup>1</sup> Dfl � Di � �<sup>2</sup>

1 <sup>A</sup> <sup>∗</sup> <sup>1</sup> gen

(15)

Dl

according to its fitness score.

<sup>F</sup>^ <sup>¼</sup> <sup>1</sup>

0 @

n ∗ m

Time Series from Clustering: An Approach to Forecast Crime Patterns

DOI: http://dx.doi.org/10.5772/intechopen.89561

n X∗ m i¼1

1 j�1 P<sup>j</sup>

Dl <sup>¼</sup> <sup>1</sup> j X j

i¼1 Dli

than a given threshold:

where

11

Figure 3.

• The fittest vectors: finally, the series corresponding to the position of the minimum value in each matrix DS is chosen as the most representative signal. Thus, there are three representative signals obtained from DX, DY, and DR which are referenced as X <sup>f</sup>ð Þt , Y <sup>f</sup>ð Þt , and Rfð Þt , respectively. These three signals are taken in the forecasting stage as representation of the clustering dynamics for a given number of clusters. Particularly, the representative signals for the best and worst conditions of the MIG index are considered in this work.

#### 2.5 Evolutionary-fuzzy forecasting of representative time series

Forecasting of representative time series is carried out by means of a custom memetic algorithm [25, 26]. The heuristic was proposed in [27] as the combination of the differential evolution algorithm [28] and the adaptive neuro-fuzzy inference system (ANFIS) [29]. The general flow graph of the algorithm is shown in Figure 3.

#### 2.5.1 Differential memetic neuro-fuzzy algorithm

A memetic algorithm allows to take advantage of both global and local search. In this manner, the optimization space can be explored widely and deeply. The differential evolution algorithm (highlighted in orange) is chosen as the global optimizer, on account of its use for optimizing multidimensional real-valued functions. In addition, it has been successfully applied for model optimization of complex systems [30]. Several variants of this algorithm have been proposed in literature, taking into account the multiplicity of elitism strategies, chromosome representations, and mutation operators. ANFIS (highlighted in pink) is implemented as the supervised learning strategy of the memetic algorithm. It uses the gradient of the objective function on the basis of its differentiability to search for solutions around a locality of the optimization landscape. ANFIS is supported on Sugeno-type fuzzy systems with gaussian membership functions in the inputs [29, 31]. In addition, the proposed fuzzy system has m rules, which also corresponds to the number of membership functions in each input and the amount of singleton fuzzy sets in the output. Each rule relates the r � th group of membership functions in the inputs with the ith singleton value in the output [30].

As shown in the flowgraph in Figure 3, when the population is adapted by ANFIS, it is necessary to re-evaluate the solutions and later, according to the fitness Time Series from Clustering: An Approach to Forecast Crime Patterns DOI: http://dx.doi.org/10.5772/intechopen.89561

#### Figure 3.

• Determining distances: the distance between each row of the matrices MS and the corresponding average vector is computed. Thereby, the Euclidean

dvSvSi <sup>¼</sup> vS � vSi ð Þ vS � vSi ð Þ<sup>T</sup> � �1=<sup>2</sup>

The matrices DS are organized by collecting the distances calculated from the

0

B@

• The fittest vectors: finally, the series corresponding to the position of the minimum value in each matrix DS is chosen as the most representative signal. Thus, there are three representative signals obtained from DX, DY, and DR which are referenced as X <sup>f</sup>ð Þt , Y <sup>f</sup>ð Þt , and Rfð Þt , respectively. These three signals are taken in the forecasting stage as representation of the clustering dynamics for a given number of clusters. Particularly, the representative signals for the best

and worst conditions of the MIG index are considered in this work.

Forecasting of representative time series is carried out by means of a custom

A memetic algorithm allows to take advantage of both global and local search. In this manner, the optimization space can be explored widely and deeply. The differential evolution algorithm (highlighted in orange) is chosen as the global optimizer, on account of its use for optimizing multidimensional real-valued functions. In addition, it has been successfully applied for model optimization of complex systems [30]. Several variants of this algorithm have been proposed in literature, taking into account the multiplicity of elitism strategies, chromosome representations, and mutation operators. ANFIS (highlighted in pink) is implemented as the supervised learning strategy of the memetic algorithm. It uses the gradient of the objective function on the basis of its differentiability to search for solutions around a locality of the optimization landscape. ANFIS is supported on Sugeno-type fuzzy systems with gaussian membership functions in the inputs [29, 31]. In addition, the proposed fuzzy system has m rules, which also corresponds to the number of membership functions in each input and the amount of singleton fuzzy sets in the output. Each rule relates the r � th group of membership functions in the inputs

As shown in the flowgraph in Figure 3, when the population is adapted by ANFIS, it is necessary to re-evaluate the solutions and later, according to the fitness

2.5 Evolutionary-fuzzy forecasting of representative time series

memetic algorithm [25, 26]. The heuristic was proposed in [27] as the combination of the differential evolution algorithm [28] and the adaptive neuro-fuzzy inference system (ANFIS) [29]. The general flow graph of the

algorithm is shown in Figure 3.

2.5.1 Differential memetic neuro-fuzzy algorithm

with the ith singleton value in the output [30].

10

dvSvS<sup>1</sup> ⋮ dvSvSC 1

CA (14)

DS ¼

(13)

distance is calculated for all vectors vSi as follows:

Recent Trends in Artificial Neural Networks - From Training to Prediction

matrices MS.

The evolutionary algorithm used to tune fuzzy forecasters. This algorithm uses the script of the differential evolution algorithm with a step of local search provided by an adaptive neural fuzzy inference system.

function, select them. This re-evaluation must be carried out since ANFIS relies on the RMSE to evaluate the solutions but the memetic algorithm relies on the fitness function. Thus, an interesting individual for ANFIS may not necessarily be good according to its fitness score.

In order to guarantee population diversity, and avoiding a fast-convergence caused by ANFIS, the diversity calculator considered is the FANO factor [26], which calculates the average variation of the mean in the membership functions according to Eq. (15). The population is re-initialized if the FANO factor is greater than a given threshold:

$$\hat{F} = \left(\frac{1}{n \ast m} \sum\_{i=1}^{n \ast m} \frac{\frac{1}{j-1} \sum\_{f=1}^{j} \left(D\_{fl} - \overline{D}\_{i}\right)^{2}}{\overline{D}\_{l}}\right) \ast \frac{1}{gen} \tag{15}$$

where

$$\overline{D}\_l = \frac{1}{j} \sum\_{i=1}^j D\_{li}$$

And n ∗ m is the number of membership functions in the inputs, Dli is the average of the <sup>i</sup>‐th membership function of the <sup>l</sup>‐th individual, and gen is the current generation.

#### 2.5.2 Fitness function

The root mean squared error (RMSE) index used by ANFIS may not be appropriate for guiding the global search of an evolutionary algorithm [30] while dealing with complex behaviors. Thus, a problem-oriented fitness function to evaluate candidate solutions must be designed. This function is significant because it provides the criteria to judge a solution and its performance. In addition, it is imperative to include several performance indicators to check solutions in a more precise and proper manner. This process is carried out by minimizing the following expression:

$$f = \frac{(1 - \text{NSE}) \ast \text{MAE}}{\text{POCID}} \tag{16}$$

where

penalizer.

3. Results

Bk <sup>¼</sup> 1, ifð Þ dk � dk�<sup>1</sup> ^

Time Series from Clustering: An Approach to Forecast Crime Patterns

DOI: http://dx.doi.org/10.5772/intechopen.89561

0, otherwise

optimal number of clusters of a dynamic partition of crime events.

3.1 Results for San Francisco, USA

3.1.2 Optimal number of clusters

13

3.1.1 Data organization

NSE ¼ 1 and MAE ¼ 0 represent the two roots of f, hence these are enough reasons to consider a solution as the best, but the POCID operates as a non-linear

The method outlined in the previous section was applied to forecast criminal patterns in two cities: San Francisco, USA and Bogota, Colombia. Results obtained from evolved fuzzy predictors are described in this section. In addition, evidence is presented from these cases about the validity of the MIG index for setting the

The criminal dataset for the city of San Francisco was obtained online through the open data services provided by the local government of the city. The dataset used in this work registers about 70,000 criminal events between years 2003 and 2015. Each criminal register contains attributes such as latitude, longitude, time, date, type of crime, among others. For the purposes of this study, only house burglary registers were considered, and from each register only latitude, longitude, and date were taken into account. Each spatial register was projected by means of the universal mercator system taken the location of the city of San Francisco as reference. Relative distances were expressed in meters. Time frames of criminal activity were created by aggregating the crime events of the last 7 days. Frames iterate from 1 day to the next. A total of 3195 time frames were produced.

Optimization of the MIG index was carried out over time series from clustering with C ¼ 2:::16, as shown in Figure 4. There was not be a clear criteria to set the maximum number of clusters to be considered in the optimization of any clustering validation index. In this case, the maximum was determined by taking into account that San Francisco city is divided in 10 police districts. Thus, the optimization was computed considering just three additional clustering settings (i.e. C ¼ 12, 16, 20). It was observed for these values that the MIG index diverged too fast, since in C ¼ 20, it reached a value greater than the first maximum by about two orders of magnitude. The optimal MIG index is obtained for C ¼ 4, whose value is around 30% of the maximum. However, the index found in C ¼ 5 is quite similar to the optimum. It is expected to find a similar predictability potential for these two cases. The difference between the extreme cases may be explained by examining Figure 5. Note that for Copt ¼ 4, big areas are covered so that clusters concentrate a greater number of crime events. Therefore, the spatial variability of clusters in the optimal case is less subject to fluctuations than that of the worst case. In other

dk � ^ dk�<sup>1</sup> � �≥<sup>0</sup>

(20)

The fitness function is composed of three indexes. Each one has been chosen to guide the evolution regarding features in the best solution. Thus, many possible solutions that do not have a near-zero mean error but can forecast the series properly, will be explored.

The Nash-Sutcliffe efficiency (NSE) is conceived as an overall performance measure. It is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance [32]. A value of 1 corresponds to a perfect match of the modeled ^ d to the observed data d.

$$NSE = 1 - \frac{\sum\_{k=1}^{N} \left(d\_k - \hat{d}\_k\right)^2}{\sum\_{k=1}^{N} \left(d\_k - \hat{d}\_k\right)^2} \tag{17}$$

The mean absolute error (MAE) represents a clear interpretation of the absolute difference between the series and its forecasting, relative to the series [33]. It is chosen instead of RMSE because the series has several peaks and it is necessary to rest importance on them in the forecasting. If it was not done, the fitness function would penalize errors in the lowest values (i.e., near the mean) more than in the highest ones.

$$MAE = \frac{1}{N} \sum\_{i=1}^{N} \left| \frac{d\_k - \hat{d}\_k}{d\_k} \right| \tag{18}$$

Prediction of change in direction (POCID) is a non-linear index included to ensure that a forecast that does not fit the series well, but follows its direction changes reliably, gets a good score [34]. As a multiplicative inverse, it causes an increase in the fitness function as large as how far the fitness score is from the optimal value.

$$POCID = \frac{\sum\_{k=1}^{N} B\_k}{N} \tag{19}$$

Time Series from Clustering: An Approach to Forecast Crime Patterns DOI: http://dx.doi.org/10.5772/intechopen.89561

where

And n ∗ m is the number of membership functions in the inputs, Dli is the average of the <sup>i</sup>‐th membership function of the <sup>l</sup>‐th individual, and gen is the current

Recent Trends in Artificial Neural Networks - From Training to Prediction

The root mean squared error (RMSE) index used by ANFIS may not be appropriate for guiding the global search of an evolutionary algorithm [30] while dealing with complex behaviors. Thus, a problem-oriented fitness function to evaluate candidate solutions must be designed. This function is significant because it provides the criteria to judge a solution and its performance. In addition, it is imperative to include several performance indicators to check solutions in a more precise and proper manner. This process is carried out by minimizing the following

<sup>f</sup> <sup>¼</sup> ð Þ <sup>1</sup> � NSE <sup>∗</sup> MAE

The fitness function is composed of three indexes. Each one has been chosen to guide the evolution regarding features in the best solution. Thus, many possible solutions that do not have a near-zero mean error but can forecast the series

The Nash-Sutcliffe efficiency (NSE) is conceived as an overall performance measure. It is a normalized statistic that determines the relative magnitude of the residual variance compared to the measured data variance [32]. A value of 1 corre-

P<sup>N</sup>

P<sup>N</sup>

The mean absolute error (MAE) represents a clear interpretation of the absolute difference between the series and its forecasting, relative to the series [33]. It is chosen instead of RMSE because the series has several peaks and it is necessary to rest importance on them in the forecasting. If it was not done, the fitness function would penalize errors in the lowest values (i.e., near the mean) more than in the

<sup>k</sup>¼<sup>1</sup> dk � ^

<sup>k</sup>¼<sup>1</sup> dk � ^

NSE ¼ 1 �

MAE <sup>¼</sup> <sup>1</sup>

N X N

POCID ¼

i¼1

Prediction of change in direction (POCID) is a non-linear index included to ensure that a forecast that does not fit the series well, but follows its direction changes reliably, gets a good score [34]. As a multiplicative inverse, it causes an increase in the fitness function as large as how far the fitness score is from the

� � � � �

P<sup>N</sup> <sup>k</sup>¼<sup>1</sup>Bk

dk � ^ dk dk

� � � � �

POCID (16)

� �<sup>2</sup> (17)

<sup>N</sup> (19)

(18)

d to the observed data d.

dk � �<sup>2</sup>

dk

generation.

expression:

highest ones.

optimal value.

12

properly, will be explored.

sponds to a perfect match of the modeled ^

2.5.2 Fitness function

$$B\_k = \begin{pmatrix} \mathbf{1}, \text{if}(d\_k - d\_{k-1}) \left(\hat{d}\_k - \hat{d}\_{k-1}\right) \ge \mathbf{0} \\ \mathbf{0}, \text{otherwise} \end{pmatrix} \tag{20}$$

NSE ¼ 1 and MAE ¼ 0 represent the two roots of f, hence these are enough reasons to consider a solution as the best, but the POCID operates as a non-linear penalizer.

#### 3. Results

The method outlined in the previous section was applied to forecast criminal patterns in two cities: San Francisco, USA and Bogota, Colombia. Results obtained from evolved fuzzy predictors are described in this section. In addition, evidence is presented from these cases about the validity of the MIG index for setting the optimal number of clusters of a dynamic partition of crime events.

#### 3.1 Results for San Francisco, USA

#### 3.1.1 Data organization

The criminal dataset for the city of San Francisco was obtained online through the open data services provided by the local government of the city. The dataset used in this work registers about 70,000 criminal events between years 2003 and 2015. Each criminal register contains attributes such as latitude, longitude, time, date, type of crime, among others. For the purposes of this study, only house burglary registers were considered, and from each register only latitude, longitude, and date were taken into account. Each spatial register was projected by means of the universal mercator system taken the location of the city of San Francisco as reference. Relative distances were expressed in meters. Time frames of criminal activity were created by aggregating the crime events of the last 7 days. Frames iterate from 1 day to the next. A total of 3195 time frames were produced.

#### 3.1.2 Optimal number of clusters

Optimization of the MIG index was carried out over time series from clustering with C ¼ 2:::16, as shown in Figure 4. There was not be a clear criteria to set the maximum number of clusters to be considered in the optimization of any clustering validation index. In this case, the maximum was determined by taking into account that San Francisco city is divided in 10 police districts. Thus, the optimization was computed considering just three additional clustering settings (i.e. C ¼ 12, 16, 20). It was observed for these values that the MIG index diverged too fast, since in C ¼ 20, it reached a value greater than the first maximum by about two orders of magnitude. The optimal MIG index is obtained for C ¼ 4, whose value is around 30% of the maximum. However, the index found in C ¼ 5 is quite similar to the optimum. It is expected to find a similar predictability potential for these two cases.

The difference between the extreme cases may be explained by examining Figure 5. Note that for Copt ¼ 4, big areas are covered so that clusters concentrate a greater number of crime events. Therefore, the spatial variability of clusters in the optimal case is less subject to fluctuations than that of the worst case. In other

validation with the other 30% of the data (i.e., 959 days). In order to initialize the algorithm in such a way that the search space could be explored widely with the least possible computational cost, guaranteeing the simplicity of the solutions, a set of experiments was launched for the combination of parameters presented in Table 2. The number of generations and population size were selected as the maximum values in which at least a difference of 0.1% between consecutive generations was found in the fitness function of the fittest solutions. Number of Epochs was chosen to avoid frequent loss of diversity so that the FANO factor was close to the selected threshold. Regarding the fuzzy predictor, the number of regressors in the inputs and the number of rules were selected by finding the maximum values in which the Pearson coefficient for the forecast and real data did not change in more

Time Series from Clustering: An Approach to Forecast Crime Patterns

DOI: http://dx.doi.org/10.5772/intechopen.89561

From selected configuration values, due to the random initialization of the memetic algorithm, over 10,000 experiments were run per series. Table 3 summarizes the performance of the best fuzzy predictors found by the memetic algorithm. Pearson's correlation coefficient between the model and data for the optimal scenario was the greatest in both training and validation. The same observation can be stated for the Nash coefficient. The relative difference between Pearson coefficients of the optimal and control scenarios was about 17% in validation whereas for the Nash index, the relative difference was 26%. Regarding the estimated LLE, smaller values were obtained for the optimal scenario, and even a negative LLE was obtained. Visual results are depicted in Figure 6. Selected time series exhibited an interesting texture, which was more accentuated in the control scenario since more peaks appeared randomly and recurrence was not easily

<sup>S</sup> <sup>μ</sup> <sup>σ</sup> <sup>10</sup><sup>8</sup> <sup>E</sup> <sup>10</sup><sup>9</sup> Lg DS

x [1] 0.323 0.323 1.139 1.534 4.692 4.821 9 12.75 0.052 y [2] 0.0810 0.0806 2.3229 4.230 1.765 2.4516 7 11 0.337 R [3] 5.23e-04 5.14e-04 1.5880 1.574 4.292 4.4145 7 19.75 0.044 These series are a representative sample of the entire, set which contains 12 time series (four clusters each one with three

Features of representative time series obtained from the TSC method for house burglaries in the city of San

Parameter Range Step Value Number of generations 50–500 50 250 Population size 10–50 10 20 Epochs number 1–20 1 5 Number of regressors 3–12 1 8 Number of rules 4–24 4 8

The first three parameters were used to configure the differential evolution algorithm whereas the last two were used to

Parameters of the evolutionary fuzzy predictor used for the city of San Francisco, USA.

Rep Mean Rep Mean Rep Mean Rep Mean Norm

than 2%.

observed.

parameters).

Francisco, USA, considering four clusters.

adjust the Adaptive Neural Fuzzy Inference System.

Table 1.

Table 2.

15

#### Figure 4.

MIG index computed for house burglaries in the city of San Francisco, USA. The minimum of the MIG index appears at four clusters, where it is expected that time series from clustering are the most predictable.

#### Figure 5.

An example of cluster dynamics for house burglaries in the city of San Francisco, USA. In this example, four clusters were tracked with the clustering reorganization algorithm (CRA).

words, the dynamics of clusters in the optimum case would exhibit the lowest level of disorder.

An example of cluster dynamics for house burglaries in the city of San Francisco, USA. In this example, four clusters were tracked with the clustering reorganization algorithm (CRA).

#### 3.1.3 Characterization of time series

According to the MIG index, only clustering results for C ¼ 4 and C ¼ 16 groups were considered. These cases represent the most and least predictable dynamic partitions. It is necessary to choose a representative time series (Rep) in the two scenarios, denoting them as the optimal scenario (i.e., minimum MIG index) and the control scenario (i.e., maximum MIG index). Table 1 summarizes some results from this process. In column S, the field "[i]" refers to the cluster number. As shown in this table, selected series are from different clusters since the first aim of the forecasting in this study is to have a preliminary validation of the MIG index instead of developing an exhaustive forecast. Representative series (Rep) are selected by finding the minimum distance between their characterization vector and the mean vector of a class S (i.e., X, Y, and R) according to Eqs. (10)–(14).

#### 3.1.4 Forecasting results

Once the series were selected, the memetic algorithm was configured to compute the forecasting with 70% of the data (i.e., 2236 days) and compute the

#### Time Series from Clustering: An Approach to Forecast Crime Patterns DOI: http://dx.doi.org/10.5772/intechopen.89561

validation with the other 30% of the data (i.e., 959 days). In order to initialize the algorithm in such a way that the search space could be explored widely with the least possible computational cost, guaranteeing the simplicity of the solutions, a set of experiments was launched for the combination of parameters presented in Table 2. The number of generations and population size were selected as the maximum values in which at least a difference of 0.1% between consecutive generations was found in the fitness function of the fittest solutions. Number of Epochs was chosen to avoid frequent loss of diversity so that the FANO factor was close to the selected threshold. Regarding the fuzzy predictor, the number of regressors in the inputs and the number of rules were selected by finding the maximum values in which the Pearson coefficient for the forecast and real data did not change in more than 2%.

From selected configuration values, due to the random initialization of the memetic algorithm, over 10,000 experiments were run per series. Table 3 summarizes the performance of the best fuzzy predictors found by the memetic algorithm. Pearson's correlation coefficient between the model and data for the optimal scenario was the greatest in both training and validation. The same observation can be stated for the Nash coefficient. The relative difference between Pearson coefficients of the optimal and control scenarios was about 17% in validation whereas for the Nash index, the relative difference was 26%. Regarding the estimated LLE, smaller values were obtained for the optimal scenario, and even a negative LLE was obtained. Visual results are depicted in Figure 6. Selected time series exhibited an interesting texture, which was more accentuated in the control scenario since more peaks appeared randomly and recurrence was not easily observed.


These series are a representative sample of the entire, set which contains 12 time series (four clusters each one with three parameters).

#### Table 1.

words, the dynamics of clusters in the optimum case would exhibit the lowest level

An example of cluster dynamics for house burglaries in the city of San Francisco, USA. In this example, four

clusters were tracked with the clustering reorganization algorithm (CRA).

MIG index computed for house burglaries in the city of San Francisco, USA. The minimum of the MIG index appears at four clusters, where it is expected that time series from clustering are the most predictable.

Recent Trends in Artificial Neural Networks - From Training to Prediction

An example of cluster dynamics for house burglaries in the city of San Francisco, USA. In this example, four clusters were tracked with the clustering reorganization

According to the MIG index, only clustering results for C ¼ 4 and C ¼ 16 groups were considered. These cases represent the most and least predictable dynamic partitions. It is necessary to choose a representative time series (Rep) in the two scenarios, denoting them as the optimal scenario (i.e., minimum MIG index) and the control scenario (i.e., maximum MIG index). Table 1 summarizes some results from this process. In column S, the field "[i]" refers to the cluster number. As shown in this table, selected series are from different clusters since the first aim of the forecasting in this study is to have a preliminary validation of the MIG index instead of developing an exhaustive forecast. Representative series (Rep) are selected by finding the minimum distance between their characterization vector and the mean vector of a class S (i.e., X, Y, and R) according to Eqs. (10)–(14).

Once the series were selected, the memetic algorithm was configured to com-

pute the forecasting with 70% of the data (i.e., 2236 days) and compute the

of disorder.

Figure 5.

Figure 4.

algorithm (CRA).

3.1.4 Forecasting results

14

3.1.3 Characterization of time series

Features of representative time series obtained from the TSC method for house burglaries in the city of San Francisco, USA, considering four clusters.


The first three parameters were used to configure the differential evolution algorithm whereas the last two were used to adjust the Adaptive Neural Fuzzy Inference System.

#### Table 2.

Parameters of the evolutionary fuzzy predictor used for the city of San Francisco, USA.


#### Table 3.

Forecasting indices for representative time series of two dynamic partitions (house burglaries in the city of San Francisco, USA).

clusters. Note the similarity of the optimal MIG index and the one obtained for C ¼ 12. The optimization process was carried out for C ¼ 2:::20; however, the last

MIG index computed for cellular phone robbery in the city of Bogota, Colombia. The minimum of the MIG index appears at four clusters, where it is expected that time series from clustering are the most predictable.

Characterization of time series of dynamic partitions are summarized in Table 4. The field "[i]" refers to the cluster number. For these series it is supposed a-priori that the three-group clustering is more predictable than the five-group, considering the respective entropies. This result is in accordance with the MIG index optimization. Regarding the radius series, the number of lags is higher with respect to the series collection. Representative series (Rep) were selected by finding the minimum distance between their characterization vector and the mean vector

Table 5 presents the combination of parameters that were used for running the optimization of the fuzzy forecasters. In this case, 70% of generated samples (i.e.,

An example of cluster dynamics for cellular phone robbery in the city of Bogota, Colombia. In this example

of a class S (i.e., X, Y, and R) according to Eqs. (10)–(14).

three clusters were tracked with the clustering reorganization algorithm (CRA).

A sample of the cluster dynamics is presented in Figure 8 for the optimal case. However, it is not a straightforward task to infer the result of the optimization process from this figure. Centroids of clusters are moving, as can be noticed from the change in shape of the connecting polygon. Note that the radiuses of the clusters changed in size so different areas were covered from one time frame to another.

result with C ¼ 20 was omitted since the MIG index grew exaggeratedly.

Time Series from Clustering: An Approach to Forecast Crime Patterns

DOI: http://dx.doi.org/10.5772/intechopen.89561

3.2.3 Characterization of time series

Figure 7.

3.2.4 Forecasting results

Figure 8.

17

Figure 6.

Forecasting results for the three representative time series of the cluster dynamics (House burglaries in the city of San Francisco, USA). Series were obtained from a dynamic partition with four clusters.

#### 3.2 Results for Bogota, Colombia

#### 3.2.1 Data organization

The dataset for the city of Bogota, Colombia was provided by the Non-Governmental Organization (NGO) "Fundacion ideas para la paz," which contains about 25,000 events of cellular phone robbery registered between the years of 2012 and 2015. Criminal registers contain X-coordinates, Y-coordinates, and dates of events. No transformation was required since the coordinates were already expressed in a geographical system adapted to the city. All differential spatial measurements were processed in meters. As in the previous case, time frames were created by the aggregation of the criminal events recorded in the last 7 days. A total amount of 1417 time frames was generated.

#### 3.2.2 Optimal number of clusters

According to Figure 7, the MIG index for Bogota series exhibited a global minimum at Copt ¼ 3, showing non-convex behavior with respect to the number of Time Series from Clustering: An Approach to Forecast Crime Patterns DOI: http://dx.doi.org/10.5772/intechopen.89561

#### Figure 7.

MIG index computed for cellular phone robbery in the city of Bogota, Colombia. The minimum of the MIG index appears at four clusters, where it is expected that time series from clustering are the most predictable.

clusters. Note the similarity of the optimal MIG index and the one obtained for C ¼ 12. The optimization process was carried out for C ¼ 2:::20; however, the last result with C ¼ 20 was omitted since the MIG index grew exaggeratedly.

A sample of the cluster dynamics is presented in Figure 8 for the optimal case. However, it is not a straightforward task to infer the result of the optimization process from this figure. Centroids of clusters are moving, as can be noticed from the change in shape of the connecting polygon. Note that the radiuses of the clusters changed in size so different areas were covered from one time frame to another.

#### 3.2.3 Characterization of time series

Characterization of time series of dynamic partitions are summarized in Table 4. The field "[i]" refers to the cluster number. For these series it is supposed a-priori that the three-group clustering is more predictable than the five-group, considering the respective entropies. This result is in accordance with the MIG index optimization. Regarding the radius series, the number of lags is higher with respect to the series collection. Representative series (Rep) were selected by finding the minimum distance between their characterization vector and the mean vector of a class S (i.e., X, Y, and R) according to Eqs. (10)–(14).

#### 3.2.4 Forecasting results

Table 5 presents the combination of parameters that were used for running the optimization of the fuzzy forecasters. In this case, 70% of generated samples (i.e.,

#### Figure 8.

An example of cluster dynamics for cellular phone robbery in the city of Bogota, Colombia. In this example three clusters were tracked with the clustering reorganization algorithm (CRA).

3.2 Results for Bogota, Colombia

amount of 1417 time frames was generated.

3.2.2 Optimal number of clusters

The dataset for the city of Bogota, Colombia was provided by the Non-Governmental Organization (NGO) "Fundacion ideas para la paz," which contains about 25,000 events of cellular phone robbery registered between the years of 2012 and 2015. Criminal registers contain X-coordinates, Y-coordinates, and dates of events. No transformation was required since the coordinates were already expressed in a geographical system adapted to the city. All differential spatial measurements were processed in meters. As in the previous case, time frames were created by the aggregation of the criminal events recorded in the last 7 days. A total

Forecasting results for the three representative time series of the cluster dynamics (House burglaries in the city of

San Francisco, USA). Series were obtained from a dynamic partition with four clusters.

S Pearson coefficient Nash LLE

x 0.6099 0.3986 0.6942 0.4581 0.3430 0.1534 0.0339 �0.0359 y 0.5447 0.7114 0.4982 0.5546 0.2908 0.5021 �0.0118 0.0796 R 0.8353 0.6645 0.7803 0.6260 0.6296 0.2779 0.0388 0.0677 Average 0.6633 0.5915 0.6576 0.5463 0.4212 0.3111 x x Dynamic partitions with minimum (4 clusters) and maximum (16 clusters) MIG index were considered.

Forecasting indices for representative time series of two dynamic partitions (house burglaries in the city of San

4 16 4 16 4 16 4 16

Training Validation

Recent Trends in Artificial Neural Networks - From Training to Prediction

According to Figure 7, the MIG index for Bogota series exhibited a global minimum at Copt ¼ 3, showing non-convex behavior with respect to the number of

3.2.1 Data organization

Figure 6.

16

Table 3.

Francisco, USA).


These series are a representative sample of the entire set, which contains nine time series (three clusters each one with three parameters).

#### Table 4.

Features of representative time series obtained from the TSC method for cellular phone robbery in the city of Bogota, Colombia, considering three clusters.


The first three parameters configured the differential evolution algorithm, whereas the last two were adjusted for the Adaptive Neural Fuzzy Inference System.

4. Conclusion

Figure 9.

take advantage of.

Conflict of interest

19

urban crime as noted in previous studies [35].

The authors declare no conflict of interest.

such as auto-encoder deep neural networks among others.

Qualitatively speaking, similar results were obtained for the two study cases. In both cases, the forecasting of time series from the optimal scenario outperformed that of the control scenario in terms of two statistical indices (i.e., Pearson correlation and Nash coefficients) and one from chaos theory (i.e., LLE). Therefore, we argue that the MIG index might be useful to evaluate the goodness of a dynamic partition of crime events. Moreover, it may give a confidential a-priori insight about the predictability of series synthesized from TSC. Hence, this method produces coherent series that preserve a temporal structure, which a forecasting method can

Forecasting results for the three representative time series of the cluster dynamics (cellular phone robbery in the

city of Bogota, Colombia). Series were obtained from a dynamic partition with three clusters.

Time Series from Clustering: An Approach to Forecast Crime Patterns

DOI: http://dx.doi.org/10.5772/intechopen.89561

TSC series for both cities exhibited an interesting behavior in terms of their texture. No evident periodicity was observed and abundant peaks appeared over the observed time window. Positive LLEs computed in these signals revealed the presence of a possible chaotic nature of the phenomena. Chaotic texture of these series speaks about non-stationary spatial crime patterns that evolve continuously producing information. Thus, this observation reflects a footprint of complexity for

As a future work, the proposed method will be tested over other dynamic phenomena characterized by non-uniform sampling of relevant variables in both space and time. The method will be also refined by considering other techniques

#### Table 5.

Parameters of the evolutionary-fuzzy predictor used for the city of Bogota, Colombia.

992 days) were used for training, and the remaining 30% for validation (i.e., 425 days). The best forecasting results for Bogota series are presented in Table 6. The Pearson correlation coefficient between the model and real data was greater for the optimal scenario (i.e., three clusters) with respect to the control scenario (i.e. five clusters) in both training and validation. The relative difference was about 18 and 12%, respectively. In terms of the Nash index, the optimal scenario exhibited the highest performance with a relative difference of 31%. Estimated LLEs were smaller in the optimal scenario with a negative LLE in the case of the X tð Þ signal. Figure 9 depicts visual results. It can be seen that predicted signals in the optimal scenario are correlated to the real ones. Signals in the control scenario exhibited more random peaks, although the recurrence is similar to the optimal case.


#### Table 6.

Forecasting indices for representative time series of two dynamic partitions of cellular phone robbery in the city of Bogota, Colombia.

Time Series from Clustering: An Approach to Forecast Crime Patterns DOI: http://dx.doi.org/10.5772/intechopen.89561

#### Figure 9.

992 days) were used for training, and the remaining 30% for validation (i.e., 425 days). The best forecasting results for Bogota series are presented in Table 6. The Pearson correlation coefficient between the model and real data was greater for the optimal scenario (i.e., three clusters) with respect to the control scenario (i.e. five clusters) in both training and validation. The relative difference was about 18 and 12%, respectively. In terms of the Nash index, the optimal scenario exhibited the highest performance with a relative difference of 31%. Estimated LLEs were smaller in the optimal scenario with a negative LLE in the case of the X tð Þ signal. Figure 9 depicts visual results. It can be seen that predicted signals in the optimal scenario are correlated to the real ones. Signals in the control scenario exhibited more random peaks, although the recurrence is similar to the optimal case.

Signal Pearson Coefficient 2\*Nash 2\*LLE

x 0.8443 0.7388 0.9154 0.6833 0.6546 0.5025 �1.5102 �0.3230 y 0.7238 0.6833 0.4009 0.4755 0.5069 0.2795 1.5157 1.5100 R 0.8879 0.5025 0.7538 0.6441 0.7740 0.5499 1.2457 2.1515 Average 0.8186 0.6708 0.6900 0.6010 0.6452 0.4440 x x Dynamic partitions with minimum (three clusters) and maximum (five clusters) MIG index were considered.

Forecasting indices for representative time series of two dynamic partitions of cellular phone robbery in the city

353535 3 5

Training Validation

Parameters of the evolutionary-fuzzy predictor used for the city of Bogota, Colombia.

<sup>S</sup> <sup>μ</sup> � <sup>10</sup><sup>4</sup> <sup>σ</sup> � <sup>10</sup><sup>6</sup> <sup>E</sup> � <sup>10</sup>�<sup>5</sup> Lg DS

Recent Trends in Artificial Neural Networks - From Training to Prediction

x [3] 9.8 9.7 8.115 7.251 1.036 1.4336 7 9 0.251 y [3] 10.9 10.4 11.951 11.265 0.894 0.755 6 8 0.099 R [3] 9.8 0.6 3.854 4.4214 1.723 1.484 163 158 0.15 These series are a representative sample of the entire set, which contains nine time series (three clusters each one with

Features of representative time series obtained from the TSC method for cellular phone robbery in the city of

The first three parameters configured the differential evolution algorithm, whereas the last two were adjusted for the

Parameter Range Step Value Number of generations 100–400 50 200 Population size 10–50 10 30 Epochs number 1–20 1 4 Number of regressors 3–12 1 8 Number of rules 4–24 4 8

three parameters).

Bogota, Colombia, considering three clusters.

Adaptive Neural Fuzzy Inference System.

Table 4.

Table 5.

Table 6.

18

of Bogota, Colombia.

Rep Mean Rep Mean Rep Mean Rep Mean Norm

Forecasting results for the three representative time series of the cluster dynamics (cellular phone robbery in the city of Bogota, Colombia). Series were obtained from a dynamic partition with three clusters.

#### 4. Conclusion

Qualitatively speaking, similar results were obtained for the two study cases. In both cases, the forecasting of time series from the optimal scenario outperformed that of the control scenario in terms of two statistical indices (i.e., Pearson correlation and Nash coefficients) and one from chaos theory (i.e., LLE). Therefore, we argue that the MIG index might be useful to evaluate the goodness of a dynamic partition of crime events. Moreover, it may give a confidential a-priori insight about the predictability of series synthesized from TSC. Hence, this method produces coherent series that preserve a temporal structure, which a forecasting method can take advantage of.

TSC series for both cities exhibited an interesting behavior in terms of their texture. No evident periodicity was observed and abundant peaks appeared over the observed time window. Positive LLEs computed in these signals revealed the presence of a possible chaotic nature of the phenomena. Chaotic texture of these series speaks about non-stationary spatial crime patterns that evolve continuously producing information. Thus, this observation reflects a footprint of complexity for urban crime as noted in previous studies [35].

As a future work, the proposed method will be tested over other dynamic phenomena characterized by non-uniform sampling of relevant variables in both space and time. The method will be also refined by considering other techniques such as auto-encoder deep neural networks among others.

#### Conflict of interest

The authors declare no conflict of interest.

References

[1] Felson M. Routine activity approach. In: Environmental Criminology and Crime Analysis. Abingdon, UK: Routledge; 2008. p. 7077

DOI: http://dx.doi.org/10.5772/intechopen.89561

Time Series from Clustering: An Approach to Forecast Crime Patterns

[11] Davies T, Johnson S. Examining the relationship between road structure and burglary risk via quantitative network analysis. Journal of Quantitative Criminology. 2015;31(3):481-507

[12] Malleson N, Andresen M. Spatiotemporal crime hotspots and the ambient population. Crime Science.

[13] Bettencourt L et al. Urban scaling and its deviations: Revealing the structure of wealth, innovation and crime across cities. PLoS One. 2010;

[14] Murray A, Grubesic T, Leitner M. Exploring spatial patterns of crime using non-hierarchical cluster analysis. In: Crime Modeling and Mapping Using Geospatial Technologies. Vol. 8.

Netherlands: Springer; 2013. pp. 105-124

[15] Mayorga D, Melgarejo M, Obregon N. A fuzzy clustering based method for the spatiotemporal analysis of criminal patterns. In: 2016 IEEE International Conference on Fuzzy

Systems; 2016. pp. 738-744

21(5):855-868

2013:1-7

[16] Izakian H, Pedrycz W, Jamal I. Clustering spatio-temporal data: An augmented fuzzy C-means. IEEE Transactions on Fuzzy Systems. 2013;

[17] Ji M, Xie F, Ping Y. A dynamic fuzzy

[18] Hardyns W, Rummens A. Predictive

enforcement? Recent developments and challenges. European Journal on Criminal

[19] Bezdek JC. Pattern Recognition with Fuzzy Objective Function Algorithms.

cluster algorithm for time series. Abstract and Applied Analysis. 2013;

policing as a new tool for law

Policy and Research. 2018;24:201

2015;4(10):1-8

5(11):20-22

[2] Cornish D, Clarke R. The rational choice perspective. In: Environmental Criminology and Crime Analysis. Abingdon, UK: Routledge; 2008. p. 2145

[3] D'Orsogna M, Perc M. Statistical physics of crime: A review. Physics of

[4] Perc M, Donnay K, Helbing D. Understanding recurrent crime as system-immanent collective behavior.

[5] Brantingham PJ, Brantingham P.

Environmental Criminology and Crime Analysis. New York: Willian Publishing;

[6] Short M et al. A statistical model of criminal behavior. M3AS. 2008;18:

Quantitative Criminology. 2012;28(3):

[8] Mohler G. Marked point process hotspot maps for homicide and gun crime prediction in Chicago.

International Journal of Forecasting.

[9] Grubesic T. On the application of fuzzy clustering for crime hot spot detection. Journal of Quantitative Criminology. 2006;22(1):77-105

[10] Brantingham P et al. Crime analysis at multiple scales of aggregation: A topological approach. In: Putting Crime in Its Place. New York: Springer; 2009.

2014;30(3):491-497

pp. 87-107

21

[7] Rey S, Mack E, Koschinsky J. Exploratory space-time analysis of burglary patterns. Journal of

Life Reviews. 2014;12:1-21

PLoS One. 2013;8:e76063

Crime pattern theory. In:

2008. pp. 78-93

1249-1267

509-531

#### Author details

Miguel Melgarejo<sup>1</sup> \*†, Cristian Rodriguez1†, Diego Mayorga1† and Nelson Obregón2†

1 Laboratory for Automation and Computational Intelligence, Universidad Distrital Francisco José de Caldas, Bogotá DC, Colombia

2 Water Institute, Pontifical Xaverian University, Bogotá DC, Colombia

\*Address all correspondence to: mmelgarejo@udistrital.edu.co

† All authors are contributed equally.

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Time Series from Clustering: An Approach to Forecast Crime Patterns DOI: http://dx.doi.org/10.5772/intechopen.89561

#### References

[1] Felson M. Routine activity approach. In: Environmental Criminology and Crime Analysis. Abingdon, UK: Routledge; 2008. p. 7077

[2] Cornish D, Clarke R. The rational choice perspective. In: Environmental Criminology and Crime Analysis. Abingdon, UK: Routledge; 2008. p. 2145

[3] D'Orsogna M, Perc M. Statistical physics of crime: A review. Physics of Life Reviews. 2014;12:1-21

[4] Perc M, Donnay K, Helbing D. Understanding recurrent crime as system-immanent collective behavior. PLoS One. 2013;8:e76063

[5] Brantingham PJ, Brantingham P. Crime pattern theory. In: Environmental Criminology and Crime Analysis. New York: Willian Publishing; 2008. pp. 78-93

[6] Short M et al. A statistical model of criminal behavior. M3AS. 2008;18: 1249-1267

[7] Rey S, Mack E, Koschinsky J. Exploratory space-time analysis of burglary patterns. Journal of Quantitative Criminology. 2012;28(3): 509-531

[8] Mohler G. Marked point process hotspot maps for homicide and gun crime prediction in Chicago. International Journal of Forecasting. 2014;30(3):491-497

[9] Grubesic T. On the application of fuzzy clustering for crime hot spot detection. Journal of Quantitative Criminology. 2006;22(1):77-105

[10] Brantingham P et al. Crime analysis at multiple scales of aggregation: A topological approach. In: Putting Crime in Its Place. New York: Springer; 2009. pp. 87-107

[11] Davies T, Johnson S. Examining the relationship between road structure and burglary risk via quantitative network analysis. Journal of Quantitative Criminology. 2015;31(3):481-507

[12] Malleson N, Andresen M. Spatiotemporal crime hotspots and the ambient population. Crime Science. 2015;4(10):1-8

[13] Bettencourt L et al. Urban scaling and its deviations: Revealing the structure of wealth, innovation and crime across cities. PLoS One. 2010; 5(11):20-22

[14] Murray A, Grubesic T, Leitner M. Exploring spatial patterns of crime using non-hierarchical cluster analysis. In: Crime Modeling and Mapping Using Geospatial Technologies. Vol. 8. Netherlands: Springer; 2013. pp. 105-124

[15] Mayorga D, Melgarejo M, Obregon N. A fuzzy clustering based method for the spatiotemporal analysis of criminal patterns. In: 2016 IEEE International Conference on Fuzzy Systems; 2016. pp. 738-744

[16] Izakian H, Pedrycz W, Jamal I. Clustering spatio-temporal data: An augmented fuzzy C-means. IEEE Transactions on Fuzzy Systems. 2013; 21(5):855-868

[17] Ji M, Xie F, Ping Y. A dynamic fuzzy cluster algorithm for time series. Abstract and Applied Analysis. 2013; 2013:1-7

[18] Hardyns W, Rummens A. Predictive policing as a new tool for law enforcement? Recent developments and challenges. European Journal on Criminal Policy and Research. 2018;24:201

[19] Bezdek JC. Pattern Recognition with Fuzzy Objective Function Algorithms.

Author details

Miguel Melgarejo<sup>1</sup>

†

20

Francisco José de Caldas, Bogotá DC, Colombia

All authors are contributed equally.

provided the original work is properly cited.

\*†, Cristian Rodriguez1†, Diego Mayorga1† and Nelson Obregón2†

1 Laboratory for Automation and Computational Intelligence, Universidad Distrital

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

2 Water Institute, Pontifical Xaverian University, Bogotá DC, Colombia

\*Address all correspondence to: mmelgarejo@udistrital.edu.co

Recent Trends in Artificial Neural Networks - From Training to Prediction

New York: Springer Science, Business Media; 2013

[20] Babuska R. Fuzzy modeling for control. In: International Series in Intelligent Technologies. Netherlands: Springer; 1998

[21] Abarbanel H. Analysis of Observed Chaotic Data. New York: Springer; 1996

[22] Rosenstein MT et al. A practical method for calculating largest Lyapunov exponents from small data sets. Physica D. 1993;65:117-134

[23] Ayyub B, McCuen R. Probability, Statistics, and Reliability for Engineers and Scientists. Boca Raton: Chapman and Hill/CRC Press; 2002. pp. 65-72

[24] Ting S, Gang C. Selection of fuzzy time series model based on autocorrelation theory. In: 29th Chinese Control and Decision Conference; Chongqing, China; 2017. pp. 4365-4369

[25] Neri F, Cotta C. Memetic algorithms and memetic computing optimization: A literature review. Swarm and Evolutionary Computation. 2012;2:1-14

[26] Moscato P, Cotta C. A Gentle Introduction to Memetic Algorithms. Boston MA: Glover, Fred and Kochenberger; 2003. pp. 105-144

[27] Rodriguez C, Mayorga D, Melgarejo M. Forecasting time series from clustering by a memetic differential fuzzy approach: An application to crime prediction. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI); Honolulu, HI; 2017

[28] Storn R, Price K. Differential evolution: A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization. 1997;11:341-359

[29] Jang J. ANFIS: Adaptive-networkbased fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics. 1993;23(3):665-685

[30] Chivata J et al. Complex system modeling using TSK fuzzy cellular automata and differential evolution. In: 2013 IEEE International Conference on Fuzzy Systems; Hyderabad; 2013. pp. 1-5

[31] Mendel J. Fuzzy logic systems for engineering: A tutorial. Proceedings of the IEEE. 1995;83(3):345-377

[32] Nash J, Sutcliffe J. River flow forecasting through. Part I: A conceptual models discussion of principles. Journal of Hydrology. 1970;10(3):282-290

[33] Chai T, Draxler R. Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geoscientific Model Development. 2014;7:1247-1250

[34] Silva D, Alves G, Mattos Neto P, Ferreira T. Measurement of fitness function efficiency using data envelopment analysis. Expert Systems with Applications. 2014;41(16): 7147-7160

[35] Oliveira M, Bastos-Filho C, Menezes R. The scaling of crime concentration in cities. PLoS One. 2017; 12:113

**23**

**Chapter 2**

**Abstract**

**1. Introduction**

Encountered Problems of Time

*Paola Andrea Sánchez-Sánchez, José Rafael García-González* 

The growing interest in the development of forecasting applications with neural networks is denoted by the publication of more than 10,000 research articles present in the literature. However, the high number of factors included in the configuration of the network, the training process, validation and forecasting, and the sample of data, which must be determined in order to achieve an adequate network model for forecasting, converts neural networks in an unstable technique, given that any change in training or in some parameter produces great changes in the prediction. In this chapter, an analysis of the problematic around the factors that affect the construction of the neural network models is made and that often present inconsistent results, and the fields that require additional research are highlighted.

**Keywords:** time series, prediction of neural networks, learning algorithms

foundation and under which the forecast is seen as an additional step.

The representation of time series with dynamics of nonlinear behavior has acquired great weight in the last decades, because many authors agree in affirming that the real world series present nonlinear behaviors, and the approximation that can be done with linear models, it is inadequate [1–3]. Although approximations have been made with statistical models (an extensive compilation of these is

The time series forecasting has received a lot of attention in recent decades, due to the growing need to have effective tools that facilitate decision making and overcome the theoretical, conceptual, and practical limitations presented by traditional approaches. The classification of forecasting methods from a statistical point of view, in general, has two aspects, one oriented to causal methods, such as regression and intervention models, and the other to time series, where mobile averages, exponential smoothing, ARIMA models, and neural networks are included. Under this current, the forecast is oriented only to the task of predicting the behavior, prioritizing forward vision and thus obviating many important steps in the model construction process; while the modeling is oriented to find the global structure, model and formulas, which explain the behavior of the data generating process and can be used to predict trends of future behavior (long term), as well as to understand the past. This last vision allows the construction of solid models in its

Series with Neural Networks:

Models and Architectures

*and Leidy Haidy Perez Coronell*

#### **Chapter 2**

New York: Springer Science, Business

Recent Trends in Artificial Neural Networks - From Training to Prediction

[29] Jang J. ANFIS: Adaptive-networkbased fuzzy inference system. IEEE Transactions on Systems, Man, and Cybernetics. 1993;23(3):665-685

[30] Chivata J et al. Complex system modeling using TSK fuzzy cellular automata and differential evolution. In: 2013 IEEE International Conference on Fuzzy Systems; Hyderabad; 2013.

[31] Mendel J. Fuzzy logic systems for engineering: A tutorial. Proceedings of

the IEEE. 1995;83(3):345-377

[32] Nash J, Sutcliffe J. River flow forecasting through. Part I: A conceptual models discussion of principles. Journal of Hydrology. 1970;10(3):282-290

[33] Chai T, Draxler R. Root mean square error (RMSE) or mean absolute error (MAE)? Arguments against avoiding RMSE in the literature. Geoscientific Model Development. 2014;7:1247-1250

[34] Silva D, Alves G, Mattos Neto P, Ferreira T. Measurement of fitness function efficiency using data

envelopment analysis. Expert Systems with Applications. 2014;41(16):

[35] Oliveira M, Bastos-Filho C, Menezes R. The scaling of crime concentration in cities. PLoS One. 2017;

pp. 1-5

7147-7160

12:113

[20] Babuska R. Fuzzy modeling for control. In: International Series in Intelligent Technologies. Netherlands:

[21] Abarbanel H. Analysis of Observed Chaotic Data. New York: Springer; 1996

[22] Rosenstein MT et al. A practical method for calculating largest Lyapunov

[23] Ayyub B, McCuen R. Probability, Statistics, and Reliability for Engineers and Scientists. Boca Raton: Chapman and Hill/CRC Press; 2002. pp. 65-72

[24] Ting S, Gang C. Selection of fuzzy

autocorrelation theory. In: 29th Chinese Control and Decision Conference; Chongqing, China; 2017. pp. 4365-4369

[25] Neri F, Cotta C. Memetic algorithms and memetic computing optimization: A

Evolutionary Computation. 2012;2:1-14

application to crime prediction. In: 2017

Computational Intelligence (SSCI);

[28] Storn R, Price K. Differential evolution: A simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization. 1997;11:341-359

time series model based on

literature review. Swarm and

Boston MA: Glover, Fred and Kochenberger; 2003. pp. 105-144

[27] Rodriguez C, Mayorga D, Melgarejo M. Forecasting time series from clustering by a memetic differential fuzzy approach: An

IEEE Symposium Series on

Honolulu, HI; 2017

22

[26] Moscato P, Cotta C. A Gentle Introduction to Memetic Algorithms.

exponents from small data sets. Physica D. 1993;65:117-134

Media; 2013

Springer; 1998

## Encountered Problems of Time Series with Neural Networks: Models and Architectures

*Paola Andrea Sánchez-Sánchez, José Rafael García-González and Leidy Haidy Perez Coronell*

#### **Abstract**

The growing interest in the development of forecasting applications with neural networks is denoted by the publication of more than 10,000 research articles present in the literature. However, the high number of factors included in the configuration of the network, the training process, validation and forecasting, and the sample of data, which must be determined in order to achieve an adequate network model for forecasting, converts neural networks in an unstable technique, given that any change in training or in some parameter produces great changes in the prediction. In this chapter, an analysis of the problematic around the factors that affect the construction of the neural network models is made and that often present inconsistent results, and the fields that require additional research are highlighted.

**Keywords:** time series, prediction of neural networks, learning algorithms

#### **1. Introduction**

The time series forecasting has received a lot of attention in recent decades, due to the growing need to have effective tools that facilitate decision making and overcome the theoretical, conceptual, and practical limitations presented by traditional approaches. The classification of forecasting methods from a statistical point of view, in general, has two aspects, one oriented to causal methods, such as regression and intervention models, and the other to time series, where mobile averages, exponential smoothing, ARIMA models, and neural networks are included. Under this current, the forecast is oriented only to the task of predicting the behavior, prioritizing forward vision and thus obviating many important steps in the model construction process; while the modeling is oriented to find the global structure, model and formulas, which explain the behavior of the data generating process and can be used to predict trends of future behavior (long term), as well as to understand the past. This last vision allows the construction of solid models in its foundation and under which the forecast is seen as an additional step.

The representation of time series with dynamics of nonlinear behavior has acquired great weight in the last decades, because many authors agree in affirming that the real world series present nonlinear behaviors, and the approximation that can be done with linear models, it is inadequate [1–3]. Although approximations have been made with statistical models (an extensive compilation of these is presented by [4–6]), its representation is difficult to restrict its use to a functional form a priori, for which neural networks have proven to be a valuable tool since they allow to extract the unknown nonlinear dynamics present between the explanatory variables and the series, without the need to perform any assumptions.

The growing interest in the development of forecasting applications with neural networks is denoted by the publication of more than 10,000 research articles in the literature [7]. However, as stated by Zhang et al. [8], inconsistent results about the performance of neural networks in the prediction of time series are often reported in the literature. Many conclusions are obtained from empirical studies, thus presenting limited results that often cannot be extended to general applications and that are not replicable. Cases where the neural network presents a worse performance than linear statistical models or other models may be due to the fact that the series studied do not present high volatilities, that the neural network used to compare was not adequately trained, that the criterion of selection of the best model is not comparable, or that the configuration used is not adequate to the characteristics of the data. Whereas, many of the publications that indicate superior performance of neural networks are related to novel paradigms or extensions of existing methods, architectures, and training algorithms, but lack a reliable and valid evaluation of the empirical evidence of their performance. The high number of factors included in the configuration of the network, the training process, validation and forecast, and the sample of data, which is required to determine to achieve a suitable network model for the forecast, makes neural networks a technique unstable, given that any change in training or in some parameter produces large changes in the prediction [9]. In this chapter, an analysis of the problematic environment is made to the factors that affect the construction of neural network models and that often present inconsistent results.

Empirical studies that allow the prediction of time series with particular characteristics such as seasonal patterns, trends, and dynamic behavior have been reported in the literature [10–12]; however, few contributions have been made in the development of systematic methodologies that allow representing time series with neural networks on specific conditions, limiting the modeling process to ad-hoc techniques, instead of scientific approaches that follow a methodology and process of replicable modeling.

In the last decade, there has been a considerable number of isolated contributions focused on specific aspects, for which a unified vision has not been presented; Zhang et al. [8] made a deep revision until 1996. This chapter is an effort to evaluate the works proposed in the literature and clarify their contributions and limitations in the task of forecasting with neural networks, highlighting the fields that require additional research.

Although some efforts aimed at the formalization of time series forecasting models with neural networks have been carried out, at a theoretical level, there are few advances obtained [13], which evidences a need to have systematic research about of modeling and forecasting of time series with neural networks.

The objective of this chapter is to delve into the problem of forecasting time series with neural networks, through an analysis of the contributions present in the literature and an identification of the difficulties underlying the task of forecasting, thus highlighting the open field research.

#### **2. Motivation of the study**

The time series forecasting is considered a generic problem to many disciplines, which has been approached with different models [14]. Formally, the objective of

**25**

*Encountered Problems of Time Series with Neural Networks: Models and Architectures*

the time series forecasting is to find a flexible mathematical functional form that approximates with sufficient precision the data generating process, in such a way that it appropriately represents the different regular and irregular patterns that the series may present, allowing the constructed representation to extrapolate future behavior [15]. However, the choice of the appropriate model for each series depends on the characteristics of the time series, and its usefulness is associated with the degree of similarity between the dynamics of the series generating process and the mathematical formulation that is made of it under the premise that the data dictate

As pointed out by Granger and Terasvirta [2], the construction of a model that relates a variable to its own history and/or to the history of other explanatory variables of its behavior can be carried out through a variety of alternatives. These depend both on the functional form by which the relationship is approximated and on the relationship between these variables. Although, each modeler is autonomous in the choice of the modeling tool, in cases where there are relations of a non-linear order, there are limitations in the use of certain types of tools, moreover, this same reason leads to the absence of a method that is the best for all cases. The question that arises is then, how to properly specify the functional form in the presence of non-linear relationships between the time series and the explanatory variables of its

The representation of time series with dynamics of nonlinear behavior has acquired great weight in the last decades, because many authors agree in affirming that the real world series present nonlinear behaviors, and the approximation that can be done with linear models, it is inadequate [1–3], among others. The approach of series with the stated characteristics has been made, among others, from statistical models, combined or hybrid models and neural networks. The complexity in the representation of non-linear relationships lies in the fact that in most cases, there are not enough physical or economic laws that allow us to specify a suitable func-

The literature has proposed a wide range of statistical models for the representation of series with nonlinear behavior such as bilinear models autoregressive thresholds—TAR, autoregressive soft transition—STAR [17, 18], autoregressive conditional heteroscedasticity—ARCH [19], and its generalized form—GARCH [20]; a comprehensive compilation of these is presented by [4–6]. Although the stated models have proved useful in particular problems, they are not universally applicable, since they limit the form of non-linearity present in the data to empirical specifications of the characteristics of the series based on the available information [2]; its success in practical cases depends on the degree to which the model used manages to represent the characteristics of the series studied. However, the formulation of each family of these models requires the specification of an appropriate type of non-linearity, which is a difficult task compared to the construction of linear models, since there are many possibilities (wide variety of possible non-linear functions), more parameters to be calculated, and more

Likewise, in the prediction of time series, it is universally accepted that a simple method is not the best in all situations [23–25]. This is because real-world problems are often complex in nature and a model of this kind may not be adequate to capture different patterns. Empirical studies suggest that by combining different models, the accuracy of the representation may be better than for the individual case [26–28]. Therefore, the union of models with different characteristics increases the possibility of capturing different patterns in the data and provides a more appropriate representation of the time series. The hybrid modeling then arises, naturally as the union of similar or different techniques with complementary characteristics.

*DOI: http://dx.doi.org/10.5772/intechopen.88901*

the tool to be used [16].

tional form for their representation.

errors can be made [21, 22].

behavior.

#### *Encountered Problems of Time Series with Neural Networks: Models and Architectures DOI: http://dx.doi.org/10.5772/intechopen.88901*

the time series forecasting is to find a flexible mathematical functional form that approximates with sufficient precision the data generating process, in such a way that it appropriately represents the different regular and irregular patterns that the series may present, allowing the constructed representation to extrapolate future behavior [15]. However, the choice of the appropriate model for each series depends on the characteristics of the time series, and its usefulness is associated with the degree of similarity between the dynamics of the series generating process and the mathematical formulation that is made of it under the premise that the data dictate the tool to be used [16].

As pointed out by Granger and Terasvirta [2], the construction of a model that relates a variable to its own history and/or to the history of other explanatory variables of its behavior can be carried out through a variety of alternatives. These depend both on the functional form by which the relationship is approximated and on the relationship between these variables. Although, each modeler is autonomous in the choice of the modeling tool, in cases where there are relations of a non-linear order, there are limitations in the use of certain types of tools, moreover, this same reason leads to the absence of a method that is the best for all cases. The question that arises is then, how to properly specify the functional form in the presence of non-linear relationships between the time series and the explanatory variables of its behavior.

The representation of time series with dynamics of nonlinear behavior has acquired great weight in the last decades, because many authors agree in affirming that the real world series present nonlinear behaviors, and the approximation that can be done with linear models, it is inadequate [1–3], among others. The approach of series with the stated characteristics has been made, among others, from statistical models, combined or hybrid models and neural networks. The complexity in the representation of non-linear relationships lies in the fact that in most cases, there are not enough physical or economic laws that allow us to specify a suitable functional form for their representation.

The literature has proposed a wide range of statistical models for the representation of series with nonlinear behavior such as bilinear models autoregressive thresholds—TAR, autoregressive soft transition—STAR [17, 18], autoregressive conditional heteroscedasticity—ARCH [19], and its generalized form—GARCH [20]; a comprehensive compilation of these is presented by [4–6]. Although the stated models have proved useful in particular problems, they are not universally applicable, since they limit the form of non-linearity present in the data to empirical specifications of the characteristics of the series based on the available information [2]; its success in practical cases depends on the degree to which the model used manages to represent the characteristics of the series studied. However, the formulation of each family of these models requires the specification of an appropriate type of non-linearity, which is a difficult task compared to the construction of linear models, since there are many possibilities (wide variety of possible non-linear functions), more parameters to be calculated, and more errors can be made [21, 22].

Likewise, in the prediction of time series, it is universally accepted that a simple method is not the best in all situations [23–25]. This is because real-world problems are often complex in nature and a model of this kind may not be adequate to capture different patterns. Empirical studies suggest that by combining different models, the accuracy of the representation may be better than for the individual case [26–28]. Therefore, the union of models with different characteristics increases the possibility of capturing different patterns in the data and provides a more appropriate representation of the time series. The hybrid modeling then arises, naturally as the union of similar or different techniques with complementary characteristics.

*Recent Trends in Artificial Neural Networks - From Training to Prediction*

models and that often present inconsistent results.

process of replicable modeling.

thus highlighting the open field research.

**2. Motivation of the study**

additional research.

Empirical studies that allow the prediction of time series with particular characteristics such as seasonal patterns, trends, and dynamic behavior have been reported in the literature [10–12]; however, few contributions have been made in the development of systematic methodologies that allow representing time series with neural networks on specific conditions, limiting the modeling process to ad-hoc techniques, instead of scientific approaches that follow a methodology and

In the last decade, there has been a considerable number of isolated contributions focused on specific aspects, for which a unified vision has not been presented; Zhang et al. [8] made a deep revision until 1996. This chapter is an effort to evaluate the works proposed in the literature and clarify their contributions and limitations in the task of forecasting with neural networks, highlighting the fields that require

Although some efforts aimed at the formalization of time series forecasting models with neural networks have been carried out, at a theoretical level, there are few advances obtained [13], which evidences a need to have systematic research

The objective of this chapter is to delve into the problem of forecasting time series with neural networks, through an analysis of the contributions present in the literature and an identification of the difficulties underlying the task of forecasting,

The time series forecasting is considered a generic problem to many disciplines, which has been approached with different models [14]. Formally, the objective of

about of modeling and forecasting of time series with neural networks.

presented by [4–6]), its representation is difficult to restrict its use to a functional form a priori, for which neural networks have proven to be a valuable tool since they allow to extract the unknown nonlinear dynamics present between the explanatory

The growing interest in the development of forecasting applications with neural networks is denoted by the publication of more than 10,000 research articles in the literature [7]. However, as stated by Zhang et al. [8], inconsistent results about the performance of neural networks in the prediction of time series are often reported in the literature. Many conclusions are obtained from empirical studies, thus presenting limited results that often cannot be extended to general applications and that are not replicable. Cases where the neural network presents a worse performance than linear statistical models or other models may be due to the fact that the series studied do not present high volatilities, that the neural network used to compare was not adequately trained, that the criterion of selection of the best model is not comparable, or that the configuration used is not adequate to the characteristics of the data. Whereas, many of the publications that indicate superior performance of neural networks are related to novel paradigms or extensions of existing methods, architectures, and training algorithms, but lack a reliable and valid evaluation of the empirical evidence of their performance. The high number of factors included in the configuration of the network, the training process, validation and forecast, and the sample of data, which is required to determine to achieve a suitable network model for the forecast, makes neural networks a technique unstable, given that any change in training or in some parameter produces large changes in the prediction [9]. In this chapter, an analysis of the problematic environment is made to the factors that affect the construction of neural network

variables and the series, without the need to perform any assumptions.

**24**

In the forecast literature, several combinations of methods have been proposed. However, many of them use similar methods, and this is how different studies about hybrid linear modeling techniques are found in the traditional literature. Although this type of combinations has demonstrated its ability to improve the accuracy of the representations made, it is considered that a more effective route could be based on models with different characteristics. Both theoretical and empirical evidence suggest that the combination of dissimilar models, or those that strongly disagree with others, leads to a decrease in model errors [29, 30] and allows, in addition, to reduce the uncertainty of this one [31]. The hybrid model is thus, more robust to estimate the possible changes in the structure of the data.

Numerous applications have been proposed in the literature based on combinations of linear models with computational intelligence [32–39]. However, the main criticisms of these works is that they do not contemplate the need to integrate subjective information into models, which, like traditional statistical models, require a preprocessing of the series, which is aimed at eliminating the visible components of this one and that require the determination of a large number of parameters, which are not economically explainable.

Neural networks seen as a non-parametric non-linear regression technique have emerged as attractive alternatives to the problem posed, since they allow extracting the unknown nonlinear dynamics present between the explanatory variables and the series, without the need to make any kind of assumptions. From this family of techniques, multi-layer perceptron networks—MLP, understood as a non-linear statistical regression model, have received great attention among researchers from the computational intelligence and statistics community.

The attractiveness of neural networks in the prediction of time series is their ability to identify hidden dependencies based on a finite sample, especially of a non-linear order, which gives them the recognition of universal approximation of functions [3, 40–42]. Perhaps the main advantage of this approach over other models is that they do not start from a priori assumptions about the functional relationship of the series and its explanatory variables, a highly desirable characteristic in cases where the mechanism generating the data is unknown and unstable [43], in addition to its high generalization capacity allows to learn behaviors and extrapolate them, which leads to better forecasts [5].

For artificial intelligence, as well as for operation research, the time series forecasting with neural networks is seen as a problem of error minimization, which consists of adjusting the parameters of the neural network in order to minimize the error between the real value and the output obtained. Although, this criterion allows obtaining models whose output is increasingly closer to the desired one, it is to the detriment of the parsimony of the model, since it leads to more complex representations (a large number of parameters). From the statistical point of view, a criterion based solely on the reduction of the error is not the most optimal, it is necessary a development oriented to the formalization of the model, which requires the fulfillment of certain properties that are not always taken into account, such as the stability of the calculated parameters, the coherence between the series and the model, the consistency with the previous knowledge and the predictive capacity of the model.

The evident interest in the use of neural networks in the prediction of time series has led to the emergence of an enormous research activity in the field. Crone and Kourentzes [7] reveal more than 5000 publications in prediction of time series with neural networks (see also publications [39, 44, 45]), and journals in fields with econometrics, statistics, engineering, and artificial intelligence, even being the central topic in special editions, such as the case of *Neurocomputing* with "*Special issue on evolving solution with neural networks*" published in October 2003 [46] and

**27**

**Table 1.**

*Encountered Problems of Time Series with Neural Networks: Models and Architectures*

*neural networks and computational intelligence"* published in 2011.

An analysis of **Table 1** and **Figure 1** shows the following facts:

the *International Journal of Forecasting* with *"Special issue on forecasting with artificial* 

In order to establish the relevance of the prediction of time series with neural networks, a search was made through *Science Direct* of the Journals that publish articles related to the topic. **Table 1** and **Figure 1** present a compilation of the 10 Journals with more publications and also relate the number of articles published in the years 2015–2019, 2010–2014, 2005–2009, 2000–2004 and 1999 and earlier, which is identified using keywords: (*Forecasting* o *Prediction*, *Neural Networks*, and

• The number of publications reported on the subject is increasing, being representative the drastic growth reported in the last 5 years (2015–2019), which is

• There is a greater participation in journals pertaining to or related to the fields

• Journals with high number of published articles, *Neurocomputing, Applied Soft Computing, Procedia Computer Science*, and *Expert Systems with Applications*, are closely related to the topic, both from contributions in the field of neural

Many comparisons have been made between neural networks and statistical models in order to measure the prediction performance of both approaches. As

"There are many inconsistent reports in the literature on the performance of ANNs for forecasting tasks. The main reason is that a large number of factors including network structure, training method, and sample data may affect the

**Journal Articles identified using keywords (forecasting or prediction,** 

**2010– 2014**

*Energy* 308 83 17 3 2 413 *Applied energy* 297 90 10 4 — 401 *Neurocomputing* 254 148 88 46 37 573

*Applied soft computing* 238 132 41 5 — 416 *Journal of hydrology* 233 166 102 34 5 540 *Expert systems with applications* 226 364 188 20 11 809 *Procedia computer science* 212 77 — — — 289 *Renewable energy* 191 49 22 5 3 270 *Energy procedia* 155 41 — — — 196

**2015– 2019**

*Journals that publish time series forecast articles with neural networks.*

**neural networks, and time series)**

241 74 14 — 1 330

2355 1224 482 117 59 4237

**2000– 2004**

**1999 and antes**

**Total**

**2005– 2009**

*DOI: http://dx.doi.org/10.5772/intechopen.88901*

evident in all the magazines listed.

of engineering and artificial intelligence.

networks, and time series forecasting.

stated by Zhang et al. [8]:

*Renewable and sustainable* 

*energy reviews*

forecasting ability of the networks."

*Time Series*).

*Encountered Problems of Time Series with Neural Networks: Models and Architectures DOI: http://dx.doi.org/10.5772/intechopen.88901*

the *International Journal of Forecasting* with *"Special issue on forecasting with artificial neural networks and computational intelligence"* published in 2011.

In order to establish the relevance of the prediction of time series with neural networks, a search was made through *Science Direct* of the Journals that publish articles related to the topic. **Table 1** and **Figure 1** present a compilation of the 10 Journals with more publications and also relate the number of articles published in the years 2015–2019, 2010–2014, 2005–2009, 2000–2004 and 1999 and earlier, which is identified using keywords: (*Forecasting* o *Prediction*, *Neural Networks*, and *Time Series*).

An analysis of **Table 1** and **Figure 1** shows the following facts:


Many comparisons have been made between neural networks and statistical models in order to measure the prediction performance of both approaches. As stated by Zhang et al. [8]:

"There are many inconsistent reports in the literature on the performance of ANNs for forecasting tasks. The main reason is that a large number of factors including network structure, training method, and sample data may affect the forecasting ability of the networks."


#### **Table 1.**

*Journals that publish time series forecast articles with neural networks.*

*Recent Trends in Artificial Neural Networks - From Training to Prediction*

estimate the possible changes in the structure of the data.

the computational intelligence and statistics community.

are not economically explainable.

them, which leads to better forecasts [5].

In the forecast literature, several combinations of methods have been proposed. However, many of them use similar methods, and this is how different studies about hybrid linear modeling techniques are found in the traditional literature. Although this type of combinations has demonstrated its ability to improve the accuracy of the representations made, it is considered that a more effective route could be based on models with different characteristics. Both theoretical and empirical evidence suggest that the combination of dissimilar models, or those that strongly disagree with others, leads to a decrease in model errors [29, 30] and allows, in addition, to reduce the uncertainty of this one [31]. The hybrid model is thus, more robust to

Numerous applications have been proposed in the literature based on combinations of linear models with computational intelligence [32–39]. However, the main criticisms of these works is that they do not contemplate the need to integrate subjective information into models, which, like traditional statistical models, require a preprocessing of the series, which is aimed at eliminating the visible components of this one and that require the determination of a large number of parameters, which

Neural networks seen as a non-parametric non-linear regression technique have emerged as attractive alternatives to the problem posed, since they allow extracting the unknown nonlinear dynamics present between the explanatory variables and the series, without the need to make any kind of assumptions. From this family of techniques, multi-layer perceptron networks—MLP, understood as a non-linear statistical regression model, have received great attention among researchers from

The attractiveness of neural networks in the prediction of time series is their ability to identify hidden dependencies based on a finite sample, especially of a non-linear order, which gives them the recognition of universal approximation of functions [3, 40–42]. Perhaps the main advantage of this approach over other models is that they do not start from a priori assumptions about the functional relationship of the series and its explanatory variables, a highly desirable characteristic in cases where the mechanism generating the data is unknown and unstable [43], in addition to its high generalization capacity allows to learn behaviors and extrapolate

For artificial intelligence, as well as for operation research, the time series forecasting with neural networks is seen as a problem of error minimization, which consists of adjusting the parameters of the neural network in order to minimize the error between the real value and the output obtained. Although, this criterion allows obtaining models whose output is increasingly closer to the desired one, it is to the detriment of the parsimony of the model, since it leads to more complex representations (a large number of parameters). From the statistical point of view, a criterion based solely on the reduction of the error is not the most optimal, it is necessary a development oriented to the formalization of the model, which requires the fulfillment of certain properties that are not always taken into account, such as the stability of the calculated parameters, the coherence between the series and the model, the consistency with the previous knowledge and the predictive capacity of

The evident interest in the use of neural networks in the prediction of time series has led to the emergence of an enormous research activity in the field. Crone and Kourentzes [7] reveal more than 5000 publications in prediction of time series with neural networks (see also publications [39, 44, 45]), and journals in fields with econometrics, statistics, engineering, and artificial intelligence, even being the central topic in special editions, such as the case of *Neurocomputing* with "*Special issue on evolving solution with neural networks*" published in October 2003 [46] and

**26**

the model.

**Figure 1.** *Published articles for forecasting time series with neural networks.*

Such inconsistencies make neural networks an unstable method, given that any change in training or in some parameter produces large changes in prediction [9]. Some key factors where mixed results are presented are:


Cases where the neural network presents a worse performance than linear statistical models or other models, may be due to the fact that the series studied do not present a great disturbance, that the neural network used to compare was not adequately trained, that the criterion of selection of the best model is not comparable, or that the configuration used is not adequate to the characteristics of the data. Many conclusions about the performance of neural networks are obtained from empirical studies, thus presenting limited results that often cannot be extended to general applications.

**29**

test sets.

*Encountered Problems of Time Series with Neural Networks: Models and Architectures*

is perhaps the primary cause of the inconsistencies reported in the literature. Many of the optimistic publications that indicate superior performance of neural networks are related to novel paradigms or extensions of existing methods, architectures and training algorithms, but lack a reliable and valid evaluation of the empirical evidence of their performance. Few contributions have been made in the systematic development of methodologies that allow representing time series with neural networks on specific conditions, limiting the modeling process to ad-hoc techniques, instead of scientific approaches that follow a methodology and replicable modeling process. A consequence of this is that, despite the empirical findings, neural network models are not fully accepted in many forecast areas. The previous discussion leads us to think that, although progress has been made in the field, there are still topics open to investigate. The question of whether, because, and on what

conditions the models of neural networks are better is still valid.

**3. Difficulties in the prediction of time series with neural networks**

The design of an artificial neural network is intended to ensure that for certain network inputs, it is capable of generating a desired output. For this, in addition to a suitable network topology (architecture), a learning or training process is required, which allows modifying the weights of the neurons until finding a configuration according to the relationship measured by some criterion and thus estimating the parameters of the network, a process that is considered critical in the field of neural networks [8, 43]. Model selection is not a trivial task in forecasting linear models and is particularly difficult in non-linear models, such as neural networks. Because the set of parameters to be estimated is typically large, neural networks often suffer from over-training problems. That is, they fit the training data very well but

To mitigate the effect of over-training, the available data set is often divided into three parts: training, validation, and testing or prediction. The training and validation sets are used to build the neural network model and then be evaluated with the test set. The training set is used to estimate the parameters of an alternative number of neural network specifications (networks with different numbers of inputs and hidden neurons). The generalization capacity of the network is evaluated with the validation set. The network model that performs best in the validation set is selected as the final model. The validity and utility of the model is then tested using the test set. Often this last set is used for forecasting purposes, and the network's general-

The criterion of selecting the model based on the best performance of the validation set, however, does not guarantee that the model has a good fit in the forecast set, and the selection of the appropriate amount of data in each set can also affect performance. This is how a large training set can lead to over-training. Granger [21] suggests that at least 20% of the data be used as a test set; however, there is no general guide on how to partition the set of observations, so that optimal results are guaranteed.

Zhang et al. [22] states that the size of the training set has limited effects on the performance of the network, where, for the sizes investigated by the authors, there is no significant difference in the performance of the forecast. These results are perhaps due to the forecasting method used, with little difference for prediction one step ahead, and marked for multi-step forecast, in which case large differences in the results are expected in the case of different sizes of the training, validation, and

However, there are few systematic researches about the modeling and prediction of time series with neural networks and the theoretical advances obtained [13], and this

*DOI: http://dx.doi.org/10.5772/intechopen.88901*

produce poor results in the forecast.

ization capacity for unknown data is evaluated.

#### *Encountered Problems of Time Series with Neural Networks: Models and Architectures DOI: http://dx.doi.org/10.5772/intechopen.88901*

However, there are few systematic researches about the modeling and prediction of time series with neural networks and the theoretical advances obtained [13], and this is perhaps the primary cause of the inconsistencies reported in the literature.

Many of the optimistic publications that indicate superior performance of neural networks are related to novel paradigms or extensions of existing methods, architectures and training algorithms, but lack a reliable and valid evaluation of the empirical evidence of their performance. Few contributions have been made in the systematic development of methodologies that allow representing time series with neural networks on specific conditions, limiting the modeling process to ad-hoc techniques, instead of scientific approaches that follow a methodology and replicable modeling process. A consequence of this is that, despite the empirical findings, neural network models are not fully accepted in many forecast areas. The previous discussion leads us to think that, although progress has been made in the field, there are still topics open to investigate. The question of whether, because, and on what conditions the models of neural networks are better is still valid.

#### **3. Difficulties in the prediction of time series with neural networks**

The design of an artificial neural network is intended to ensure that for certain network inputs, it is capable of generating a desired output. For this, in addition to a suitable network topology (architecture), a learning or training process is required, which allows modifying the weights of the neurons until finding a configuration according to the relationship measured by some criterion and thus estimating the parameters of the network, a process that is considered critical in the field of neural networks [8, 43]. Model selection is not a trivial task in forecasting linear models and is particularly difficult in non-linear models, such as neural networks. Because the set of parameters to be estimated is typically large, neural networks often suffer from over-training problems. That is, they fit the training data very well but produce poor results in the forecast.

To mitigate the effect of over-training, the available data set is often divided into three parts: training, validation, and testing or prediction. The training and validation sets are used to build the neural network model and then be evaluated with the test set. The training set is used to estimate the parameters of an alternative number of neural network specifications (networks with different numbers of inputs and hidden neurons). The generalization capacity of the network is evaluated with the validation set. The network model that performs best in the validation set is selected as the final model. The validity and utility of the model is then tested using the test set. Often this last set is used for forecasting purposes, and the network's generalization capacity for unknown data is evaluated.

The criterion of selecting the model based on the best performance of the validation set, however, does not guarantee that the model has a good fit in the forecast set, and the selection of the appropriate amount of data in each set can also affect performance. This is how a large training set can lead to over-training. Granger [21] suggests that at least 20% of the data be used as a test set; however, there is no general guide on how to partition the set of observations, so that optimal results are guaranteed.

Zhang et al. [22] states that the size of the training set has limited effects on the performance of the network, where, for the sizes investigated by the authors, there is no significant difference in the performance of the forecast. These results are perhaps due to the forecasting method used, with little difference for prediction one step ahead, and marked for multi-step forecast, in which case large differences in the results are expected in the case of different sizes of the training, validation, and test sets.

*Recent Trends in Artificial Neural Networks - From Training to Prediction*

Some key factors where mixed results are presented are:

*Published articles for forecasting time series with neural networks.*

• Criteria for the selection of input variables [15, 22].

differentiation, etc.) [10–12, 47, 48].

• Criteria for selecting the best model [43].

• Tests on the residuals. Consistency of linear tests.

• Diagnostic tests and acceptance.

• Predictive capacity of the model.

of calendar days, etc. [3, 49, 50].

versus model.

**Figure 1.**

[10–12].

Such inconsistencies make neural networks an unstable method, given that any change in training or in some parameter produces large changes in prediction [9].

• Need for data preprocessing (scaling, transformation, simple and seasonal

• Criteria for the selection of the network configuration. Complexity vs. Parsimony (number of internal layers [40–42], neurons in each layer [22]).

• Estimation of the parameters (learning algorithms, stop criteria, etc.).

• Properties of the model: stability of the parameters, mean and variance series

• Presence of regular patterns such as: trends, seasonal, and cyclical patterns

• Presence of irregular patterns such as: structural changes, atypical data, effect

Cases where the neural network presents a worse performance than linear statistical models or other models, may be due to the fact that the series studied do not present a great disturbance, that the neural network used to compare was not adequately trained, that the criterion of selection of the best model is not comparable, or that the configuration used is not adequate to the characteristics of the data. Many conclusions about the performance of neural networks are obtained from empirical studies, thus presenting limited results that often cannot be extended to general applications.

**28**

Although, as a criterion for the selection of the best model, the minimization of some error function is often used, such as mean square error (MSE), absolute average deviation (MAD), cost functions [51], or even expert knowledge [52], because the performance of each measure is not the same, since they can favor or penalize certain characteristics in the data, and that, in the case of expert knowledge is not always easy to acquire; approaches based on the use of machine learning [53, 54] and meta-learning [55–59] have been reported in the literature, which show advantages by allowing an automatic process of model selection based on the parallel evaluation of multiple network architectures, but they are limited to the execution of certain architectures and their implementation is complex. Other studies related to the topic include Qi and Zhang [43] who investigate the well-known criteria of AIC [60], BIC [61], square root of the mean square error (RMSE), absolute average percentage deviation (MAPE), and direction of occurrence (DA). The amplified panorama of the techniques for selecting the best model reflects that, despite the effort made, there is not a strong criterion for adequate selection.

Another widespread criticism that is often made to neural networks is the high number of parameters that must be experimentally selected to generate the desired output, such as: the selection of input variables to the neural network from a usually large set of possible entries; the selection of the internal architecture of the network; and the estimation of the values associated with the weights of the connections. For each of the problems mentioned, different approaches to its solution have been proposed in the literature.

The selection of the input variables depends to a large extent on the knowledge that the modeler possesses about the time series, and it is the task of the latter to choose according to some previously fixed criterion the need of each variable within the model. Although there is no systematic way to determine the set of inputs accepted by the research community, recent studies have suggested the use of rational procedures, based on the use of decisional analysis, or traditional statistical methods, such as autocorrelation functions [62]; however, the use of the latter is disregarded since the functions are based on linear approaches and not neural networks do not express by themselves the components of moving averages (MA) of the model. Mixed results about the benefits of including many or few input variables are also reported in the literature. Tang et al. [63] report the benefits of using a large set of input variables, while Lachtermacher and Fuller [15] report the same results for multistep forecasting, but opposed in forecasting a step forward. Zhang et al. [22] said that the number of input variables in a neural network model for prediction is much more important than the number of hidden neurons. Other techniques based on heuristic analysis of the importance of each lag, statistical tests of non-linear dependence Lagrange multipliers, [64, 65]; radio of likelihood, [66]; Biespectro, [67], criteria for identifying the model, such as AIC [5] or evolutionary algorithms [68, 69] they have also been proposals.

The selection of the internal configuration of the neural network (number of hidden layers and neurons in each layer) is perhaps the most difficult process in the construction of the model where more different approaches have been proposed in the literature, demonstrating in this way the interest of the scientific community to solve this problem.

Regarding the number of hidden layers, theoretically a neural network with a hidden capacity and a sufficient number of neurons can approximate the accuracy of a continuous function in a compact domain. However, in practice, some authors say that the use of a hidden layer when the time series is continuous, and twice if there is some type of discontinuity [41, 42]. However, other research has shown that a network with two hidden layers can result in a more compact architecture and with a high efficiency than networks with a single hidden layer [70–72]. Increasing the number of hidden layers only increases computational time and the danger of overtraining.

**31**

[92–98]:

*Encountered Problems of Time Series with Neural Networks: Models and Architectures*

With respect to the number of hidden neurons, a small number means that the network cannot adequately learn the relationships in the data, while a large number causes the network to memorize the data with a poor generalization and little utility for prediction. Some authors propose that the number of hidden neurons should be based on the number of input variables; however, this criterion is in turn related to the extension of the time series and the sets of training, validation, and prediction. Given that the value of the weights in each neuron depends on the degree of error between the desired value and that predicted by the network, the selection of the optimal number of hidden neurons is directly associated with the training process

The training of a neural network is a problem of non-restricted non-linear minimization in which the weights of the network are iteratively modified in order to minimize the error between the desired output and the obtained one. Several methods have been proposed in the literature for the training of the neural network, going through the classical gradient descendant techniques [73], which have convergence problems and are robust, adaptive dynamic optimization [74, 75], Quickprop [76], Levenberg–Marquardt [77], Cuasi-Newton, BFGS, GRG2 [78], among others. However, the joint selection of hidden neurons and the training process has led to the development of fixed, constructive, and destructive methods, where those based on constructive algorithms have certain advantages over others, since they allow evaluating the convenience of adding or not adding a new one. Neuron to the network, during training, according to it decreases the term of the error, which makes them more efficient methods, although with high computational cost [79]. Other developments such as pruning algorithms (*pruning algorithm*) [77, 80–82], Bayesian algorithms, based on Genetic algorithms as the GANN, neural networks with rugged assemblies, assembled learning [83–86], and meta-learning [9, 87] have also shown good results in the task of finding the optimal architecture of the network; however, these methods are usually more complex and difficult to implement. Furthermore, none of them can guarantee to find the optimal global solution and they are not universally applicable for all real forecasting problems, thus

The efficiency of the prediction with neural networks has been evidenced through

the applications published in the literature; however, the power of the prediction produced is limited to the degree of stability of the time series and can fail when it presents complex dynamic behaviors. This is how representations that use dynamic character models, such as neural networks with recurrence Elman, Jordan, etc., emerge as an alternative solution [88–91], which due to the possibility of accumulating dynamic behaviors are able to allow more adequate forecasts. The recurrence feature allows forward and backward connections (recurrent or feedback), forming cycles within the network architecture, which uses previous states as a basis for the current state, and allowing to preserve an internal memory of the behavior of the data, which facilitates the learning of dynamic relationships. However, their main criticism lies in the need they impose an efficient training algorithm that allows them to capture the dynamics of the series, its use being computationally complex. Potentially useful models to address the problem of series with dynamic behavior arise from the combi-

nation of different architectures in the input of the multilayer perceptron.

The problem that arises goes beyond the simple estimation of each model in light of the characteristics of each series. Although it is recognized that there is much experience gained in multilayer perceptron neural networks, there are still many theoretical, methodological, and empirical problems open about the use of such models. These general problems are related to the aspects listed below, and for which many of the recommendations given in the literature are contradictory

*DOI: http://dx.doi.org/10.5772/intechopen.88901*

designing a proper neural network.

used.

*Encountered Problems of Time Series with Neural Networks: Models and Architectures DOI: http://dx.doi.org/10.5772/intechopen.88901*

With respect to the number of hidden neurons, a small number means that the network cannot adequately learn the relationships in the data, while a large number causes the network to memorize the data with a poor generalization and little utility for prediction. Some authors propose that the number of hidden neurons should be based on the number of input variables; however, this criterion is in turn related to the extension of the time series and the sets of training, validation, and prediction. Given that the value of the weights in each neuron depends on the degree of error between the desired value and that predicted by the network, the selection of the optimal number of hidden neurons is directly associated with the training process used.

The training of a neural network is a problem of non-restricted non-linear minimization in which the weights of the network are iteratively modified in order to minimize the error between the desired output and the obtained one. Several methods have been proposed in the literature for the training of the neural network, going through the classical gradient descendant techniques [73], which have convergence problems and are robust, adaptive dynamic optimization [74, 75], Quickprop [76], Levenberg–Marquardt [77], Cuasi-Newton, BFGS, GRG2 [78], among others. However, the joint selection of hidden neurons and the training process has led to the development of fixed, constructive, and destructive methods, where those based on constructive algorithms have certain advantages over others, since they allow evaluating the convenience of adding or not adding a new one. Neuron to the network, during training, according to it decreases the term of the error, which makes them more efficient methods, although with high computational cost [79]. Other developments such as pruning algorithms (*pruning algorithm*) [77, 80–82], Bayesian algorithms, based on Genetic algorithms as the GANN, neural networks with rugged assemblies, assembled learning [83–86], and meta-learning [9, 87] have also shown good results in the task of finding the optimal architecture of the network; however, these methods are usually more complex and difficult to implement. Furthermore, none of them can guarantee to find the optimal global solution and they are not universally applicable for all real forecasting problems, thus designing a proper neural network.

The efficiency of the prediction with neural networks has been evidenced through the applications published in the literature; however, the power of the prediction produced is limited to the degree of stability of the time series and can fail when it presents complex dynamic behaviors. This is how representations that use dynamic character models, such as neural networks with recurrence Elman, Jordan, etc., emerge as an alternative solution [88–91], which due to the possibility of accumulating dynamic behaviors are able to allow more adequate forecasts. The recurrence feature allows forward and backward connections (recurrent or feedback), forming cycles within the network architecture, which uses previous states as a basis for the current state, and allowing to preserve an internal memory of the behavior of the data, which facilitates the learning of dynamic relationships. However, their main criticism lies in the need they impose an efficient training algorithm that allows them to capture the dynamics of the series, its use being computationally complex. Potentially useful models to address the problem of series with dynamic behavior arise from the combination of different architectures in the input of the multilayer perceptron.

The problem that arises goes beyond the simple estimation of each model in light of the characteristics of each series. Although it is recognized that there is much experience gained in multilayer perceptron neural networks, there are still many theoretical, methodological, and empirical problems open about the use of such models. These general problems are related to the aspects listed below, and for which many of the recommendations given in the literature are contradictory [92–98]:

*Recent Trends in Artificial Neural Networks - From Training to Prediction*

effort made, there is not a strong criterion for adequate selection.

been proposed in the literature.

Although, as a criterion for the selection of the best model, the minimization of some error function is often used, such as mean square error (MSE), absolute average deviation (MAD), cost functions [51], or even expert knowledge [52], because the performance of each measure is not the same, since they can favor or penalize certain characteristics in the data, and that, in the case of expert knowledge is not always easy to acquire; approaches based on the use of machine learning [53, 54] and meta-learning [55–59] have been reported in the literature, which show advantages by allowing an automatic process of model selection based on the parallel evaluation of multiple network architectures, but they are limited to the execution of certain architectures and their implementation is complex. Other studies related to the topic include Qi and Zhang [43] who investigate the well-known criteria of AIC [60], BIC [61], square root of the mean square error (RMSE), absolute average percentage deviation (MAPE), and direction of occurrence (DA). The amplified panorama of the techniques for selecting the best model reflects that, despite the

Another widespread criticism that is often made to neural networks is the high number of parameters that must be experimentally selected to generate the desired output, such as: the selection of input variables to the neural network from a usually large set of possible entries; the selection of the internal architecture of the network; and the estimation of the values associated with the weights of the connections. For each of the problems mentioned, different approaches to its solution have

The selection of the input variables depends to a large extent on the knowledge that the modeler possesses about the time series, and it is the task of the latter to choose according to some previously fixed criterion the need of each variable within the model. Although there is no systematic way to determine the set of inputs accepted by the research community, recent studies have suggested the use of rational procedures, based on the use of decisional analysis, or traditional statistical methods, such as autocorrelation functions [62]; however, the use of the latter is disregarded since the functions are based on linear approaches and not neural networks do not express by themselves the components of moving averages (MA) of the model. Mixed results about the benefits of including many or few input variables are also reported in the literature. Tang et al. [63] report the benefits of using a large set of input variables, while Lachtermacher and Fuller [15] report the same results for multistep forecasting, but opposed in forecasting a step forward. Zhang et al. [22] said that the number of input variables in a neural network model for prediction is much more important than the number of hidden neurons. Other techniques based on heuristic analysis of the importance of each lag, statistical tests of non-linear dependence Lagrange multipliers, [64, 65]; radio of likelihood, [66]; Biespectro, [67], criteria for identifying the model, such as AIC [5] or evolutionary algorithms [68, 69] they have also been proposals. The selection of the internal configuration of the neural network (number of hidden layers and neurons in each layer) is perhaps the most difficult process in the construction of the model where more different approaches have been proposed in the literature, demonstrating in this way the interest of the scientific community to

Regarding the number of hidden layers, theoretically a neural network with a hidden capacity and a sufficient number of neurons can approximate the accuracy of a continuous function in a compact domain. However, in practice, some authors say that the use of a hidden layer when the time series is continuous, and twice if there is some type of discontinuity [41, 42]. However, other research has shown that a network with two hidden layers can result in a more compact architecture and with a high efficiency than networks with a single hidden layer [70–72]. Increasing the number of hidden

layers only increases computational time and the danger of overtraining.

**30**

solve this problem.


### **4. Conclusions**

In this chapter, the need to have adequate models of neural networks for the prediction of time series has been identified, and this task has been exposed as a difficult, relevant, and timely problem.

**33**

*Encountered Problems of Time Series with Neural Networks: Models and Architectures*

A critical step in the forecast process is the selection of the set of input variables. At this point the decision of which lags of the series to include is fundamental for the result and depends on the available information and knowledge. Obviating the need for prior knowledge about the series, the choice of candidate lags to be included in the model should be based on a heuristic analysis of the importance of each lag, a statistical test of non-linear dependence, criteria for identifying the model or evolutionary algorithms, however, before such options the mixed results reported in the literature show that there is no consensus about what is the appro-

As previously emphasized, in the literature there are no clear indications about the best practices for choosing the size of the training, test, and prediction sets. Often the size is a predefined parameter in the construction of the neural network model or it is chosen randomly; however, there is no study that demonstrates the effect that this decision entails, moreover, this may be related to the forecasting

Likewise, there is a close relationship between the selection of the internal configuration, especially the hidden neurons, and the training process of the neural network. The consensus about the use of a hidden layer when the data of the time series are continuous and two when there is discontinuity, and of the advantages of the functions sigmodia and hyperbolic tangent in the transfer of the hidden layer,

It is often used as a criterion for the selection of the best model based on the error of prediction, expert knowledge or criteria of information; however, the limitations that they manifest and the mixed results reported in their use, in addition to the limited results reported with other techniques, which do not allow conclusive

The consideration of characteristic factors of the time series that can affect the evolution of the neural network model such as the length of the time series, the frequency of the observations, the presence of regular and irregular patterns, and the scale of the data, must be included in the process of building the neural network model. The discussion of whether a preprocessing oriented to the stabilization of the series is necessary in non-linear models, and even more, in neural networks, is a topic that is still valid, and depends to a large extent on the type of data that is modeled. The abilities exhibited by neural networks allow, in the first instance, to avoid pre-processing via data transformation. However, it is not yet clear whether, under a correct network construction and training procedure, a prior process of elimination of seasonal trends and patterns is necessary. Scaling is always preferable given its advantages of reducing training patterns and leading to more accurate results. Likewise, the benefits that different neural network architectures have in relation to nonlinear relationships in the data have been discussed. Neural network models, by themselves, facilitate the representation of non-linear characteristics, without the need for a priori knowledge about such relationships, and such consideration is always desirable in models for real time series; however, it is not. In addition, their performance in the face of dynamic behavior in the data, the exposed architectures have been developed as an extension of neural network models and not explicitly as time series models, so there is no theoretical foundation for the construction of these, nor rigorous studies that allow to assess their performance in

*DOI: http://dx.doi.org/10.5772/intechopen.88901*

priate procedure for this purpose.

reflects a deep investigation of such topics.

time series with the stated characteristics.

The authors declare no conflict of interest.

**Conflict of interest**

conclusions about their use.

method used.

#### *Encountered Problems of Time Series with Neural Networks: Models and Architectures DOI: http://dx.doi.org/10.5772/intechopen.88901*

A critical step in the forecast process is the selection of the set of input variables. At this point the decision of which lags of the series to include is fundamental for the result and depends on the available information and knowledge. Obviating the need for prior knowledge about the series, the choice of candidate lags to be included in the model should be based on a heuristic analysis of the importance of each lag, a statistical test of non-linear dependence, criteria for identifying the model or evolutionary algorithms, however, before such options the mixed results reported in the literature show that there is no consensus about what is the appropriate procedure for this purpose.

As previously emphasized, in the literature there are no clear indications about the best practices for choosing the size of the training, test, and prediction sets. Often the size is a predefined parameter in the construction of the neural network model or it is chosen randomly; however, there is no study that demonstrates the effect that this decision entails, moreover, this may be related to the forecasting method used.

Likewise, there is a close relationship between the selection of the internal configuration, especially the hidden neurons, and the training process of the neural network. The consensus about the use of a hidden layer when the data of the time series are continuous and two when there is discontinuity, and of the advantages of the functions sigmodia and hyperbolic tangent in the transfer of the hidden layer, reflects a deep investigation of such topics.

It is often used as a criterion for the selection of the best model based on the error of prediction, expert knowledge or criteria of information; however, the limitations that they manifest and the mixed results reported in their use, in addition to the limited results reported with other techniques, which do not allow conclusive conclusions about their use.

The consideration of characteristic factors of the time series that can affect the evolution of the neural network model such as the length of the time series, the frequency of the observations, the presence of regular and irregular patterns, and the scale of the data, must be included in the process of building the neural network model. The discussion of whether a preprocessing oriented to the stabilization of the series is necessary in non-linear models, and even more, in neural networks, is a topic that is still valid, and depends to a large extent on the type of data that is modeled. The abilities exhibited by neural networks allow, in the first instance, to avoid pre-processing via data transformation. However, it is not yet clear whether, under a correct network construction and training procedure, a prior process of elimination of seasonal trends and patterns is necessary. Scaling is always preferable given its advantages of reducing training patterns and leading to more accurate results.

Likewise, the benefits that different neural network architectures have in relation to nonlinear relationships in the data have been discussed. Neural network models, by themselves, facilitate the representation of non-linear characteristics, without the need for a priori knowledge about such relationships, and such consideration is always desirable in models for real time series; however, it is not. In addition, their performance in the face of dynamic behavior in the data, the exposed architectures have been developed as an extension of neural network models and not explicitly as time series models, so there is no theoretical foundation for the construction of these, nor rigorous studies that allow to assess their performance in time series with the stated characteristics.

#### **Conflict of interest**

The authors declare no conflict of interest.

*Recent Trends in Artificial Neural Networks - From Training to Prediction*

priate set of inputs to the neural network.

specific model among several alternatives.

and seasonal components in neural network models.

els, and if there are gains derived from this practice.

different neural network architectures.

difficult, relevant, and timely problem.

are unknown or unclear.

teristics of the time series.

tives are considered.

modeling.

in the forecasts.

network architectures.

• There is no systematic way accepted in the literature to determine the appro-

• The effects that factors such as partition in training sets, validation and forecast, preprocessing, transfer function, etc., in different forecasting methods

• There are no clear indications that allow to express a priori which transfer function should be used in the neural network model according to the charac-

• There are no empirical, methodological or theoretical reasons to prefer a

• It is not clear when and how to transform the data before performing the

• There is no agreement on how to select the final model when several alterna-

• There is no clarity about the necessity of eliminating or not eliminating trend

• It is difficult to incorporate qualitative, subjective, and contextual information

• There is little understanding of the statistical properties of different neural

• There is no clarity about which are the most adequate procedures for the estimation, validation, and testing of different neural network architectures.

• There is no clarity on how to combine forecasts from several alternative mod-

• There are no or no clarity in the criteria for evaluating the performance of

• There is no clarity about whether and under what criteria, different architectures of neural networks allow the handling of dynamic behaviors in the data.

In this chapter, the need to have adequate models of neural networks for the prediction of time series has been identified, and this task has been exposed as a

• There is no clarity about procedures oriented to the selection of neurons in the hidden layer that in turn allow to minimize the training time of the network.

• There is no general guide to partition the set of observations in training, validation and forecast, in such a way that optimal results are guaranteed.

**32**

**4. Conclusions**

*Recent Trends in Artificial Neural Networks - From Training to Prediction*

#### **Author details**

Paola Andrea Sánchez-Sánchez\*, José Rafael García-González and Leidy Haidy Perez Coronell Universidad Simón Bolívar, Barranquilla, Colombia

\*Address all correspondence to: psanchez9@unisimonbolivar.edu.co

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**35**

*Encountered Problems of Time Series with Neural Networks: Models and Architectures*

artificial neural networks. Journal of Econometrics. 1997;**81**(1):273-280

[11] Qi M, Zhang P. Trend time-series modeling and forecasting with neural networks. IEEE Transactions on Neural

Networks. 2008;**19**(5):808-816

Research. 2005;**160**:501-514

Técnica de Wien; 2000

Cambridge; 1998

1995;**14**:381-393

Ruprecht; 1978

1982;**50**:987-1008

[12] Zhang P, Qi M. Neural network forecasting for seasonal and trend time series. European Journal of Operational

[13] Trapletti A. On Neural Networks as Time Series Models. Universidad

[14] Kasabov N. Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering. 2nd ed. Massachusetts: The MIT Press

[15] Lachtermacher G, Fuller J. Backpropagation in time-series forecasting. Journal of Forecasting.

[16] Meade N. Evidence for selection of forecasting methods. Journal of Forecasting. 2000;**19**:515-535

[17] Granger C, Anderson A. An Introduction to Bilinear Time Series Models. Gottingen: Vandenhoeck and

[18] Tong H, Lim K. Threshold

[20] Bollerslev T. Generalised autoregressive conditional heteroscedasticity. Journal of Econometrics. 1986;**31**:307-327

autoregressive, limit cycles and cyclical data. Journal of the Royal Statistical Society, Series B. 1980;**42**(3):245-292

[19] Engle R. Autoregressive conditional heteroskedasticity with estimates of the variance of UK inflation. Econometrica.

*DOI: http://dx.doi.org/10.5772/intechopen.88901*

[1] Zhang P. An investigation of neural networks for linear time-series forecasting. Computers & Operations Research. 2001;**28**(12):1183-1202

[2] Granger C, Terasvirta T. Modelling Nonlinear Economic Relationships. Oxford: Oxford University Press; 1993

[3] Franses P, Van Dijk D. Non-Linear Time Series Models in Empirical Finance. UK: Cambridge University Press; 2000

[4] Tong H. Non-Linear Time Series: A Dynamical System Approach. Oxford: Oxford Statistical Science Series; 1990

[5] De Gooijer I, Kumar K. Some recent developments in non-linear modelling, testing, and forecasting. International Journal of Forecasting. 1992;**8**:135-156

[6] Peña D. Second-generation timeseries models: A comment on 'Some advances in non-linear and adaptive modelling in time-series analysis' by Tiao and Tsay. Journal of Forecasting.

[7] Crone S, and Kourentzes N. Inputvariable Specification for Neural Networks - An Analysis of Forecasting low and high Time Series Frequency. Proceedings of the International Joint Conference on Neural Networks, (IJCNN'09). in press. 2009

[8] Zhang P, Patuwo B, Hu M. Forecasting with artificial neural networks: the state of the art. International Journal of Forecasting.

[9] Yu L, Wang S, Lai K. A neuralnetwork-based nonlinear metamodeling

approach to financial time series forecasting. Applied Soft Computing.

[10] Franses P, Draisma G. Recognizing changing seasonal patterns using

1994;**13**:133-140

1998;**14**(1):35-62

2009;**9**:563-574

**References**

*Encountered Problems of Time Series with Neural Networks: Models and Architectures DOI: http://dx.doi.org/10.5772/intechopen.88901*

#### **References**

*Recent Trends in Artificial Neural Networks - From Training to Prediction*

**34**

**Author details**

and Leidy Haidy Perez Coronell

Paola Andrea Sánchez-Sánchez\*, José Rafael García-González

\*Address all correspondence to: psanchez9@unisimonbolivar.edu.co

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

Universidad Simón Bolívar, Barranquilla, Colombia

provided the original work is properly cited.

[1] Zhang P. An investigation of neural networks for linear time-series forecasting. Computers & Operations Research. 2001;**28**(12):1183-1202

[2] Granger C, Terasvirta T. Modelling Nonlinear Economic Relationships. Oxford: Oxford University Press; 1993

[3] Franses P, Van Dijk D. Non-Linear Time Series Models in Empirical Finance. UK: Cambridge University Press; 2000

[4] Tong H. Non-Linear Time Series: A Dynamical System Approach. Oxford: Oxford Statistical Science Series; 1990

[5] De Gooijer I, Kumar K. Some recent developments in non-linear modelling, testing, and forecasting. International Journal of Forecasting. 1992;**8**:135-156

[6] Peña D. Second-generation timeseries models: A comment on 'Some advances in non-linear and adaptive modelling in time-series analysis' by Tiao and Tsay. Journal of Forecasting. 1994;**13**:133-140

[7] Crone S, and Kourentzes N. Inputvariable Specification for Neural Networks - An Analysis of Forecasting low and high Time Series Frequency. Proceedings of the International Joint Conference on Neural Networks, (IJCNN'09). in press. 2009

[8] Zhang P, Patuwo B, Hu M. Forecasting with artificial neural networks: the state of the art. International Journal of Forecasting. 1998;**14**(1):35-62

[9] Yu L, Wang S, Lai K. A neuralnetwork-based nonlinear metamodeling approach to financial time series forecasting. Applied Soft Computing. 2009;**9**:563-574

[10] Franses P, Draisma G. Recognizing changing seasonal patterns using

artificial neural networks. Journal of Econometrics. 1997;**81**(1):273-280

[11] Qi M, Zhang P. Trend time-series modeling and forecasting with neural networks. IEEE Transactions on Neural Networks. 2008;**19**(5):808-816

[12] Zhang P, Qi M. Neural network forecasting for seasonal and trend time series. European Journal of Operational Research. 2005;**160**:501-514

[13] Trapletti A. On Neural Networks as Time Series Models. Universidad Técnica de Wien; 2000

[14] Kasabov N. Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering. 2nd ed. Massachusetts: The MIT Press Cambridge; 1998

[15] Lachtermacher G, Fuller J. Backpropagation in time-series forecasting. Journal of Forecasting. 1995;**14**:381-393

[16] Meade N. Evidence for selection of forecasting methods. Journal of Forecasting. 2000;**19**:515-535

[17] Granger C, Anderson A. An Introduction to Bilinear Time Series Models. Gottingen: Vandenhoeck and Ruprecht; 1978

[18] Tong H, Lim K. Threshold autoregressive, limit cycles and cyclical data. Journal of the Royal Statistical Society, Series B. 1980;**42**(3):245-292

[19] Engle R. Autoregressive conditional heteroskedasticity with estimates of the variance of UK inflation. Econometrica. 1982;**50**:987-1008

[20] Bollerslev T. Generalised autoregressive conditional heteroscedasticity. Journal of Econometrics. 1986;**31**:307-327 [21] Granger C. Strategies for modelling nonlinear time-series relationships. Economic Record. 1993;**69**(206):233-238

[22] Zhang P, Patuwo E, Hu M. A simulation study of artificial neural networks for nonlinear time-series forecasting. Computers and Operations Research. 2001;**28**(4):381-396

[23] Chatfield C. What is the "best" method of forecasting? Journal of Applied Statistics. 1988;**15**:19-39

[24] Jenkins G. Some practical aspects of forecasting in organisations. Journal of Forecasting. 1982;**1**:3-21

[25] Makridakis S, Anderson A, Carbone R, Fildes R, Hibon M, Lewandowski R, et al. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. Journal of Forecasting. 1982;**1**:111-153

[26] Clemen R. Combining forecasts: A review and annotated bibliography with discussion. International Journal of Forecasting. 1989;**5**:559-608

[27] Makridakis S, Chatfield C, Hibon M, Lawrence M, Mills T, Ord K, et al. The M2 competition: A realtime judgmentally based forecasting competition. Journal of Forecasting. 1993;**9**:5-22

[28] Newbold P, Granger C. Experience with forecasting univariate time series and the combination of forecasts (with discussion). Journal of the Royal Statistical Society. 1974;**137**:131-164

[29] Granger C. Combining forecaststwenty years later. Journal of Forecasting. 1989;**8**:167-173

[30] Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems. 1995;**7**:231-238

[31] Chatfield C. Model uncertainty and forecast accuracy. Journal of Forecasting. 1996;**15**:495-508

[32] Bates J, Granger C. The combination of forecasts. Operational Research Quarterly. 1969;**20**:451-468

[33] Davison M, Anderson C, Anderson K. Development of a hybrid model for electrical power spot prices. IEEE Transactions on Power Systems. 2002;**2**:17

[34] Luxhoj J, Riis J, Stensballe B. A hybrid econometric-neural network modeling approach for sales forecasting. International Journal of Production Economics. 1996;**43**:175-192

[35] Makridakis S. Why combining works? International Journal of Forecasting. 1989;**5**:601-603

[36] Palm F, Zellner A. To combine or not to combine? issues of combining forecasts. Journal of Forecasting. 1992;**11**:687-701

[37] Reid D. Combining three estimates of gross domestic product. Economica. 1968;**35**:431-444

[38] Winkler R. Combining forecasts: A philosophical basis and some current issues. International Journal of Forecasting. 1989;**5**:605-609

[39] Zhang P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing. 2003;**50**:159-175

[40] Cybenko G. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems. 1989;**2**:303-314

[41] Hornik K. Approximation capability of multilayer feedforward networks. Neural Networks. 1991;**4**:251-257

**37**

1994;**10**:5-15

*Encountered Problems of Time Series with Neural Networks: Models and Architectures*

forecasting. International Journal of Forecasting. 2001;**17**(2):143-157

[53] Arinze B. Selecting appropriate forecasting models using rule induction. Omega-International Journal of Management Science.

1994;**22**(6):647-658

2004;**54**(3):187-193

[54] Venkatachalan A, Sohl J. An intelligent model selection and forecasting system. Journal of Forecasting. 1999;**18**:167-180

[55] Giraud-Carrier R, Brazdil P. Introduction to the special issue on meta-learning. Machine Learning.

[56] Santos P, Ludermir T, Prudencio R. Selection of time series forecasting models based on performance

information. In: Proceedings of the 4th International Conference on Hybrid Intelligent Systems. 2004. pp. 366-371

[57] Santos P, Ludermir T, Prudencio R. Selecting neural network forecasting models using the zoomed-ranking approach. In: Proceedings of the 10th Brazilian Symposium on Neural Networks SBRN '08. 2008. pp. 165-170

[58] Soares C, Brazdil P. Zoomed Ranking – Selection of classification algorithms based on relevant performance information. Lecture Notes in Computer Science.

[59] Vilalta R, Drissi Y. A perspective view and survey of meta-learning. Journal of Artificial Intelligence Review.

statistical model identification. IEEE Transactions on Automatic Control.

[60] Akaike H. A new look at

[61] Schwarz G. Estimating the dimension of a model. The Annals of

Statistics. 1978;**6**:461-464

1910;**2000**:126-135

2002;**18**(2):77-95

1974;**9**:716-723

*DOI: http://dx.doi.org/10.5772/intechopen.88901*

networks are universal approximators. Neural Networks. 1989;**2**(5):359-366

[43] Qi M, Zhang P. An investigation of model selection criteria for neural network time series forecasting. European Journal of Operational Research. 2001;**132**:666-680

[44] Adya M, Collopy F. How effective are neural networks at forecasting and prediction? A review and evaluation. Journal of Forecasting. 1998;**17**:481-495

[45] Hill T, O'Connor M, Remus W. Neural network models for time series forecasts. Management Science.

[46] Fanni A, Uncini A. Special issue on evolving solution with neural networks. Neurocomputing.

[48] Nelson M, Hill T, Remus T, O'Connor M. Time series forecasting using NNs: Should the data be deseasonalized first? Journal of Forecasting. 1999;**18**:359-367

[49] Hill T, Marquez L, O'Connor M, Remus W. Artificial neural networks for forecasting and decision making. International Journal of Forecasting.

[50] Tkacz G, Hu S. Forecasting GDP Growth Using Artificial Neural Networks. Bank of Canada; 1999

[51] Tashman L. Out-of-sample tests of forecasting accuracy: An analysis and review. International Journal of Forecasting. 2000;**16**:437-450

[52] Adya M, Collopy F, Armstrong J, Kennedy M. Automatic identification of time series features for rule-based

[47] Faraway J, Chatfield C. Time series forecasting with neural networks: a comparative study using the airline data. Applied Statistics. 1998;**47**:231-250

1996;**42**:1082-1092

2003;**55**(3-4):417-419

[42] Hornik K, Stinchicombe M, White H. Multilayer feedforward *Encountered Problems of Time Series with Neural Networks: Models and Architectures DOI: http://dx.doi.org/10.5772/intechopen.88901*

networks are universal approximators. Neural Networks. 1989;**2**(5):359-366

*Recent Trends in Artificial Neural Networks - From Training to Prediction*

[31] Chatfield C. Model uncertainty and forecast accuracy. Journal of Forecasting. 1996;**15**:495-508

[33] Davison M, Anderson C,

2002;**2**:17

[32] Bates J, Granger C. The combination of forecasts. Operational Research Quarterly. 1969;**20**:451-468

Anderson K. Development of a hybrid model for electrical power spot prices. IEEE Transactions on Power Systems.

[34] Luxhoj J, Riis J, Stensballe B. A hybrid econometric-neural network modeling approach for sales forecasting. International Journal of Production

[35] Makridakis S. Why combining works? International Journal of Forecasting. 1989;**5**:601-603

[36] Palm F, Zellner A. To combine or not to combine? issues of combining forecasts. Journal of Forecasting.

[37] Reid D. Combining three estimates of gross domestic product. Economica.

[38] Winkler R. Combining forecasts: A philosophical basis and some current issues. International Journal of

[39] Zhang P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing. 2003;**50**:159-175

[41] Hornik K. Approximation capability of multilayer feedforward networks. Neural Networks. 1991;**4**:251-257

[42] Hornik K, Stinchicombe M, White H. Multilayer feedforward

[40] Cybenko G. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and

Forecasting. 1989;**5**:605-609

Systems. 1989;**2**:303-314

Economics. 1996;**43**:175-192

1992;**11**:687-701

1968;**35**:431-444

[21] Granger C. Strategies for modelling nonlinear time-series relationships. Economic Record.

[22] Zhang P, Patuwo E, Hu M. A simulation study of artificial neural networks for nonlinear time-series forecasting. Computers and Operations

Research. 2001;**28**(4):381-396

Forecasting. 1982;**1**:3-21

[25] Makridakis S, Anderson A, Carbone R, Fildes R, Hibon M,

[23] Chatfield C. What is the "best" method of forecasting? Journal of Applied Statistics. 1988;**15**:19-39

[24] Jenkins G. Some practical aspects of forecasting in organisations. Journal of

Lewandowski R, et al. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. Journal of Forecasting. 1982;**1**:111-153

[26] Clemen R. Combining forecasts: A review and annotated bibliography with discussion. International Journal of

Hibon M, Lawrence M, Mills T, Ord K, et al. The M2 competition: A realtime judgmentally based forecasting competition. Journal of Forecasting.

[28] Newbold P, Granger C. Experience with forecasting univariate time series and the combination of forecasts (with discussion). Journal of the Royal Statistical Society. 1974;**137**:131-164

[29] Granger C. Combining forecasts-

twenty years later. Journal of Forecasting. 1989;**8**:167-173

[30] Krogh A, Vedelsby J. Neural network ensembles, cross validation, and active learning. Advances in Neural Information Processing Systems.

Forecasting. 1989;**5**:559-608

1993;**9**:5-22

[27] Makridakis S, Chatfield C,

1993;**69**(206):233-238

**36**

1995;**7**:231-238

[43] Qi M, Zhang P. An investigation of model selection criteria for neural network time series forecasting. European Journal of Operational Research. 2001;**132**:666-680

[44] Adya M, Collopy F. How effective are neural networks at forecasting and prediction? A review and evaluation. Journal of Forecasting. 1998;**17**:481-495

[45] Hill T, O'Connor M, Remus W. Neural network models for time series forecasts. Management Science. 1996;**42**:1082-1092

[46] Fanni A, Uncini A. Special issue on evolving solution with neural networks. Neurocomputing. 2003;**55**(3-4):417-419

[47] Faraway J, Chatfield C. Time series forecasting with neural networks: a comparative study using the airline data. Applied Statistics. 1998;**47**:231-250

[48] Nelson M, Hill T, Remus T, O'Connor M. Time series forecasting using NNs: Should the data be deseasonalized first? Journal of Forecasting. 1999;**18**:359-367

[49] Hill T, Marquez L, O'Connor M, Remus W. Artificial neural networks for forecasting and decision making. International Journal of Forecasting. 1994;**10**:5-15

[50] Tkacz G, Hu S. Forecasting GDP Growth Using Artificial Neural Networks. Bank of Canada; 1999

[51] Tashman L. Out-of-sample tests of forecasting accuracy: An analysis and review. International Journal of Forecasting. 2000;**16**:437-450

[52] Adya M, Collopy F, Armstrong J, Kennedy M. Automatic identification of time series features for rule-based

forecasting. International Journal of Forecasting. 2001;**17**(2):143-157

[53] Arinze B. Selecting appropriate forecasting models using rule induction. Omega-International Journal of Management Science. 1994;**22**(6):647-658

[54] Venkatachalan A, Sohl J. An intelligent model selection and forecasting system. Journal of Forecasting. 1999;**18**:167-180

[55] Giraud-Carrier R, Brazdil P. Introduction to the special issue on meta-learning. Machine Learning. 2004;**54**(3):187-193

[56] Santos P, Ludermir T, Prudencio R. Selection of time series forecasting models based on performance information. In: Proceedings of the 4th International Conference on Hybrid Intelligent Systems. 2004. pp. 366-371

[57] Santos P, Ludermir T, Prudencio R. Selecting neural network forecasting models using the zoomed-ranking approach. In: Proceedings of the 10th Brazilian Symposium on Neural Networks SBRN '08. 2008. pp. 165-170

[58] Soares C, Brazdil P. Zoomed Ranking – Selection of classification algorithms based on relevant performance information. Lecture Notes in Computer Science. 1910;**2000**:126-135

[59] Vilalta R, Drissi Y. A perspective view and survey of meta-learning. Journal of Artificial Intelligence Review. 2002;**18**(2):77-95

[60] Akaike H. A new look at statistical model identification. IEEE Transactions on Automatic Control. 1974;**9**:716-723

[61] Schwarz G. Estimating the dimension of a model. The Annals of Statistics. 1978;**6**:461-464

[62] Tang Z, Fishwick P. Feedforward neural nets as models for time series forecasting. ORSA Journal on Computing. 1993;**5**(4):374-385

[63] Tang Z, Almeida C, Fishwick P. Time series forecasting using neural networks vs Box-Jenkins methodology. Simulation. 1991;**57**(5):303-310

[64] Luukkonen R, Saikkonen P, Terasvirta T. Testing linearity in univariate time series models. Scandinavian Journal of Statistics. 1988;**15**:161-175

[65] Saikkonen P, Luukkonen R. Lagrange multiplier tests for testing non-linearities in time series models. Scandinavian Journal of Statistics. 1988;**15**:55-68

[66] Chan W, Tong H. On tests for nonlinearity in time series analysis. Journal of Forecasting. 1986;**5**:217-228

[67] Hinich M. Testing for Gaussianity and linearity of a statistionary time series. Journal of Time Series Analysis. 1982;**3**:169-176

[68] Happel B, Murre J. The design and evolution of modular neural network architectures. Neural Networks. 1994;**7**:985-1004

[69] Schiffmann W, Joost M, Werner R. Application of genetic algorithms to the construction of topologies for multilayer perceptron. In: Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms. 1993. pp. 675-682

[70] Srinivasan D, Liew A, Chang C. A neural network short-term load forecaster. Electric Power Systems Research. 1994;**28**:227-234

[71] Zhang X. Time series analysis and prediction by neural networks. Optimization Methods and Software. 1994;**4**:151-170

[72] Chester D. Why two hidden layers are better than one. In: Proceedings of the International Joint Conference on Neural Networks. 1990. pp. 1265-1268

[73] Bishop C. Neural Networks for Pattern Recognition. Oxford University Press; 1995

[74] Pack D, El-Sharkawi M, Marks R, Atlas L. Electric load forecasting using an artificial neural network. IEEE Transactions on Power Systems. 1991;**6**(2):442-449

[75] Yu X, Chen G, Cheng S. Dynamic learning rate optimization of the backpropagation algorithm. IEEE Transactions on Neural Networks. 1995;**6**(3):669-677

[76] Falhman S. Faster-learning variations of back-propagation: An empirical study. In: de Proceedings of the 1988 Connectionist Models Summer School. 1989. pp. 38-51

[77] Cottrell M, Girard B, Girard Y, Mangeas M, Muller C. Neural modeling for time series: a statistical stepwise method for weight elimination. IEEE Transactions on Neural Networks. 1995;**6**(6):1355-1364

[78] Lasdon L, Waren A. GRG2 User's Guide. Austin: School of Business Administration, University of Texas; 1986

[79] Weigend A, Rumelhart D, Huberman B. Generalization by weightelimination with application to forecasting. Advances in Neural Information Processing Systems. 1991;**3**:875-882

[80] Karnin E. A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks. 1990;**1**(2):239-245

**39**

*Encountered Problems of Time Series with Neural Networks: Models and Architectures*

[91] Tenti P. Forecasting foreign exchange rates using recurrent neural networks. Applied Artificial Intelligence. 1996;**10**:567-581

[92] Caire P, Hatabian G, Muller C. Progress in forecasting by neural networks. In: Proceedings of the International Joint Conference on Neural Networks. Vol. 2. 1992.

[93] Ong P, Zainuddin Z. Optimizing wavelet neural networks using modified cuckoo search for multi-step ahead chaotic time series prediction. Applied Soft Computing. 2019;**80**:374-386

[94] Zhanga Y, Wanga X, Tang H. An improved Elman neural network with piecewise weighted gradient for time series prediction. Neurocomputing.

Liu S. Optimal forecast combination based on neural networks for time series forecasting. Applied Soft Computing.

Sanchez-Esguevillas A. Neural network

[97] Zurbarán M, Sanmartin P. Efectos de la Comunicación en una Red Ad-Hoc. Investigación e Innovación en

[98] Tealab A. Time series forecasting using artificial neural networks methodologies: A systematic review. Future Computing and Informatics

pp. 540-545

2019;**359**:199-208

2018;**66**:1-17

2019;**100**:656-673

[95] Wang L, Wang Z, Qu H,

[96] Lopez-Martin M, Carro B,

architecture based on gradient boosting for IoT traffic prediction. Future Generation Computer Systems.

Ingenierías. 2016;**4**(1):26-31

Journal. 2018;**3**(2):334-340

*DOI: http://dx.doi.org/10.5772/intechopen.88901*

[81] Reed R. Pruning algorithms a survey. IEEE Transactions on Neural

[82] Siestema J, Dow R. Neural net pruning – why and how. In: Proceedings of the IEEE International Conference on Neural Networks. Vol. 1. 1998.

[83] Breiman L. Combining predictors de Combining Artificial Neural Nets—Ensemble and Modular Multi-Net Systems. Berlin: Springer; 1999.

[84] Carney J, Cunningham P. Tuning diversity in bagged ensembles.

[85] Hansen L, Salamon P. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990;**12**:993-1001

[86] Naftaly U, Intrator N, Horn D. Optimal ensemble averaging of neural networks. Network: Computation in Neural Systems. 1997;**8**:283-296

[87] Chan P, Stolfo S. Metalearning for multistrategy and parallel learning. In: Proceedings of the Second International Workshop on Multistrategy Learning.

[88] Connor J, Atlas L, Martin D. Recurrent Networks and NARMA Modeling de Advances in Neural Information Processing Systems. Morgan Kaufmann Publishers, Inc.

[89] Kuan C, Liu T. Forecasting exchange rates using feedforwad and recurrent neural networks. Journal of Applied Econometrics. 1995;**10**:347-364

[90] Najand M, Bond C. Structural models of exchange rate determination. Journal of Multinational Financial Management. 2000;**10**:15-27

International Journal of Neural Systems.

Networks. 1993;**4**:740-747

pp. 325-333

pp. 31-50

2000;**10**:267-280

1993. pp. 150-165

1991;**119**:301-308

*Encountered Problems of Time Series with Neural Networks: Models and Architectures DOI: http://dx.doi.org/10.5772/intechopen.88901*

[81] Reed R. Pruning algorithms a survey. IEEE Transactions on Neural Networks. 1993;**4**:740-747

*Recent Trends in Artificial Neural Networks - From Training to Prediction*

Optimization Methods and Software.

[72] Chester D. Why two hidden layers are better than one. In: Proceedings of the International Joint Conference on Neural Networks. 1990. pp. 1265-1268

[73] Bishop C. Neural Networks for Pattern Recognition. Oxford University

[74] Pack D, El-Sharkawi M, Marks R, Atlas L. Electric load forecasting using an artificial neural network. IEEE Transactions on Power Systems.

[75] Yu X, Chen G, Cheng S. Dynamic learning rate optimization of the backpropagation algorithm. IEEE Transactions on Neural Networks.

[76] Falhman S. Faster-learning variations of back-propagation: An empirical study. In: de Proceedings of the 1988 Connectionist Models Summer

[77] Cottrell M, Girard B, Girard Y, Mangeas M, Muller C. Neural modeling for time series: a statistical stepwise method for weight elimination. IEEE Transactions on Neural Networks.

[78] Lasdon L, Waren A. GRG2 User's Guide. Austin: School of Business Administration, University of Texas;

Huberman B. Generalization by weight-

[80] Karnin E. A simple procedure for pruning back-propagation trained neural networks. IEEE Transactions on Neural Networks. 1990;**1**(2):239-245

[79] Weigend A, Rumelhart D,

elimination with application to forecasting. Advances in Neural Information Processing Systems.

1994;**4**:151-170

Press; 1995

1991;**6**(2):442-449

1995;**6**(3):669-677

School. 1989. pp. 38-51

1995;**6**(6):1355-1364

1986

1991;**3**:875-882

[62] Tang Z, Fishwick P. Feedforward neural nets as models for time series forecasting. ORSA Journal on Computing. 1993;**5**(4):374-385

[63] Tang Z, Almeida C, Fishwick P. Time series forecasting using neural networks vs Box-Jenkins methodology.

Simulation. 1991;**57**(5):303-310

[64] Luukkonen R, Saikkonen P, Terasvirta T. Testing linearity in univariate time series models. Scandinavian Journal of Statistics.

[65] Saikkonen P, Luukkonen R. Lagrange multiplier tests for testing non-linearities in time series models. Scandinavian Journal of Statistics.

[66] Chan W, Tong H. On tests for nonlinearity in time series analysis. Journal

[67] Hinich M. Testing for Gaussianity and linearity of a statistionary time series. Journal of Time Series Analysis.

[68] Happel B, Murre J. The design and evolution of modular neural network architectures. Neural Networks.

[70] Srinivasan D, Liew A, Chang C. A neural network short-term load forecaster. Electric Power Systems

of Forecasting. 1986;**5**:217-228

1988;**15**:161-175

1988;**15**:55-68

1982;**3**:169-176

1994;**7**:985-1004

1993. pp. 675-682

Research. 1994;**28**:227-234

[71] Zhang X. Time series analysis and prediction by neural networks.

[69] Schiffmann W, Joost M, Werner R. Application of genetic algorithms to the construction of topologies for multilayer perceptron. In: Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms.

**38**

[82] Siestema J, Dow R. Neural net pruning – why and how. In: Proceedings of the IEEE International Conference on Neural Networks. Vol. 1. 1998. pp. 325-333

[83] Breiman L. Combining predictors de Combining Artificial Neural Nets—Ensemble and Modular Multi-Net Systems. Berlin: Springer; 1999. pp. 31-50

[84] Carney J, Cunningham P. Tuning diversity in bagged ensembles. International Journal of Neural Systems. 2000;**10**:267-280

[85] Hansen L, Salamon P. Neural network ensembles. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1990;**12**:993-1001

[86] Naftaly U, Intrator N, Horn D. Optimal ensemble averaging of neural networks. Network: Computation in Neural Systems. 1997;**8**:283-296

[87] Chan P, Stolfo S. Metalearning for multistrategy and parallel learning. In: Proceedings of the Second International Workshop on Multistrategy Learning. 1993. pp. 150-165

[88] Connor J, Atlas L, Martin D. Recurrent Networks and NARMA Modeling de Advances in Neural Information Processing Systems. Morgan Kaufmann Publishers, Inc. 1991;**119**:301-308

[89] Kuan C, Liu T. Forecasting exchange rates using feedforwad and recurrent neural networks. Journal of Applied Econometrics. 1995;**10**:347-364

[90] Najand M, Bond C. Structural models of exchange rate determination. Journal of Multinational Financial Management. 2000;**10**:15-27

[91] Tenti P. Forecasting foreign exchange rates using recurrent neural networks. Applied Artificial Intelligence. 1996;**10**:567-581

[92] Caire P, Hatabian G, Muller C. Progress in forecasting by neural networks. In: Proceedings of the International Joint Conference on Neural Networks. Vol. 2. 1992. pp. 540-545

[93] Ong P, Zainuddin Z. Optimizing wavelet neural networks using modified cuckoo search for multi-step ahead chaotic time series prediction. Applied Soft Computing. 2019;**80**:374-386

[94] Zhanga Y, Wanga X, Tang H. An improved Elman neural network with piecewise weighted gradient for time series prediction. Neurocomputing. 2019;**359**:199-208

[95] Wang L, Wang Z, Qu H, Liu S. Optimal forecast combination based on neural networks for time series forecasting. Applied Soft Computing. 2018;**66**:1-17

[96] Lopez-Martin M, Carro B, Sanchez-Esguevillas A. Neural network architecture based on gradient boosting for IoT traffic prediction. Future Generation Computer Systems. 2019;**100**:656-673

[97] Zurbarán M, Sanmartin P. Efectos de la Comunicación en una Red Ad-Hoc. Investigación e Innovación en Ingenierías. 2016;**4**(1):26-31

[98] Tealab A. Time series forecasting using artificial neural networks methodologies: A systematic review. Future Computing and Informatics Journal. 2018;**3**(2):334-340

Section 2

Metaheuristics and Artificial

Neural Networks

41

Section 2

## Metaheuristics and Artificial Neural Networks

Chapter 3

Abstract

Electric Transmission Network

This paper presents a new method to solve the static long-term power transmis-

sion network expansion planning (TNEP) problem that uses the metaheuristic variable neighbourhood search (VNS). The TNEP is a large-scale, complex mixedinteger nonlinear programming problem that consists of determining the optimum expansion in the network to meet a forecasted demand. VNS changes structure neighbourhood within a local algorithm and makes the choices of implementation that integrate intensification and/or diversification strategies during the search process. The initial solution is obtained by a heuristic nonlinear mixed integer which takes two Kirchhoff's laws (transportation and the DC models have been used). Several tests are performed on Graver's 6-bus, IEEE 24-bus and Southern Brazilian systems displaying the applicability of the proposed method, and results show that the proposed method has a significant performance in comparison with

Keywords: transmission network expansion planning, variable neighbourhood

Due to consumption growth of electrical power, the need of increasing the existing transmission network power flow capacity is evident. This expansion can be a dynamic or static performance. The static long-term power transmission network expansion planning (TNEP) problem consists of determining the minimum cost planning which specifies the number and the locations of transmission lines to meet a forecasted demand while satisfying the balance between generation and load and other operational constraints [1]. Transmission investments are very capital intensive and have long useful lives, so transmission investment decisions have a long-standing impact on the power system as a whole; therefore TNEP has become an important component of power system planning, and its solution is used to guide

search algorithm, metaheuristic algorithm, power system planning,

Expansion Planning with the

Metaheuristic Variable

Neighbourhood Search

Silvia Lopes de Sena Taglialenha

and Rubén Augusto Romero Lázaro

some studies addressed in common literature.

future investment in transmission equipment.

combinatorial optimization

1. Introduction

43

#### Chapter 3

## Electric Transmission Network Expansion Planning with the Metaheuristic Variable Neighbourhood Search

Silvia Lopes de Sena Taglialenha and Rubén Augusto Romero Lázaro

### Abstract

This paper presents a new method to solve the static long-term power transmission network expansion planning (TNEP) problem that uses the metaheuristic variable neighbourhood search (VNS). The TNEP is a large-scale, complex mixedinteger nonlinear programming problem that consists of determining the optimum expansion in the network to meet a forecasted demand. VNS changes structure neighbourhood within a local algorithm and makes the choices of implementation that integrate intensification and/or diversification strategies during the search process. The initial solution is obtained by a heuristic nonlinear mixed integer which takes two Kirchhoff's laws (transportation and the DC models have been used). Several tests are performed on Graver's 6-bus, IEEE 24-bus and Southern Brazilian systems displaying the applicability of the proposed method, and results show that the proposed method has a significant performance in comparison with some studies addressed in common literature.

Keywords: transmission network expansion planning, variable neighbourhood search algorithm, metaheuristic algorithm, power system planning, combinatorial optimization

#### 1. Introduction

Due to consumption growth of electrical power, the need of increasing the existing transmission network power flow capacity is evident. This expansion can be a dynamic or static performance. The static long-term power transmission network expansion planning (TNEP) problem consists of determining the minimum cost planning which specifies the number and the locations of transmission lines to meet a forecasted demand while satisfying the balance between generation and load and other operational constraints [1]. Transmission investments are very capital intensive and have long useful lives, so transmission investment decisions have a long-standing impact on the power system as a whole; therefore TNEP has become an important component of power system planning, and its solution is used to guide future investment in transmission equipment.

The pioneering work on transmission expansion planning is reported in [2], and since then TNEP literature has been vast and reports that there are usually considered various solution methods that depend on the mathematical model formulation [3]. A state of the art, which was obtained from the review of the most interesting models found in the international technical literature, is presented in [4]. In [5] TNEP is reviewed from different aspects such as modelling, solving methods, reliability, distributed generation, electricity market, uncertainties, line congestion and reactive power planning. A critical review focusing on its most recent developments and a taxonomy of modelling decisions and solution to TNEP are presented in [6].

Considering that exact methods of optimization to TNEP are not efficient to big data problems, this paper presents a novel metaheuristic method that considers the so-called variable neighbourhood search (VNS) to solve the TNEP problem considering DC model. The VNS metaheuristic was presented in the middle of the 1990s, by Mladenovic and Hansen [29], and represents a significantly different proposal compared to other metaheuristics. The fundamental idea of the VNS algorithm is based on a basic principle: to explore the space of solutions by systematic changes of neighbourhood structures during the search process. Thus, the transition through the search space of the problem is always accomplished with an improvement of the objective function, and, therefore, the transition is not allowed for a solution of

Electric Transmission Network Expansion Planning with the Metaheuristic Variable…

The VNS algorithm was used with success in the optimization of several prob-

lems of operational research [26, 27, 29, 30], but it is still insignificant in the optimization of problems related to the operation and the planning of electric power systems. The VNS was used to TNEP considering transportation model in [31, 32]. This paper is organized as follows: Initially the mathematical model for TNEP problem and the VNS metaheuristic are presented. After, the developed VNS algorithm to solve the TNEP problem is described. Later, obtained results are presented

The mathematical formulation of the TNEP for the DC model is given by Eqs. (1)–(8) and performs as a nonlinear mixed-integer programming problem [3]:

Min v ¼ ∑

fij � <sup>γ</sup>ij <sup>n</sup><sup>0</sup>

fij <sup>≤</sup> <sup>n</sup><sup>0</sup>

i,j ∈ Ω

ij <sup>þ</sup> nij <sup>θ</sup><sup>i</sup> � <sup>θ</sup><sup>j</sup>

where v is the total investment value for a predefined horizon; cij is the cost of a circuit or facility that can be added in the branch ð Þ i; j ; nij is the number of circuits

initial topology; γij is the susceptance of the branch ð Þ i; j ; θ<sup>i</sup> is phase angle at the bus i; F is the vector of power flow with components fij; fij is the transmission capacity of a circuit through branch ð Þ i; j ; A is the transposed incidence branch-node matrix of the power system; G is a vector with elements gk (power generation at bus k) with maximum values gk; nij is the maximum number of circuits that can be added to the branch ð Þ i; j ; Ω is the set of all branches where it is possible to add new circuits.

Eq. (1) that contains the sum of the investments costs is the objective function. The KCL is framed in Eq. (2), and the Ohm's law is expressed in Eq. (3) which

cij:nij (1)

<sup>¼</sup> <sup>0</sup> (3)

AF þ G ¼ D (2)

ij <sup>þ</sup> nij fij (4)

0≤g ≤g (5) 0≤nij ≤nij (6)

ij is the number of existing circuits in the

nij ≥0 and integer ∀ð Þ i; j ∈ Ω (7) fij, θ<sup>j</sup> unbounded ∀ð Þ i; j ∈ Ω (8)

worse quality as occurs with most of the metaheuristics [29].

and commented. Finally, conclusions are drawn.

2. Mathematical model of TNEP

DOI: http://dx.doi.org/10.5772/intechopen.87071

added during the optimization process; n<sup>0</sup>

45

The convenient mathematical modelling to indicate the appropriate operation would be the representation of the problem by mathematical relationships of the AC load flow, typically used for the electric system operation analysis [1]. However, this modelling is more difficult to be used in an efficient way in transmission network planning, due to its non-convex and nonlinear nature. Consequently, the mathematical modelling in its most accurate representation is the direct current (DC) model, which considers Kirchhoff's voltage (KVL) and current (KCL) laws just for balance and active power flow. In this case, the resulting problem is a nonlinear mixed-integer programming with high complexity for large systems, presenting combinatorial explosion of the number of alternative solutions, with extra difficulty of presenting many local optima, which most of the time are of poor quality [3].

A more simplified modelling is the so-called transportation model (TM) which just enforces the KLC at all existent nodes [2]. In this case the resulting problem is an integer linear programming problem which is normally easier to solve than the DC model although it maintains the combinatorial characteristic of the original problem [3].

It is still possible to consider hybrid models which combine characteristics of the DC model and the transportation model. In this model it is assumed that KCL constraints are satisfied for all nodes of the network, whereas the constraint which represents Ohm's law (and indirectly KVL) is satisfied only by the existing circuits (and not necessarily by the added circuits) [3].

Technical literature related to the TNEP proposes many solution methods that can be classified into mathematical optimization, heuristic and metaheuristic approaches [7]. Techniques such as dynamic programming [8], linear programming [2], nonlinear programming [9], mixed-integer programming [10], branch and bound [11], hierarchical decomposition [12] and Benders decomposition [13] have been used and are categorized as mathematical-based approaches. But these techniques demand large computing time due to the dimensionality curse of this kind of problem. Heuristic methods emerged as an alternative to classical optimization methods, and their use has been very attractive since they were able to find good feasible solutions demanding less computational effort.

Some heuristic approaches have been proposed using constructive heuristic algorithms (CHA) [10, 14–16] and the forward-backward approach [17]. Metaheuristic methods emerged as an alternative to the two previous approaches, producing high-quality solutions with moderate computing time. Genetic algorithms [18, 19], greedy randomized adaptive search procedure [13], tabu search [20, 21], simulated annealing [20, 22], GRASP [23], scatter search [24] and grey wolf optimization algorithm [25] have been used to solve the TNEP problem, among other metaheuristic optimization techniques. It is important to point out that they cannot guarantee the global optimal solution to the TNEP problem.

A varied bibliography regarding the theory and application of metaheuristics can be found in [26, 27]. Other applications of metaheuristics appear in [28].

Electric Transmission Network Expansion Planning with the Metaheuristic Variable… DOI: http://dx.doi.org/10.5772/intechopen.87071

Considering that exact methods of optimization to TNEP are not efficient to big data problems, this paper presents a novel metaheuristic method that considers the so-called variable neighbourhood search (VNS) to solve the TNEP problem considering DC model. The VNS metaheuristic was presented in the middle of the 1990s, by Mladenovic and Hansen [29], and represents a significantly different proposal compared to other metaheuristics. The fundamental idea of the VNS algorithm is based on a basic principle: to explore the space of solutions by systematic changes of neighbourhood structures during the search process. Thus, the transition through the search space of the problem is always accomplished with an improvement of the objective function, and, therefore, the transition is not allowed for a solution of worse quality as occurs with most of the metaheuristics [29].

The VNS algorithm was used with success in the optimization of several problems of operational research [26, 27, 29, 30], but it is still insignificant in the optimization of problems related to the operation and the planning of electric power systems. The VNS was used to TNEP considering transportation model in [31, 32].

This paper is organized as follows: Initially the mathematical model for TNEP problem and the VNS metaheuristic are presented. After, the developed VNS algorithm to solve the TNEP problem is described. Later, obtained results are presented and commented. Finally, conclusions are drawn.

#### 2. Mathematical model of TNEP

The pioneering work on transmission expansion planning is reported in [2], and since then TNEP literature has been vast and reports that there are usually considered various solution methods that depend on the mathematical model formulation [3]. A state of the art, which was obtained from the review of the most interesting models found in the international technical literature, is presented in [4]. In [5] TNEP is reviewed from different aspects such as modelling, solving methods, reliability, distributed generation, electricity market, uncertainties, line congestion and reactive power planning. A critical review focusing on its most recent developments and a taxonomy of modelling decisions and solution to TNEP

Recent Trends in Artificial Neural Networks - From Training to Prediction

The convenient mathematical modelling to indicate the appropriate operation would be the representation of the problem by mathematical relationships of the AC load flow, typically used for the electric system operation analysis [1]. However, this modelling is more difficult to be used in an efficient way in transmission network planning, due to its non-convex and nonlinear nature. Consequently, the mathematical modelling in its most accurate representation is the direct current (DC) model, which considers Kirchhoff's voltage (KVL) and current (KCL) laws just for balance and active power flow. In this case, the resulting problem is a nonlinear mixed-integer programming with high complexity for large systems, presenting combinatorial explosion of the number of alternative solutions, with extra difficulty of presenting many local optima, which most of the time are of

A more simplified modelling is the so-called transportation model (TM) which just enforces the KLC at all existent nodes [2]. In this case the resulting problem is an integer linear programming problem which is normally easier to solve than the DC model although it maintains the combinatorial characteristic of the original

It is still possible to consider hybrid models which combine characteristics of the

Technical literature related to the TNEP proposes many solution methods that

Some heuristic approaches have been proposed using constructive heuristic

Metaheuristic methods emerged as an alternative to the two previous approaches,

algorithms [18, 19], greedy randomized adaptive search procedure [13], tabu search [20, 21], simulated annealing [20, 22], GRASP [23], scatter search [24] and grey wolf optimization algorithm [25] have been used to solve the TNEP problem, among other metaheuristic optimization techniques. It is important to point out that

A varied bibliography regarding the theory and application of metaheuristics can

algorithms (CHA) [10, 14–16] and the forward-backward approach [17].

producing high-quality solutions with moderate computing time. Genetic

they cannot guarantee the global optimal solution to the TNEP problem.

be found in [26, 27]. Other applications of metaheuristics appear in [28].

DC model and the transportation model. In this model it is assumed that KCL constraints are satisfied for all nodes of the network, whereas the constraint which represents Ohm's law (and indirectly KVL) is satisfied only by the existing circuits

can be classified into mathematical optimization, heuristic and metaheuristic approaches [7]. Techniques such as dynamic programming [8], linear programming [2], nonlinear programming [9], mixed-integer programming [10], branch and bound [11], hierarchical decomposition [12] and Benders decomposition [13] have been used and are categorized as mathematical-based approaches. But these techniques demand large computing time due to the dimensionality curse of this kind of problem. Heuristic methods emerged as an alternative to classical optimization methods, and their use has been very attractive since they were able to find good

(and not necessarily by the added circuits) [3].

feasible solutions demanding less computational effort.

are presented in [6].

poor quality [3].

problem [3].

44

The mathematical formulation of the TNEP for the DC model is given by Eqs. (1)–(8) and performs as a nonlinear mixed-integer programming problem [3]:

$$\text{Min } v = \sum\_{i,j \in \Omega} c\_{ij}.n\_{ij} \tag{1}$$

$$AF + G = D \tag{2}$$

$$(f\_{i\vec{\eta}} - \chi\_{\vec{\eta}} (n\_{i\vec{\eta}}^0 + n\_{i\vec{\eta}}) (\theta\_i - \theta\_{\vec{\eta}}) = \mathbf{0} \tag{3}$$

$$\left| f\_{\ ij} \right| \le \left( n\_{ij}^0 + n\_{\ ij} \right) \overline{f\_{\ ij}} \tag{4}$$

$$0 \le \underline{\mathbf{g}} \le \overline{\mathbf{g}} \tag{5}$$

$$0 \le n\_{\vec{\imath}\vec{\jmath}} \le \overline{n}\_{\vec{\imath}\vec{\jmath}} \tag{6}$$

$$n\_{\vec{\eta}} \ge 0 \text{ and integer } \forall (i, j) \in \Omega \tag{7}$$

$$f\_{\vec{\eta}^p} \theta\_j \text{ unbounded} \,\forall (i, j) \in \Omega \tag{8}$$

where v is the total investment value for a predefined horizon; cij is the cost of a circuit or facility that can be added in the branch ð Þ i; j ; nij is the number of circuits added during the optimization process; n<sup>0</sup> ij is the number of existing circuits in the initial topology; γij is the susceptance of the branch ð Þ i; j ; θ<sup>i</sup> is phase angle at the bus i; F is the vector of power flow with components fij; fij is the transmission capacity of a circuit through branch ð Þ i; j ; A is the transposed incidence branch-node matrix of the power system; G is a vector with elements gk (power generation at bus k) with maximum values gk; nij is the maximum number of circuits that can be added to the branch ð Þ i; j ; Ω is the set of all branches where it is possible to add new circuits.

Eq. (1) that contains the sum of the investments costs is the objective function. The KCL is framed in Eq. (2), and the Ohm's law is expressed in Eq. (3) which

implicitly takes into consideration Kirchhoff's voltage law (KVL). Inequalities Eq. (4) represent capacity constraints for transmission lines, whereas the absolute value is necessary since power can flow in both directions. Other constraints Eqs. (6)–(8) represent operational limits of the generators, maximum limit for the addition of circuits per branch and integrality demand of the variables nij, respectively.

The model Eqs. (1)–(8) cannot be solved by using traditional algorithms, and there is no efficient method for solving these kinds of problems directly. Therefore, metaheuristics become suitable optimization tools for finding optimal and suboptimal solutions for the TNEP problem when it is considered complex power systems (big instances).

A more simplified model called the transport model can be considered, which contemplates only Kirchhoff's current law and could be obtained by relaxing the nonlinear constraint Eq. (3) of the DC model described above [3]. In this case, the resulting model is an integer linear programming problem. Even though it is linear, it is still very difficult to find the optimal solution for large and complex systems. The transport model was the first systematic proposal of mathematical modelling used with great success in the problem of planning of transmission systems. The model was proposed by Garver [2] and has represented the beginning of systematic research in the area of transmission system planning.

Another model that has been considered for the PPEST is the linear hybrid model (LHM) which combines characteristics of the DC model and the transport model. This model, in a simpler formulation, preserves the linear properties of the transport model, considering Kirchhoff's current law in all nodes of the network and KVL only in the circuits in the base network (not necessarily in the circuits that will be added) [3, 10]. The LHM is framed by Eqs. (9)–(17):

$$\mathbf{Min}\,\mathbf{v} = \sum\_{\mathbf{i}\_{\bullet}\mathbf{j}\in\Omega} \mathbf{c}\_{\mathbf{i}\mathbf{j}}.\mathbf{n}\_{\mathbf{i}\mathbf{j}}\tag{9}$$

The LHM Eqs. (9)–(17) will be considered as a sensitivity indicator to the

Electric Transmission Network Expansion Planning with the Metaheuristic Variable…

A metaheuristic is a search strategy that orchestrates an interaction between local improvement procedures and higher local strategies to create a process capable of escaping from local optima and performing a robust optimization method for complex problems. This search is performed by means of transitions in the search space from an initial solution or a set of initial solution. In this context, the main difference among the diverse metaheuristic techniques is the strategy used to carry out the transitions within the search space. VNS is a metaheuristic that systematically exploits the idea of neighbourhood change to find local-optimal solutions and to leave those local optima. In that fundamental aspect, VNS is significantly different from other metaheuristics. Most metaheuristics accept the degradation of the current solution as a strategy to leave a local-optimal solution. The VNS algorithm

The VNS algorithm changes the neighbourhood as a way of leaving local-optimal solutions. During this process, the current solution is also the incumbent, which does not happen with other metaheuristics. Thus, it is possible to state that the VNS algorithm performs a set of transitions in the search space of a problem and at each step this transition is performed for the new incumbent. If the process finds a local optimum, then the VNS algorithm changes the neighbourhood in order to leave from that local optimum and to achieve the new incumbent. As a consequence of this strategy, if the VNS algorithm finds the global optimum, the search stops at that point, eliminating any chance of leaving it. This behaviour does not occur with

The strategy of the VNS algorithm is inspired by three important facts [29]: Fact 1—A minimum with regard to one neighbourhood structure is not neces-

Fact 2—A global minimum is a local minimum with regard to all possible

Fact 3—For many problems, a local minimum with regard to one or several

The latter is particularly important in the formulation of the VNS algorithm. This empirical fact implies that a local-optimal solution often provides important information regarding the global one, especially if the local-optimal solution presents excellent quality. It is also an empirical fact that local-optimal solutions are generally concentrated in specific regions of the search space. If local-optimal solutions were to be uniformly distributed in the search space, all metaheuristics would become inefficient. Consequently, if a local optimum is found in the same region where the global optimum is, then the VNS metaheuristic has better chances of finding this global optimum. On the other hand, if the global optimum pertains to another region, then the only possibility to find it is to implement a diversification process. For this reason, equilibrium between intensification and diversification

There is another important aspect related to the quality of the local optimum that should be part of the implementation logic of a VNS algorithm. A local optimum with a better-quality objective function is not necessarily more suitable for trying to find the global optimum. Let xa and xb be two local-optimal solutions with f xð Þ<sup>a</sup> < f xð Þ<sup>b</sup> for the minimization problem. Considering the traditional analysis, it

proposed heuristic algorithm.

DOI: http://dx.doi.org/10.5772/intechopen.87071

does not accept this possibility [26].

other metaheuristics.

neighbourhood structures.

neighbourhoods is relatively close to each other.

during the search process can be important in a metaheuristic.

can be concluded that xa is a local optimum with better quality than xb.

sary for another.

47

3. Metaheuristic VNS

$$\mathbf{A}\mathbf{F} + \mathbf{A}^0 \mathbf{F}^0 + \mathbf{G} = \mathbf{D} \tag{10}$$

$$\left(\mathbf{f}\_{\mathbf{i}\mathbf{j}}^{0} - \gamma\_{\mathbf{i}\mathbf{j}}\mathbf{n}\_{\mathbf{i}\mathbf{j}}^{0}(\theta\_{\mathbf{i}} - \theta\_{\mathbf{j}}) = \mathbf{0}, \forall (\mathbf{i}, \mathbf{j}) \in \Omega\_{0} \tag{11}$$

$$\left|\mathbf{f}\_{\mathrm{ij}}^{0}\right| \leq \left(\mathbf{n}\_{\mathrm{ij}}^{0}\right) \overline{\mathbf{f}\_{\mathrm{ij}}}, \forall (\mathbf{i}, \mathbf{j}) \in \mathfrak{Q}\_{0} \tag{12}$$

$$|\mathbf{f}\_{\mathbf{i}\mathbf{j}}| \le \left(\mathbf{n}\_{\mathbf{i}\mathbf{j}}^{0} + \mathbf{n}\_{\mathbf{i}\mathbf{j}}\right) \overline{\mathbf{f}\_{\mathbf{i}\mathbf{j}}}, \forall (\mathbf{i}, \mathbf{j}) \in \Omega \tag{13}$$

$$0 \le \mathbf{g} \le \overline{\mathbf{g}} \tag{14}$$

$$\mathbf{0} \le \mathbf{n}\_{\nkern-1.2mu\mathbf{T}} \le \overline{\mathbf{n}}\_{\nkern-1.2mu\mathbf{T}}, \forall (\mathbf{i}, \mathbf{j}) \in \Omega \tag{15}$$

$$\mathbf{f}^{0}\_{\text{ij}} \text{ unbounded}, \forall (\mathbf{i}, \mathbf{j}) \in \Omega\_{0} \tag{16}$$

$$\{\mathbf{f}\_{\left|\mathbf{j}\right|} \boldsymbol{\theta}\_{\left|\mathbf{j}\right|} \text{ unbounded}, \forall (\mathbf{i}, \mathbf{j}) \in \Omega\}\tag{17}$$

where A0 is the transposed incidence branch-node matrix of the base topology in previews iterations of the algorithm system; F0 is the vector of base power flow with components f<sup>0</sup> ij; n0 ij is the circuits added during the iterative process to the base case; Ω<sup>0</sup> is the set of all the circuits added during the iterative process and all of the prime circuits of the base case.

The LHM was originally proposed in [10] whose authors present a mathematical modelling Eqs. (9)–(17) which specifies that the portion of the electric system corresponding to the circuits existing in the base configuration must satisfy the two Kirchhoff's laws and the other corresponding part from new circuits must satisfy only Kirchhoff's current law.

Electric Transmission Network Expansion Planning with the Metaheuristic Variable… DOI: http://dx.doi.org/10.5772/intechopen.87071

The LHM Eqs. (9)–(17) will be considered as a sensitivity indicator to the proposed heuristic algorithm.

#### 3. Metaheuristic VNS

implicitly takes into consideration Kirchhoff's voltage law (KVL). Inequalities Eq. (4) represent capacity constraints for transmission lines, whereas the absolute value is necessary since power can flow in both directions. Other constraints Eqs. (6)–(8) represent operational limits of the generators, maximum limit for the

Recent Trends in Artificial Neural Networks - From Training to Prediction

addition of circuits per branch and integrality demand of the variables nij,

metaheuristics become suitable optimization tools for finding optimal and

research in the area of transmission system planning.

be added) [3, 10]. The LHM is framed by Eqs. (9)–(17):

f 0 ij � <sup>γ</sup>ijn<sup>0</sup>

> f 0 ij <sup>≤</sup> n0 ij

fij <sup>≤</sup> <sup>n</sup><sup>0</sup>

> f 0

The model Eqs. (1)–(8) cannot be solved by using traditional algorithms, and there is no efficient method for solving these kinds of problems directly. Therefore,

suboptimal solutions for the TNEP problem when it is considered complex power

Another model that has been considered for the PPEST is the linear hybrid model (LHM) which combines characteristics of the DC model and the transport model. This model, in a simpler formulation, preserves the linear properties of the transport model, considering Kirchhoff's current law in all nodes of the network and KVL only in the circuits in the base network (not necessarily in the circuits that will

Min v ¼ ∑

ij θ<sup>i</sup> � θ<sup>j</sup>

ij þ nij 

<sup>i</sup>,<sup>j</sup>∈ Ω

where A0 is the transposed incidence branch-node matrix of the base topology in

previews iterations of the algorithm system; F0 is the vector of base power flow

modelling Eqs. (9)–(17) which specifies that the portion of the electric system corresponding to the circuits existing in the base configuration must satisfy the two Kirchhoff's laws and the other corresponding part from new circuits must satisfy

case; Ω<sup>0</sup> is the set of all the circuits added during the iterative process and all of the

The LHM was originally proposed in [10] whose authors present a mathematical

cij:nij (9)

AF <sup>þ</sup> A0F0 <sup>þ</sup> <sup>G</sup> <sup>¼</sup> <sup>D</sup> (10)

<sup>¼</sup> <sup>0</sup>, <sup>∀</sup>ð Þ <sup>i</sup>; <sup>j</sup> <sup>∈</sup> <sup>Ω</sup><sup>0</sup> (11)

fij, ∀ð Þ i; j ∈ Ω<sup>0</sup> (12)

0≤ g≤g (14)

0≤nij ≤nij, ∀ð Þ i; j ∈ Ω (15)

ij unbouded; ∀ð Þ i; j ∈ Ω<sup>0</sup> (16) fij, θ<sup>j</sup> unbounded; ∀ð Þ i; j ∈ Ω (17)

ij is the circuits added during the iterative process to the base

fij, ∀ð Þ i; j ∈ Ω (13)

A more simplified model called the transport model can be considered, which contemplates only Kirchhoff's current law and could be obtained by relaxing the nonlinear constraint Eq. (3) of the DC model described above [3]. In this case, the resulting model is an integer linear programming problem. Even though it is linear, it is still very difficult to find the optimal solution for large and complex systems. The transport model was the first systematic proposal of mathematical modelling used with great success in the problem of planning of transmission systems. The model was proposed by Garver [2] and has represented the beginning of systematic

respectively.

systems (big instances).

with components f<sup>0</sup>

46

prime circuits of the base case.

only Kirchhoff's current law.

ij; n0

A metaheuristic is a search strategy that orchestrates an interaction between local improvement procedures and higher local strategies to create a process capable of escaping from local optima and performing a robust optimization method for complex problems. This search is performed by means of transitions in the search space from an initial solution or a set of initial solution. In this context, the main difference among the diverse metaheuristic techniques is the strategy used to carry out the transitions within the search space. VNS is a metaheuristic that systematically exploits the idea of neighbourhood change to find local-optimal solutions and to leave those local optima. In that fundamental aspect, VNS is significantly different from other metaheuristics. Most metaheuristics accept the degradation of the current solution as a strategy to leave a local-optimal solution. The VNS algorithm does not accept this possibility [26].

The VNS algorithm changes the neighbourhood as a way of leaving local-optimal solutions. During this process, the current solution is also the incumbent, which does not happen with other metaheuristics. Thus, it is possible to state that the VNS algorithm performs a set of transitions in the search space of a problem and at each step this transition is performed for the new incumbent. If the process finds a local optimum, then the VNS algorithm changes the neighbourhood in order to leave from that local optimum and to achieve the new incumbent. As a consequence of this strategy, if the VNS algorithm finds the global optimum, the search stops at that point, eliminating any chance of leaving it. This behaviour does not occur with other metaheuristics.

The strategy of the VNS algorithm is inspired by three important facts [29]:

Fact 1—A minimum with regard to one neighbourhood structure is not necessary for another.

Fact 2—A global minimum is a local minimum with regard to all possible neighbourhood structures.

Fact 3—For many problems, a local minimum with regard to one or several neighbourhoods is relatively close to each other.

The latter is particularly important in the formulation of the VNS algorithm. This empirical fact implies that a local-optimal solution often provides important information regarding the global one, especially if the local-optimal solution presents excellent quality. It is also an empirical fact that local-optimal solutions are generally concentrated in specific regions of the search space. If local-optimal solutions were to be uniformly distributed in the search space, all metaheuristics would become inefficient. Consequently, if a local optimum is found in the same region where the global optimum is, then the VNS metaheuristic has better chances of finding this global optimum. On the other hand, if the global optimum pertains to another region, then the only possibility to find it is to implement a diversification process. For this reason, equilibrium between intensification and diversification during the search process can be important in a metaheuristic.

There is another important aspect related to the quality of the local optimum that should be part of the implementation logic of a VNS algorithm. A local optimum with a better-quality objective function is not necessarily more suitable for trying to find the global optimum. Let xa and xb be two local-optimal solutions with f xð Þ<sup>a</sup> < f xð Þ<sup>b</sup> for the minimization problem. Considering the traditional analysis, it can be concluded that xa is a local optimum with better quality than xb.

If these solutions are to be used for initiating (or reinitiating) the search process, then it can be affirmed that the solution presenting internal characteristics closer to those of the global optimum is the most suitable for initiating (or reinitiating) the search and, consequently, solution should not necessarily be chosen.

Thus, for instance, considering the TNEP problem, the local-optimal solution with the largest number of nij elements equal to the optimal solution is the most appropriate for initiating (or reinitiating) the search. It is evident that in normal conditions, the optimal solution is unknown. However, there are some problems where the optimal solution is known, and there are also various heuristic algorithms to find local-optimal solutions for this problem.

In this way, the previous observation can be used to identify the heuristic algorithm that produces best-quality local-optimal solutions for initiating the search using the VNS algorithm. This type of behaviour occurs in the TNEP problem where for some instances (power systems) optimal solutions are known and various constructive heuristic algorithms used to find excellent local-optimal solutions are available. Thus, the best constructive heuristic algorithm to be incorporated into the solution structure of a VNS algorithm can be identified.

Let Nk, k ¼ 1, …, kmax be a finite set of preselected neighbourhood structures, and let Nkð Þ x be a set of solutions or neighbours in the kth neighbourhood of x.

An optimal solution xopt (or global minimum) is a solution where the minimum of Eqs. (9)–(17) is achieved.

neighbourhoods will be nested, i.e. each one contains the previous. Then a point is chosen at random in the first neighbourhood. If its value is better than that of the incumbent (i.e. f x<sup>0</sup> ð Þ < f xð Þ), the search is recentred there (x0 x). Otherwise, one proceeds to the next neighbourhood. After all neighbourhoods have been consid-

The RVNS algorithm chooses neighbours more dynamically by selecting those

More efficient VNS algorithms can be formulated by integrating those characteristics of the VND algorithm that allow local quality optima to be found and those of the RVNS algorithm that allow new promising regions from a local optimum to be found. Thus, by merging those characteristics, two types of VNS algorithms that generally exhibit excellent performance can be formulated. These algorithms are called the basic variable neighbourhood search (BVNS) and the general variable

The BVNS algorithm combines a local search with systematic changes of neighbourhood around the local optimum found in [33]. The structure of the BVNS

The logical procedure adopted by the BVNS is very interesting. Firstly, k neighbourhood structures should be chosen. The optimization process is initiated from a solution x and the corresponding neighbourhood N1ð Þ x . Then, a neighbour

, a local search process to find the local

, one already was the local optimum of the valley and,

, then the local optimum with less quality than the incumbent

consequently, a change of neighbourhood level should be performed (N2ð Þ x in

x was found, and a change of neighbourhood should also be carried out.

ered, one begins again with the first, until a stopping condition is met.

Electric Transmission Network Expansion Planning with the Metaheuristic Variable…

DOI: http://dx.doi.org/10.5772/intechopen.87071

algorithm.

Figure 2.

BVNS framework [33].

neighbourhood search (GVNS).

algorithm is presented in Figure 2.

optimum x<sup>00</sup> is started.

this case).

49

1. If x<sup>00</sup> it is equal to x<sup>0</sup>

2. If x<sup>00</sup> is worse than x<sup>0</sup>

x<sup>0</sup> of x in N1ð Þ x is randomly selected. From x<sup>0</sup>

In this context, three cases may occur:

from all neighbourhood structures (diversification) and prioritizing the first neighbourhood structure (intensification) during the initial stages of the search. Nevertheless, an important component of the RVNS structure is its capacity for finding new promising regions from a local optimum. The RVNS algorithm can also be used independently or be integrated into a more complex structure of the VNS

A solution x<sup>0</sup> is a local minimum of Eqs. (1)–(8) with regard to Nkð Þ x , if there is no solution <sup>x</sup><sup>0</sup> <sup>∈</sup> Nkð Þ <sup>x</sup> <sup>⊆</sup>X, such that f x <sup>00</sup> < f x<sup>0</sup> ð Þ.

Thus, the idea is to define a set of neighbourhood structures that can be used in a deterministic, random or both deterministic and random manners. These different forms of using the neighbourhood structure lead to VNS algorithms with different performances.

There are various proposals of VNS algorithms that can be used independently or in an integrated manner forming more complex VNS structures. The simplest form of a VNS algorithm is the variable neighbourhood descent (VND). The VND algorithm is based on previously mentioned Fact 1, i.e. the local minimum for a given move is not necessarily the local minimum for another type of move [29]. In this way, the local optimum x<sup>0</sup> in the neighbourhood N1ð Þ x is not necessarily equal to the local optimum x <sup>00</sup> of x<sup>0</sup> to the neighbourhood N2ð Þ x .

The VND algorithm takes on the form shown in Figure 1.

This algorithm can be integrated into a more complex structure of the VNS algorithm.

For example, the sept (a) in Figure 1 could be replaced by randomly generating a solution neighbour x<sup>0</sup> of x x<sup>0</sup> ð ∈ Nkð Þ x ); and the resulting algorithm is called the reduced variable neighbourhood search (RVNS). In the RVNS, usually, the


Figure 1. VND algorithm [33]. Electric Transmission Network Expansion Planning with the Metaheuristic Variable… DOI: http://dx.doi.org/10.5772/intechopen.87071

If these solutions are to be used for initiating (or reinitiating) the search process, then it can be affirmed that the solution presenting internal characteristics closer to those of the global optimum is the most suitable for initiating (or reinitiating) the

Thus, for instance, considering the TNEP problem, the local-optimal solution with the largest number of nij elements equal to the optimal solution is the most appropriate for initiating (or reinitiating) the search. It is evident that in normal conditions, the optimal solution is unknown. However, there are some problems where the optimal solution is known, and there are also various heuristic algorithms

In this way, the previous observation can be used to identify the heuristic algorithm that produces best-quality local-optimal solutions for initiating the search using the VNS algorithm. This type of behaviour occurs in the TNEP problem where for some instances (power systems) optimal solutions are known and various constructive heuristic algorithms used to find excellent local-optimal solutions are available. Thus, the best constructive heuristic algorithm to be incorporated into the

Let Nk, k ¼ 1, …, kmax be a finite set of preselected neighbourhood structures, and let Nkð Þ x be a set of solutions or neighbours in the kth neighbourhood of x. An optimal solution xopt (or global minimum) is a solution where the minimum

A solution x<sup>0</sup> is a local minimum of Eqs. (1)–(8) with regard to Nkð Þ x , if there is

There are various proposals of VNS algorithms that can be used independently or in an integrated manner forming more complex VNS structures. The simplest form of a VNS algorithm is the variable neighbourhood descent (VND). The VND algorithm is based on previously mentioned Fact 1, i.e. the local minimum for a given move is not necessarily the local minimum for another type of move [29]. In this way, the local optimum x<sup>0</sup> in the neighbourhood N1ð Þ x is not necessarily

This algorithm can be integrated into a more complex structure of the VNS

For example, the sept (a) in Figure 1 could be replaced by randomly generating a solution neighbour x<sup>0</sup> of x x<sup>0</sup> ð ∈ Nkð Þ x ); and the resulting algorithm is called the reduced variable neighbourhood search (RVNS). In the RVNS, usually, the

< f x<sup>0</sup> ð Þ. Thus, the idea is to define a set of neighbourhood structures that can be used in a deterministic, random or both deterministic and random manners. These different forms of using the neighbourhood structure lead to VNS algorithms with different

of x<sup>0</sup> to the neighbourhood N2ð Þ x .

<sup>00</sup>

search and, consequently, solution should not necessarily be chosen.

Recent Trends in Artificial Neural Networks - From Training to Prediction

to find local-optimal solutions for this problem.

of Eqs. (9)–(17) is achieved.

equal to the local optimum x

performances.

algorithm.

Figure 1.

48

VND algorithm [33].

no solution x<sup>0</sup> ∈ Nkð Þ x ⊆X, such that f x

solution structure of a VNS algorithm can be identified.

00

The VND algorithm takes on the form shown in Figure 1.

Figure 2. BVNS framework [33].

neighbourhoods will be nested, i.e. each one contains the previous. Then a point is chosen at random in the first neighbourhood. If its value is better than that of the incumbent (i.e. f x<sup>0</sup> ð Þ < f xð Þ), the search is recentred there (x0 x). Otherwise, one proceeds to the next neighbourhood. After all neighbourhoods have been considered, one begins again with the first, until a stopping condition is met.

The RVNS algorithm chooses neighbours more dynamically by selecting those from all neighbourhood structures (diversification) and prioritizing the first neighbourhood structure (intensification) during the initial stages of the search. Nevertheless, an important component of the RVNS structure is its capacity for finding new promising regions from a local optimum. The RVNS algorithm can also be used independently or be integrated into a more complex structure of the VNS algorithm.

More efficient VNS algorithms can be formulated by integrating those characteristics of the VND algorithm that allow local quality optima to be found and those of the RVNS algorithm that allow new promising regions from a local optimum to be found. Thus, by merging those characteristics, two types of VNS algorithms that generally exhibit excellent performance can be formulated. These algorithms are called the basic variable neighbourhood search (BVNS) and the general variable neighbourhood search (GVNS).

The BVNS algorithm combines a local search with systematic changes of neighbourhood around the local optimum found in [33]. The structure of the BVNS algorithm is presented in Figure 2.

The logical procedure adopted by the BVNS is very interesting. Firstly, k neighbourhood structures should be chosen. The optimization process is initiated from a solution x and the corresponding neighbourhood N1ð Þ x . Then, a neighbour x<sup>0</sup> of x in N1ð Þ x is randomly selected. From x<sup>0</sup> , a local search process to find the local optimum x<sup>00</sup> is started.

In this context, three cases may occur:


3. If x<sup>00</sup> is better than x<sup>0</sup> , it means that a better solution than the incumbent was found, and, consequently, the incumbent should be updated; the search should be reinitiated from the new incumbent while remaining in the neighbourhood N1:

Whenever the local search finds a new incumbent, at any iteration of the process, the neighbourhood N1ð Þ x should be considered again. Also, whenever the local search finds an equal or worse quality solution than the incumbent, a change towards a more complex neighbourhood should be performed. This strategy and the random choice of the incumbent x' neighbour avoid cycling and allow local optima which are distant from the current incumbent to be found.

The local search of the BVNS algorithm can be any heuristic strategy. Nonetheless, the local search can also use a strategy of the VNS algorithm. Therefore, the BVNS algorithm can be transformed into a more general algorithm called general variable neighbourhood search (GVNS). The GVNS algorithm is obtained through the generalization of the BVNS algorithm by simply using a VND algorithm as a local search and using a RVNS algorithm to improve the initial solution required to begin the search.

All observations made for the BVNS algorithm remain valid for the GVNS algorithm. As mentioned previously, the fundamental change corresponds to the improvement stage of the initial solution using an RVNS algorithm and a VND algorithm for the local search stage.

Since the VNS algorithm can be implemented in various ways, a family of VNS algorithms can also be implemented. In [26, 30, 33] diverse types of VNS algorithms are analysed. In this work, only one of these algorithms is presented. There are other more complex algorithms or structures based on the logic of the VNS algorithm that are out of the scope of this work. Those algorithms can be found in [30, 33].

IS ¼ Max sij ¼ nij fij : nij 6¼ 0

Electric Transmission Network Expansion Planning with the Metaheuristic Variable…

DOI: http://dx.doi.org/10.5772/intechopen.87071

• VGS1: Take a base topology as a current solution, and resolve the HML

• VGS2: Solve LP for the HML using the current solution. If the LP solution indicates that the system is adequately operating with the new additions and v ¼ 0, then stop. A new solution for the DC model was found. Go to step 4.

• VGS3: Identify the most attractive circuit considering the sensitivity in Eq. (18). Update the current solution with the chosen circuit, update n<sup>0</sup>

All of the added circuits represent the solution of the CHA. It can be noted that although the VGS uses a hybrid linear model to identify the best circuit for addition in an iterative process, it complies with both of Kirchhoff's laws after adding a new

Example 1: considering Graver's system [34] that includes six transmission lines

and six buses with a 760-MW demand for base topology, which is shown in Figure 4a, after has applied the VGS it gave the topology in Figure 4b, with

[10]. The VGS can be summarized by the following steps:

circuit; thus, the final solution is also feasible in DC.

Step 2: Definition of neighbourhoods

follow both Kirchhoff's laws.

Ω0, and go to step 2.

v ¼ 130:000 m.u.

51

Figure 3.

GVNS framework [33].

Generally, for large and complex systems, the derived solutions are local-optimal

Eqs. (10)–(17) considering that all of the circuits of the current solution must

n o (18)

ij and

#### 4. Modified VNS for TNEP

In this section the application of our proposed VNS to the TNEP will be described. The GVNS described in Figure 3 will be used considering the following steps that will be explained in detail in sequence:


To determine a DC initial solution to TNEP, the constructive heuristic algorithm (CHA) presented by Villasana-Garver-Salon (VGS) [10] is considered. This algorithm iteratively chooses a new circuit to be added to the system considering a stepby-set procedure that uses a sensitivity index (given in Eq. (18)) that plays a key role in the CHA. The iteratively process continues until a feasible solution is achieved; that means that there is no need for new circuit additions:

Electric Transmission Network Expansion Planning with the Metaheuristic Variable… DOI: http://dx.doi.org/10.5772/intechopen.87071

3. If x<sup>00</sup> is better than x<sup>0</sup>

neighbourhood N1:

begin the search.

found in [30, 33].

algorithm for the local search stage.

4. Modified VNS for TNEP

initial solution.

Step 1: Initial solution

algorithm.

50

steps that will be explained in detail in sequence:

and determination of their elements.

, it means that a better solution than the incumbent was

found, and, consequently, the incumbent should be updated; the search should be reinitiated from the new incumbent while remaining in the

Whenever the local search finds a new incumbent, at any iteration of the process, the neighbourhood N1ð Þ x should be considered again. Also, whenever the local search finds an equal or worse quality solution than the incumbent, a change towards a more complex neighbourhood should be performed. This strategy and the random choice of the incumbent x' neighbour avoid cycling and allow local optima

The local search of the BVNS algorithm can be any heuristic strategy. Nonetheless, the local search can also use a strategy of the VNS algorithm. Therefore, the BVNS algorithm can be transformed into a more general algorithm called general variable neighbourhood search (GVNS). The GVNS algorithm is obtained through the generalization of the BVNS algorithm by simply using a VND algorithm as a local search and using a RVNS algorithm to improve the initial solution required to

All observations made for the BVNS algorithm remain valid for the GVNS algorithm. As mentioned previously, the fundamental change corresponds to the improvement stage of the initial solution using an RVNS algorithm and a VND

algorithms can also be implemented. In [26, 30, 33] diverse types of VNS algorithms are analysed. In this work, only one of these algorithms is presented. There are other more complex algorithms or structures based on the logic of the VNS algorithm that are out of the scope of this work. Those algorithms can be

In this section the application of our proposed VNS to the TNEP will be described. The GVNS described in Figure 3 will be used considering the following

Step 1—Initial solution: Considering a heuristic algorithm to determine an

Step 3—Improvement: Improve the initial solution by using a RVNS

Step 4—Local search: Apply some local search to determine the best

configuration for each current solution neighbourhood.

achieved; that means that there is no need for new circuit additions:

Step 2—Definition of neighbourhoods: Characterization of each neighbourhood

To determine a DC initial solution to TNEP, the constructive heuristic algorithm (CHA) presented by Villasana-Garver-Salon (VGS) [10] is considered. This algorithm iteratively chooses a new circuit to be added to the system considering a stepby-set procedure that uses a sensitivity index (given in Eq. (18)) that plays a key role in the CHA. The iteratively process continues until a feasible solution is

Since the VNS algorithm can be implemented in various ways, a family of VNS

which are distant from the current incumbent to be found.

Recent Trends in Artificial Neural Networks - From Training to Prediction

	- -
		-

Figure 3. GVNS framework [33].

$$I\mathbf{S} = \mathbf{M}\mathbf{a}\mathbf{x}\left\{\mathbf{s}\_{\vec{\eta}} = n\_{\vec{\eta}}\overline{f\_{\vec{\eta}}} : n\_{\vec{\eta}} \neq \mathbf{0}\right\}\tag{18}$$

Generally, for large and complex systems, the derived solutions are local-optimal [10]. The VGS can be summarized by the following steps:


All of the added circuits represent the solution of the CHA. It can be noted that although the VGS uses a hybrid linear model to identify the best circuit for addition in an iterative process, it complies with both of Kirchhoff's laws after adding a new circuit; thus, the final solution is also feasible in DC.

Example 1: considering Graver's system [34] that includes six transmission lines and six buses with a 760-MW demand for base topology, which is shown in Figure 4a, after has applied the VGS it gave the topology in Figure 4b, with v ¼ 130:000 m.u.

Step 2: Definition of neighbourhoods

Figure 4.

Base topology and VGS solution for Graver's system. (a) Base topology and (b) Initial solution by VGS.

Given solution x, the structures of neighbourhood within the solution space can be defined by Eq. (19):

$$N\_k(\mathbf{x}) = \{ \mathbf{x}' \in \mathbb{S} : d(\mathbf{x}, \mathbf{x}') = k, k = \mathbf{1}, \dots, k\_{\max} \}\tag{19}$$

Considering kmax ¼ 5 and the initial solution obtained in step 1, a local improvement search using a GVNS described in Figure 3 is applied considering the HLM

At the end of the process, all the added circuits that were not removed represent

As for the remaining neighbourhoods, the cost variations due to changes (cost difference between entering and leaving circuits) are calculated, and only the changes that exhibit negative variation are simulated (the HLM is solved). If the simulation points out a feasible configuration, then it is a candidate to be used by updating the current configuration. If the new configuration is unfeasible, then the

It is important to elucidate that the movement only be carried out if the new configuration is better than the incumbent and that in this step the procedure only

To illustrate the effectiveness of the proposed method, three problems are con-

Full data can be found in [34–36], respectively. Planning could be done with (r) or without (w) generation rescheduling, resulting in these following cases that have been widely used to validate results of new methods [2, 10, 15, 16, 20, 34]; Da

sidered: the Garver 6-bus, the IEEE 24-bus and the Brazilian Southern 46-bus

The stop criterion corresponds to the maximum number of solved HLM.

In N1ð Þ x , sort all added circuits in cost-decreasing order, remove the circuit having the maximum cost, and verify the operation using the HLM model. If such removal keeps a feasible solution which indicates that the system is in adequate operation condition (i.e. v ¼ 0 after HML solution), remove that circuit; otherwise, keep the circuit. Repeat the process of simulating circuit removal until all of the

Electric Transmission Network Expansion Planning with the Metaheuristic Variable…

Eqs. (9)–(17).

Figure 6.

added circuits have been tested.

x, x<sup>0</sup> and x<sup>00</sup> neighbours codification.

DOI: http://dx.doi.org/10.5772/intechopen.87071

the improved solution.

simulation is cancelled.

Step 4: Local search

5. Results

systems.

53

[21–24, 31, 32, 35, 36]:

accepts movements that lead to feasible solutions.

The local search is based in VND described in Figure 1.

• Case 1w: Garver 6-bus system without rescheduling

• Case 2w: IEEE 24-bus system without rescheduling

• Case 1r: Garver 6-bus system with rescheduling

where d x; x<sup>0</sup> ð Þ¼ k is the quantity of branches with a different number of added circuits in the solutions x and x<sup>0</sup> .

For example, given solutions x, x' and x" from Figure 5a–c, respectively, which are coded in Figure 6, d x; <sup>x</sup><sup>0</sup> ð Þ¼ 1, and d x; <sup>x</sup> <sup>00</sup> ¼ 2. So, solution x<sup>0</sup> is a neighbour of x in N1ð Þ x , and solution x<sup>00</sup> is a neighbour of x in N2ð Þ x .

Neighbour x<sup>0</sup> is obtained from x by adding a circuit in branch 8 (buses 3–6), whereas the neighbour x<sup>00</sup> was obtained from x by adding one circuit in branch 7 (buses 3–5) and removing one circuit in branch 9 (buses 4–6). In the same way, the neighbours in the other k neighbourhoods can be obtained.

Step 3: Improvement of the initial solution

Figure 5. Neighbourhood characterization.

Electric Transmission Network Expansion Planning with the Metaheuristic Variable… DOI: http://dx.doi.org/10.5772/intechopen.87071

Figure 6.

Given solution x, the structures of neighbourhood within the solution space can

Base topology and VGS solution for Graver's system. (a) Base topology and (b) Initial solution by VGS.

where d x; x<sup>0</sup> ð Þ¼ k is the quantity of branches with a different number of added

For example, given solutions x, x' and x" from Figure 5a–c, respectively, which

whereas the neighbour x<sup>00</sup> was obtained from x by adding one circuit in branch 7 (buses 3–5) and removing one circuit in branch 9 (buses 4–6). In the same way, the

is obtained from x by adding a circuit in branch 8 (buses 3–6),

∈ S : d x; x<sup>0</sup> f g ð Þ¼ k; k ¼ 1; …; kmax (19)

¼ 2. So, solution x<sup>0</sup> is a neighbour of

be defined by Eq. (19):

Figure 4.

Neighbour x<sup>0</sup>

Figure 5.

52

Neighbourhood characterization.

circuits in the solutions x and x<sup>0</sup>

Nkð Þ¼ x x<sup>0</sup>

are coded in Figure 6, d x; <sup>x</sup><sup>0</sup> ð Þ¼ 1, and d x; <sup>x</sup> <sup>00</sup>

x in N1ð Þ x , and solution x<sup>00</sup> is a neighbour of x in N2ð Þ x .

neighbours in the other k neighbourhoods can be obtained.

Step 3: Improvement of the initial solution

.

Recent Trends in Artificial Neural Networks - From Training to Prediction

x, x<sup>0</sup> and x<sup>00</sup> neighbours codification.

Considering kmax ¼ 5 and the initial solution obtained in step 1, a local improvement search using a GVNS described in Figure 3 is applied considering the HLM Eqs. (9)–(17).

In N1ð Þ x , sort all added circuits in cost-decreasing order, remove the circuit having the maximum cost, and verify the operation using the HLM model. If such removal keeps a feasible solution which indicates that the system is in adequate operation condition (i.e. v ¼ 0 after HML solution), remove that circuit; otherwise, keep the circuit. Repeat the process of simulating circuit removal until all of the added circuits have been tested.

At the end of the process, all the added circuits that were not removed represent the improved solution.

As for the remaining neighbourhoods, the cost variations due to changes (cost difference between entering and leaving circuits) are calculated, and only the changes that exhibit negative variation are simulated (the HLM is solved). If the simulation points out a feasible configuration, then it is a candidate to be used by updating the current configuration. If the new configuration is unfeasible, then the simulation is cancelled.

It is important to elucidate that the movement only be carried out if the new configuration is better than the incumbent and that in this step the procedure only accepts movements that lead to feasible solutions.

The stop criterion corresponds to the maximum number of solved HLM.

Step 4: Local search

The local search is based in VND described in Figure 1.

#### 5. Results

To illustrate the effectiveness of the proposed method, three problems are considered: the Garver 6-bus, the IEEE 24-bus and the Brazilian Southern 46-bus systems.

Full data can be found in [34–36], respectively. Planning could be done with (r) or without (w) generation rescheduling, resulting in these following cases that have been widely used to validate results of new methods [2, 10, 15, 16, 20, 34]; Da [21–24, 31, 32, 35, 36]:



Table 1. Obtainedresults.

• Case 2r: IEEE 24-bus system with rescheduling

DOI: http://dx.doi.org/10.5772/intechopen.87071

gence of the VNS algorithm applied to TNEP.

solution requiring fewer solved linear problems.

demand [35].

for removal.

resolutions.

55

6. Conclusions

• Case 3w: Brazilian Southern 46-bus system without rescheduling

Electric Transmission Network Expansion Planning with the Metaheuristic Variable…

The Brazilian Southern is a real referred system originally formed by 46 buses and 66 circuits in the base topology, 79 candidate paths and 6.880 MW as expected

For reducing the size of the considered neighbourhoods, only those added circuits operating below 70% of their capacity were considered to be candidate circuits

Table 1 shows the results. The proposed method was more efficient than the methods shown in [15, 20], since it requires less number of linear programing

In this paper an efficient new method based on variable neighbourhood search has been proposed for transmission networking problem planning considering the DC model whose mathematical formulation is nonlinear and mixed integer. The TNEP is a multimodal problem of high complexity for medium and large systems and cannot be solved by exact algorithms in reasonable computational times.

The proposed method systematically exploits the idea of neighbourhood change to find local-optimal solutions and to leave those local optima. It was observed that the definition of neighbourhood structures plays an important role to the conver-

The proposed method was tested in the Garver 6-bus, in the IEEE 24-bus and in the Brazilian Southern 46-bus systems, and the results got more chance of finding better solutions than mathematical optimization techniques and find local-optimal

As further research directions, new strategies for reducing the size of the neighbourhood such as using adjacency lists to avoid adding new lines in isolated circuits and different kinds of structure neighbourhoods could be developed.

• Case 3r: Brazilian Southern 46-bus system with rescheduling

Electric Transmission Network Expansion Planning with the Metaheuristic Variable… DOI: http://dx.doi.org/10.5772/intechopen.87071


The Brazilian Southern is a real referred system originally formed by 46 buses and 66 circuits in the base topology, 79 candidate paths and 6.880 MW as expected demand [35].

For reducing the size of the considered neighbourhoods, only those added circuits operating below 70% of their capacity were considered to be candidate circuits for removal.

Table 1 shows the results. The proposed method was more efficient than the methods shown in [15, 20], since it requires less number of linear programing resolutions.

#### 6. Conclusions

In this paper an efficient new method based on variable neighbourhood search has been proposed for transmission networking problem planning considering the DC model whose mathematical formulation is nonlinear and mixed integer. The TNEP is a multimodal problem of high complexity for medium and large systems and cannot be solved by exact algorithms in reasonable computational times.

The proposed method systematically exploits the idea of neighbourhood change to find local-optimal solutions and to leave those local optima. It was observed that the definition of neighbourhood structures plays an important role to the convergence of the VNS algorithm applied to TNEP.

The proposed method was tested in the Garver 6-bus, in the IEEE 24-bus and in the Brazilian Southern 46-bus systems, and the results got more chance of finding better solutions than mathematical optimization techniques and find local-optimal solution requiring fewer solved linear problems.

As further research directions, new strategies for reducing the size of the neighbourhood such as using adjacency lists to avoid adding new lines in isolated circuits and different kinds of structure neighbourhoods could be developed.

Cases

54

Initial solution

> Added circuits

> > 1w

1r 2w

x3�24

¼ 1, x10�12

x10�12

¼ 1, x12�13

x15�24

2r

x1�5 ¼ 1, x3�24

x7�8 ¼ 1, x10�11 ¼ 1, x14�16

x15�21

3w

x5�6 ¼ 2, x20�21

x28�31

3r Table 1. Obtained results.

x5�6 ¼ 2, x19�21

¼ 1, x20�21

x46�6 ¼ 1

¼ 2, x20�23

¼ 1,

95.795

8

x2�5 ¼ 1, x5�6 ¼ 2, x13�20

x20�23

¼ 1, x42�43

¼ 1, x46�6 ¼ 2

¼ 1, x20�21

¼ 2,

72.870

497

 3

¼ 1, x31�41

¼ 1, x40�41

x46�6 ¼ 1

¼ 1, x42�43

¼ 1,

¼ 2, x24�25

¼ 2, x25�32

¼ 1,

166.041

17

x5�6 ¼ 2, x19�25

x26�29

¼ 3, x28�30

¼ 1, x29�30

x42�43

¼ 2, x46�6 ¼ 1

¼ 2, x31�32

¼ 1,

¼ 1, x20�21

¼ 1, x24�25

¼ 2,

154.420

5

¼ 1, x15�24

¼ 1, x16�17

¼ 2, x17�18

¼ 1

¼ 1, x6�7 ¼ 2, x6�10

¼ 1,

618

16

x3�24

¼ 1, x6�10

x10�12

¼ 1, x14�16

¼ 2, x16�17

¼ 1

342

361

 2

¼ 1, x7�8 ¼ 2, x9�11 ¼ 1,

> ¼ 2, x15�16

> ¼ 1,

¼ 1, x16�17

¼ 1

¼ 1, x14�16

¼ 1,

¼ 2, x6�10

¼ 1, x7�8 ¼ 2,

x23

¼ 1, x26

¼ 1, x35

¼ 1, x46

¼ 2

130 476

12

x3�24

x9�11 ¼ 1, x10�12

¼ 1, x14�16

¼ 2, x16�17

¼ 1

¼ 1, x6�10

¼ 1, x7�8 ¼ 2,

6

x35

¼ 1, x46

¼ 3

x1�3 ¼ 3, x1�5 ¼ 1, x2�3 ¼ 1, x4�6 ¼ 3

Total cost

PLs

Added circuits

Total cost

PLs

kmax

required

(�1:000Þ

200 110 392

1776

 3

Recent Trends in Artificial Neural Networks - From Training to Prediction

19

 2

27

 3

required

(�1:000Þ

244

9

x2�6 ¼ 4, x3�5 ¼ 1, x4�6 ¼ 2

GVNS solution Recent Trends in Artificial Neural Networks - From Training to Prediction

References

1970;89:1688-1697

[1] Sullivan RL. Power System Planning.

DOI: http://dx.doi.org/10.5772/intechopen.87071

Electric Transmission Network Expansion Planning with the Metaheuristic Variable…

Conference and Exhibition; Vol. 1. Yokohama, Japan: IEEE; 2002. pp. 50-55. DOI: 10.1109/TDC.2002. 1178259

[10] Villasana R, Garver LL, Salon SJ. Transmission network planning using linear programming. IEEE Transactions on Power Systems. 1985;104:349-356

[11] Haffner S, Monticelli A, Garcia A, Mantovani J, Romero R. Branch and bound algorithm for transmission system expansion planning using a transportation model. IEE Proceedings -

Generation, Transmission and Distribution, V. 2000;147(3):149-156

[12] Romero R, Monticelli A. A

transmission network expansion planning. IEEE Transactions on Power

Systems. 1994;9:373-380

235-240

hierarchical decomposition approach for

[13] Binato S, Pereira MVF, Granville S. A new benders decomposition approach to solve power transmission network design problems. IEEE Transactions on Power Apparatus and Systems. 2001;16:

[14] Romero R, Rocha C, Mantovani JRS, Sanchez IG. Constructive heuristic algorithm for the DC model in network transmission expansion planning. IEE Proceedings-Generation, Transmission and Distribution. 2005;152(2):277-282

[15] Romero R, Rider M, Silva I. A metaheuristic to solve the transmission expansion planning. IEEE Transactions on Power Systems. 2007;22:2289-2291

[16] Da Silva EL, Gil HA, Areiza JM. Transmission network expansion planning under an improved genetic algorithm. IEEE Transactions on Power

[17] Seifi H, Sepasian MS, Haghighat H,

Systems. 2000;15(4):1168-1117

Foroud AA, Yousefi GR, Rae S.

[2] Garver LL. Transmission network estimation using linear programming. IEEE Transaction Apparatus Systems.

[3] Romero R, Monticelli A, Garcia A,

mathematical models for transmission network expansion planning. IEE Proceedings Generation, Transmission and Distribution. 2002;149(1):27-36

[4] Latorre G, Cruz RD, Areiza JM, Villegas A. Classification of publications and models on transmission expansion planning. IEEE Transactions on Power

Systems. 2003;18(2):938-946

[5] Hemmati R, Hooshmand RA, Khodabakhshian A. State-of-the-art of transmission expansion planning: Comprehensive review. Renewable and Sustainable Energy Reviews. 2013;23: 312-319. DOI: 10.1016/j.rser.2013.03.015

[6] Lumbreras S, Ramos A. The new challenges to transmission expansion planning. Survey of recent practice and literature review. Electric Power Systems Research. 2016;134:19-29

[7] Lee CW, Ng SKK, Zhong J, Wu FF. Transmission Expansion Planning From Past to Future. In: IEEE PES Power Systems Conference and Exposition; Atlanta, GA; 2006. pp. 257-265

[8] Dusonchet YP, El-Abiad A. Transmission planning using discrete dynamic optimizing. IEEE Transactions on Power Apparatus and Systems. 1973; PAS-92(4):1358-1137. DOI: 10.1109/

[9] Al-Hamouz ZM, Al-Faraj AS. Transmission expansion planning using nonlinear programming. In: IEEE/PES

Transmission and Distribution

TPAS.1973.293543

57

New York: McGraw-Hil; 1977

Haffner S. Test systems and

### Author details

Silvia Lopes de Sena Taglialenha<sup>1</sup> \* and Rubén Augusto Romero Lázaro<sup>2</sup>

1 Federal University of Santa Catarina, Technological Center of Joinville, Joinville, Brazil

2 Electrical Engineering at FEIS-UNESP-Ilha Solteira, Solteira, Brazil

\*Address all correspondence to: s.taglialenha@ufsc.br

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Electric Transmission Network Expansion Planning with the Metaheuristic Variable… DOI: http://dx.doi.org/10.5772/intechopen.87071

#### References

[1] Sullivan RL. Power System Planning. New York: McGraw-Hil; 1977

[2] Garver LL. Transmission network estimation using linear programming. IEEE Transaction Apparatus Systems. 1970;89:1688-1697

[3] Romero R, Monticelli A, Garcia A, Haffner S. Test systems and mathematical models for transmission network expansion planning. IEE Proceedings Generation, Transmission and Distribution. 2002;149(1):27-36

[4] Latorre G, Cruz RD, Areiza JM, Villegas A. Classification of publications and models on transmission expansion planning. IEEE Transactions on Power Systems. 2003;18(2):938-946

[5] Hemmati R, Hooshmand RA, Khodabakhshian A. State-of-the-art of transmission expansion planning: Comprehensive review. Renewable and Sustainable Energy Reviews. 2013;23: 312-319. DOI: 10.1016/j.rser.2013.03.015

[6] Lumbreras S, Ramos A. The new challenges to transmission expansion planning. Survey of recent practice and literature review. Electric Power Systems Research. 2016;134:19-29

[7] Lee CW, Ng SKK, Zhong J, Wu FF. Transmission Expansion Planning From Past to Future. In: IEEE PES Power Systems Conference and Exposition; Atlanta, GA; 2006. pp. 257-265

[8] Dusonchet YP, El-Abiad A. Transmission planning using discrete dynamic optimizing. IEEE Transactions on Power Apparatus and Systems. 1973; PAS-92(4):1358-1137. DOI: 10.1109/ TPAS.1973.293543

[9] Al-Hamouz ZM, Al-Faraj AS. Transmission expansion planning using nonlinear programming. In: IEEE/PES Transmission and Distribution

Conference and Exhibition; Vol. 1. Yokohama, Japan: IEEE; 2002. pp. 50-55. DOI: 10.1109/TDC.2002. 1178259

[10] Villasana R, Garver LL, Salon SJ. Transmission network planning using linear programming. IEEE Transactions on Power Systems. 1985;104:349-356

[11] Haffner S, Monticelli A, Garcia A, Mantovani J, Romero R. Branch and bound algorithm for transmission system expansion planning using a transportation model. IEE Proceedings - Generation, Transmission and Distribution, V. 2000;147(3):149-156

[12] Romero R, Monticelli A. A hierarchical decomposition approach for transmission network expansion planning. IEEE Transactions on Power Systems. 1994;9:373-380

[13] Binato S, Pereira MVF, Granville S. A new benders decomposition approach to solve power transmission network design problems. IEEE Transactions on Power Apparatus and Systems. 2001;16: 235-240

[14] Romero R, Rocha C, Mantovani JRS, Sanchez IG. Constructive heuristic algorithm for the DC model in network transmission expansion planning. IEE Proceedings-Generation, Transmission and Distribution. 2005;152(2):277-282

[15] Romero R, Rider M, Silva I. A metaheuristic to solve the transmission expansion planning. IEEE Transactions on Power Systems. 2007;22:2289-2291

[16] Da Silva EL, Gil HA, Areiza JM. Transmission network expansion planning under an improved genetic algorithm. IEEE Transactions on Power Systems. 2000;15(4):1168-1117

[17] Seifi H, Sepasian MS, Haghighat H, Foroud AA, Yousefi GR, Rae S.

Author details

Brazil

56

Silvia Lopes de Sena Taglialenha<sup>1</sup>

\* and Rubén Augusto Romero Lázaro<sup>2</sup>

1 Federal University of Santa Catarina, Technological Center of Joinville, Joinville,

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

2 Electrical Engineering at FEIS-UNESP-Ilha Solteira, Solteira, Brazil

Recent Trends in Artificial Neural Networks - From Training to Prediction

\*Address all correspondence to: s.taglialenha@ufsc.br

provided the original work is properly cited.

Multi-voltage approach to long-term network expansion planning. IET Generation Transmission and Distribution. 2007;1:9. DOI: 10.1049/iet-gtd:20070092

[18] Gallego RA, Monticelli A, Romero R. Transmission expansion planning by extended genetic algorithm. IEE Proceedings - Generation, Transmission and Distribution. 1998:145(3):329-335

[19] da SilvaEL, Gil HA, Areiza JM. Transmission network expansion planning under an improved genetic algorithm. IEEE Transmission Power Systems. 2000;15:1168-1175

[20] Gallego RA, Monticelli A, Romero R. Comparative studies of non-convex optimization methods for transmission network expansion planning. IEEE Transactions on Power Systems. 1998; 13(3):822-828

[21] Da Silva EL, Areiza JM, Oliveira GC, Binato S. Transmission network expansion planning under a tabu search approach. IEEE Transactions on Power Systems. 2001;16(1):62-68

[22] Gallego RA, Alves AB, Monticelli A, Romero R. Parallel simulated annealing applied to long term transmission expansion planning. IEEE Transactions on Power Systems. 1997;1(12):181-187

[23] Faria HJ, Binato S, Resende MGC, Falcão DM. Power transmission network design by greedy randomized adaptive path relinking. IEEE Transactions on Power Systems. 2005; 20(1):43-49

[24] Mori H, Shimomugi K. Network expansion planning with scatter search. In: IEEE International Conference on Systems, Man and Cybernetics. ISIC; 2007. pp. 3749-3754

[25] Khandelwal A, Bhargava A, Sharma A, et al. Modified grey wolf optimization algorithm for transmission network expansion planning problem.

Arabian Journal for Science and Engineering. 2018;43:2899. DOI: 10.1007/s13369-017-2967-3

[26] Glover F, Kochenberger GA. Handbook of Metaheuristics. Kluwer Academic Publishers; 2003

bound algorithm for transmission system expansion planning using a transportation model. IEE Proceedings-

DOI: http://dx.doi.org/10.5772/intechopen.87071

Electric Transmission Network Expansion Planning with the Metaheuristic Variable…

[35] Oliveira GC, Costa APC, Binato S. Large scale transmission network planning using optimization and heuristic techniques. IEEE Transactions on Power Systems. 1995;10:1828-1834

[36] Risheng F, Hill DJ. A new strategy

competitive electricity markets. IEEE Transactions on Power Systems. 2003;

for transmission expansion in

18(1):374-380

59

Generation, Transmission and Distribution. 2000;147(3):149-156

[27] Yang XS. Review of meta-heuristics and generalized evolutionary walk algorithm. International Journal of Bio-Inspired Computation. 2011;3(2):77-84

[28] Li Y, Gong G, Li N. Recent advances in modelling and optimizing complex systems based on intelligent algorithms. International Journal of Industrial Engineering: Theory, Applications and Practice. 2018;25(6):779-799

[29] Mladenovic N, Hansen P. Variable neighborhood search. Computers and Operations Research. 1997;24(11): 1097-1100

[30] Hansen P, Mladenovic N. Variable neighborhood search: Principles and applications. European Journal of Operational Research. 2001;130: 449-467

[31] Taglialenha SLS. Novas Aplicações de Meta heurísticas na Solução do Problema de Planejamento da Expansão do Sistema de Transmissão de Energia Elétrica [thesis]. 2008. DEE-FEIS-UNESP, Ilha Solteira

[32] Taglialenha SLS, Fernandes CWN, Silva VMD. Variable neighborhood search for transmission network expansion planning problem. In: Borsato M et al, editors. Transdisciplinary Engineering: Crossing Boundaries. ISPE TE. 2016;2016:543-552. DOI: 10.3233/ 978-1-61499-703-0-543

[33] Hansen P, Mladenovic N. A tutorial on variable neighbourhood search. Les Cahiers du GERAD, G-2003-46; 2003

[34] Haffner S, Monticelli A, Garcia A, Mantovani J, Romero R. Branch and

Electric Transmission Network Expansion Planning with the Metaheuristic Variable… DOI: http://dx.doi.org/10.5772/intechopen.87071

bound algorithm for transmission system expansion planning using a transportation model. IEE Proceedings-Generation, Transmission and Distribution. 2000;147(3):149-156

Multi-voltage approach to long-term network expansion planning. IET Generation Transmission and Distribution. 2007;1:9. DOI: 10.1049/iet-gtd:20070092

Recent Trends in Artificial Neural Networks - From Training to Prediction

Arabian Journal for Science and Engineering. 2018;43:2899. DOI: 10.1007/s13369-017-2967-3

[26] Glover F, Kochenberger GA. Handbook of Metaheuristics. Kluwer

[27] Yang XS. Review of meta-heuristics and generalized evolutionary walk algorithm. International Journal of Bio-Inspired Computation. 2011;3(2):77-84

[28] Li Y, Gong G, Li N. Recent advances in modelling and optimizing complex systems based on intelligent algorithms. International Journal of Industrial Engineering: Theory, Applications and

[29] Mladenovic N, Hansen P. Variable neighborhood search. Computers and Operations Research. 1997;24(11):

[30] Hansen P, Mladenovic N. Variable neighborhood search: Principles and applications. European Journal of Operational Research. 2001;130:

[31] Taglialenha SLS. Novas Aplicações de Meta heurísticas na Solução do Problema de Planejamento da Expansão do Sistema de Transmissão de Energia Elétrica [thesis]. 2008. DEE-FEIS-

[32] Taglialenha SLS, Fernandes CWN, Silva VMD. Variable neighborhood search for transmission network

expansion planning problem. In: Borsato M et al, editors. Transdisciplinary Engineering: Crossing Boundaries. ISPE TE. 2016;2016:543-552. DOI: 10.3233/

[33] Hansen P, Mladenovic N. A tutorial on variable neighbourhood search. Les Cahiers du GERAD, G-2003-46; 2003

[34] Haffner S, Monticelli A, Garcia A, Mantovani J, Romero R. Branch and

Academic Publishers; 2003

Practice. 2018;25(6):779-799

1097-1100

449-467

UNESP, Ilha Solteira

978-1-61499-703-0-543

[18] Gallego RA, Monticelli A, Romero R. Transmission expansion planning by extended genetic algorithm. IEE

Proceedings - Generation, Transmission and Distribution. 1998:145(3):329-335

[19] da SilvaEL, Gil HA, Areiza JM. Transmission network expansion planning under an improved genetic algorithm. IEEE Transmission Power

[20] Gallego RA, Monticelli A, Romero R. Comparative studies of non-convex optimization methods for transmission network expansion planning. IEEE Transactions on Power Systems. 1998;

[21] Da Silva EL, Areiza JM, Oliveira GC,

expansion planning under a tabu search approach. IEEE Transactions on Power

[22] Gallego RA, Alves AB, Monticelli A, Romero R. Parallel simulated annealing applied to long term transmission expansion planning. IEEE Transactions on Power Systems. 1997;1(12):181-187

[23] Faria HJ, Binato S, Resende MGC, Falcão DM. Power transmission network design by greedy randomized

Transactions on Power Systems. 2005;

[24] Mori H, Shimomugi K. Network expansion planning with scatter search. In: IEEE International Conference on Systems, Man and Cybernetics. ISIC;

[25] Khandelwal A, Bhargava A, Sharma

optimization algorithm for transmission network expansion planning problem.

adaptive path relinking. IEEE

20(1):43-49

58

2007. pp. 3749-3754

A, et al. Modified grey wolf

Binato S. Transmission network

Systems. 2001;16(1):62-68

Systems. 2000;15:1168-1175

13(3):822-828

[35] Oliveira GC, Costa APC, Binato S. Large scale transmission network planning using optimization and heuristic techniques. IEEE Transactions on Power Systems. 1995;10:1828-1834

[36] Risheng F, Hill DJ. A new strategy for transmission expansion in competitive electricity markets. IEEE Transactions on Power Systems. 2003; 18(1):374-380

Chapter 4

Abstract

An Improved Algorithm for

Biochemical Systems

Mohd Arfian Ismail, Vitaliy Mezhuyev,

and Ashraf Osman Ibrahim

genetic algorithm, Newton method

1. Introduction

61

Mohd Saberi Mohamad, Shahreen Kasim

Optimising the Production of

This chapter presents an improved method for constrained optimisation of biochemical systems production. The aim of the proposed method is to maximise its production and, at the same time, to minimise the total amount of chemical concentrations involved in producing the best production. The proposed method models biochemical systems with ordinary differential equations. The optimisation process became complex for the large size of biochemical systems that contain many chemicals. In addition, several constraints as the steady-state constraint and the constraint of chemical concentrations also contributed to the computational complexity and difficulty in the optimisation process. This chapter considers the biochemical systems as a nonlinear equations system. To solve the nonlinear equations system, the Newton method was applied. Then, both genetic algorithm and cooperative co-evolutionary algorithm were applied to fine-tune the components in the biochemical systems to maximise the production and minimise the total amount of chemical concentrations involved. Two biochemical systems were used, namely the ethanol production in the Saccharomyces cerevisiae pathway and the tryptophan production in the Escherichia coli pathway. In evaluating the performance of the proposed method, several comparisons with other works were performed, and the proposed method demonstrated its effectiveness in maximising the production and

minimising the total amount of chemical concentrations involved.

Keywords: biochemical systems production, constrained optimisation, computational intelligence, cooperative co-evolutionary algorithm,

Computational systems biology is a field of biological study that combines the knowledge of science and engineering. The objective of this field is to model the behaviour of biochemical reactions through a computational approach. Within this field, the structures and complexity of biological processes can be investigated as a system [1]. Therefore, computational systems biology enables the scientist to represent the biological process as a system. This allows the biochemical process in a

#### Chapter 4

## An Improved Algorithm for Optimising the Production of Biochemical Systems

Mohd Arfian Ismail, Vitaliy Mezhuyev, Mohd Saberi Mohamad, Shahreen Kasim and Ashraf Osman Ibrahim

#### Abstract

This chapter presents an improved method for constrained optimisation of biochemical systems production. The aim of the proposed method is to maximise its production and, at the same time, to minimise the total amount of chemical concentrations involved in producing the best production. The proposed method models biochemical systems with ordinary differential equations. The optimisation process became complex for the large size of biochemical systems that contain many chemicals. In addition, several constraints as the steady-state constraint and the constraint of chemical concentrations also contributed to the computational complexity and difficulty in the optimisation process. This chapter considers the biochemical systems as a nonlinear equations system. To solve the nonlinear equations system, the Newton method was applied. Then, both genetic algorithm and cooperative co-evolutionary algorithm were applied to fine-tune the components in the biochemical systems to maximise the production and minimise the total amount of chemical concentrations involved. Two biochemical systems were used, namely the ethanol production in the Saccharomyces cerevisiae pathway and the tryptophan production in the Escherichia coli pathway. In evaluating the performance of the proposed method, several comparisons with other works were performed, and the proposed method demonstrated its effectiveness in maximising the production and minimising the total amount of chemical concentrations involved.

Keywords: biochemical systems production, constrained optimisation, computational intelligence, cooperative co-evolutionary algorithm, genetic algorithm, Newton method

#### 1. Introduction

Computational systems biology is a field of biological study that combines the knowledge of science and engineering. The objective of this field is to model the behaviour of biochemical reactions through a computational approach. Within this field, the structures and complexity of biological processes can be investigated as a system [1]. Therefore, computational systems biology enables the scientist to represent the biological process as a system. This allows the biochemical process in a

living cell to be manipulated as a real factory and gives a way for scientists to improve the cell production (microbial production).

found that the Newton method is suitable for the nonlinear equations system due to

Using the Newton method with the GA in optimising the biochemical systems production is a good choice because the Newton method deals with the biochemical

This section describes the proposed ANCGA in detail. The ANCGA is proposed

Step 1—randomly generate the initial n sub-chromosomes in m sub-populations and create an empty external population. The number of sub-populations (m) must be the same to the number of variables in the nonlinear equations system. The subchromosomes represent the variables in the nonlinear equations system. The sub-

in order to improve the performance of the previous method [17] in terms of computational time. In addition, the ANCGA is hope to improve the performance of the previous method [17] in maximising the production and minimising the total amount of chemical concentrations involved. Figure 1 shows the flowchart of ANCGA. The ANCGA operates by treating the biochemical systems as a system of nonlinear equations and then uses the Newton method in solving the nonlinear equations system. Then, the GA and CCA were used in the optimisation process.

The detailed operation of the ANCGA is described in the following steps:

systems, while the GA is used to fine-tune the chemical concentrations by representing the chemical concentrations into a chromosome. However, several problems do occur when dealing with large biochemical systems that contain many chemicals and has complex structures where it makes the representation of the solution become complex and difficult to evaluate. Hence, a method is needed in order to overcome these problems by simplifying the representation of the solution. Using the cooperative co-evolutionary algorithm (CCA) is a good choice because it

has the ability to simplify the representation of the candidate solution by decomposing a single chromosome into multiple sub-chromosomes [17, 25, 26]. In this chapter, a hybrid method known as the advanced Newton cooperative genetic algorithm (ANCGA) that combined the Newton optimisation method; the GA and the CCA were presented. This method models biochemical systems as a system of nonlinear equations and applies the Newton method to solve the system. In the optimisation process, the GA and the CCA were used to represent the variables in a nonlinear system in order to search the best solution. The GA was used to maximise the production, while the CCA was used to minimise the total amount of chemical concentrations involved. The ANCGA that proposed in this study is the improvement of the existing method [17]. The reason of proposing the ANCGA is due to the previous algorithm that takes longer time for the optimisation process. Moreover, the performance of the previous work can be improved in terms of maximising the production and minimising the total amount of chemical concentrations involved. In order to do that, this work introduces a concept of external population. The external population was used to store the best solution found in every generation. The reason of using this concept was to avoid the best solution found in every generation from being lost during the reproduction process. The methods used in this study are presented in the following order. Firstly, the proposed method is explained in detail. Case studies of the fermentation pathway in Saccharomyces cerevisiae (S. cerevisiae) and the tryptophan (trp) of biosynthesis in Escherichia coli (E. coli) are then presented. Following that, the results are discussed,

the convergence speed, simplicity and ease of use [23, 24].

DOI: http://dx.doi.org/10.5772/intechopen.83611

An Improved Algorithm for Optimising the Production of Biochemical Systems

and a brief conclusion is made.

2. The proposed method

chromosome is in the binary format.

63

Integrating the knowledge of microbial production with genomic techniques and biotechnology processes creates the ability to manipulate a living cell to act like a real cell factory, thus opening new doors for researchers seeking to improve microbial productions [2]. One example of improving the microbial production is the optimisation of a biochemical systems production. Generally, biochemical systems can be defined as a series of chemical reactions found in a microorganism cell. With the knowledge of microbial production and genomic techniques, the biochemical systems can be represented as a dynamic mathematical model such as the Michaelis-Menten type [3], the stoichiometric approach [4], flux-balance analysis [5], metabolic control analysis [6] and biochemical systems theory (BST) [7]. Among these various choices, this work uses the BST representation to model the biochemical system. An advantage of using the BST is that prior knowledge of the mechanisms for each reaction is not required in order to build equations and the mathematical models can be designed by identifying the reactants and their interconnections [7]. For that reason, a canonical form that uses an ordinary differential equation (ODE) representation is suitable for modelling biochemical systems [1].

The optimisation of the biochemical systems production is a biotechnological process that aims to improve production by fine-tuning the chemical reaction. Besides that, the total amount of chemical concentrations involved also needs to be taken into account [8, 9]. To date, many studies have been carried out to develop methods for the optimisation of the biochemical systems production. Researchers tend to use the computational methods due to the flexibility of the mathematical models allowing to reduce the required costs and time. Popular methods used are the linear programming method (Vera et al. 2010; Xu 2012) and the geometric programming method [10, 11]. These methods depend on the definitions of the decision variables and the equality and inequality constraints, which could cause a convergence problem if the definition process is not performed well [12]. In order to overcome this problem, the present study was carried out using the stochastic method. The stochastic method operates on an evolving set of candidate solutions. In the evolving process, the candidate solutions are modified by the stochastic operator to produce the next generation. Using the stochastic operator, the search direction is determined by a random method, which makes it more efficient and robust [13]. In addition, the stochastic method does not rely on the manipulation of the objective function and constraints or the initialisation of a feasible point [14]. There are many stochastic methods that can be adopted for the optimisation process, among which is the genetic algorithm (GA) that has been widely found to be the most suitable method [15–17]. The GA works by representing the chemical reaction in the biochemical systems as a chromosome. The chromosome is then evolved and modified by a crossover and mutation process, with the intention to improve the solution.

As mentioned above, this chapter uses the BST method to model biochemical systems. Within the BST, two representations are typically used, namely, the Ssystem and generalised mass action (GMA). This study employs the GMA representation due to its ability to represent the nonlinearity of a biochemical systems and superior performance in optimisation [10]. The GMA uses the power law function, which is an ODE to model the biochemical systems. Applying only the GA for the optimisation of biochemical systems is not sufficient as the GA only finetunes the chemical concentrations. Therefore, a method is needed to deal with the biochemical systems. Implementing the Newton method for the biochemical systems is a good choice because the GMA model that represents the biochemical systems can be viewed as a nonlinear equations system [8, 18–22]. It also has been

#### An Improved Algorithm for Optimising the Production of Biochemical Systems DOI: http://dx.doi.org/10.5772/intechopen.83611

found that the Newton method is suitable for the nonlinear equations system due to the convergence speed, simplicity and ease of use [23, 24].

Using the Newton method with the GA in optimising the biochemical systems production is a good choice because the Newton method deals with the biochemical systems, while the GA is used to fine-tune the chemical concentrations by representing the chemical concentrations into a chromosome. However, several problems do occur when dealing with large biochemical systems that contain many chemicals and has complex structures where it makes the representation of the solution become complex and difficult to evaluate. Hence, a method is needed in order to overcome these problems by simplifying the representation of the solution. Using the cooperative co-evolutionary algorithm (CCA) is a good choice because it has the ability to simplify the representation of the candidate solution by decomposing a single chromosome into multiple sub-chromosomes [17, 25, 26].

In this chapter, a hybrid method known as the advanced Newton cooperative genetic algorithm (ANCGA) that combined the Newton optimisation method; the GA and the CCA were presented. This method models biochemical systems as a system of nonlinear equations and applies the Newton method to solve the system. In the optimisation process, the GA and the CCA were used to represent the variables in a nonlinear system in order to search the best solution. The GA was used to maximise the production, while the CCA was used to minimise the total amount of chemical concentrations involved. The ANCGA that proposed in this study is the improvement of the existing method [17]. The reason of proposing the ANCGA is due to the previous algorithm that takes longer time for the optimisation process. Moreover, the performance of the previous work can be improved in terms of maximising the production and minimising the total amount of chemical concentrations involved. In order to do that, this work introduces a concept of external population. The external population was used to store the best solution found in every generation. The reason of using this concept was to avoid the best solution found in every generation from being lost during the reproduction process. The methods used in this study are presented in the following order. Firstly, the proposed method is explained in detail. Case studies of the fermentation pathway in Saccharomyces cerevisiae (S. cerevisiae) and the tryptophan (trp) of biosynthesis in Escherichia coli (E. coli) are then presented. Following that, the results are discussed, and a brief conclusion is made.

#### 2. The proposed method

This section describes the proposed ANCGA in detail. The ANCGA is proposed in order to improve the performance of the previous method [17] in terms of computational time. In addition, the ANCGA is hope to improve the performance of the previous method [17] in maximising the production and minimising the total amount of chemical concentrations involved. Figure 1 shows the flowchart of ANCGA. The ANCGA operates by treating the biochemical systems as a system of nonlinear equations and then uses the Newton method in solving the nonlinear equations system. Then, the GA and CCA were used in the optimisation process. The detailed operation of the ANCGA is described in the following steps:

Step 1—randomly generate the initial n sub-chromosomes in m sub-populations and create an empty external population. The number of sub-populations (m) must be the same to the number of variables in the nonlinear equations system. The subchromosomes represent the variables in the nonlinear equations system. The subchromosome is in the binary format.

living cell to be manipulated as a real factory and gives a way for scientists to

Integrating the knowledge of microbial production with genomic techniques and biotechnology processes creates the ability to manipulate a living cell to act like a real cell factory, thus opening new doors for researchers seeking to improve microbial productions [2]. One example of improving the microbial production is the optimisation of a biochemical systems production. Generally, biochemical systems can be defined as a series of chemical reactions found in a microorganism cell. With the knowledge of microbial production and genomic techniques, the biochemical systems can be represented as a dynamic mathematical model such as the Michaelis-Menten type [3], the stoichiometric approach [4], flux-balance analysis [5], metabolic control analysis [6] and biochemical systems theory (BST) [7]. Among these various choices, this work uses the BST representation to model the biochemical system. An advantage of using the BST is that prior knowledge of the mechanisms for each reaction is not required in order to build equations and the mathematical models can be designed by identifying the reactants and their interconnections [7]. For that reason, a canonical form that uses an ordinary differential equation (ODE)

improve the cell production (microbial production).

Recent Trends in Artificial Neural Networks - From Training to Prediction

representation is suitable for modelling biochemical systems [1].

62

The optimisation of the biochemical systems production is a biotechnological process that aims to improve production by fine-tuning the chemical reaction. Besides that, the total amount of chemical concentrations involved also needs to be taken into account [8, 9]. To date, many studies have been carried out to develop methods for the optimisation of the biochemical systems production. Researchers tend to use the computational methods due to the flexibility of the mathematical models allowing to reduce the required costs and time. Popular methods used are the linear programming method (Vera et al. 2010; Xu 2012) and the geometric programming method [10, 11]. These methods depend on the definitions of the decision variables and the equality and inequality constraints, which could cause a convergence problem if the definition process is not performed well [12]. In order to overcome this problem, the present study was carried out using the stochastic method. The stochastic method operates on an evolving set of candidate solutions. In the evolving process, the candidate solutions are modified by the stochastic operator to produce the next generation. Using the stochastic operator, the search direction is determined by a random method, which makes it more efficient and robust [13]. In addition, the stochastic method does not rely on the manipulation of the objective function and constraints or the initialisation of a feasible point [14]. There are many stochastic methods that can be adopted for the optimisation process, among which is the genetic algorithm (GA) that has been widely found to be the most suitable method [15–17]. The GA works by representing the chemical reaction in the biochemical systems as a chromosome. The chromosome is then evolved and modified by a crossover and mutation process, with the intention to improve the solution. As mentioned above, this chapter uses the BST method to model biochemical systems. Within the BST, two representations are typically used, namely, the Ssystem and generalised mass action (GMA). This study employs the GMA representation due to its ability to represent the nonlinearity of a biochemical systems and superior performance in optimisation [10]. The GMA uses the power law function, which is an ODE to model the biochemical systems. Applying only the GA for the optimisation of biochemical systems is not sufficient as the GA only finetunes the chemical concentrations. Therefore, a method is needed to deal with the biochemical systems. Implementing the Newton method for the biochemical systems is a good choice because the GMA model that represents the biochemical systems can be viewed as a nonlinear equations system [8, 18–22]. It also has been

Newton method is used to solve the nonlinear equations system. In the evaluation process, a condition might occur depending on whether or not the cooperative chromosome follows the set of constraints. If the cooperative chromosome follows the constraints, then the procedure goes ahead to Step 8; if not, it goes to Step 5. Step 5—decompose the cooperative chromosome into sub-chromosomes. After solving the nonlinear equations system using the Newton method, the variables in the nonlinear equations system are decoded back into the cooperative chromosome

An Improved Algorithm for Optimising the Production of Biochemical Systems

form. Then, the cooperative chromosome is decomposed into multiple subchromosomes. After that, all the sub-chromosomes are sent back to their own sub-

Step 6—select a pair of sub-chromosome for the reproduction process. The selection process is based on their fitness value, where the lowest fitness value is

Step 7—produce new generations. In this step, the genetic operators of crossover and mutation are applied on the selected sub-chromosomes in order to produce new generations. This process is performed up to the last sub-chromosome. Then, the

Step 8—copy the cooperative chromosome into the external population. The process is performed by copying selected cooperative chromosome that fulfil the constraints and put the selected cooperative chromosome into external population. This process is intended to keep the best solution in every generation and prevent it from being lost in the reproduction process (Step 7). At this stage, two conditions may occur: either the maximum number of generations is reached or the maximum number of cooperative chromosomes in the external population is achieved. If these two conditions are fulfilled, the procedure jumps to Step 10; otherwise, the procedure continues to the next step. During this process, if the maximum number of cooperative chromosomes in the external population is reached before the maximum number of generations is achieved, the cooperative chromosome that has the lowest fitness value is deleted and replaced by a newly copied cooperative chromosome. However, if the maximum number of generations is reached before the maximum number of cooperative chromosomes in the external population is

Step 9—select some of the cooperative chromosomes from the external population. This process refers to the elitism of external population concept. The elitism of external population concept works where some (with y probability) of the cooperative chromosomes from the external population are selected and combined with the current sub-chromosomes. The selection process is based on their fitness value, where the cooperative chromosomes from the external population that have the highest fitness value are selected first. Then, this process goes back to Step 5.

populations in order to perform selection and reproduction.

new generation is processed again, starting from Step 2.

achieved, the procedure moves to Step 10.

selected first.

65

Figure 2.

The formation of the cooperative chromosome.

DOI: http://dx.doi.org/10.5772/intechopen.83611

Figure 1. The flow chart of ANCGA.

Step 2—evaluate the sub-chromosome. The evaluation process starts when a representative from every sub-population is selected to produce a complete solution that is known as a cooperative chromosome. The selection of representatives is based on their fitness value, where the lowest values are selected first. This process is known as the sub-chromosome evaluation. The objective of this process is to minimise the total amount of chemical concentrations involved by letting representatives that have the lowest fitness values from every sub-population to be combined together.

Step 3—produce the cooperative chromosome. The cooperative chromosome is produced after all the selected representatives are combined together. The cooperative chromosome is the complete solution. The formation of the cooperative chromosome is depicted in Figure 2.

Step 4—evaluate the cooperative chromosome. In this step, the cooperative chromosome is tested. The evaluation process starts with an encoding of the cooperative chromosome into variables in the nonlinear equations system. Then, the

An Improved Algorithm for Optimising the Production of Biochemical Systems DOI: http://dx.doi.org/10.5772/intechopen.83611

Figure 2.

The formation of the cooperative chromosome.

Newton method is used to solve the nonlinear equations system. In the evaluation process, a condition might occur depending on whether or not the cooperative chromosome follows the set of constraints. If the cooperative chromosome follows the constraints, then the procedure goes ahead to Step 8; if not, it goes to Step 5.

Step 5—decompose the cooperative chromosome into sub-chromosomes. After solving the nonlinear equations system using the Newton method, the variables in the nonlinear equations system are decoded back into the cooperative chromosome form. Then, the cooperative chromosome is decomposed into multiple subchromosomes. After that, all the sub-chromosomes are sent back to their own subpopulations in order to perform selection and reproduction.

Step 6—select a pair of sub-chromosome for the reproduction process. The selection process is based on their fitness value, where the lowest fitness value is selected first.

Step 7—produce new generations. In this step, the genetic operators of crossover and mutation are applied on the selected sub-chromosomes in order to produce new generations. This process is performed up to the last sub-chromosome. Then, the new generation is processed again, starting from Step 2.

Step 8—copy the cooperative chromosome into the external population. The process is performed by copying selected cooperative chromosome that fulfil the constraints and put the selected cooperative chromosome into external population. This process is intended to keep the best solution in every generation and prevent it from being lost in the reproduction process (Step 7). At this stage, two conditions may occur: either the maximum number of generations is reached or the maximum number of cooperative chromosomes in the external population is achieved. If these two conditions are fulfilled, the procedure jumps to Step 10; otherwise, the procedure continues to the next step. During this process, if the maximum number of cooperative chromosomes in the external population is reached before the maximum number of generations is achieved, the cooperative chromosome that has the lowest fitness value is deleted and replaced by a newly copied cooperative chromosome. However, if the maximum number of generations is reached before the maximum number of cooperative chromosomes in the external population is achieved, the procedure moves to Step 10.

Step 9—select some of the cooperative chromosomes from the external population. This process refers to the elitism of external population concept. The elitism of external population concept works where some (with y probability) of the cooperative chromosomes from the external population are selected and combined with the current sub-chromosomes. The selection process is based on their fitness value, where the cooperative chromosomes from the external population that have the highest fitness value are selected first. Then, this process goes back to Step 5.

Step 2—evaluate the sub-chromosome. The evaluation process starts when a representative from every sub-population is selected to produce a complete solution that is known as a cooperative chromosome. The selection of representatives is based on their fitness value, where the lowest values are selected first. This process is known as the sub-chromosome evaluation. The objective of this process is to minimise the total amount of chemical concentrations involved by letting representatives that have the lowest fitness values from every sub-population to be

Recent Trends in Artificial Neural Networks - From Training to Prediction

Step 3—produce the cooperative chromosome. The cooperative chromosome is produced after all the selected representatives are combined together. The cooperative chromosome is the complete solution. The formation of the cooperative chro-

Step 4—evaluate the cooperative chromosome. In this step, the cooperative chromosome is tested. The evaluation process starts with an encoding of the cooperative chromosome into variables in the nonlinear equations system. Then, the

combined together.

The flow chart of ANCGA.

Figure 1.

64

mosome is depicted in Figure 2.

Step 10—choose the best solution. The best solution is chosen among all the cooperative chromosomes in the external population. The selection is based on the fitness values of the cooperative chromosomes, where the cooperative chromosome with the highest fitness value is chosen.

dX<sup>1</sup>

DOI: http://dx.doi.org/10.5772/intechopen.83611

dX<sup>2</sup>

dX<sup>3</sup>

dX<sup>4</sup>

dX<sup>5</sup>

as follows:

dt <sup>¼</sup> Vin � VHK

dt <sup>¼</sup> VHK � VPFK � VCarb

dt <sup>¼</sup> <sup>2</sup>VGAPD � VPK

dt <sup>¼</sup> VPFK � VGAPD � <sup>0</sup>:5VGro

An Improved Algorithm for Optimising the Production of Biochemical Systems

Eq. (2) shows the fluxes at the steady-state condition.

Vin <sup>¼</sup> <sup>0</sup>:8122X�0:<sup>2344</sup>

VHK <sup>¼</sup> <sup>2</sup>:8632X<sup>0</sup>:<sup>7464</sup>

VPFK <sup>¼</sup> <sup>0</sup>:5232X<sup>0</sup>:<sup>7318</sup>

VCarb <sup>¼</sup> <sup>8</sup>:<sup>904</sup> � <sup>10</sup>�<sup>4</sup>X<sup>8</sup>:<sup>6107</sup>

minF<sup>2</sup> ¼ ∑

VGAPD <sup>¼</sup> <sup>7</sup>:<sup>6092</sup> � <sup>10</sup>�<sup>2</sup>

Vgro <sup>¼</sup> <sup>9</sup>:<sup>272</sup> � <sup>10</sup>�<sup>2</sup>

VPK <sup>¼</sup> <sup>9</sup>:<sup>471</sup> � <sup>10</sup>�<sup>2</sup>

VATPase ¼ X5X<sup>8</sup>

the ODE models in Eq. (1) the following forms:

Vin � VHK ¼ 0

VHK � VPFK � VPol ¼ 0 VPFK � VGAPD � 0:5VGol ¼ 0

Thus, the constraint for this case study became as follows:

<sup>k</sup> ≤ Xk ≤ X<sup>1</sup>:<sup>2</sup>

<sup>k</sup> ≤ Yk ≤ Y<sup>50</sup>

X<sup>0</sup>:<sup>8</sup>

Y0

can be formulated as follows:

67

2VGAPD � VPK ¼ 0

dt <sup>¼</sup> <sup>2</sup>VGAPD <sup>þ</sup> VPK � VHK � VCarb � VPFK � VATPase

<sup>2</sup> Y<sup>1</sup>

<sup>1</sup> X<sup>0</sup>:<sup>0243</sup> <sup>5</sup> Y<sup>2</sup>

<sup>2</sup> X�0:<sup>3941</sup> <sup>5</sup> Y<sup>3</sup>

> X<sup>0</sup>:<sup>05</sup> <sup>3</sup> X<sup>0</sup>:<sup>533</sup>

X<sup>0</sup>:<sup>05</sup> <sup>3</sup> X<sup>0</sup>:<sup>533</sup>

For the total amount of chemical concentration involved, it can be formulated

5 j¼1

In the optimisation process, the GMA model was treated as a nonlinear equations system, where all the GMA models were set to be equal to 0. This gave all

2VGAPD þ VPK � VHK � VPol � VPFK � VATPase ¼ 0

Meanwhile, the enzyme concentration constraint was set in the range between 0 and 50 from the steady-state value [8, 29]. The enzyme concentration constraint

For the metabolite concentration constraint, the constraint was set to 20% from the steady-state value, which was in the range between 0.8 and 1.2 [8, 29].

Xj þ ∑ 6 k¼1

<sup>2</sup> Y<sup>6</sup>

X<sup>0</sup>:<sup>6159</sup> <sup>3</sup> X<sup>0</sup>:<sup>1308</sup> <sup>5</sup> Y<sup>4</sup>

> <sup>4</sup> X�0:<sup>0822</sup> <sup>5</sup> Y<sup>7</sup>

<sup>4</sup> X�0:<sup>0822</sup> <sup>5</sup> Y<sup>5</sup>

Xk (3)

<sup>k</sup> k ¼ 1, 2, 3, 4, 5 (5)

<sup>k</sup> k ¼ 1, 2, 3, 4, 5, 8 (6)

(1)

(2)

(4)

Step 11—return to the best solution. This step decodes the selected cooperative chromosome into its real value (the variable in the nonlinear equations system) and gives the best solution set.

#### 3. Case studies

In this section, the effectiveness and efficiency of the ANCGA is demonstrated. The effectiveness of the proposed method refers to the ability of the ANCGA to obtain the best result, while the efficiency refers to the ability of the ANCGA to maintain its performance in producing the best result in several case studies. Two case studies were used, namely, the S. cerevisiae pathway and the E. coli pathway. In order to test the performance of the ANCGA, a Java program based on two Java libraries, namely, jMetal [27] and JAMA of the version 1.0.3, was developed. The jMetal library can be downloaded from http://jmetal.sourceforge.net/index.html, while the JAMA version 1.0.3 can be accessed at http://math.nist.gov/javanumerics/jama/.

#### 3.1 Case study 1: the ethanol production in S. cerevisiae pathway

In this case study, the ANCGA was used to optimise ethanol production in the S. cerevisiae pathway. The GA was used to represent the chemical reactions in the S. cerevisiae pathway, which were metabolites and enzymes. Details of the metabolites and enzymes, including the initial steady-state values, are presented in Table 1. The pathway was suspended in a cell culture at p. 4.5 and had the following ODE models [28].


#### Table 1.

Details of metabolite and enzymes in case study 1.

An Improved Algorithm for Optimising the Production of Biochemical Systems DOI: http://dx.doi.org/10.5772/intechopen.83611

$$\begin{aligned} \frac{dX\_1}{dt} &= V\_{in} - V\_{HK} \\ \frac{dX\_2}{dt} &= V\_{HK} - V\_{PFK} - V\_{Carb} \\ \frac{dX\_3}{dt} &= V\_{PFK} - V\_{GAP} - 0.5V\_{Gop} \\ \frac{dX\_4}{dt} &= 2V\_{GAP} - V\_{PK} \\ \frac{dX\_5}{dt} &= 2V\_{GAP} + V\_{PK} - V\_{HK} - V\_{Carb} - V\_{PFK} - V\_{ATPase} \end{aligned} \tag{1}$$

Eq. (2) shows the fluxes at the steady-state condition.

$$\begin{aligned} V\_{in} &= 0.8122 \mathbf{X}\_2^{-0.2344} Y\_1 \\ V\_{HK} &= 2.8632 \mathbf{X}\_1^{0.7464} \mathbf{X}\_5^{0.0243} Y\_2 \\ V\_{PFK} &= 0.5232 \mathbf{X}\_2^{0.7318} \mathbf{X}\_5^{-0.3941} Y\_3 \\ V\_{Carb} &= 8.904 \times 10^{-4} \mathbf{X}\_2^{8.6107} Y\_6 \\ V\_{GAD} &= 7.6092 \times 10^{-2} \mathbf{X}\_3^{0.619} \mathbf{X}\_5^{0.1308} Y\_4 \\ V\_{gro} &= 9.272 \times 10^{-2} \mathbf{X}\_3^{0.05} \mathbf{X}\_4^{0.533} \mathbf{X}\_5^{-0.0822} Y\_7 \\ V\_{PK} &= 9.471 \times 10^{-2} \mathbf{X}\_3^{0.05} \mathbf{X}\_4^{0.533} \mathbf{X}\_5^{-0.0822} Y\_8 \\ V\_{ATPase} &= \mathbf{X}\_5 \mathbf{X}\_8 \end{aligned} \tag{2}$$

For the total amount of chemical concentration involved, it can be formulated as follows:

$$\min F\_2 = \sum\_{j=1}^{5} X\_j + \sum\_{k=1}^{6} X\_k \tag{3}$$

In the optimisation process, the GMA model was treated as a nonlinear equations system, where all the GMA models were set to be equal to 0. This gave all the ODE models in Eq. (1) the following forms:

$$\begin{aligned} \mathbf{V}\_{in} - \mathbf{V}\_{HK} &= \mathbf{0} \\ \mathbf{V}\_{HK} - \mathbf{V}\_{PFK} - \mathbf{V}\_{Pol} &= \mathbf{0} \\ \mathbf{V}\_{PFK} - \mathbf{V}\_{GAP} - \mathbf{0}.5 \mathbf{V}\_{Go} &= \mathbf{0} \\ 2\mathbf{V}\_{GAP} - \mathbf{V}\_{PK} &= \mathbf{0} \\ 2\mathbf{V}\_{GAP} + \mathbf{V}\_{PK} - \mathbf{V}\_{HK} - \mathbf{V}\_{Pol} - \mathbf{V}\_{PFK} - \mathbf{V}\_{ATPase} &= \mathbf{0} \end{aligned} \tag{4}$$

For the metabolite concentration constraint, the constraint was set to 20% from the steady-state value, which was in the range between 0.8 and 1.2 [8, 29]. Thus, the constraint for this case study became as follows:

$$X\_k^{0.8} \le X\_k \le X\_k^{1.2} \qquad k = 1, 2, 3, 4, 5 \tag{5}$$

Meanwhile, the enzyme concentration constraint was set in the range between 0 and 50 from the steady-state value [8, 29]. The enzyme concentration constraint can be formulated as follows:

$$Y\_k^0 \le Y\_k \le Y\_k^{50} \qquad k = 1, 2, 3, 4, 5, 8 \tag{6}$$

Step 10—choose the best solution. The best solution is chosen among all the cooperative chromosomes in the external population. The selection is based on the fitness values of the cooperative chromosomes, where the cooperative chromosome

Recent Trends in Artificial Neural Networks - From Training to Prediction

Step 11—return to the best solution. This step decodes the selected cooperative chromosome into its real value (the variable in the nonlinear equations system) and

In this section, the effectiveness and efficiency of the ANCGA is demonstrated. The effectiveness of the proposed method refers to the ability of the ANCGA to obtain the best result, while the efficiency refers to the ability of the ANCGA to maintain its performance in producing the best result in several case studies. Two case studies were used, namely, the S. cerevisiae pathway and the E. coli pathway. In order to test the performance of the ANCGA, a Java program based on two Java libraries, namely, jMetal [27] and JAMA of the version 1.0.3, was developed. The jMetal library can be downloaded from http://jmetal.sourceforge.net/index.html, while the JAMA

In this case study, the ANCGA was used to optimise ethanol production in the S. cerevisiae pathway. The GA was used to represent the chemical reactions in the S. cerevisiae pathway, which were metabolites and enzymes. Details of the metabolites and enzymes, including the initial steady-state values, are presented in Table 1. The pathway was suspended in a cell culture at p. 4.5 and had the

Metabolite/enzyme Symbol Initial steady-state value

Glcin X1 0.0345 G6P X2 1.0110 FDP X3 9.1440 PEP X4 0.0095 ATP X5 1.1278 Vin Y1 19.70 VHK Y2 68.50 VPFK Y3 31.70 VGAPD Y4 49.90 VPK Y5 3440.00 VCarb Y6 14.31 VGro Y7 203.00 VATPase Y8 25.10

version 1.0.3 can be accessed at http://math.nist.gov/javanumerics/jama/.

3.1 Case study 1: the ethanol production in S. cerevisiae pathway

with the highest fitness value is chosen.

gives the best solution set.

following ODE models [28].

Table 1.

66

Details of metabolite and enzymes in case study 1.

3. Case studies

#### 3.2 Case study 2: the tryptophan biosynthesis in E. coli pathway

For case study 2, the ANCGA was used to optimise the end product of this pathway, which was trp production. The complete description of this pathway was provided by Xiu and colleagues [30]. The GMA models of this pathway are given as follows:

$$\begin{aligned} \frac{dX\_1}{dt} &= V\_{11} - V\_{12} \\ \frac{dX\_2}{dt} &= V\_{21} - V\_{22} \\ \frac{dX\_3}{dt} &= V\_{31} - V\_{32} - V\_{33} - V\_{34} \end{aligned} \tag{7}$$

X<sup>0</sup>:<sup>8</sup>

Summary of reaction concentrations in case study 2.

DOI: http://dx.doi.org/10.5772/intechopen.83611

<sup>k</sup> ≤ Xk ≤ X<sup>1</sup>:<sup>2</sup>

Reaction Initial steady-state value

X1 0.0345 X2 1.0110 X3 9.1440 X4 0.0095 X5 1.1278 X6 19.70 X8 25.10

An Improved Algorithm for Optimising the Production of Biochemical Systems

0≤ X<sup>4</sup> ≤ 0:00624

4≤ X<sup>5</sup> ≤ 10 500≤ X<sup>6</sup> ≤ 5000 X<sup>7</sup> ¼ 0:0022X<sup>5</sup> 0≤ X<sup>8</sup> ≤ 1000

X<sup>9</sup> ¼ 7:5 X<sup>10</sup> ¼ 0:005 X<sup>11</sup> ¼ 0:9 X<sup>12</sup> ¼ 0:02 X<sup>13</sup> ¼ 0

4. Results and discussion

Number of sub-chromosomes in sub-

Number of chromosomes in external

List of all parameter settings used.

population

population P

Table 3.

69

Table 2.

<sup>k</sup> k ¼ 1, 2, 3

In performing the experiments, many parameter settings were used. The list of

all parameter settings used in this study is listed in Table 3, whereas Table 4 presents the parameter settings in producing the best result. The binary coding was used to represent the chemical concentrations. For the Newton method, fixed

Number of sub-populations Depend on the number of variables in nonlinear

Maximum number of generations [100,110,120,130,140,150] Crossover rate [0.1,0.2,0.3,0.4,0.5] Mutation rate [0.1,0.2,0.3,0.4,0.5] Elitism rate [0.1,0.2,0.3,0.4,0.5]

equations system

[100,110,120,130,140,150]

[100,110,120,130,140,150]

Parameter Rate

(12)

where X1 is the mRNA concentration, X2 is the enzyme concentration and X3 is the trp concentration. The rates of all reactions in this pathway at steady state are given as follows:

$$\begin{aligned} V\_{11} &= 0.6403 \mathbf{X}\_3^{-5.87 \times 10^{-4}} \mathbf{X}\_5^{-0.832} \\ V\_{12} &= 1.0233 \mathbf{X}\_1 \mathbf{X}\_4^{0.005} \mathbf{X}\_{11}^{0.9965} \\ V\_{21} &= X\_1 \\ V\_{22} &= 1.4854 X\_2 \mathbf{X}\_4^{-0.1349} \mathbf{X}\_{12}^{0.8651} \\ V\_{33} &= 0.5534 X\_2 \mathbf{X}\_3^{-0.5573} \mathbf{X}\_6^{0.5573} \\ V\_{32} &= X\_3 \mathbf{X}\_4 \\ V\_{33} &= 0.9942 \mathbf{X}\_3^{7.0426 \times 10^{-4}} \mathbf{X}\_7 \\ V\_{34} &= 0.8925 \mathbf{X}\_3^{3.5 \times 10^{-6}} \mathbf{X}\_4^{0.9760} \mathbf{X}\_8 \mathbf{X}\_9^{-0.0240} \mathbf{X}\_{10}^{-3.5 \times 10^{-6}} \end{aligned} \tag{8}$$

The trp production in this case study is given by the reaction V34 [31]. This leads to optimisation that can be formulated as follows:

$$\max F = V\_{\mathfrak{A}} \tag{9}$$

For the total amount of chemical concentrations involved, it can be formulated as follows:

$$\min F\_2 = \sum\_{k=1}^{5} X\_k + X\_8 \tag{10}$$

Similar to case study 1, the GMA model was set to be equal to 0, thus Eq. (8) became as follows:

$$\begin{aligned} V\_{11} - V\_{12} &= 0\\ V\_{21} - V\_{22} &= 0\\ V\_{31} - V\_{32} - V\_{33} - V\_{34} &= 0 \end{aligned} \tag{11}$$

In this case study, the GA and CCA only represent several chemical concentrations. This was because not all chemical concentrations were being tuned [1, 10, 11]. The chemical concentrations that tuned were X1 up to X6 and X8. These chemical concentrations including their initial steady states are summarised in Table 2. For the other chemical concentrations which were X7 and X9 up to X13, fixed values were used [1, 10, 11]. Eq. (12) lists the range of these chemicals.

An Improved Algorithm for Optimising the Production of Biochemical Systems DOI: http://dx.doi.org/10.5772/intechopen.83611


#### Table 2.

3.2 Case study 2: the tryptophan biosynthesis in E. coli pathway

Recent Trends in Artificial Neural Networks - From Training to Prediction

dt <sup>¼</sup> <sup>V</sup><sup>11</sup> � <sup>V</sup><sup>12</sup>

dt <sup>¼</sup> <sup>V</sup><sup>21</sup> � <sup>V</sup><sup>22</sup>

dX<sup>1</sup>

dX<sup>2</sup>

dX<sup>3</sup>

<sup>V</sup><sup>11</sup> <sup>¼</sup> <sup>0</sup>:6403X�5:87�10�<sup>4</sup>

<sup>V</sup><sup>12</sup> <sup>¼</sup> <sup>1</sup>:0233X1X<sup>0</sup>:<sup>0035</sup>

<sup>V</sup><sup>22</sup> <sup>¼</sup> <sup>1</sup>:4854X2X�0:<sup>1349</sup>

<sup>V</sup><sup>31</sup> <sup>¼</sup> <sup>0</sup>:5534X2X�0:<sup>5573</sup>

<sup>V</sup><sup>33</sup> <sup>¼</sup> <sup>0</sup>:9942X<sup>7</sup>:0426�10�<sup>4</sup>

<sup>V</sup><sup>34</sup> <sup>¼</sup> <sup>0</sup>:8925X<sup>3</sup>:5�10�<sup>6</sup>

leads to optimisation that can be formulated as follows:

V<sup>21</sup> ¼ X<sup>1</sup>

V<sup>32</sup> ¼ X3X<sup>4</sup>

given as follows:

given as follows:

as follows:

68

became as follows:

For case study 2, the ANCGA was used to optimise the end product of this pathway, which was trp production. The complete description of this pathway was provided by Xiu and colleagues [30]. The GMA models of this pathway are

dt <sup>¼</sup> <sup>V</sup><sup>31</sup> � <sup>V</sup><sup>32</sup> � <sup>V</sup><sup>33</sup> � <sup>V</sup><sup>34</sup>

where X1 is the mRNA concentration, X2 is the enzyme concentration and X3 is the trp concentration. The rates of all reactions in this pathway at steady state are

> <sup>3</sup> X�0:<sup>8332</sup> 5

> > <sup>4</sup> X<sup>0</sup>:<sup>9965</sup> 11

<sup>4</sup> X<sup>0</sup>:<sup>8651</sup> 12

<sup>3</sup> X<sup>0</sup>:<sup>5573</sup> 6

<sup>3</sup> X<sup>7</sup>

<sup>3</sup> X<sup>0</sup>:<sup>9760</sup>

minF<sup>2</sup> ¼ ∑

V<sup>11</sup> � V<sup>12</sup> ¼ 0 V<sup>21</sup> � V<sup>22</sup> ¼ 0

were used [1, 10, 11]. Eq. (12) lists the range of these chemicals.

The trp production in this case study is given by the reaction V34 [31]. This

For the total amount of chemical concentrations involved, it can be formulated

5 k¼1

Similar to case study 1, the GMA model was set to be equal to 0, thus Eq. (8)

V<sup>31</sup> � V<sup>32</sup> � V<sup>33</sup> � V<sup>34</sup> ¼ 0

In this case study, the GA and CCA only represent several chemical concentrations. This was because not all chemical concentrations were being tuned [1, 10, 11]. The chemical concentrations that tuned were X1 up to X6 and X8. These chemical concentrations including their initial steady states are summarised in Table 2. For the other chemical concentrations which were X7 and X9 up to X13, fixed values

<sup>4</sup> X8X�0:<sup>0240</sup>

<sup>9</sup> <sup>X</sup>�3:5�10�<sup>6</sup> 10

maxF ¼ V<sup>34</sup> (9)

Xk þ X<sup>8</sup> (10)

(7)

(8)

(11)

Summary of reaction concentrations in case study 2.

$$\begin{aligned} &X\_k^{0.8} \le X\_k \le X\_k^{1.2} & k = 1, 2, 3\\ &0 \le X\_4 \le 0.00624\\ &4 \le X\_5 \le 10\\ &500 \le X\_6 \le 5000\\ &X\_7 = 0.0022X\_5\\ &0 \le X\_8 \le 1000\\ &X\_9 = 7.5\\ &X\_{10} = 0.005\\ &X\_{11} = 0.9\\ &X\_{12} = 0.02\\ &X\_{13} = 0 \end{aligned} \tag{12}$$

#### 4. Results and discussion

In performing the experiments, many parameter settings were used. The list of all parameter settings used in this study is listed in Table 3, whereas Table 4 presents the parameter settings in producing the best result. The binary coding was used to represent the chemical concentrations. For the Newton method, fixed


Table 3. List of all parameter settings used.

#### Recent Trends in Artificial Neural Networks - From Training to Prediction


#### Table 4.

Parameter settings in producing optimum solution.


#### Table 5.

The full result of case study 1.

parameters were used, namely, 50 for the maximum number of iterations and 10<sup>6</sup> for tolerance.

The full results of the E. coli pathway are presented in Table 7. The ANCGA was

able to improve the F1 (production of trp) to 3.9774 from its initial steady state. Meanwhile, the proposed method was able to reduce the F2 (total amount of chemical concentrations involved) to 6006.4280. All variables representing the chemical reaction followed their constraints and were in the optimum range. To assess the performance of the ANCGA, the results achieved were compared to the results of other methods, with the details of the comparison shown in Table 8. As presented in the table, the F1 of the ANCGA was higher when compared to the methods employed in other works. Similar to the previous case study, 100 experiments were conducted, and the average result was calculated in order to validate the ANCGA results. Table 7 presents the average result. From the data in Table 7, it can be concluded that the ANCGA is reliable in performing the optimisation of this pathway because the average of all the variables follows their constraints. From the observations presented in Tables 7 and 8, it can be concluded that the ANCGA is effective in optimising the trp production as well as producing reliable results.

Work by F1 F2 Marin-Sanguino et al. [10] 3.062 6006.1412 Vera et al. [1] 3.05 6007.1314 Xu [11] 3.946 6007.7814 Previous method [17] 3.9759 6006.5581 ANCGA 3.9774 6006.4280

Work by F1 F2 Xu [11] 52.38 297.664 Rodriguez-Acosta et al. [29] 52.31 295.270 Previous method [17] 52.91 294.800 ANCGA 53.02 293.5249

An Improved Algorithm for Optimising the Production of Biochemical Systems

Variables Best solution Average X1 0.8064 1.0742 X2 0.8046 1.1085 X3 0.8000 0.8000 X4 0.0054 0.0054 X5 4.0116 4.4694 X6 5000 5000 X8 1000 1000 F1 3.9774 3.9616 F2 6006.4280 6007.4575

Table 6.

Table 7.

Table 8.

71

The full result of case study 2.

Comparison with other works for case study 2.

Comparison with other works for case study 1.

DOI: http://dx.doi.org/10.5772/intechopen.83611

The full results obtained by the ANCGA when applied on S. cerevisiae pathway are given in Table 5. At the best solution, the ANCGA was able to increase the F1 (ethanol production) up to 53.02 bigger than its initial steady-state value. For the F2 (total amount of chemical concentrations involved), the proposed method was able to reduce it to 293.5249. All metabolites and enzymes fulfilled their constraints, with all the metabolites staying in the range of 0.8–1.2, while all the enzymes were in the range of 0–50. The performance of the ANCGA was assessed by comparing the result obtained by ANCGA with other works, and the comparison results are listed in Table 6. As shown in the table, the ANCGA produced higher results as compared to other methods. In addition, to verify the results achieved by the ANCGA, an average of 100 independent runs was recorded. The results are summarised in Table 5. It shows that the average result for the metabolites and enzymes fulfilled their constraints, whereby they were in their optimum range, thus leading to the conclusion that the ANCGA is able to produce reliable results. It can be said that the ANCGA can produce higher production of ethanol as compared to the methods used in other studies.

An Improved Algorithm for Optimising the Production of Biochemical Systems DOI: http://dx.doi.org/10.5772/intechopen.83611


#### Table 6.

Comparison with other works for case study 1.


#### Table 7.

The full result of case study 2.


#### Table 8.

parameters were used, namely, 50 for the maximum number of iterations and 10<sup>6</sup>

Parameter Case study 1 Case study 2 Number of sub-populations 11 7 Number of sub-chromosomes in sub-population 150 140 Number of chromosomes in external population P 100 100 Maximum number of generations 150 130 Crossover rate 0.3 0.4 Mutation rate 0.1 0.1 Elitism rate 0.2 0.2

Recent Trends in Artificial Neural Networks - From Training to Prediction

Variables Best solution 1 Average X1 1.1240 0.9951 X2 1.0322 1.0018 X3 0.9900 1.0053 X4 1.1407 1.1297 X5 1.0001 0.9831 Y1 49.8103 49.9793 Y2 45.3702 45.0767 Y3 45.3452 49.8103 Y4 48.5112 47.4064 Y5 49.4448 49.3426 Y8 49.7563 49.7876 F1 53.0200 52.7499 F2 293.5249 294.5178

The full results obtained by the ANCGA when applied on S. cerevisiae pathway are given in Table 5. At the best solution, the ANCGA was able to increase the F1 (ethanol production) up to 53.02 bigger than its initial steady-state value. For the F2 (total amount of chemical concentrations involved), the proposed method was able to reduce it to 293.5249. All metabolites and enzymes fulfilled their constraints, with all the metabolites staying in the range of 0.8–1.2, while all the enzymes were in the range of 0–50. The performance of the ANCGA was assessed by comparing the result obtained by ANCGA with other works, and the comparison results are listed in Table 6. As shown in the table, the ANCGA produced higher results as compared to other methods. In addition, to verify the results achieved by the ANCGA, an average of 100 independent runs was recorded. The results are summarised in Table 5. It shows that the average result for the metabolites and enzymes fulfilled their constraints, whereby they were in their optimum range, thus leading to the conclusion that the ANCGA is able to produce reliable results. It can be said that the ANCGA can produce higher production of ethanol as compared to

for tolerance.

The full result of case study 1.

Table 5.

70

Table 4.

Parameter settings in producing optimum solution.

the methods used in other studies.

Comparison with other works for case study 2.

The full results of the E. coli pathway are presented in Table 7. The ANCGA was able to improve the F1 (production of trp) to 3.9774 from its initial steady state. Meanwhile, the proposed method was able to reduce the F2 (total amount of chemical concentrations involved) to 6006.4280. All variables representing the chemical reaction followed their constraints and were in the optimum range. To assess the performance of the ANCGA, the results achieved were compared to the results of other methods, with the details of the comparison shown in Table 8. As presented in the table, the F1 of the ANCGA was higher when compared to the methods employed in other works. Similar to the previous case study, 100 experiments were conducted, and the average result was calculated in order to validate the ANCGA results. Table 7 presents the average result. From the data in Table 7, it can be concluded that the ANCGA is reliable in performing the optimisation of this pathway because the average of all the variables follows their constraints. From the observations presented in Tables 7 and 8, it can be concluded that the ANCGA is effective in optimising the trp production as well as producing reliable results.

The external population concept used by ANCGA can be validated by comparing it with the previous method proposed in [17]. The aim of the external population concept was to reduce the computational time and the number of generations. To learn the effect of the external population concept, several experiments were conducted. To investigate the decrease in the number of generations, F1 was set to 52.5 for case study 1 and 3.90 for case study 2. After F1 was achieved, the process was stopped. This helped to investigate which method required more generations in achieving the target production. Figures 3 and 4 illustrate the comparisons of all case studies. In both figures, the maximum number of the external population was smaller as compared to the maximum number of the previous method in achieving F1. This was caused by the concept of external population that was introduced in this study. By using this concept, the best solutions found in the iteration process could be maintained and thus enabled the number of generations to be reduced. In addition, it was found that this concept tended to converge faster than the previous method. This meant that the use of the external population concept allowed faster search of the best solution. In conclusion, the external population concept had an impact in reducing the number of generations and helped in faster convergence as compared to previous methods. To determine the statistical significance between the proposed method and previous methods, the paired t-test and the Wilcoxon signed-rank test were used. The result of the statistical tests showed that all p-values were <0.05, thus confirming that the proposed method significantly improved the previous method.

computational time results, and it was found that the ANCGA required less time as compared to the method in [17]. This situation occurred because the high-quality solutions were stored in the external population and then combined with the current solution in the optimisation process. Copying the high-quality solutions into the external population prevented them from being lost (because the optimisation process involving crossover and mutation operation could lose the high-quality solutions). By storing the high-quality solutions into the external population, it would be able to keep the best solution until the optimisation process stopped. To determine the significant improvement of the proposed method against the previous method, the paired t-test and the Wilcoxon signed-rank test were used. The p-value from both tests was <0.05. From this finding, the proposed method and the previous method were statistically different from each other, and the improvement

Method Case study 1 Case study 2 ANCGA 75.56 38.07 Previous method [17] 80.40 40.45

An Improved Algorithm for Optimising the Production of Biochemical Systems

Improving production has become an important issue in the optimisation of biochemical systems. Many factors need to be considered to ensure optimal production. In this work, a hybrid method for constraint optimisation of the biochemical systems production known as the ANCGA was presented. The ANCGA was developed based on a previous method [17], where the ANCGA combined the Newton method, GA and CCA. This study introduced a concept of external population. The aim of this concept was to reduce computational time. In this work, the biochemical system was modelled by a nonlinear equations system. In the optimisation process, the Newton method was employed to deal with a system of nonlinear equations. The GA and CCA were then applied to fine-tune the chemical concentration value in the nonlinear system in order to search for the best solution. During the optimisation process, the high-quality solutions were copied and stored into the external population. The purpose of this process was to avoid the loss of high-quality solutions during the optimisation process. Then, some solutions from the external population were mixed with the next generation of solutions. By doing this, the computational time and number of generations were reduced. In the present study, the proposed method was applied on two case studies, and better results were obtained as compared to the methods presented in other works. In addition, the results were validated, and they demonstrated that the constraints of all the components in the biochemical system were fulfilled. Thus, it can be concluded that the performance of the ANCGA is

Special appreciation to Universiti Malaysia Pahang for the sponsorship of this study by approving the RDU Grant Vot No. RDU180307. Special thanks to the

of the proposed method could be accepted.

The computation times obtained (in second).

DOI: http://dx.doi.org/10.5772/intechopen.83611

effective and reliable in producing the best result.

reviewers and editor who reviewed this manuscript.

Acknowledgements

73

5. Conclusion

Table 9.

Meanwhile, to investigate the decrease in computational time, the maximum number of generations was not set, but F1 was set to 52.5 for case study 1 and 3.9 for case study 2. After F1 was achieved, the process was terminated. Table 9 lists the

Figure 3. The comparison of results of elitism concept and non-elitism concept for case study 1.

Figure 4. The comparison of results of elitism concept and non-elitism concept for case study 2.

An Improved Algorithm for Optimising the Production of Biochemical Systems DOI: http://dx.doi.org/10.5772/intechopen.83611


Table 9.

The external population concept used by ANCGA can be validated by comparing it with the previous method proposed in [17]. The aim of the external population concept was to reduce the computational time and the number of generations. To learn the effect of the external population concept, several experiments were conducted. To investigate the decrease in the number of generations, F1 was set to 52.5 for case study 1 and 3.90 for case study 2. After F1 was achieved, the process was stopped. This helped to investigate which method required more generations in achieving the target production. Figures 3 and 4 illustrate the comparisons of all case studies. In both figures, the maximum number of the external population was smaller as compared to the maximum number of the previous method in achieving F1. This was caused by the concept of external population that was introduced in this study. By using this concept, the best solutions found in the iteration process could be maintained and thus enabled the number of generations to be reduced. In addition, it was found that this concept tended to converge faster than the previous method. This meant that the use of the external population concept allowed faster search of the best solution. In conclusion, the external population concept had an impact in reducing the number of generations and helped in faster convergence as compared to previous methods. To determine the statistical significance between the proposed method and previous methods, the paired t-test and the Wilcoxon signed-rank test were used. The result of the statistical tests showed that all p-values were <0.05, thus confirming

Recent Trends in Artificial Neural Networks - From Training to Prediction

that the proposed method significantly improved the previous method.

The comparison of results of elitism concept and non-elitism concept for case study 1.

The comparison of results of elitism concept and non-elitism concept for case study 2.

Figure 3.

Figure 4.

72

Meanwhile, to investigate the decrease in computational time, the maximum number of generations was not set, but F1 was set to 52.5 for case study 1 and 3.9 for case study 2. After F1 was achieved, the process was terminated. Table 9 lists the

The computation times obtained (in second).

computational time results, and it was found that the ANCGA required less time as compared to the method in [17]. This situation occurred because the high-quality solutions were stored in the external population and then combined with the current solution in the optimisation process. Copying the high-quality solutions into the external population prevented them from being lost (because the optimisation process involving crossover and mutation operation could lose the high-quality solutions). By storing the high-quality solutions into the external population, it would be able to keep the best solution until the optimisation process stopped. To determine the significant improvement of the proposed method against the previous method, the paired t-test and the Wilcoxon signed-rank test were used. The p-value from both tests was <0.05. From this finding, the proposed method and the previous method were statistically different from each other, and the improvement of the proposed method could be accepted.

#### 5. Conclusion

Improving production has become an important issue in the optimisation of biochemical systems. Many factors need to be considered to ensure optimal production. In this work, a hybrid method for constraint optimisation of the biochemical systems production known as the ANCGA was presented. The ANCGA was developed based on a previous method [17], where the ANCGA combined the Newton method, GA and CCA. This study introduced a concept of external population. The aim of this concept was to reduce computational time. In this work, the biochemical system was modelled by a nonlinear equations system. In the optimisation process, the Newton method was employed to deal with a system of nonlinear equations. The GA and CCA were then applied to fine-tune the chemical concentration value in the nonlinear system in order to search for the best solution. During the optimisation process, the high-quality solutions were copied and stored into the external population. The purpose of this process was to avoid the loss of high-quality solutions during the optimisation process. Then, some solutions from the external population were mixed with the next generation of solutions. By doing this, the computational time and number of generations were reduced. In the present study, the proposed method was applied on two case studies, and better results were obtained as compared to the methods presented in other works. In addition, the results were validated, and they demonstrated that the constraints of all the components in the biochemical system were fulfilled. Thus, it can be concluded that the performance of the ANCGA is effective and reliable in producing the best result.

#### Acknowledgements

Special appreciation to Universiti Malaysia Pahang for the sponsorship of this study by approving the RDU Grant Vot No. RDU180307. Special thanks to the reviewers and editor who reviewed this manuscript.

References

1427-1438

e1003658

1623(1):6-12

2008;9(5):422-436

[1] Vera J, Gonzalez-Alcon C, Marin-Sanguino A, Torres N. Optimization of

DOI: http://dx.doi.org/10.5772/intechopen.83611

An Improved Algorithm for Optimising the Production of Biochemical Systems

Computers and Chemical Engineering.

[9] Xu G. Bi-objective optimization of

biochemical systems by linear programming. Applied Mathematics and Computation. 2012;218(14):

[10] Marin-Sanguino A, Voit EO, Gonzalez-Alcon C, Torres NV. Optimization of biotechnological systems through geometric

programming. Theoretical Biology and Medical Modelling. 2007;4:38-54

[11] Xu G. Steady-state optimization of biochemical systems through geometric programming. European Journal of Operational Research. 2013;225(1):

[12] Mariano AP et al. Optimization strategies based on sequential quadratic

[13] Balsa-Canto E, Banga JR, Egea JA, Fernandez-Villaverde A, Hijas-Liste GM. Global optimization in systems biology: Stochastic methods and their applications. In: Goryanin II, Goryachev AB, editors. Advances in Systems Biology. Vol. 736. New York: Springer;

programming applied for a fermentation process for butanol production. Applied Biochemistry and Biotechnology. 2009;159(2):366-381

2012. pp. 409-424

[14] Mariano AP et al. Genetic

10.2202/1542-6580.2333

algorithms (binary and real codes) for the optimisation of a fermentation process for butanol production. International Journal of Chemical Reactor Engineering. 2010;8. DOI:

[15] Elsayed SM, Sarker RA, Essam DL. A new genetic algorithm for solving optimization problems. Engineering

2008;32(8):1707-1713

7562-7572

12-20

mathematical programming: Methods and applications. Computers & Operations Research. 2010;37(8):

[2] Sowa SW, Baldea M, Contreras LM. Optimizing metabolite production using

Computational Biology. 2014;10(6):

[4] Planes FJ, Beasley JE. A critical examination of stoichiometric and path-

finding approaches to metabolic pathways. Briefings in Bioinformatics.

Springer; 2013. pp. 414-423

2013:1-15

75

[5] Salleh A, Mohamad M, Deris S, Illias R. Identifying minimal genomes and essential genes in metabolic model using flux balance analysis. In: Selamat A, Nguyen N, Haron H, editors. Intelligent Information and Database Systems SE - 43. Vol. 7802. Berlin, Heidelberg:

[6] Fell D. Metabolic control analysis. In: Alberghina L, Westerhoff HV, editors. Systems Biology SE - 80. Vol. 13. Berlin, Heidelberg: Springer; 2005. pp. 69-80

[7] Voit EO. Biochemical systems theory: A review. ISRN Biomathematics. 2013;

[8] Link H, Vera J, Weuster-Botz D, Darias NT, Franco-Lara E. Multiobjective steady state optimization of biochemical reaction networks using a

constrained genetic algorithm.

[3] Sakamoto N. Characterization of the transit and transition times for a pathway unit of Michaelis–Menten mechanism. Biochimica et Biophysica Acta (BBA) - General Subjects. 2003;

biochemical systems through

periodic oscillations. PLoS

### Author details

Mohd Arfian Ismail<sup>1</sup> \*, Vitaliy Mezhuyev<sup>1</sup> , Mohd Saberi Mohamad2,3, Shahreen Kasim<sup>4</sup> and Ashraf Osman Ibrahim5,6

1 Faculty of Computer Systems and Software Engineering, Universiti Malaysia Pahang, Gambang, Pahang, Malaysia

2 Institute For Artificial Intelligence and Big Data, Universiti Malaysia Kelantan, City Campus, Kota Bharu, Kelantan, Malaysia

3 Faculty of Bioengineering and Technology, Universiti Malaysia Kelantan, Jeli Campus, Jeli, Kelantan, Malaysia

4 Soft Computing and Data Mining Centre, Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn, Johor, Malaysia

5 Faculty of Computer Science and Information Technology, Alzaiem Alazhari University, Khartoum North, Sudan

6 Arab Open University, Khartoum, Sudan

\*Address all correspondence to: arfian@ump.edu.my

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

An Improved Algorithm for Optimising the Production of Biochemical Systems DOI: http://dx.doi.org/10.5772/intechopen.83611

#### References

Author details

Mohd Arfian Ismail<sup>1</sup>

\*, Vitaliy Mezhuyev<sup>1</sup>

Recent Trends in Artificial Neural Networks - From Training to Prediction

1 Faculty of Computer Systems and Software Engineering, Universiti Malaysia

2 Institute For Artificial Intelligence and Big Data, Universiti Malaysia Kelantan,

3 Faculty of Bioengineering and Technology, Universiti Malaysia Kelantan,

4 Soft Computing and Data Mining Centre, Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn, Johor, Malaysia

5 Faculty of Computer Science and Information Technology, Alzaiem Alazhari

© 2019 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

Shahreen Kasim<sup>4</sup> and Ashraf Osman Ibrahim5,6

City Campus, Kota Bharu, Kelantan, Malaysia

Pahang, Gambang, Pahang, Malaysia

Jeli Campus, Jeli, Kelantan, Malaysia

University, Khartoum North, Sudan

6 Arab Open University, Khartoum, Sudan

provided the original work is properly cited.

74

\*Address all correspondence to: arfian@ump.edu.my

, Mohd Saberi Mohamad2,3,

[1] Vera J, Gonzalez-Alcon C, Marin-Sanguino A, Torres N. Optimization of biochemical systems through mathematical programming: Methods and applications. Computers & Operations Research. 2010;37(8): 1427-1438

[2] Sowa SW, Baldea M, Contreras LM. Optimizing metabolite production using periodic oscillations. PLoS Computational Biology. 2014;10(6): e1003658

[3] Sakamoto N. Characterization of the transit and transition times for a pathway unit of Michaelis–Menten mechanism. Biochimica et Biophysica Acta (BBA) - General Subjects. 2003; 1623(1):6-12

[4] Planes FJ, Beasley JE. A critical examination of stoichiometric and pathfinding approaches to metabolic pathways. Briefings in Bioinformatics. 2008;9(5):422-436

[5] Salleh A, Mohamad M, Deris S, Illias R. Identifying minimal genomes and essential genes in metabolic model using flux balance analysis. In: Selamat A, Nguyen N, Haron H, editors. Intelligent Information and Database Systems SE - 43. Vol. 7802. Berlin, Heidelberg: Springer; 2013. pp. 414-423

[6] Fell D. Metabolic control analysis. In: Alberghina L, Westerhoff HV, editors. Systems Biology SE - 80. Vol. 13. Berlin, Heidelberg: Springer; 2005. pp. 69-80

[7] Voit EO. Biochemical systems theory: A review. ISRN Biomathematics. 2013; 2013:1-15

[8] Link H, Vera J, Weuster-Botz D, Darias NT, Franco-Lara E. Multiobjective steady state optimization of biochemical reaction networks using a constrained genetic algorithm.

Computers and Chemical Engineering. 2008;32(8):1707-1713

[9] Xu G. Bi-objective optimization of biochemical systems by linear programming. Applied Mathematics and Computation. 2012;218(14): 7562-7572

[10] Marin-Sanguino A, Voit EO, Gonzalez-Alcon C, Torres NV. Optimization of biotechnological systems through geometric programming. Theoretical Biology and Medical Modelling. 2007;4:38-54

[11] Xu G. Steady-state optimization of biochemical systems through geometric programming. European Journal of Operational Research. 2013;225(1): 12-20

[12] Mariano AP et al. Optimization strategies based on sequential quadratic programming applied for a fermentation process for butanol production. Applied Biochemistry and Biotechnology. 2009;159(2):366-381

[13] Balsa-Canto E, Banga JR, Egea JA, Fernandez-Villaverde A, Hijas-Liste GM. Global optimization in systems biology: Stochastic methods and their applications. In: Goryanin II, Goryachev AB, editors. Advances in Systems Biology. Vol. 736. New York: Springer; 2012. pp. 409-424

[14] Mariano AP et al. Genetic algorithms (binary and real codes) for the optimisation of a fermentation process for butanol production. International Journal of Chemical Reactor Engineering. 2010;8. DOI: 10.2202/1542-6580.2333

[15] Elsayed SM, Sarker RA, Essam DL. A new genetic algorithm for solving optimization problems. Engineering

Applications of Artificial Intelligence. 2014;27:57-69

[16] Deng H et al. The application of multiobjective genetic algorithm to the parameter optimization of single-well potential stochastic resonance algorithm aimed at simultaneous determination of multiple weak chromatographic peaks. The Scientific World Journal. 2014;2014

[17] Ismail MA, Deris S, Mohamad MS, Abdullah A. A newton cooperative genetic algorithm method for in silico optimization of metabolic pathway production. PLoS One. 2015;10(5): e0126199

[18] Grosan C, Abraham A. A new approach for solving nonlinear equations systems. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans. 2008;38(3): 698-714

[19] Luo Y-Z, Tang G-J, Zhou L-N. Hybrid approach for solving systems of nonlinear equations using chaos optimization and quasi-Newton method. Applied Soft Computing. 2008; 8(2):1068-1073

[20] Babaei M. A general approach to approximate solutions of nonlinear differential equations using particle swarm optimization. Applied Soft Computing. 2013;13(7):3354-3365

[21] Ramos H, Monteiro MTT. A new approach based on the Newton's method to solve systems of nonlinear equations. Journal of Computational and Applied Mathematics. 2017;318:3-13

[22] Ahmad F, Tohidi E, Carrasco JA. A parameterized multi-step Newton method for solving systems of nonlinear equations. Numerical Algorithms. 2016; 71(3):631-653

[23] Liu C-S, Atluri SN. A novel time integration method for solving a large system of non-linear algebraic equations. Computer Modeling in Engineering and Sciences. 2008;31(2): 71-83

[31] Marin-Sanguino A, Torres NV. Optimization of tryptophan production in bacteria. Design of a strategy for genetic manipulation of the tryptophan

DOI: http://dx.doi.org/10.5772/intechopen.83611

An Improved Algorithm for Optimising the Production of Biochemical Systems

maximization. Biotechnology Progress.

operon for tryptophan flux

2000;16(2):133-145

77

[24] Taheri S, Mammadov M. Solving systems of nonlinear equations using a globally convergent optimization algorithm. Global Journal of Technology & Optimization. 2013;3:132-138

[25] Gu J, Gu M, Cao C, Gu X. A novel competitive co-evolutionary quantum genetic algorithm for stochastic job shop scheduling problem. Computers and Operations Research. 2010;37(5): 927-937

[26] Ismail MA, Asmuni H, Othman MR. The fuzzy cooperative genetic algorithm (FCoGA): The optimisation of a fuzzy model through incorporation of a cooperative coevolutionary method. Journal of Computing. 2011;3(11):81-90

[27] Durillo JJ, Nebro AJ. jMetal: A Java framework for multi-objective optimization. Advances in Engineering Software. 2011;42(10):760-771

[28] Galazzo JL, Bailey JE. Fermentation pathway kinetics and metabolic flux control in suspended and immobilized Saccharomyces cerevisiae. Enzyme and Microbial Technology. 1990;12(3): 162-172

[29] Rodriguez-Acosta F, Regalado CM, Torres NV. Non-linear optimization of biotechnological processes by stochastic algorithms: Application to the maximization of the production rate of ethanol, glycerol and carbohydrates by Saccharomyces cerevisiae. Journal of Biotechnology. 1999;65(1):15-28

[30] Xiu Z-L, Zeng A-P, Deckwer W-D. Model analysis concerning the effects of growth rate and intracellular tryptophan level on the stability and dynamics of tryptophan biosynthesis in bacteria. Journal of Biotechnology. 1997;58(2): 125-140

An Improved Algorithm for Optimising the Production of Biochemical Systems DOI: http://dx.doi.org/10.5772/intechopen.83611

[31] Marin-Sanguino A, Torres NV. Optimization of tryptophan production in bacteria. Design of a strategy for genetic manipulation of the tryptophan operon for tryptophan flux maximization. Biotechnology Progress. 2000;16(2):133-145

Applications of Artificial Intelligence.

Recent Trends in Artificial Neural Networks - From Training to Prediction

system of non-linear algebraic equations. Computer Modeling in Engineering and Sciences. 2008;31(2):

[24] Taheri S, Mammadov M. Solving systems of nonlinear equations using a globally convergent optimization algorithm. Global Journal of Technology

[25] Gu J, Gu M, Cao C, Gu X. A novel competitive co-evolutionary quantum genetic algorithm for stochastic job shop scheduling problem. Computers and Operations Research. 2010;37(5):

[26] Ismail MA, Asmuni H, Othman MR. The fuzzy cooperative genetic algorithm (FCoGA): The optimisation of a fuzzy model through incorporation of a cooperative coevolutionary method. Journal of Computing. 2011;3(11):81-90

[27] Durillo JJ, Nebro AJ. jMetal: A Java

optimization. Advances in Engineering

[28] Galazzo JL, Bailey JE. Fermentation pathway kinetics and metabolic flux control in suspended and immobilized Saccharomyces cerevisiae. Enzyme and Microbial Technology. 1990;12(3):

[29] Rodriguez-Acosta F, Regalado CM, Torres NV. Non-linear optimization of biotechnological processes by stochastic

maximization of the production rate of ethanol, glycerol and carbohydrates by Saccharomyces cerevisiae. Journal of Biotechnology. 1999;65(1):15-28

[30] Xiu Z-L, Zeng A-P, Deckwer W-D. Model analysis concerning the effects of growth rate and intracellular tryptophan level on the stability and dynamics of tryptophan biosynthesis in bacteria. Journal of Biotechnology. 1997;58(2):

algorithms: Application to the

framework for multi-objective

Software. 2011;42(10):760-771

& Optimization. 2013;3:132-138

71-83

927-937

162-172

125-140

[16] Deng H et al. The application of multiobjective genetic algorithm to the parameter optimization of single-well potential stochastic resonance algorithm aimed at simultaneous determination of multiple weak chromatographic peaks. The Scientific World Journal.

[17] Ismail MA, Deris S, Mohamad MS, Abdullah A. A newton cooperative genetic algorithm method for in silico optimization of metabolic pathway production. PLoS One. 2015;10(5):

[18] Grosan C, Abraham A. A new approach for solving nonlinear equations systems. IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans. 2008;38(3):

[19] Luo Y-Z, Tang G-J, Zhou L-N. Hybrid approach for solving systems of

method. Applied Soft Computing. 2008;

[20] Babaei M. A general approach to approximate solutions of nonlinear differential equations using particle swarm optimization. Applied Soft Computing. 2013;13(7):3354-3365

[21] Ramos H, Monteiro MTT. A new approach based on the Newton's method to solve systems of nonlinear equations. Journal of Computational and Applied

[22] Ahmad F, Tohidi E, Carrasco JA. A parameterized multi-step Newton method for solving systems of nonlinear equations. Numerical Algorithms. 2016;

[23] Liu C-S, Atluri SN. A novel time integration method for solving a large

Mathematics. 2017;318:3-13

71(3):631-653

76

nonlinear equations using chaos optimization and quasi-Newton

2014;27:57-69

2014;2014

e0126199

698-714

8(2):1068-1073

Section 3

Applications of Artificial

Neural Networks

79

Section 3
