**6. Modelling Process and Performance Evaluation Methods**

The three models were supplied with datasets for training/knowledge elicitation. No statutory holidays and adjacent days were used in either the training or testing processes of the model application. The testing dataset was not included in the modelling process and was kept entire‐ ly separate from the training sets. The weather data was not the forecast data, but the actual hourly-averaged weather recordings, so as to minimize error due to weather forecasts.

Energy and peak load forecasting was performed for weather-induced demand and profileconformance. Aggregated versus individual load forecasting were evaluated and contrasted. Load and weather trends were identified and amalgamated into load forecasting methods for further optimization.

Databases of weather and electric loads were constructed and backfilled to January 1, 2005. Load calculations were created, monitored, and evaluated for integrity. A total of 12 con‐ forming load centres were analyzed. Real time and historical weather reports were stored and updated for 10 weather stations.

Load variables were assessed according to their weather-sensitivity and profile-conform‐ ance. Sensitivity analysis combined with statistical methods was used to identify weatherinduced demand variables. Load forecasting was evaluated with an expert-based Aggregate Similar Day model, an aggregate artificial neural network model, and a multi-region artifi‐ cial neural network model.

For the purposes of this research, assessments of performance using the following statistical methods: correlation, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Rela‐ tive Absolute Error (RAE), and Root Relative Squared Error (RRSE). These methods as‐ sessed the forecasting models by the overall prediction accuracy and consistency.

Correlation performance was calculated by computing the correlation coefficient, which is a measurement of the statistical similarity of the predictor to the prediction. The coefficient is defined from 1, indicating perfectly correlated results, to 0, indicating no correlation present, to -1, indicating perfectly negatively correlated results. Correlation assesses errors different than any other method used in this research for benchmarking. Its scale is independent and untransformed, even if the output is scaled. Its assessment tracks the behaviour of the mod‐ el, rather than its error [11]. Thus, a large correlation value is desirable, whereas a low error value is also desirable.

Correlation is defined as:

The ANN models developed during this research were subject to the aforementioned limita‐ tions; however, efforts were taken to mitigate these impediments. The training set met or ex‐ ceeded the size of similar STLF ANN models [1, 3, 7, 9, 10] and contained a number of scenarios, both common and diverse with respect to weather conditions and load response. The benchmarking process considered a case study of the 2011 year across all hours and weekdays, which exceeded the evaluation events used in similar STLF ANN models [1, 3, 7, 9] and utilized five statistical measures for model benchmarking, further described in sec‐ tion 6. Finally a systematic analysis of model optimization was enacted. Parameters were changed methodologically and performance was noted. Ultimately, the best modelling pa‐

rameters were chosen based upon the analytical review of the model configurations.

hourly-averaged weather recordings, so as to minimize error due to weather forecasts.

The three models were supplied with datasets for training/knowledge elicitation. No statutory holidays and adjacent days were used in either the training or testing processes of the model application. The testing dataset was not included in the modelling process and was kept entire‐ ly separate from the training sets. The weather data was not the forecast data, but the actual

Energy and peak load forecasting was performed for weather-induced demand and profileconformance. Aggregated versus individual load forecasting were evaluated and contrasted. Load and weather trends were identified and amalgamated into load forecasting methods

Databases of weather and electric loads were constructed and backfilled to January 1, 2005. Load calculations were created, monitored, and evaluated for integrity. A total of 12 con‐ forming load centres were analyzed. Real time and historical weather reports were stored

Load variables were assessed according to their weather-sensitivity and profile-conform‐ ance. Sensitivity analysis combined with statistical methods was used to identify weatherinduced demand variables. Load forecasting was evaluated with an expert-based Aggregate Similar Day model, an aggregate artificial neural network model, and a multi-region artifi‐

For the purposes of this research, assessments of performance using the following statistical methods: correlation, Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), Rela‐ tive Absolute Error (RAE), and Root Relative Squared Error (RRSE). These methods as‐

Correlation performance was calculated by computing the correlation coefficient, which is a measurement of the statistical similarity of the predictor to the prediction. The coefficient is defined from 1, indicating perfectly correlated results, to 0, indicating no correlation present, to -1, indicating perfectly negatively correlated results. Correlation assesses errors different than any other method used in this research for benchmarking. Its scale is independent and

sessed the forecasting models by the overall prediction accuracy and consistency.

**6. Modelling Process and Performance Evaluation Methods**

for further optimization.

264 Decision Support Systems

cial neural network model.

and updated for 10 weather stations.

$$\begin{aligned} \text{Correlation} &= \frac{S\_{p\_A}}{\sqrt{S\_p S\_A}}\\ S\_p = S\_{p\_A} &= \frac{\sum\_{i}^{n} (p\_i - \overline{p})(a\_i - \overline{a})}{n - 1} \quad S\_p \frac{\sum\_{i}^{n} (p\_i - \overline{p})^2}{n - 1} \text{and } S\_A = \frac{\sum\_{i}^{n} (a\_i - \overline{a})^2}{n - 1} \end{aligned} \tag{2}$$

Where *ai* is the actual value; *pi* is the predicted value; *ā* is the mean value of the actual; and n is the total number of values predicted.

Accuracy performance of the three load forecasting systems was established by comparing MAE and RMSE results.

MAE is defined as:

$$MAE = \sum\_{i=1}^{n} (|\ p\_i - a\_i|) / n \tag{3}$$

Where *ai* is the actual value; *pi* is the predicted value; and n is the total number of values predicted. MAE is the magnitude of individual errors, irrespective of their sign. MAE does not exaggerate the effect of outliers, treating all errors equally according to their magnitude. MAE, however, does mask the tendency of a model to over or under predict values.

RMSE is defined as:

$$RMSE = \sqrt{\frac{\sum\_{i}^{n} (p\_i - a\_i)^2}{n}} \tag{4}$$

Where *ai* is the actual value; *pi* is the predicted value; and n is the total number of values predicted.

RMSE, like MAE, does not exaggerate large errors as is the case in squared error and root squared error measurements. By computing the square root in RMSE, the dimensionality of the prediction is reduced to that of the predictor [11]. These two methods equally consider all prediction errors.

In order to evaluate the consistency of the predictions, RAE and RRSE are utilized. RAE nor‐ malizes the total absolute error of the predictor against the average results to provide a dis‐ tance-weighted result.

RAE is defined as:

$$REA = \sum \sqrt[n]{\frac{|p\_i - a\_i|}{|a\_i - \overline{a}|}} \tag{5}$$

**Similar Day Aggregate ANN Multi-Region ANN**

Towards Developing a Decision Support System for Electricity Load Forecast

http://dx.doi.org/10.5772/51306

267

**Similar Day Aggregate ANN Multi-Region ANN**

**Correlation** 0.7282 0.7819 0.8131 **MAE (MWhr)** 33.1111 31.3465 32.2332 **RMSE (MWhr)** 43.8135 41.0459 41.2891 **RAE (%)** 49.15% 46.98% 47.91% **RRSE (%)** 46.36% 49.61% 49.44%

It can be seen from Table 7 that for next day predictions, the performance of the aggregate models closely approximated the multi-region model. Of the two aggregate models, the Ag‐ gregate ANN model outperformed the Similar Day model in all categories, except for the RRSE. The Aggregate ANN model demonstrated a greater ability to track the behaviour of the load, produce more accurate predictions, and had greater consistency than the Similar Day model. However, the Similar Day model produced a better RRSE, which indicates it is

> **Correlation** 0.7697 0.9359 0.9469 **MAE (MWhr)** 24.6104 16.9821 15.8962 **RMSE (MWhr)** 31.4624 21.7404 20.5349 **RAE (%)** 38.11% 26.30% 24.49% **RRSE (%)** 39.45% 27.22% 25.50%

It can be seen from Table 8 that for next hour predictions the performance of the Multi-Region ANN model, as compared to the aggregate models, was superior across all metrics. According to all the metrics, the Multi-Region ANN model was the most accurate and consistent for next

The Multi-Region ANN model performed best overall for next hour intervals, but was sec‐ ond to the Aggregate ANN model for next day intervals. This was because the perceptron generalizes the data it receives into a single model. If this generalization was not achieved then the model becomes over trained and relies too heavily on the training set. Since weath‐ er is much more dynamic from one day to the next versus hour to hour, the Multi-Region ANN model was able to generalize weather/load responses for next hour conditions. How‐ ever for next day conditions, the Multi-Region ANN model was unable to sufficiently gener‐ alize the impact from all its weather inputs and, consequently, was over-trained. The Aggregate ANN model was best able to generalize relationships for next day forecasts as its reduced input set enables it to better reflect changes in the major load centres, which signifi‐

hour predictions, and the Similar Day model was the least accurate and consistent.

slightly better at modelling behaviours, than the Aggregate ANN model.

**Table 7.** Average Model Performance – Next Day.

**Table 8.** Average Model Performance – Next Hour.

Where *ai* is the actual value; *pi* is the predicted value; *ā* is the mean value of the actual; and n is the total number of values predicted.

The RRE, like RAE, evaluates the relative distance of magnitude errors. Outliers are empha‐ sized and, like RMSE, the dimensionality of the prediction equals that of the predictor.

RRSE is defined as:

$$RRSE = \sqrt{\frac{\sum\_{i}^{n} (p\_i - a\_i)^2}{\sum\_{i}^{n} (a\_i - \overline{a})^2}}\tag{6}$$

Where *ai* is the actual value; *pi* is the predicted value; *ā* is the mean value of the actual; and n is the total number of values predicted.

The best model is one that has the highest correlation and the lowest error rates. The success rate must be evaluated according to each of the aforementioned benchmarking methods. Consistency is equally important to accuracy, a highly variable model may be correct some‐ times, but has considerable uncertainty for future planning efforts. In the next section, the case study and the performance results are discussed with reference to the statistical per‐ formance indicators of: correlation, MAE, RMSE, RAE, and RRSE.

#### **7. Case Study**

Each of the models was evaluated according to the same testing dataset consisting of nonholiday loads from January 2nd, 2011 to December 30th, 2011. Each model had access to a training dataset (see Table 5 for a summary of model inputs) using the same hourly and weekday groups for modelling. The aggregate models were restricted to historical aggregate loads rather than regional loads, and the weather from the two largest load centres, whereas the multi-region model had full access to the training dataset.

This section presents the case study of predicting the hourly energy consumption throughout the 2011 year and an analysis of the prediction results generated from each of the three models. For the purposes of evaluation, historically recorded weather variables were used, rather than predicted weather variables such as a forecaster would use in reality. A summary of the predic‐ tion results, assessed with the benchmarking methods identified in section 6, and grouped by classification period (next day and next hour) are presented in Tables 7 and 8.


**Table 7.** Average Model Performance – Next Day.

RAE is defined as:

266 Decision Support Systems

RRSE is defined as:

**7. Case Study**

is the total number of values predicted.

is the total number of values predicted.

*n i i i i*

Where *ai* is the actual value; *pi* is the predicted value; *ā* is the mean value of the actual; and n

The RRE, like RAE, evaluates the relative distance of magnitude errors. Outliers are empha‐ sized and, like RMSE, the dimensionality of the prediction equals that of the predictor.

> 2 2

( ) ( )

*a a*

*n ii i n i i*

Where *ai* is the actual value; *pi* is the predicted value; *ā* is the mean value of the actual; and n

The best model is one that has the highest correlation and the lowest error rates. The success rate must be evaluated according to each of the aforementioned benchmarking methods. Consistency is equally important to accuracy, a highly variable model may be correct some‐ times, but has considerable uncertainty for future planning efforts. In the next section, the case study and the performance results are discussed with reference to the statistical per‐

Each of the models was evaluated according to the same testing dataset consisting of nonholiday loads from January 2nd, 2011 to December 30th, 2011. Each model had access to a training dataset (see Table 5 for a summary of model inputs) using the same hourly and weekday groups for modelling. The aggregate models were restricted to historical aggregate loads rather than regional loads, and the weather from the two largest load centres, whereas

This section presents the case study of predicting the hourly energy consumption throughout the 2011 year and an analysis of the prediction results generated from each of the three models. For the purposes of evaluation, historically recorded weather variables were used, rather than predicted weather variables such as a forecaster would use in reality. A summary of the predic‐ tion results, assessed with the benchmarking methods identified in section 6, and grouped by


*p a RRSE*

formance indicators of: correlation, MAE, RMSE, RAE, and RRSE.

the multi-region model had full access to the training dataset.

classification period (next day and next hour) are presented in Tables 7 and 8.

*a a*


å (6)

*p a REA*

It can be seen from Table 7 that for next day predictions, the performance of the aggregate models closely approximated the multi-region model. Of the two aggregate models, the Ag‐ gregate ANN model outperformed the Similar Day model in all categories, except for the RRSE. The Aggregate ANN model demonstrated a greater ability to track the behaviour of the load, produce more accurate predictions, and had greater consistency than the Similar Day model. However, the Similar Day model produced a better RRSE, which indicates it is slightly better at modelling behaviours, than the Aggregate ANN model.


**Table 8.** Average Model Performance – Next Hour.

It can be seen from Table 8 that for next hour predictions the performance of the Multi-Region ANN model, as compared to the aggregate models, was superior across all metrics. According to all the metrics, the Multi-Region ANN model was the most accurate and consistent for next hour predictions, and the Similar Day model was the least accurate and consistent.

The Multi-Region ANN model performed best overall for next hour intervals, but was sec‐ ond to the Aggregate ANN model for next day intervals. This was because the perceptron generalizes the data it receives into a single model. If this generalization was not achieved then the model becomes over trained and relies too heavily on the training set. Since weath‐ er is much more dynamic from one day to the next versus hour to hour, the Multi-Region ANN model was able to generalize weather/load responses for next hour conditions. How‐ ever for next day conditions, the Multi-Region ANN model was unable to sufficiently gener‐ alize the impact from all its weather inputs and, consequently, was over-trained. The Aggregate ANN model was best able to generalize relationships for next day forecasts as its reduced input set enables it to better reflect changes in the major load centres, which signifi‐ cantly affected system demand. The Multi-Region ANN model was a more dynamic model in its response to varying weather conditions, but this only applies for same day forecasts.

The Similar Day model performed worst overall for both next day and next hour intervals. Since the Similar Day model operates by finding comparable days for inclusion into a weighted average, its performance will deteriorate during abnormal load/weather days. Its RRSE performance during next day predictions was second best to the Aggregate ANN model. As the Similar Day model is predicated upon the assumption that the past may be used to predict the future, the model relies significantly on direct load modelling; that is, the simple predictor of aggregate system load has a greater influence on the model's calcula‐ tions than the ANN models which model weather and load equally.

Comparing next day and next hour performance identified that model performance across all benchmark metrics improved when the time interval was shortened. This was expected as the previous hour's energy demand has a high correlation with the next hour's energy demand. Next hour predictions require a high ability to adapt to weather and load changes. The ANN models performed better than the Similar Day method across all metrics. Next day predictions require greater generalization of behaviour as the load value of the previous day does not have as great a correlation as compared to the load value of the previous hour.

When considering the performance of individual hour groups, the situation becomes more complicated. The multi-region model, in general, resulted in the lowest MAE and highest correlation; however, during next day predictions, the aggregate models often had better MAE and RMSE performance. Figures 8 and 9 illustrate the behaviour of the models for next day and next hour predictions within specific hourly groupings.

**Figure 9.** STLF Model MAE Performance in Next Day Predictions.

It can be seen from Figure 8, that the prediction accuracy of the Aggregate ANN model and the Multi-Region ANN model are very similar. This is likely because their topologies are similar. In addition, these models are consistent in their errors across all hours. The similar profile of accuracy across the three models indicates certain hour groups are more difficult to forecast than others. The performance of the Similar Day model is best during off peak periods, as the greatest error associated with the model is found during the hour group of 17 – 21. This observation may be generalized for all the models as peak error was often found during the morning peak or evening peak periods, which suggests the impact of tempera‐ ture on electrical demand is weakest during peak periods. When these results were shown to the experts, they noted the demands describing the peak periods are often attributed to the business cycle and the temperature would likely exert a less significant influence. In gen‐ eral, the experts identified the results of the hour group of 22 – 23 to be the most accurate. They suggested this hour group should be extended to include hours 22 – 23 and 0 – 3 as these periods typically have high baseload and temperature-dependency. A comparison of the models' abilities in describing the behaviours of loads is shown in Figures 10 and 11.

Towards Developing a Decision Support System for Electricity Load Forecast

http://dx.doi.org/10.5772/51306

269

The Aggregate ANN model and the Multi-Region Grouped ANN model are similar in both their correlation coefficients and predictive accuracy. The Similar Day model has the lowest

As a conclusion, the multi-region model proved to be the best overall model, in terms of pre‐ dictive accuracy and consistency. The Similar Day model was the easiest to build and of‐ fered to the operators an intuitive explanation for load behavior. However, it also performed the worst among the three models analyzed. The performances of the Aggregate ANN and Multi-Region ANN models were similar due to their topological similarities. This suggests

correlation across all hours, and demonstrates low correlation at both peak periods.

**Figure 8.** STLF Model MAE Performance in Next Hour Predictions.

**Figure 9.** STLF Model MAE Performance in Next Day Predictions.

cantly affected system demand. The Multi-Region ANN model was a more dynamic model in its response to varying weather conditions, but this only applies for same day forecasts.

The Similar Day model performed worst overall for both next day and next hour intervals. Since the Similar Day model operates by finding comparable days for inclusion into a weighted average, its performance will deteriorate during abnormal load/weather days. Its RRSE performance during next day predictions was second best to the Aggregate ANN model. As the Similar Day model is predicated upon the assumption that the past may be used to predict the future, the model relies significantly on direct load modelling; that is, the simple predictor of aggregate system load has a greater influence on the model's calcula‐

Comparing next day and next hour performance identified that model performance across all benchmark metrics improved when the time interval was shortened. This was expected as the previous hour's energy demand has a high correlation with the next hour's energy demand. Next hour predictions require a high ability to adapt to weather and load changes. The ANN models performed better than the Similar Day method across all metrics. Next day predictions require greater generalization of behaviour as the load value of the previous day does not have as great a correlation as compared to the load value of the previous hour.

When considering the performance of individual hour groups, the situation becomes more complicated. The multi-region model, in general, resulted in the lowest MAE and highest correlation; however, during next day predictions, the aggregate models often had better MAE and RMSE performance. Figures 8 and 9 illustrate the behaviour of the models for next

tions than the ANN models which model weather and load equally.

268 Decision Support Systems

day and next hour predictions within specific hourly groupings.

**Figure 8.** STLF Model MAE Performance in Next Hour Predictions.

It can be seen from Figure 8, that the prediction accuracy of the Aggregate ANN model and the Multi-Region ANN model are very similar. This is likely because their topologies are similar. In addition, these models are consistent in their errors across all hours. The similar profile of accuracy across the three models indicates certain hour groups are more difficult to forecast than others. The performance of the Similar Day model is best during off peak periods, as the greatest error associated with the model is found during the hour group of 17 – 21. This observation may be generalized for all the models as peak error was often found during the morning peak or evening peak periods, which suggests the impact of tempera‐ ture on electrical demand is weakest during peak periods. When these results were shown to the experts, they noted the demands describing the peak periods are often attributed to the business cycle and the temperature would likely exert a less significant influence. In gen‐ eral, the experts identified the results of the hour group of 22 – 23 to be the most accurate. They suggested this hour group should be extended to include hours 22 – 23 and 0 – 3 as these periods typically have high baseload and temperature-dependency. A comparison of the models' abilities in describing the behaviours of loads is shown in Figures 10 and 11.

The Aggregate ANN model and the Multi-Region Grouped ANN model are similar in both their correlation coefficients and predictive accuracy. The Similar Day model has the lowest correlation across all hours, and demonstrates low correlation at both peak periods.

As a conclusion, the multi-region model proved to be the best overall model, in terms of pre‐ dictive accuracy and consistency. The Similar Day model was the easiest to build and of‐ fered to the operators an intuitive explanation for load behavior. However, it also performed the worst among the three models analyzed. The performances of the Aggregate ANN and Multi-Region ANN models were similar due to their topological similarities. This suggests that forecast environments with a considerable weather and load diversity should adopt a multi-region model for prediction of load instead of grouping the regions into a single ANN model. It can be observed that peak periods were the most difficult for the models to pre‐ dict, and the forecast results have low accuracies.

**8. Conclusions and Future Work**

approach was adopted for developing the models.

ling other system concerns, such as system reliability.

mote greater user-adoption.

Load forecasting continues to grow in importance within the electric utility industry. To date, no known study has been published which examines load forecasting within the province of Saskatchewan and/or within the control area examined. The increased importance of energy and environmental concerns, coupled with enhanced regulatory presence, has renewed inter‐ est in developing an accurate and easy-to-use load forecasting system within the control area.

Towards Developing a Decision Support System for Electricity Load Forecast

http://dx.doi.org/10.5772/51306

271

The general objective of this research is to conduct load forecasting for a large geographic area which has considerable weather and load diversity. The specific research objective is to develop data-driven hourly prediction models for multi-region short term load forecasting (STLF) for twelve conforming load centres within the control area in the province of Sas‐ katchewan, Canada. Since the load centres experience considerable diversity in terms of both weather and load, a multi-region based approach is needed and the ANN modelling

Due to their simplicity, ease of analysis, and long adoption history, many load forecasting systems currently used are based on a similar day methodology. However, the research re‐ sults show that the multi-region ANN model improved prediction performance over the ag‐ gregate-based short term load forecasting ANN model and the similar day aggregate model in forecasting short term aggregate loads in next hour forecasts as well as next day forecasts. All models examined were weather-driven forecasting systems. The performance of the models was evaluated using the dataset from the 2011 year. Based on the measurements of Correlation, MAE, RMSE, RAE, and RRSE, it can be concluded that the ANN-based models provide superior prediction performance over existing similar-day forecasting systems. The developed models are able to reduce STLF inaccuracies and may be applicable for model‐

Operational staff of grid control centres often adopt similar day models due to their simplic‐ ity and intuitive development, while paying less attention to the impacts of weather changes to electricity demand. This chapter has demonstrated the superior performance of the ANNbased models over the similar day models. This finding suggests that artificial-intelligencebased methods can potentially be used for enhancing performance of load forecasting in the operational environment. Future efforts in developing artificial intelligence-based forecast‐ ing systems can include efforts towards building more intuitive user interfaces, so as to pro‐

We believe that merging the ANN models with other methods such as fuzzy logic, support vector regression, and time series considerations can provide enhanced consistency for mod‐ elling reduced load interval datasets. Further analysis of heat wave theory and other weath‐ er trend electricity demand drivers is necessary for these methods to become applicable for conducting both short and medium term load forecasts. The results and methods of this work will be compared against other artificial intelligence models and statistical methods to identify further areas of improvement. Future work in this field is required to decrease fore‐ cast time intervals in order to provide a real-time operating model for intelligent automated

**Figure 10.** STLF Model Correlation Performance in Next Day Predictions.

**Figure 11.** STLF Model Correlation Performance in Next Hour Predictions.
