**Electricity Load Forecasting Using Data Mining Technique**

Intan Azmira binti Wan Abdul Razak, Shah bin Majid, Mohd Shahrieel bin Mohd. Aras and Arfah binti Ahmad

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/48657

## **1. Introduction**

234 Advances in Data Mining Knowledge Discovery and Applications

Control. New Jersey: Holden-Day. 784p.

Environmental Research Risk Assessment. 19: 326–339.

Applications. Oxford: Academic Press (Elsevier).824p.

and Knowledge Management. 3327: 71-80.

Management. 47: 2128-2142.

http://www.ccee.org.br

http://www.spss.com

Blucher. 538p.

Macmillan. 842p.

[16] Mandal P, Senjyu T, Funabashi T (2006) Neural networks approach to forecast several hour ahead electricity prices and loads in deregulated market. Energy Conversion and

[17] Brazilian Electrical Energy Commercialization Chamber (CCEE) [Online]. Available:

[18] SPSS, Data Mining, Statistical Analysis Software. [Online]. Available:

[19] Box G E P, Jenkins G M, Reinsel G C (1976) Time Series Analysis, Forecasting and

[20] Mishra A K, Desai V R (2005) Drought forecasting using stochastic models. Stochastic

[21] Morettin P A, Toloi C M C (2006) Análise de Séries Temporais. São Paulo: Edgard

[22] Haykin S (1994) Neural Networks: A Comprehensive Foundation. New York:

[23] Nisbet R, Elder J, Miner G (2009) Handbook of Statistical Analysis & Data Mining

[24] Olson D L (2005) Data Set Balancing. Lecture Notes in Computer Science-Data Mining

Accurate load forecasting is become crucial in power system operation and planning [1-3]; both for deregulated and regulated electricity market. Electric load forecasting can be divided into three categories that are short term load forecasting, medium term load forecasting and long term load forecasting. The short term load forecasting predicts the load demand from one day to several weeks. It helps to estimate load flows that can prevent overloading and hence lead to more economic and secure power system . The medium term load forecasting predicts the load demand from a month to several years that provides information for power system planning and operations. The long term load forecasting predicts the load demand from a year up to twenty years and it is mainly for power system planning [1].

A variety of methods including neural networks [2], time series [1], hybrid method [3,4] and fuzzy logic [5] have been developed for load forecasting. The time series techniques have been widely used because load behaviour can be analyzed in a time series signal with hourly, daily, weekly, and seasonal periodicities. Besides, it is able to deal with non stationary data to reflect the variation of variables [4].

However, for a huge power system covering large geographical area such as Peninsular Malaysia, a single forecasting model for the entire Malaysia would not satisfy the forecasting accuracy; due to the load and weather diversity[6]. Thus, this research will cater these conditions whereby five models of SARIMA (Seasonal ARIMA) Time Series [7,8] were developed for five day types.

## **2. Problem statement**

Electric load forecasting is very important in power system operation such as during startup and shut-down schedules of generating units as well as for overhaul planning [2] and

© 2012 Razak et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Razak et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

spot market energy pricing [4]. In normal working condition, system generating capacity should meet load requirement to avoid adding generating units and importing power from the neighbouring network [9].

Electricity Load Forecasting Using Data Mining Technique 237

Week1 Week2 Week3 Week4 Week5 Week6

Week1 Week2 Week3 Week4 Week5 Week6

Week1 Week2 Week3 Week4 Week5 Week6

**Load Plot for Week1 - Week6 on Monday**

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 **Time (half hourly)**

**Load Plot of Week1 - Week6 on Tuesday - Thursday**

1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 **Time (half hourly)**

**Load Plot of Week1 - Week6 on Friday**

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 **Time (half hourly)**

**Figure 1.** Load Plot for Monday

**Load (MW)**

**Load (MW)**

**Figure 3.** Load Plot for Friday

**Load (MW)**

**Figure 2.** Load Plot for Tuesday, Wednesday and Thursday

This research applied ARIMA time series approach to forecast future load in Peninsular Malaysia. Time series method that was introduced by Box and Jenkins is a sequence of data points that measured typically at successive times and time intervals [10].

## **3. Data mining with SARIMA time series**

Before proceeding the forecasting process, load data need to be analyzed. Table 1 shows the average maximum and minimum demand, average energy and peak hour per day within a week. From the analysis, it can be concluded that the load characteristic among the days in a week is different. The average energy for Monday is slightly lower compared to Tuesday, Wednesday and Thursday. On the other hand, the average energy for those three days is fairly around 255MWh so that they can be clustered in a category. The average energy for Friday shows the lowest value within weekdays while the energy used for weekend is much lower than the consumption on weekdays. Comparing energy consumed on weekend, there is more consumption on Saturday rather than Sunday. Hence, the forecast will be conducted based on five day types that are:



**Table 1.** Load data analysis within a week

**Figure 1.** Load Plot for Monday

**3. Data mining with SARIMA time series** 

the neighbouring network [9].

based on five day types that are:

Type 2 : Tuesday, Wednesday, Thursday

**Table 1.** Load data analysis within a week

**Average Maximum Demand (MW)** 

Monday 12 442 7 842 249.06

Tuesday 12 484 8 526 254.89

Wednesday 12 508 8 565 255.95

Thursday 12 436 8 543 255.03

Friday 11 884 8 463 246.23

Type 1 : Monday

Type 3 : Friday Type 4 : Saturday Type 5 : Sunday

**Day** 

spot market energy pricing [4]. In normal working condition, system generating capacity should meet load requirement to avoid adding generating units and importing power from

This research applied ARIMA time series approach to forecast future load in Peninsular Malaysia. Time series method that was introduced by Box and Jenkins is a sequence of data

Before proceeding the forecasting process, load data need to be analyzed. Table 1 shows the average maximum and minimum demand, average energy and peak hour per day within a week. From the analysis, it can be concluded that the load characteristic among the days in a week is different. The average energy for Monday is slightly lower compared to Tuesday, Wednesday and Thursday. On the other hand, the average energy for those three days is fairly around 255MWh so that they can be clustered in a category. The average energy for Friday shows the lowest value within weekdays while the energy used for weekend is much lower than the consumption on weekdays. Comparing energy consumed on weekend, there is more consumption on Saturday rather than Sunday. Hence, the forecast will be conducted

> **Average Minimum Demand (MW)**

Saturday 10 718 8 122 227.26 11.30am – 12.00pm

Sunday 10 116 7 605 211.01 8.00 – 9.00 pm

**Average Energy (MWh)** 

**Peak Hour** 

3.00 – 4.30 pm

points that measured typically at successive times and time intervals [10].

**Figure 2.** Load Plot for Tuesday, Wednesday and Thursday

**Figure 3.** Load Plot for Friday

Electricity Load Forecasting Using Data Mining Technique 239

Time Monday Tuesday – Thursday Friday 00.30 – 04.00 9 100 – 8 000 9 800 – 8 800 9 300 – 8 500 04.00 – 17.00 8 000 – 12 500 8 800 – 12 600 8 500 – 12 300 17.00 – 19.00 12 500 – 11 100 12 600 – 11 100 12 300 – 10 800 19.00 – 20.00 11 000 – 11 700 11 100 – 11 700 10 800 – 11 600 20.00 – 00.00 11 700 – 10 100 11 700 – 10 200 11 600 – 10 000 (a) Weekday

Time Saturday Sunday 00.30 - 08.00 9 800 – 8 400 9 000 – 7 900

18.00 – 21.00 9 800 – 10 600 8 900 – 10 400 21.00 – 00.00 10 600 – 9 500 10 400 – 9 400

**Table 2.** Load consumption per day (MW)

parameters [10].

08.00 – 12.00 8 400 – 11 300 08.00 – 16.00: 7 900 – 9 400 16.00 – 18.00: 9 400 – 8 900 12.00 – 18.00 11 300 – 9 800

(b) Weekend

Five models of SARIMA were developed in Minitab which represents the five day types. ARIMA; Autoregressive Integrated Moving Average involves the filtering steps in constructing the ARIMA model until only random noise remains. ARIMA model can be classified as seasonal or non-seasonal model. The series with seasonal repeating pattern is categorized as seasonal model or seasonal ARIMA (SARIMA) while the series with random series or no seasonal repeating trend is called as non-seasonal pattern. At least four or five seasons of the data are needed to fit the SARIMA model. Instead, ARIMA modeling identifies an acceptable model by some steps which are differencing, autocorrelation and partial autocorrelation functions. A non-seasonal ARIMA model is known as an ARIMA (p, d, q) model while a seasonal ARIMA model is named as ARIMA (P, D, Q) model where P or P is the number of autoregressive term (AR), d or D is the number of non-seasonal differences and q or Q is the number of lagged forecast errors in the prediction equation (MA). Appropriate ARIMA model is determined by identifying the p, d, q and P, D, Q

During modelling an ARIMA, the first step is determining whether the series has a trend or not. Trend analysis determines the seasonality and stationary. The second step is determining period for the seasonal model; by plotting spectral plot in MATLAB or ACF from Minitab. Usually the period is already known and it can be seen from ACF but spectral plot will prove that assumption. The third step involved is data transformation (if any) by

Box-Cox plot; depending on the value of λ as shown in Table 3.

**Figure 4.** Load Plot for Saturday

**Figure 5.** Load Plot for Sunday

Apart from that, load plot for each day types can be observed as in Figure 1-5. Their characteristic for certain time interval can be simplified as in Table 2 (a) and (b). Referring Table 2(a) for weekday, load consumption is decreasing from time 20.00 till 00.00 and 00.30 till 04.00 where people are having some rest or sleeping at night. However, starting 04.00 till 17.00 the load consumption is increasing because people start using home appliances and go to work. The load consumption for 17.00 till 19.00 shows slight decrease as people come back to home. The next an hour show the load consumption increasing where people spend some time watching television or having a dinner. However, there are bit differences of people activities during weekend that affect load consumption.


(a) Weekday


(b) Weekend

**Table 2.** Load consumption per day (MW)

238 Advances in Data Mining Knowledge Discovery and Applications

**Figure 4.** Load Plot for Saturday

**Load (MW)**

**Figure 5.** Load Plot for Sunday

**Load (MW)**

Apart from that, load plot for each day types can be observed as in Figure 1-5. Their characteristic for certain time interval can be simplified as in Table 2 (a) and (b). Referring Table 2(a) for weekday, load consumption is decreasing from time 20.00 till 00.00 and 00.30 till 04.00 where people are having some rest or sleeping at night. However, starting 04.00 till 17.00 the load consumption is increasing because people start using home appliances and go to work. The load consumption for 17.00 till 19.00 shows slight decrease as people come back to home. The next an hour show the load consumption increasing where people spend some time watching television or having a dinner. However, there are bit differences of

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 **Time (half hourly)**

**Load Plot of Week1 - Week6 on Saturday**

Week1 Week2 Week3 Week4 Week5 Week6

Week1 Week2 Week3 Week4 Week5 Week6

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 **Time (half hourly)**

**Load Plot of Week1 - Week6 on Sunday**

people activities during weekend that affect load consumption.

Five models of SARIMA were developed in Minitab which represents the five day types. ARIMA; Autoregressive Integrated Moving Average involves the filtering steps in constructing the ARIMA model until only random noise remains. ARIMA model can be classified as seasonal or non-seasonal model. The series with seasonal repeating pattern is categorized as seasonal model or seasonal ARIMA (SARIMA) while the series with random series or no seasonal repeating trend is called as non-seasonal pattern. At least four or five seasons of the data are needed to fit the SARIMA model. Instead, ARIMA modeling identifies an acceptable model by some steps which are differencing, autocorrelation and partial autocorrelation functions. A non-seasonal ARIMA model is known as an ARIMA (p, d, q) model while a seasonal ARIMA model is named as ARIMA (P, D, Q) model where P or P is the number of autoregressive term (AR), d or D is the number of non-seasonal differences and q or Q is the number of lagged forecast errors in the prediction equation (MA). Appropriate ARIMA model is determined by identifying the p, d, q and P, D, Q parameters [10].

During modelling an ARIMA, the first step is determining whether the series has a trend or not. Trend analysis determines the seasonality and stationary. The second step is determining period for the seasonal model; by plotting spectral plot in MATLAB or ACF from Minitab. Usually the period is already known and it can be seen from ACF but spectral plot will prove that assumption. The third step involved is data transformation (if any) by Box-Cox plot; depending on the value of λ as shown in Table 3.


#### **Table 3.** Box-Cox Transformation

The last step is identifying the p, d, q and P, D, Q parameters. It started by determining the order of differencing needed to stationarize the series [10]. Normally the lowest order of differencing leads time series to fluctuates around a well-defined mean value and the spikes of ACF and PACF decays fairly rapidly to zero. After chosen appropriate order of differencing, AR and MA terms are then identified to determine whether the AR and MA terms are needed to correct any autocorrelation that remains in the differenced series.

Apart from that, the best fit of the model must meet these specifications:

a. -1.96 ≥ t-value ≥ 1.96

b. The lowest standard deviation


Some equations related to ARIMA model are shows in (1) to (4).

The order of d can be expressed in terms of the backshift operator B as:

$$\nabla^d = (\mathbf{1} - \mathbf{B})^d \tag{1}$$

Electricity Load Forecasting Using Data Mining Technique 241

The load data on Monday for six weeks had been plotted by trend analysis. Figure 6 shows that the data is seasonal and non-stationary so the period of the data must be identified. It

Figure 7 shows that the graph had no aliasing or crossing on x-axis; meaning that the data is

can be done by plotting spectral plot in MATLAB as shown in Figure 7.

**4. SARIMA modelling** 

**4.1. ARIMA model for monday** 

**Figure 6.** Trend analysis for Monday

**Figure 7.** Spectral plot

suitable for an analysis. The period is determined by;

The seasonal backshift operator;

$$\mathcal{B}^S \mathbf{z}\_t = \mathbf{z}\_{t-S} \tag{2}$$

Where S = seasonal period, *Z*t = transformed data at time *t* 

The seasonal difference operator;

$$\nabla\_s^D = (\mathbf{1} - \mathbf{B}^S)^D \tag{3}$$

Combining (1) and (3) yields:

$$Y\_t = (1 - B)^d (1 - B^S)^D z\_t \tag{4}$$

Where Yt = differenced data at time t

## **4. SARIMA modelling**

240 Advances in Data Mining Knowledge Discovery and Applications

**Table 3.** Box-Cox Transformation

a. -1.96 ≥ t-value ≥ 1.96

b. The lowest standard deviation c. Chi-Square at Lag-12 is acceptable d. -1 ≤ Parameter's coefficient ≤ 1

The seasonal backshift operator;

*Z*t = transformed data at time *t*  The seasonal difference operator;

Combining (1) and (3) yields:

Where Yt = differenced data at time t

Where

S = seasonal period,



Value of λ Transformation

0 ln *<sup>X</sup>*<sup>t</sup> 0.5 *<sup>t</sup> x* 1.0 *X*<sup>t</sup>

The last step is identifying the p, d, q and P, D, Q parameters. It started by determining the order of differencing needed to stationarize the series [10]. Normally the lowest order of differencing leads time series to fluctuates around a well-defined mean value and the spikes of ACF and PACF decays fairly rapidly to zero. After chosen appropriate order of differencing, AR and MA terms are then identified to determine whether the AR and MA terms are needed to correct any autocorrelation that remains in the differenced series.

*<sup>S</sup>*

Apart from that, the best fit of the model must meet these specifications:

Some equations related to ARIMA model are shows in (1) to (4).

The order of d can be expressed in terms of the backshift operator B as:

1 *t x*

> 1 *t x*

(1 ) *d d B* (1)

*t tS Bz z* (2)

*<sup>S</sup>* (1 ) *D SD B* (3)

(1 ) (1 ) *d SD Y B Bz t t* (4)

## **4.1. ARIMA model for monday**

The load data on Monday for six weeks had been plotted by trend analysis. Figure 6 shows that the data is seasonal and non-stationary so the period of the data must be identified. It can be done by plotting spectral plot in MATLAB as shown in Figure 7.

**Figure 6.** Trend analysis for Monday

**Figure 7.** Spectral plot

Figure 7 shows that the graph had no aliasing or crossing on x-axis; meaning that the data is suitable for an analysis. The period is determined by;

$$\mathbf{T} = \mathbf{1}/\mathbf{f} \tag{5}$$

Electricity Load Forecasting Using Data Mining Technique 243

Figure 10 show bad ACF (sine-cosines' phenomenon) and PACF when all parameters are zero. It is important to ensure that all the spikes are within the boundary to be a stationary model. Then the ARIMA parameters were identified and the selected model was ARIMA

(2,1,1)(0,1,1)48.

**Figure 10.** ACF and PACF for Monday

Where T = period, and f = frequency

From Figure 8, the frequency was 0.0208 thus the period was approximately 48. This value was determined based on the half hourly load and is valid for all day types. Since the data was not stationary, the actual data must be transformed; depending on the value of λ.

**Figure 8.** Enlarged view of spectral plot

Figure 9 shows the Box-Cox Plot for Monday where the value of λ = 0.562 so the rounded value is 0.5. Hence the actual data was transformed to √*X*t.

**Figure 9.** Box-Cox plot for Monday

Figure 10 show bad ACF (sine-cosines' phenomenon) and PACF when all parameters are zero. It is important to ensure that all the spikes are within the boundary to be a stationary model. Then the ARIMA parameters were identified and the selected model was ARIMA (2,1,1)(0,1,1)48.

242 Advances in Data Mining Knowledge Discovery and Applications

Where T = period, and f = frequency

**Figure 8.** Enlarged view of spectral plot

6.83

6.835

6.84

6.845

6.85

6.855

**Figure 9.** Box-Cox plot for Monday

value is 0.5. Hence the actual data was transformed to √*X*t.

T = 1/f (5)

Time Series/loadtnb

From Figure 8, the frequency was 0.0208 thus the period was approximately 48. This value was determined based on the half hourly load and is valid for all day types. Since the data was not stationary, the actual data must be transformed; depending on the value of λ.

x 108 Periodogram of loadtnb

Figure 9 shows the Box-Cox Plot for Monday where the value of λ = 0.562 so the rounded

0.0208 0.0208 0.0208 0.0208 0.0208 0.0208 0.0208 0.0209 0.0209 0.0209

Freq (cyc/second)

Figure 11-12 shows good ACF and PACF where the spikes decay fairly rapidly to zero. There was strong autocorrelation at lag-48 that shows the period of the data. All these steps were repeated for other day types.

Electricity Load Forecasting Using Data Mining Technique 245

Figure 13 and 14 show good ACF and PACF for selected ARIMA model where less spikes

ACF of Residuals for d0 D0 (with 95% confidence limits for the autocorrelations)

5 10 15 20 25 30 35 40 45 50 55 60 65 70

5 10 15 20 25 30 35 40 45 50 55 60 65 70

Lag

The steps taken for modelling ARIMA for this third model were repeated as for two previous models. The trend analysis for Friday was plotted followed by Box-Cox plot. The trend analysis showed that the data is seasonal and non-stationary thus it must be transformed. Box-Cox plot showed that the value of λ is -0.112 and the rounded value is 0.

The data was transformed to *ln X*t and the selected model is ARIMA (0,1,1)(0,1,1)48.

Lag

PACF of Residuals for d0 D0 (with 95% confidence limits for the partial autocorrelations)

were found outside the boundary.

**Figure 13.** ACF Plot for Tuesday, Wednesday and Thursday

**Figure 14.** PACF Plot for Tuesday, Wednesday and Thursday

1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0

> -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0

**4.3. ARIMA model for Friday** 

Partial Autocorrelation

Autocorrelation

ACF of Residuals for d0 D0 (with 95% confidence limits for the autocorrelations)

**Figure 11.** ACF Plot for ARIMA (2,1,1)(0,1,1)48 on Monday

**Figure 12.** PACF Plot for ARIMA (2,1,1)(0,1,1)48 on Monday

#### **4.2. ARIMA model for Tuesday, Wednesday and Thursday**

The steps taken for modelling ARIMA for this second model were repeated as for Monday. The trend analysis for Tuesday, Wednesday and Thursday was plotted followed by Box-Cox plot. The value of λ is 0.45 thus the rounded value is 0.5. After the data had been transformed to √*X*t, the fitted ARIMA model was ARIMA (1,1,1)(0,1,1)48.

Figure 13 and 14 show good ACF and PACF for selected ARIMA model where less spikes were found outside the boundary.

ACF of Residuals for d0 D0

**Figure 13.** ACF Plot for Tuesday, Wednesday and Thursday

244 Advances in Data Mining Knowledge Discovery and Applications

**Figure 11.** ACF Plot for ARIMA (2,1,1)(0,1,1)48 on Monday

**Figure 12.** PACF Plot for ARIMA (2,1,1)(0,1,1)48 on Monday

**4.2. ARIMA model for Tuesday, Wednesday and Thursday** 

transformed to √*X*t, the fitted ARIMA model was ARIMA (1,1,1)(0,1,1)48.

were repeated for other day types.



Partial Autocorrelation

Autocorrelation

Figure 11-12 shows good ACF and PACF where the spikes decay fairly rapidly to zero. There was strong autocorrelation at lag-48 that shows the period of the data. All these steps

> ACF of Residuals for d0 D0 (with 95% confidence limits for the autocorrelations)

5 10 15 20 25 30 35 40 45 50 55 60

5 10 15 20 25 30 35 40 45 50 55 60

Lag

The steps taken for modelling ARIMA for this second model were repeated as for Monday. The trend analysis for Tuesday, Wednesday and Thursday was plotted followed by Box-Cox plot. The value of λ is 0.45 thus the rounded value is 0.5. After the data had been

Lag

PACF of Residuals for d0 D0 (with 95% confidence limits for the partial autocorrelations)

PACF of Residuals for d0 D0

**Figure 14.** PACF Plot for Tuesday, Wednesday and Thursday

#### **4.3. ARIMA model for Friday**

The steps taken for modelling ARIMA for this third model were repeated as for two previous models. The trend analysis for Friday was plotted followed by Box-Cox plot. The trend analysis showed that the data is seasonal and non-stationary thus it must be transformed. Box-Cox plot showed that the value of λ is -0.112 and the rounded value is 0. The data was transformed to *ln X*t and the selected model is ARIMA (0,1,1)(0,1,1)48.

Figure 15 and 16 show good ACF and PACF for Friday model with no spikes outside the boundary.

Electricity Load Forecasting Using Data Mining Technique 247

After the actual data had been transformed to *ln X*t, the selected model was ARIMA

ACF of Residuals for d0 D0 (with 95% confidence limits for the autocorrelations)

5 10 15 20 25 30 35 40 45 50 55 60

5 10 15 20 25 30 35 40 45 50 55 60

Lag

The steps taken for modelling ARIMA for this fifth model were repeated as for four previous models. The trend analysis for Sunday was plotted followed by Box-Cox plot. The trend analysis showed that the data is seasonal and non-stationary thus it must be

Lag

PACF of Residuals for d0 D0 (with 95% confidence limits for the partial autocorrelations)

Figure 17-18 show good ACF and PACF for Saturday with ARIMA model selected.

(2,1,1)(0,1,1)48.



**Figure 18.** PACF Plot for Saturday

**4.5. ARIMA model for Sunday** 

Partial Autocorrelation

**Figure 17.** ACF Plot for Saturday

Autocorrelation

#### ACF of Residuals for d0 D0

(with 95% confidence limits for the autocorrelations)

**Figure 15.** ACF Plot for Friday

#### PACF of Residuals for d0 D0

(with 95% confidence limits for the partial autocorrelations)

**Figure 16.** PACF Plot for Friday

#### **4.4. ARIMA model for Saturday**

The steps taken for modelling ARIMA for this fourth model were repeated as for three previous models. The trend analysis for Saturday was plotted followed by Box-Cox plot. The trend analysis showed that the data is seasonal and non-stationary thus it must be transformed. Box-Cox plot showed that the value of λ is 0.113 and the rounded value is 0. After the actual data had been transformed to *ln X*t, the selected model was ARIMA (2,1,1)(0,1,1)48.

Figure 17-18 show good ACF and PACF for Saturday with ARIMA model selected.

ACF of Residuals for d0 D0 (with 95% confidence limits for the autocorrelations)

246 Advances in Data Mining Knowledge Discovery and Applications

boundary.

Autocorrelation



**Figure 16.** PACF Plot for Friday

**4.4. ARIMA model for Saturday** 

Partial Autocorrelation

**Figure 15.** ACF Plot for Friday

Figure 15 and 16 show good ACF and PACF for Friday model with no spikes outside the

ACF of Residuals for d0 D0 (with 95% confidence limits for the autocorrelations)

5 10 15 20 25 30 35 40 45 50 55 60

5 10 15 20 25 30 35 40 45 50 55 60

Lag

The steps taken for modelling ARIMA for this fourth model were repeated as for three previous models. The trend analysis for Saturday was plotted followed by Box-Cox plot. The trend analysis showed that the data is seasonal and non-stationary thus it must be transformed. Box-Cox plot showed that the value of λ is 0.113 and the rounded value is 0.

Lag

PACF of Residuals for d0 D0 (with 95% confidence limits for the partial autocorrelations)

**Figure 18.** PACF Plot for Saturday

#### **4.5. ARIMA model for Sunday**

The steps taken for modelling ARIMA for this fifth model were repeated as for four previous models. The trend analysis for Sunday was plotted followed by Box-Cox plot. The trend analysis showed that the data is seasonal and non-stationary thus it must be

transformed. Box-Cox plot showed that the value of λ is 0.225 and the rounded value is 0. After the actual data had been transformed to *ln X*t, the selected model was ARIMA (0,1,1)(0,1,1)48.

Electricity Load Forecasting Using Data Mining Technique 249

82.9448 10.9 8

226.319 9.8 9

0.0632711 10.1 8

11.01 0.0355914 14.4 10

11.77 0.0596651 11.9 10

The forecasting was held for 48 points that represent a day ahead for each day types. Table 4-8 show model specifications for all day types. Referring to t-values for all models, they satisfied the condition -1.96 ≥ t-value ≥ 1.96. Besides, good standard deviations shown for all models as well as Chi-Square at Lag-12 are also acceptable. The parameters' coefficients also

Parameters' Coefficient t-value Standard Deviation Chi-Square at Lag-12 DF

Parameters' Coefficient t-value Standard Deviation Chi-Square at Lag-12 DF

Parameters' Coefficient t-value Standard Deviation Chi-Square at Lag-12 DF

Parameters' Coefficient t-value Standard Deviation Chi-Square at Lag-12 DF

Parameters' Coefficient t-value Standard Deviation Chi-Square at Lag-12 DF

**5. Result and analysis** 

AR 1 -0.3879 AR 2 -0.2675 MA 1 0.3717 SMA 48 08382

AR 1 0.1962 MA 1 0.6848 SMA 48 0.9120

MA 1 0.4878 SMA 48 0.6243

AR 1 0.4083 AR 2 0.3241 MA 1 0.6491 SMA 48 0.7511

MA 1 0.5746 SMA 48 0.6823

fulfil the condition within the range of -1 and 1.

**Table 4.** Model Specification for Monday

**Table 6.** Model Specification for Friday

**Table 7.** Model Specification for Saturday

**Table 8.** Model Specification for Sunday


3.23 15.20 41.80

**Table 5.** Model Specification for Tuesday, Wednesday and Thursday

8.73

2.33 5.16 3.59 13.48

10.81

Figure 19-20 show good ACF and PACF for the fitted model. The plots show less spikes outside the boundary after a differencing and good selection of p, P, q and Q.

ACF of Residuals for d0 D0 (with 95% confidence limits for the autocorrelations)

**Figure 19.** ACF Plot for Sunday

PACF of Residuals for d0 D0

**Figure 20.** PACF Plot for Sunday

## **5. Result and analysis**

248 Advances in Data Mining Knowledge Discovery and Applications

(0,1,1)(0,1,1)48.



**Figure 20.** PACF Plot for Sunday

Partial Autocorrelation

**Figure 19.** ACF Plot for Sunday

Autocorrelation

transformed. Box-Cox plot showed that the value of λ is 0.225 and the rounded value is 0. After the actual data had been transformed to *ln X*t, the selected model was ARIMA

Figure 19-20 show good ACF and PACF for the fitted model. The plots show less spikes

ACF of Residuals for d0 D0 (with 95% confidence limits for the autocorrelations)

5 10 15 20 25 30 35 40 45 50 55 60

5 10 15 20 25 30 35 40 45 50 55 60

Lag

Lag

PACF of Residuals for d0 D0 (with 95% confidence limits for the partial autocorrelations)

outside the boundary after a differencing and good selection of p, P, q and Q.

The forecasting was held for 48 points that represent a day ahead for each day types. Table 4-8 show model specifications for all day types. Referring to t-values for all models, they satisfied the condition -1.96 ≥ t-value ≥ 1.96. Besides, good standard deviations shown for all models as well as Chi-Square at Lag-12 are also acceptable. The parameters' coefficients also fulfil the condition within the range of -1 and 1.


**Table 4.** Model Specification for Monday


**Table 5.** Model Specification for Tuesday, Wednesday and Thursday


**Table 6.** Model Specification for Friday


**Table 7.** Model Specification for Saturday


**Table 8.** Model Specification for Sunday

Electricity Load Forecasting Using Data Mining Technique 251

Actual Forecasted

Actual Forecasted

**Actual load vs Forecasted Load on Friday**

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 **Hour**

**Actual Load vs Forecasted Load on Saturday**

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 **Hour**

**Figure 23.** Actual load vs. forecasted load on Friday

**Load (MW)**

**Load (MW)**

**Figure 24.** Actual load vs. forecasted load on Saturday

**Figure 21.** Actual load vs. forecasted load on Monday

**Figure 22.** Actual load vs. forecasted load on Tuesday, Wednesday and Thursday

**Figure 23.** Actual load vs. forecasted load on Friday

**Actual Load vs Forecasted Load on Monday**

Forecasted Actual

Actual Forecasted

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 **Time (half hourly)** 

**Actual Load vs Forecasted Load on Tuesday-Thursday**

1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 **Hour**

**Figure 21.** Actual load vs. forecasted load on Monday

**Load (MW)**

**Load (MW)**

**Figure 22.** Actual load vs. forecasted load on Tuesday, Wednesday and Thursday

**Figure 24.** Actual load vs. forecasted load on Saturday

**Figure 25.** Actual load vs. forecasted load on Sunday

Figure 21-25 show the plots of forecasted load vs. actual load. The forecasted load plot are seems to be close as actual load plot. Mean Absolute Percentage Error (MAPE) for all day types were calculated as in (7):

$$\text{MAPE (\%)} = \frac{1}{N} \left[ \frac{\left| \vec{Z\_t} - \mathbf{x\_t} \right|}{\mathbf{x\_t}} \right] \times 100\% \tag{6}$$

Electricity Load Forecasting Using Data Mining Technique 253

From the data analysis, load data was clustered to five day types and hence five models of SARIMA are designed. Each forecasting model is developed for each day except for Tuesday, Wednesday and Thursday which clustered as a model. Forecasting method is held by Time Series - SARIMA where it is one of data mining methods which require enough experience on determining its parameter (p,d,q,P,D,Q). Sometimes it is needs for trial and error during identifying the parameters. However, the MAPEs obtained for each day types were ranging from 1% to 3%. This new approach had improved the accuracy of forecasting compared to traditional approach of ARIMA that use only a model for all days in a week.

Additional input variables can be included in the forecasting process such as weather data, customers' classes and event day; instead of only the load data. Besides, other methods may

I wish to express my gratitude to honorable University (**Universiti Teknikal Malaysia Melaka- UTeM**) especially to Faculty of Electrical Engineering for give the financial as well as moral support. My special thanks also fall to Mr. Fuad Jamaluddin from Utility of Malaysia for his valuable advice and help during completion of this research. Also to all my

[1] I. Azmira, A. Razak, S. Majid, and H.A. Rahman, "Short Term Load Forecasting Using Data Mining Technique," *Energy Conversion and Management*, 2008, pp. 139-142. [2] "Application of Pattern Recognition and Artificial Neural Network to Load Forecasting

[3] P. Qingle and Z. Min, "Very Short-Term Load Forecasting Based on Neural Network

be implemented such as Neural Network, Fuzzy Logic as well as hybrid method [11].

,

**6. Conclusion** 

**7. Further research** 

**Author details** 

Shah bin Majid

**8. References** 

Corresponding Author

 \*

**Acknowledgement** 

Intan Azmira binti Wan Abdul Razak\*

Mohd Shahrieel bin Mohd. Aras and Arfah binti Ahmad *Faculty of Electrical Engineering, UTeM, Malacca, Malaysia* 

*Faculty of Electrical Engineering, UTM, Johor, Malaysia* 

research members that give full commitments and cooperation.

in Electric Power System," *Pattern Recognition*, 2007.

and Rough Set," *Network*, 2010, pp. 1132-1135.

Where *Z*'t = Forecasted Load,

*X*t = Actual Load

N = Forecasting number

Table 9 shows the ARIMA models and their MAPEs for all day types. It can be seen that the difference order (d and D) for all models is 1which is the lowest order and the best selection. The result is considered as accurate when the MAPE is lower than 1.5% as shown for Tuesday –Thursday, Friday and Sunday models. The higher MAPE for Monday and Saturday models may caused by load or weather fluctuation.


**Table 9.** Forecasting result for all day types

## **6. Conclusion**

252 Advances in Data Mining Knowledge Discovery and Applications

**Figure 25.** Actual load vs. forecasted load on Sunday

Saturday models may caused by load or weather fluctuation.

types were calculated as in (7):

**Load (MW)**

Where *Z*'t = Forecasted Load, *X*t = Actual Load

N = Forecasting number

**Table 9.** Forecasting result for all day types

Figure 21-25 show the plots of forecasted load vs. actual load. The forecasted load plot are seems to be close as actual load plot. Mean Absolute Percentage Error (MAPE) for all day

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 **Hour**

**Actual Load vs Forecasted Load on Sunday** 

 

*N x* (6)

Actual Forecasted

*Z x*

*t*

**Day ARIMA Model MAPE**  Monday (2,1,1)(0,1,1)48 3.26064%

Friday (0,1,1)(0,1,1)48 1.11833 % Saturday (2,1,1)(0,1,1)48 2.41944 % Sunday (0,1,1)(0,1,1)48 1.07158 %

Tuesday -Thursday (1,1,1)(0,1,1)48 1.62094%

'

<sup>1</sup> MAPE (%) = 100% *t t*

Table 9 shows the ARIMA models and their MAPEs for all day types. It can be seen that the difference order (d and D) for all models is 1which is the lowest order and the best selection. The result is considered as accurate when the MAPE is lower than 1.5% as shown for Tuesday –Thursday, Friday and Sunday models. The higher MAPE for Monday and From the data analysis, load data was clustered to five day types and hence five models of SARIMA are designed. Each forecasting model is developed for each day except for Tuesday, Wednesday and Thursday which clustered as a model. Forecasting method is held by Time Series - SARIMA where it is one of data mining methods which require enough experience on determining its parameter (p,d,q,P,D,Q). Sometimes it is needs for trial and error during identifying the parameters. However, the MAPEs obtained for each day types were ranging from 1% to 3%. This new approach had improved the accuracy of forecasting compared to traditional approach of ARIMA that use only a model for all days in a week.

## **7. Further research**

Additional input variables can be included in the forecasting process such as weather data, customers' classes and event day; instead of only the load data. Besides, other methods may be implemented such as Neural Network, Fuzzy Logic as well as hybrid method [11].

## **Author details**

Intan Azmira binti Wan Abdul Razak\* , Mohd Shahrieel bin Mohd. Aras and Arfah binti Ahmad *Faculty of Electrical Engineering, UTeM, Malacca, Malaysia* 

Shah bin Majid *Faculty of Electrical Engineering, UTM, Johor, Malaysia* 

## **Acknowledgement**

I wish to express my gratitude to honorable University (**Universiti Teknikal Malaysia Melaka- UTeM**) especially to Faculty of Electrical Engineering for give the financial as well as moral support. My special thanks also fall to Mr. Fuad Jamaluddin from Utility of Malaysia for his valuable advice and help during completion of this research. Also to all my research members that give full commitments and cooperation.

## **8. References**


<sup>\*</sup> Corresponding Author

	- [4] J.C. Hwang and C.S. Chen, "Customer Short Term Load Forecasting by Using Arima Transfer Function Model," *Electrical Engineering*, pp. 317-322.

**Chapter 0**

**Chapter 12**

**Mining and Adaptivity in Automated**

Since the past few years, the banking sector has seen a considerable application of diverse technologies for its daily operations. The most significant of such technologies has been the introduction of Automated Teller Machines (ATMs). A typical ATM is shown in Figure 1. Initially, ATMs were used only for dispensing cash, but now offer round-the-clock services for a diverse number of operations, e.g., electronic transfer of funds, paying bills, viewing past transactions of bank accounts, changing the ATM sign-in credentials etc [9]. For using ATM services, the bank issues its customer an ATM card and a PIN code. The customer inserts the card into the ATM terminal, and enters the PIN code. If the bank authenticates the PIN code, then the customer can use the ATM services. The first ATM was installed in 1967 by Barclay's Bank in the USA, and now there is hardly any bank in the world which operates without an ATM. Till March 2012, the current number of ATMs is estimated around

> ©2012 Mujtaba and Mahmood, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original

©2012 Mujtaba and Mahmood, licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Ghulam Mujtaba Shaikh and Tariq Mahmood

**Figure 1.** A typical Automated Teller Machine (ATM)

work is properly cited.

Additional information is available at the end of the chapter

**Teller Machines**

http://dx.doi.org/10.5772/48317

**1. Introduction**

