**4.1 The uncertainty of demand forecast**

82 Renewable Energy – Trends and Applications

COP also contains comprehensive summary information. Summary information could be rollups from a raw data at a lower level (e.g. resource level) according to some pre-defined

Adaptive Model Management as shown in Figure 4 consists of two parts: Advanced Constraint Modeling (ACM) and Adaptive Generator Modeling (AGM). ACM will use intelligent methods to preprocess transmission constraints based on historical and current network conditions, load forecasts, and other key parameters. It should also have ability to achieve smoother transmission constraint binding in time. AGM will provide other GCA components with information related to specific generator operational characteristics and performances. The resource "profiles" may contain parameters such as ramp rate, operating bands, predicted response per MW of requested change, high and low operating limits, etc. Another major core functions of Smart Dispatch is After-the Fact Analysis (AFA). AFA aims at providing a framework to conduct forensic analysis. AFA is a decision-support tool to:

b. Systematically analyze dispatch results based on comparison of actual dispatches with

c. Provide quantitative and qualitative measures for financial, physical or security impacts

One special use case of AFA is the so-called "Perfect Dispatch" (PD). The idea of PD was originated by PJM (Gisin et al., 2010). PD calculates the hypothetical least bid production cost commitment and dispatch, achievable only if all system conditions were known and controllable. PD could then be used to establish an objective measure of RTO/TSO's performance (mean of % savings, variance of % savings) in dispatching the system in the most efficient manner possible by evaluating the potential production cost saving derived

Demand forecast is a very crucial input to GCA. The accuracy of it very much impacts market efficiency and system reliability. The following is devoted to discuss some recent

Demand or load forecasting is very essential for reliable power system operations and market system operations. It determines the amount of system load against which real-time dispatch and day-ahead scheduling functions need to balance in different time horizon.

2. Mid-Term (MTLF): Next n days (n can be any value from 3-31), in intervals of one hour

3. Long-Term (LTLF): Next n years (n can be any value from 2-10), broken into one month increments. The LTLF forecast is provided for three scenarios (pessimistic growth,

Demand forecasting play an increasingly important role in the restructured electricity market and smart grid environment due to its impacts on market prices and market participants' bidding behavior. In general, demand forecasting is a challenging subject in view of complicated features of load and effective data gathering. With Demand Response being one of the few near-term options for large-scale reduction of greenhouse gases, and fits strategically with the drive toward clean energy technology such as wind and solar, advanced

Demand forecasting typically provides forecasts for three different time frames:

1. Short-Term (STLF): Next 60-120 minutes by 5-minute increments.

a. Identify root cause impacts and process re-engineering.

on system dispatch due to system events and/or conditions.

system structures.

idealized scenarios.

from the PD solutions.

**4. Demand forecast** 

advances in techniques of demand forecasting.

or less (e.g., 60, 30, 20, 15 minute intervals).

expected growth, and optimistic growth).

The uncertainty for demand forecast is one of the most critical factors influencing the uncertainty of generation requirements for system balancing (DOE, 2010). It is important to note that wind generation has fairly strong positive correlation with electrical load in many ways more than traditional dispatchable generation. As a result, it is viable to treat wind generation as a negative load and incorporate its uncertainty analysis as part of the uncertainty of demand forecast assuming transmission congestion is not an issue. Hence, the concept of net demand has been employed in wind integration studies to assess the impact of load and wind generation variability on the power system operations. Typically, the net demand has been defined as the following:

Net demand = Total electrical load – Renewable generation + Net interchange

 One practical approach can be used for the uncertainty modeling of demand forecast is distribution fitting. Basically probability distributions are based on assumptions about a specific standard form of random variables. Based on the standard distributions (e.g. normal) and selected set of its parameters (e.g. mean �� �������� ��������� �), they assign probability to the event that the random variable *x* takes on a specific, discrete value, or falls within a specified range of continuous values. An example of the probability density function *PDF*(*x*) (Meyer, 1970) of demand forecast is presented in Figure 5a. The cumulative distribution function *CDF*(*x*) can then be defined as:

$$\text{CDF (x)} = \int\_{-\infty}^{\chi} PDF \text{ (s)} ds \tag{1}$$

A confidence interval (*CI*) is a particular kind of interval estimate of a population parameter such that the random parameter is expected to lie within a specific level of confidence. A confidence interval in general is used to indicate the reliability of an estimate and how likely the interval to contain the parameter is determined by the confidence level (*CL*). The *CL* of confidence interval [��� ��� for demand forecast can be defined as:

$$\text{CL}(Dl \le x \le Dh) = \{\text{CDF}(Dh) - \text{CDF}(Dl)\} \times 100\% \tag{2}$$

Increasing the desired confidence level will widen the confidence interval being controlled by parameters *k1* and *k2* as shown in Figure 5. It is obvious that the size of uncertainty ranges depends on the look-ahead time. In general for longer look-ahead periods, the uncertainty range becomes larger. Figure 6 illustrates the time-dependent nature of confidence intervals – cone of uncertainty for demand forecast.

#### **4.2 Artificial neural network with wavelet transform**

In the era of smart grid, the generation and load patterns, and more importantly, the way people use electricity, will be fundamentally changed. With intermittent renewable generation, advanced metering infrastructure, dynamic pricing, intelligent appliances and HVAC equipment, micro grids, and hybrid plug-in vehicles, etc., load forecasting with uncertain factors in the future will be quite different from today. Therefore, effective STLF are highly needed to consider the effects of smart grid.

Fig. 5. Probabilistic Uncertainty Model and Desired Confidence Interval for Demand Forecast

Fig. 6. Confidence Intervals for Demand Forecast

Based on frequency domain analysis, the 5-minute load data have multiple frequency components. They can be illustrated via power spectrum magnitude. Figure 7 shows a typical power spectrum of actual load of a regional transmission organization. Note that the power density spectrum can be divided into multiple frequency ranges.

generation, advanced metering infrastructure, dynamic pricing, intelligent appliances and HVAC equipment, micro grids, and hybrid plug-in vehicles, etc., load forecasting with uncertain factors in the future will be quite different from today. Therefore, effective STLF

Fig. 5. Probabilistic Uncertainty Model and Desired Confidence Interval for Demand

Based on frequency domain analysis, the 5-minute load data have multiple frequency components. They can be illustrated via power spectrum magnitude. Figure 7 shows a typical power spectrum of actual load of a regional transmission organization. Note that the

power density spectrum can be divided into multiple frequency ranges.

are highly needed to consider the effects of smart grid.

Fig. 6. Confidence Intervals for Demand Forecast

Forecast

Fig. 7. A power spectrum density for 5-minute actual load

Neural networks have been widely used for load forecasting. They have been used for load forecasting in era of smart grid (Amjady et al., 2010; Zhang et al., 2010). In particular, Chen et al. have presented the method of similar day-based wavelet neural network approach (Chen, et al., 2010). The key idea there was to select "similar day load" as the input load, use wavelet decomposition to decompose the load into multiple components at different frequencies, apply separate neural networks to capture the features of the forecast load at individual frequencies, and then combine the results of the multiple neural networks to form the final forecast (see Figure 9). In general, these methods used general neural networks which adopted multilayer perception with the back-propagation training. There are many wavelet decomposition techniques. Some recent techniques applying to load forecasting are:


The Daubechies 4 (D4) wavelet is part of the family of orthogonal wavelets defining a discrete wavelet transform that decomposes a series into a high frequency series and a low frequency series. Multiple-level wavelet basically repeatedly applies D4 wavelet decomposition to the low frequency component of its previous decomposition as shown in Figure 8. Unlike D4 wavelet, Dual-tree M-band wavelet can selectively decompose a series into specified frequency ranges which could be key design parameters for more effective decomposition.

Fig. 8. Multiple-level Wavelet Neural Network

In general, each (Neural Network) NN as shown in Figure 8 could be implemented as a feed-forward neural network being described by the following equation:

$$L\_{t+l} = f\left(t, L\_t, L\_{t-1}, \dots, L\_{t-n}\right) + \varepsilon\_{t+1} \,. \tag{3}$$

where *t* is time of day, *l* is the time lead of the forecast, *Lt* is the load component or relative increment of the load component at time *t* and *<sup>t</sup>* <sup>1</sup> represents a random load component. The nonlinear function *f* is used to represent the nonlinear characteristics of a given neural network.

#### **4.3 Neural networks trained by hybrid kalman filters**

Since back-propagation algorithm is a first-order gradient-based learning algorithm, neural networks trained by such algorithm cannot produce the covariance matrix to construct dynamic confidence interval for the load forecasting. Replacing back-propagation learning, wavelet neural networks trained by hybrid Kalman filters are developed to forecast the load of next hour in five-minute steps with small estimated confidence intervals.

If the NN input-output function was nearly linear, through linearization, NNs can be trained with the extended Kalman filter (EKFNN) by treating weight as state (Singhal & Wu, 1989). To speed up the computation, EKF was extended to the decoupled EKF by ignoring the interdependence of mutually exclusive groups of weights (Puskorius &Feldkamp, 1991). The numerical stability and accuracy of decoupled EKF was further improved by U-D factorization (Zhang & Luh, 2005). If the NN input-output function was highly nonlinear, EKFNN may not be good since mean and covariance were propagated via linearization of the underlying non-linear model. Unscented Kalman filter (Julier et al, 1995) was a potential method, and NNs trained by unscented Kalman filter (UKFNN) showed a superior performance. EKFNN was used to capture the feature of low frequency, and UKFNNs for those of higher frequency. Results are combined to form the final forecast.

To capture the near linear relation between the input and output of the NN for the low component, a neural network trained by EKF is developed through treating the NN weight as the state and desired output as the observation. The input-output observations for the

H NNH

NNLH

1st

2nd

3rd

4th

5th

NNLLH

+

Forecasts

High Freqs.

Low Freq.

represents a random load component.

NNLLLH

NNLLLL

, (3)

Input Load

L

LL LH

Relative Increment

In general, each (Neural Network) NN as shown in Figure 8 could be implemented as a

1 1 (, , , , ) *L ftL L L tl t t tn t*

where *t* is time of day, *l* is the time lead of the forecast, *Lt* is the load component or relative

The nonlinear function *f* is used to represent the nonlinear characteristics of a given neural

Since back-propagation algorithm is a first-order gradient-based learning algorithm, neural networks trained by such algorithm cannot produce the covariance matrix to construct dynamic confidence interval for the load forecasting. Replacing back-propagation learning, wavelet neural networks trained by hybrid Kalman filters are developed to forecast the load

If the NN input-output function was nearly linear, through linearization, NNs can be trained with the extended Kalman filter (EKFNN) by treating weight as state (Singhal & Wu, 1989). To speed up the computation, EKF was extended to the decoupled EKF by ignoring the interdependence of mutually exclusive groups of weights (Puskorius &Feldkamp, 1991). The numerical stability and accuracy of decoupled EKF was further improved by U-D factorization (Zhang & Luh, 2005). If the NN input-output function was highly nonlinear, EKFNN may not be good since mean and covariance were propagated via linearization of the underlying non-linear model. Unscented Kalman filter (Julier et al, 1995) was a potential method, and NNs trained by unscented Kalman filter (UKFNN) showed a superior performance. EKFNN was used to capture the feature of low frequency, and UKFNNs for

To capture the near linear relation between the input and output of the NN for the low component, a neural network trained by EKF is developed through treating the NN weight as the state and desired output as the observation. The input-output observations for the

1-Level

LLL LLH

feed-forward neural network being described by the following equation:

of next hour in five-minute steps with small estimated confidence intervals.

those of higher frequency. Results are combined to form the final forecast.

2-Level

LLLL LLLH

3-Level

4-Level

Fig. 8. Multiple-level Wavelet Neural Network

increment of the load component at time *t* and *<sup>t</sup>* <sup>1</sup>

**4.3 Neural networks trained by hybrid kalman filters** 

network.

model can be represented by the set {*u(t), z(t+1)*}, where *u(t) =* {*u1, …, unu*}*T* is a *nu×1* input vector, and *z(t+1) = z(t+1|t)* = {*z1, …,znz*}*T* is *nz×1* a output vector. Correspondingly, *zt zt t* ˆ ˆ ( 1) ( 1 ) represents the estimation for measurement *z(t+1)*. The formulation of training NN through EKF (Zhang and Luh, 2005; Guan et al., 2010) can be described by state and measurement functions:

$$w(t+1) = w(t) + \varepsilon \left(t\right),\tag{4a}$$

$$z(t+1) \equiv \ln\left(u(t), w(t+1)\right) + v\left(t+1\right),\tag{4b}$$

where *h(•)* is the input-output function of the network, *ε(t)* and *ν(t)* are the process and measurement noises. The former is assumed to be white Gaussian noised with a zero mean and a covariance matrix *Q(t)*, whereas the latter is assumed to have a student t-distribution with covariance matrix *R(t)*. The weight vector *w(t)* has a dimension *nw×1* and *nw* is determined by numbers of inputs, hidden neurons and outputs:

$$n\_w = (\mathbf{n}\_x + 1) \times \mathbf{n}\_h + (\mathbf{n}\_h + 1) \times n\_z \,. \tag{5}$$

Using the input vector *u(t)*, weight vector *w(t)* and output vector *z t* ˆ( 1) , EKFNN are derived. Key steps of derivation for EKF (Bar-Shalom et al. 2001) are summarized:

$$
\hat{w}(t+1\mid t) = w(t\mid t) \,, \tag{6}
$$

$$P(t+1\mid t) = P(t\mid t) + Q(t),\tag{7}$$

$$\hat{z}(t+1\mid t) = h\left(u(t), \hat{w}(t+1\mid t)\right),\tag{8}$$

$$S(t+1) = H(t+1) \cdot P(t+1 \mid t) \cdot H(t+1)^T + R(t+1) \,, \tag{9}$$

$$\text{where}\quad H(t+1) = \left(\left.\partial\mathfrak{l}(\mu, w)/\partial w\right|\right|\_{\substack{u=u(t)\\w=\hat{w}(t+1|t)}}\tag{10}$$

$$K(t+1) = P(t+1 \mid t) \cdot H(t+1)^T \cdot S(t+1)^{-1} \tag{11}$$

$$
\hat{w}(t+1\mid t+1) = \hat{w}(t+1\mid t) + K(t+1) \cdot \left( z(t+1) - \hat{z}(t+1\mid t) \right),
\tag{12}
$$

$$P(t+1 \mid t+1) = P(t+1 \mid t) - K(t+1) \cdot H(t+1) \cdot P(t+1 \mid t) \,. \tag{13}$$

where *H(t+1)* is the partial derivative of *h(•)* with respect to *w(t)* with dimension *nz×nw*, *K(t+1)* is the Kalman gain, *P(t+1|t)* is the prior weight covariance matrix and is updated to posterior weight covariance matrix *P(t+1|t+1)* based on the Bayesian formula, and S(t+1) is the measurement covariance matrix.

Let us denote *z t t zt t* ˆ ˆ *<sup>L</sup>* 1| 1| and <sup>2</sup> <sup>ˆ</sup> ( 1) ( 1) 1 1 *<sup>T</sup> L ny t St I* , where *ny I* is the unit matrix, 1 1 *<sup>T</sup>* is a vector with length of *ny*, <sup>2</sup> <sup>ˆ</sup> ( 1) *<sup>L</sup> t* is the variance vector consists of the diagonal elements of *S*(t+1). *zt t* ˆ*<sup>L</sup>* 1| and <sup>2</sup> ˆ ( 1) *<sup>L</sup> t* representing the low frequency component of prediction and variance, respectively, will be used for the final load prediction and confidence interval estimation. Corresponding medium frequency components of *zt t* ˆ*<sup>M</sup>* 1| and <sup>2</sup> ˆ*<sup>M</sup>* ( 1) *t* and high frequency components of *zt t* ˆ*<sup>H</sup>* 1| and <sup>2</sup> ˆ*<sup>H</sup>* ( 1) *t* can be obtained via some UKFNN (Guan and et al., 2010).

#### **4.4 Overall load forecasting and confidence interval estimation**

To quantify forecasting accuracy, the confidence interval was obtained by using the neural networks trained by hybrid Kalman filters. Within the wavelet neural network framework, the covariance matrices of Kalman filters for individual frequency components contained forecasting quality information of individual load components. When load components were combined to form the overall forecast, the corresponding covariance matrices would also be appropriately combined to provide accurate confidence intervals for the overall prediction (Guan et al., 2010).

The overall load prediction is the sum of low component prediction ˆ*Lz* , medium component prediction ˆ *zM* and high component prediction ˆ *zH* because these components are orthogonal based on wavelet decomposition property:

$$
\hat{z}(t+1\mid t) = \hat{z}\_L(t+1\mid t) + \hat{z}\_M(t+1\mid t) + \hat{z}\_H(t+1\mid t),\tag{14}
$$

By the same token, the overall standard deviation ˆ( 1| ) *t t* for STLF is the sum of standard deviations for low and high components:

$$
\hat{\sigma}(t+1\mid t) = \hat{\sigma}\_L(t+1\mid t) + \hat{\sigma}\_M(t+1\mid t) + \hat{\sigma}\_H(t+1\mid t),\tag{15}
$$

Hence, the one sigma confidence interval for STLF can be constructed by:

$$
\left[ \hat{z}(t+1\mid t) - \hat{\sigma}(t+1\mid t), \hat{z}(t+1\mid t) + \hat{\sigma}(t+1\mid t) \right].\tag{16}
$$

The overall scheme of training, forecasting and confidence interval estimation is depicted and summarized in Figure 9.

Fig. 9. Structure of a general wavelet neural networks trained by hybrid Kalman filters

## **4.5 Composite demand forecasting**

88 Renewable Energy – Trends and Applications

component of prediction and variance, respectively, will be used for the final load prediction and confidence interval estimation. Corresponding medium frequency

To quantify forecasting accuracy, the confidence interval was obtained by using the neural networks trained by hybrid Kalman filters. Within the wavelet neural network framework, the covariance matrices of Kalman filters for individual frequency components contained forecasting quality information of individual load components. When load components were combined to form the overall forecast, the corresponding covariance matrices would also be appropriately combined to provide accurate confidence intervals for the overall

The overall load prediction is the sum of low component prediction ˆ*Lz* , medium component

ˆˆ ˆ ˆ ( 1| ) ( 1| ) ( 1| ) ( 1| )

*tt tt tt tt*

*zt t t t zt t t t* ˆ ˆ ( 1| ) ( 1| ), ( 1| ) ( 1| )

The overall scheme of training, forecasting and confidence interval estimation is depicted

EKFNNH

UKFNNM

UKFNNL

Fig. 9. Structure of a general wavelet neural networks trained by hybrid Kalman filters

SDH

SDM

SDL

Hence, the one sigma confidence interval for STLF can be constructed by:

High Freq. Mediu Freq. Low Freq.

RI

*t* representing the low frequency

*zH* because these components are orthogonal

ˆ( 1| ) *t t* for STLF is the sum of standard

ˆ*<sup>M</sup>* ( 1) *t* and high frequency components of *zt t* ˆ*<sup>H</sup>* 1|

ˆˆ ˆ ˆ ( 1| ) ( 1| ) ( 1| ) ( 1| ) *zt t z t t z t t z t t LMH* , (14)

 *LMH* , (15)

> 

ˆ ˆ . (16)

+

+

Forecast

Estimated SD for CI

the diagonal elements of *S*(t+1). *zt t* ˆ*<sup>L</sup>* 1| and <sup>2</sup> ˆ ( 1) *<sup>L</sup>*

*zM* and high component prediction ˆ

**4.4 Overall load forecasting and confidence interval estimation** 

ˆ*<sup>H</sup>* ( 1) *t* can be obtained via some UKFNN (Guan and et al., 2010).

components of *zt t* ˆ*<sup>M</sup>* 1| and <sup>2</sup>

prediction (Guan et al., 2010).

and summarized in Figure 9.

Hour

Time Indices

based on wavelet decomposition property:

deviations for low and high components:

WD

The Load of Last

RI: Relative increment WD: Wavelet Decomposition EKFNN: Neural Networks Trained by Extended Kalman Filter UKFNN: Neural Networks Trained by Unscented Kalman Filter SD: Standard Deviation Derivation

By the same token, the overall standard deviation

and <sup>2</sup> 

prediction ˆ

To generate better forecasting results, a composite forecast is developed to mix multiple methods for STLF with CI estimation. The concept is based on the statistical model of ensemble forecasting to produce an optimal forecast by compositing forecasts from a number of different techniques. The method is depicted schematically in Figure 3.

Fig. 10. Ensemble forecasting

As illustrated in Figure 11, the method runs three sample models (Forecast 1, Forecast 2 and Forecast 3) in parallel. The weights of the combination are theoretically derived based on the "interactive multiple model" approach (Bar-Shalom et al, 2001). For methods which are based on Kalman filters and have dynamic covariance matrices on the forecast load, these dynamic covariance matrices are used for the combination. Otherwise, static covariance matrices derived from historic forecasting accuracy are used instead.

CLFCC: Composite load forecasting and covariance combination MPC: Mixing probability calculation

Fig. 11. Structure of composite forecasting with confidence interval estimation

The relative increment (RI) in load is used to help capture the load features in the method since it removes a first-order trend and anchor the prediction by the latest load (Shamsollahi et al., 2001). After normalization, the *RI* in load of last time period *z(t)* is denoted as the input to the NN, where time t is the time index. The mixing weight *µ(t)* can be calculated through the likelihood functions *<sup>j</sup>*(t), with superscript *j*=1, 2, 3 representing Forecasts 1, 2 &3 respectively:

$$
\Lambda\_j(t) = N \left\{ z(t); \hat{z}\_j(t \mid t-1), S\_j(t) \right\}, \tag{17}
$$

$$
\mu\_j(t) = \Lambda\_j(t) \cdot \overline{c}\_j \left\langle \left(\sum\_{j=1}^3 \Lambda\_j(t) \cdot \overline{c}\_j\right) \right\rangle\_{\text{s.t}}
$$

$$
\text{where} \quad c\_j = \sum\_{i=1}^3 p\_{ij} \cdot \mu\_i(t-1) \quad j = 1, 2, 3 \right\}, \tag{18}
$$

where *p* is the transition probability to be configured manually. *S1*, *S2* and *S3* are sample covariance matrices from Forecasts 1, 2 &3 derived from historic forecasting accuracy. Without loss of generality, we assume that dynamic covariance matrices *S2D* for Forecast 2 and *S3D* for Forecast 3 are available. To make a stable combination, the dynamic innovation matrices *S2D* from Forecast 2 and *S3D* from Forecast 3 are not used to calculate likelihood functions *2* and *<sup>3</sup>*since *S2D* and *S3D* may largely affects the mixing weight. Then predictions from individual models can be combined to form the forecast:

$$
\hat{\mathbf{z}}(t+1\mid t) = \sum\_{j=1}^{3} \mu\_{j}(t) \cdot \hat{\mathbf{z}}\_{,}(t+1\mid t) \,. \tag{19}
$$

The output *z(t+1|t)* from NNs has to be transformed back due to the RI transformation on the load input. Similar to the prediction combination, the static covariance matrix *S1* derived from historic forecasting accuracy and dynamic covariance matrices *S2D* and *S3D* will also be combined. Here, *S1 S2D* and *S3D* are the covariance matrices for NN outputs (estimated RI in load). Since RI is a nonlinear transformation, the covariance matrix has to be transformed. If *S1 S2D* and *S3D* can be obtained directly from individual models, they can be combined first:

$$S(t+1) = \mu\_1(t) \cdot S\_1(t+1) + \mu\_2\left(t\right) \cdot S\_2^D(t+1) + \mu\_3\left(t\right) \cdot S\_3^D(t+1) \tag{20}$$

Then*, S(t+1)* will be used to further derive CIs with respect to *RI* transformation (Guan et al., 2010).

Demand forecast and its corresponding confidence intervals are crucial inputs to the Generation Control Application which robustly dispatch the power system using a series of coordinated scheduling functions.

#### **5. Generation control application**

Generation Control Application (GCA) is an application designed to provide dispatchers in large power grid control centers with the capability to manage changes in load, generation, interchange and transmission security constraints simultaneously on a intra-day and near real-time operational basis. GCA uses least-cost security-constrained economic scheduling and dispatch algorithms with resource commitment capability to perform analysis of the desired generation dispatch. With the latest State Estimator (SE) solution as the starting point and transmission constraint data from the Energy Management System (EMS), GCA Optimization Engines (aka Scheduler or SKED) will look ahead at different time frames to forecast system conditions and alter generation patterns within those timeframes.

This rest of this section will focus on the functionality of SKED engines and its coordination with COP.
