**2. Approximate dynamic programming**

In the field of decision making, it is often useful to assess what could be expected from each possible decision given the information available. After evaluating the outcome of each alternative decision, a simple comparison is enough to take the optimal course of action. This approach is straightforward but also naive. Real problems often present simply too many possible options to evaluate. Moreover, if the problem involves sequential decision stages, the number of possible solution paths scales up exponentially. Finally, outcomes are frequently subjected to uncertainty. So, several outcomes could present themselves for each decision, augmenting further the size of the problem.

Therefore, in order to keep the size of the problem within reach, shortcuts and simplifications areoften necessary. Dynamic Programming (DP) is a clever way to reduce the number of options based on Bellman´s Principle of Optimality: "An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision" [4]. This leads to the fact that from any given state there is an optimal decision trajectory that once solved can be used for every other path containing that state. Therefore for each state is only necessary to hold the information of the best solution until the end, ruling out suboptimal options. This rule prevents the exponential growth in the decision sequence path, scaling the problem only linearly.

Yet, other reasons of explosive dimensionality growth remain, namely the number of states, as well as decision and outcome spaces. For small financial decision problems, conventional DP algorithms are able to find the optimal policy. For real-scale problems, gross simplifications are often necessary to keep tractability. Sometimes, these simplifications render the model unrealistic turning results meaningless.

Finding an appropriate combination of financial instruments in a portfolio can be portrayed as a Markov Decision Problem (MDP). A MDP is defined as a succession of states that are reached through decisions (actions) followed by an outcome (reaction) of the system. After each decision, the system evolves into another state, according probabilities defined by the previous state and the decision taken. Each transition between states has a cost, usually dependent on the action taken, and a reward produced by the reaction of the system, as depicted in Figure 1. In these processes, a decision maker should choose a sequence of actions so that the expected sum of rewards minus the sum of incurred costs is maximized over a certain period of time.

Figure 1. Markov Decision Process depiction

According to the Bellman´s Principle of Optimality, for every MDP can be determined a series of **Figure 1.** Markov Decision Process depiction

supply expose generators to delivery risk [1]. Delivery risk considerably modifies the proba‐ bility distribution of profits, shifting the optimal trading strategy toward a portfolio mixing forward contracts and power sold in the spot market. Because of the size of the probability state space and the limited computing capabilities, the problem of the optimal trading strategy has not a closed form solution and thus, its determination is matter of current study. The increase in computing power and recent developments in Operational Research has brought

In the past decade and by virtue of the ever increasing computational power, many methods emerged in different scientific fields with several different names: Reinforced Learning, Q-Learning, Neuro-Dynamic Programming, etc. All these methods were later brought together in what is currently known as Approximated Dynamic Programming (ADP) [2],[3]. These algorithms resign the exhaustive enumeration and calculation of the space-state typically performed by conventional DP. Instead, they iteratively approximate a function of the space state through stochastic simulation and statistical regression techniques, circumventing the

Although ADP algorithms are being used in several other fields of science, the application to design optimal trading strategies in power markets has not been proposed so far. In this chapter, ADP techniques are exploited to optimize the selling strategy of a power generator trading in a frictional market with transaction costs. Three available products are considered: selling in the spot market, and/or get involved in quarterly and one-year forward contracts. The objective of the generator is to maximize the expected profit while limiting financial risk. Decisions can be made only at the beginning of each month. At each decision stage, the current

In the field of decision making, it is often useful to assess what could be expected from each possible decision given the information available. After evaluating the outcome of each alternative decision, a simple comparison is enough to take the optimal course of action. This approach is straightforward but also naive. Real problems often present simply too many possible options to evaluate. Moreover, if the problem involves sequential decision stages, the number of possible solution paths scales up exponentially. Finally, outcomes are frequently subjected to uncertainty. So, several outcomes could present themselves for each decision,

Therefore, in order to keep the size of the problem within reach, shortcuts and simplifications areoften necessary. Dynamic Programming (DP) is a clever way to reduce the number of options based on Bellman´s Principle of Optimality: "An optimal policy has the property that whatever the initial state and initial decision are, the remaining decisions must constitute an optimal policy with regard to the state resulting from the first decision" [4]. This leads to the fact that from any given state there is an optimal decision trajectory that once solved can be used for every other path containing that state. Therefore for each state is only necessary to

trading position can be changed at a cost in order to rebalance the portfolio.

new insights into the solution of such problems.

92 Dynamic Programming and Bayesian Inference, Concepts and Applications

**2. Approximate dynamic programming**

augmenting further the size of the problem.

dimensionality problem of DP.

Value Functions, which represents the continuation value of each state. The continuation value associated to a given state is the expected sum of rewards that the optimal strategy would yield from that state until the end of the process (or the expected average reward if the MDP is infinite). It is easy to see how the value functions of a MDP can be found using a classic backwards DP algorithm. Starting from the final states, a DP algorithm exhaustively calculates the continuation value for a discrete number of states. All these continuation values, collected in a lookup table, constitute later the Value Functions which are accurate but not very compact or easy to calculate. After acquiring the Value Functions, it is simple to find an optimal decision for each state as the one According to the Bellman´s Principle of Optimality, for every MDP can be determined a series of Value Functions, which represents the continuation value of each state. The continuation value associated to a given state is the expected sum of rewards that the optimal strategy would yield from that state until the end of the process (or the expected average reward if the MDP is infinite).

that maximizes the sum of the expected reward and the expected continuation value of the next state or states. However, the problems that a DP algorithm can address by this procedure are restricted by the size of their state spaces. Other forms to represent and/or approximate the Value Functions can then be proposed. These approaches should not require exhaustive calculation of every state. The Value Functions can be interpolated between computed states. Approximate Dynamic Programming algorithms are built on this cornerstone: approximation of the Value Functions in the state space domain. The estimation methods can be linear regressions, artificial neural networks, etc. Several authors make detailed analysis of MDPs and the use of ADP and DP algorithms to solve them [2],[3]. For approximating and updating the Value Functions, the proposed algorithm uses linear regression on Gaussian radial basis functions jointly with Monte Carlo simulations to consider randomness. An It is easy to see how the value functions of a MDP can be found using a classic backwards DP algorithm. Starting from the final states, a DP algorithm exhaustively calculates the continu‐ ation value for a discrete number of states. All these continuation values, collected in a lookup table, constitute later the Value Functions which are accurate but not very compact or easy to calculate. After acquiring the Value Functions, it is simple to find an optimal decision for each state as the one that maximizes the sum of the expected reward and the expected continuation value of the next state or states. However, the problems that a DP algorithm can address by this procedure are restricted by the size of their state spaces.

> The ADP algorithm starts with a series of approximations of the value functions, usually constant. Then taking a Monte Carlo sample, a simulation of the system is conducted. At each decision stage, the algorithm makes a decision that is optimal regarding to the current state and the available approximations. Finally, after each Monte Carlo simulation, decisions and outcomes are used to refine the estimation of the Value Functions and a complementary Risk Functions, denoted by ܸ௧ and ܴ௧ respectively. The process continues iteratively, until a certain termination criterion is fulfilled. A

interior-point optimization algorithm is implemented to make decisions.

simple diagram of this approach is illustrated inFigure 2.

3

Other forms to represent and/or approximate the Value Functions can then be proposed. These approaches should not require exhaustive calculation of every state. The Value Functions can be interpolated between computed states. Approximate Dynamic Programming algorithms are built on this cornerstone: approximation of the Value Functions in the state space domain. The estimation methods can be linear regressions, artificial neural networks, etc. Several authors make detailed analysis of MDPs and the use of ADP and DP algorithms to solve them [2],[3].

For approximating and updating the Value Functions, the proposed algorithm uses linear regression on Gaussian radial basis functions jointly with Monte Carlo simulations to consider randomness. An interior-point optimization algorithm is implemented to make decisions.

The ADP algorithm starts with a series of approximations of the value functions, usually constant. Then taking a Monte Carlo sample, a simulation of the system is conducted. At each decision stage, the algorithm makes a decision that is optimal regarding to the current state and the available approximations. Finally, after each Monte Carlo simulation, decisions and outcomes are used to refine the estimation of the Value Functions and complementary Risk Functions, denoted by *Vt* and *Rt* respectively. The process continues iteratively, until a certain termination criterion is fulfilled. A simple diagram of this approach is illustrated in Figure 2.

**Figure 2.** Simulation and estimation update as performed by the proposed ADP algorithm

#### An approximation of a system involves finding a function that can predict the output values for a given set of inputs. In this case, the inputs are the state variables and the decision and the outputs are **2.1. Regression technique**

**2.1. Regression technique** 

regression based on linear parameters.

recursive regression technique can be used.

Let be a set of inputs **X** and a set of collected data from outputs **Y**

*xx x xx x*

,1 ,2 ,

data points and *h* dimensions. Then, the approximation function is:

**<sup>x</sup> <sup>X</sup>**

*xx x*

1,1 1,2 1, 1 2,1 2,2 2, 2

 

*n n ng n*

,

*g g*

the continuation value and risk. To approximate the value functions one could use several methods such as linear regressions, artificial neural networks, splines, etc. In the context of electricity trading, it will be discussed how to use linear regression with radial basis functions. The same principles and techniques apply to any An approximation of a system involves finding a function that can predict the output values for a given set of inputs. In this case, the inputs are the state variables and the decision and the outputs are the continuation value and risk.

The main feature of the algorithm is that the approximation is used to make decisions while collecting new data to further improve the currently available approximation. Thus, it is necessary that the regressed function fitting the data can be readily updated as new data is simulated, making the improved approximations immediately available for the next simulation. To fulfill this requirement, a

**x**

1,1 1,2 1, 1 2,1 2,2 2, 2

 

*n n nh n*

*h h*

(1)

**y**

**y**

,1 ,2 ,

**<sup>y</sup> <sup>Y</sup>**

*yy y*

*yy y yy y*

**x**

The matrix of inputs **X** has *n* data points and *g* dimensions and the matrix of outputs **Y** has *n*

Figure 2. Simulation and estimation update as performed by the proposed ADP algorithm

4

To approximate the value and risk functions one could use several methods such as linear regressions, artificial neural networks, splines, etc. In the context of electricity trading, it will be discussed how to use linear regression with radial basis functions. The same principles and

Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming

The main feature of the algorithm is that the approximation is used to make decisions while collecting new data to further improve the currently available approximation. Thus, it is necessary that the regressed function fitting the data can be readily updated as new data is simulated, making the improved approximations immediately available for the next simula‐

> 1,1 1,2 1, 1 1 1,1 1,2 1, 2,1 2,2 2, 2 2 2,1 2,2 2,

> *n n ng n n n n nh*

The matrix of inputs *X* has *n* data points and *g* dimensions and the matrix of outputs *Y* has

1,2 … *y* ^ 1,*h*

2,2 … *y* ^ 2,*h*

*<sup>n</sup>*,2 … *y* ^ *n*,*h*

⋮ ⋮⋱⋮

*g h g h*

**x y**

(1)

95

(3)

**x y**

=

*y* **^** 1 *y* **^** 2 ⋮ *y* **^** *n*

,

= = é ù ë û **θ Z Z Z Y AZ Y** (4)

1

q

q

q

2

=*φ*(*X* ) ⋅*θ* (2)

http://dx.doi.org/10.5772/57466

,1 ,2 , ,1 ,2 ,

*xx x yy y*

, é ù <sup>¼</sup> é ù é ù <sup>¼</sup> é ù ê ú <sup>¼</sup> ê ú ê ú <sup>¼</sup> ê ú ê ú ê ú ê ú ê ú <sup>=</sup> = = <sup>=</sup> ê ú ê ú ê ú ê ú ê ú ê ú <sup>¼</sup> ê ú ë û ê ú ë û <sup>¼</sup> ê ú ë û ë û M M M MOM M MOM

**x y X Y**

*xx x yy y xx x yy y*

tion. To fulfill this requirement, a recursive regression technique can be used.

techniques apply to any regression based on linear parameters.

Let be a set of inputs *X* and a set of collected data from outputs *Y*

*n* data points and *h* dimensions. Then, the approximation function is:

*y* ^ 1,1 *y* ^

*y* ^ 2,1 *y* ^

*y* ^ *<sup>n</sup>*,1 *y* ^

() () () () () ()

11 21 1 1

**xx x z xx x <sup>z</sup> φ X Z θ**

 j

 j

é ù é ù <sup>¼</sup> é ù ê ú ê ú ê ú <sup>¼</sup> ê ú ê ú <sup>=</sup> === ê ú ê ú ê ú ê ú <sup>¼</sup> ê ú ë û ê ú ë û ë û M MOM M M

*k n n kn n*

Where the kernel functions *φ*(*X* ) transform the input variables *X* into a *k*-dimensional space and are in general non-linear. Here, it is important to notice that despite the fact the *φ* functions are nonlinear, the approximation is still linear with respect the parameters *θ*. Therefore, the

The parameter vector *θ* that minimizes the mean quadratic error *MQE* of the estimated outputs

<sup>1</sup> TT T -

*k k*

 j

12 22 2 2

**xx x z**

regression parameters can easily be found by solving a linear system of equations.

() () ()

=*V* (*X* )=

1 2

jj

jj

jj

*Y* **^** <sup>=</sup>∑ *j*=1

is:

*k φj*

(*X* ) ⋅*θ <sup>j</sup>*

( )

To approximate the value and risk functions one could use several methods such as linear regressions, artificial neural networks, splines, etc. In the context of electricity trading, it will be discussed how to use linear regression with radial basis functions. The same principles and techniques apply to any regression based on linear parameters.

The main feature of the algorithm is that the approximation is used to make decisions while collecting new data to further improve the currently available approximation. Thus, it is necessary that the regressed function fitting the data can be readily updated as new data is simulated, making the improved approximations immediately available for the next simula‐ tion. To fulfill this requirement, a recursive regression technique can be used.

Let be a set of inputs *X* and a set of collected data from outputs *Y*

Other forms to represent and/or approximate the Value Functions can then be proposed. These approaches should not require exhaustive calculation of every state. The Value Functions can be interpolated between computed states. Approximate Dynamic Programming algorithms are built on this cornerstone: approximation of the Value Functions in the state space domain. The estimation methods can be linear regressions, artificial neural networks, etc. Several authors make detailed analysis of MDPs and the use of ADP and DP algorithms to solve them [2],[3].

94 Dynamic Programming and Bayesian Inference, Concepts and Applications

For approximating and updating the Value Functions, the proposed algorithm uses linear regression on Gaussian radial basis functions jointly with Monte Carlo simulations to consider randomness. An interior-point optimization algorithm is implemented to make decisions.

The ADP algorithm starts with a series of approximations of the value functions, usually constant. Then taking a Monte Carlo sample, a simulation of the system is conducted. At each decision stage, the algorithm makes a decision that is optimal regarding to the current state and the available approximations. Finally, after each Monte Carlo simulation, decisions and outcomes are used to refine the estimation of the Value Functions and complementary Risk Functions, denoted by *Vt* and *Rt* respectively. The process continues iteratively, until a certain termination criterion is fulfilled. A simple diagram of this approach is illustrated in Figure 2.

> For Monte Carlo sample ݓ For all periods ݐ

Figure 2. Simulation and estimation update as performed by the proposed ADP algorithm

Update approximations ܸ௧ and ܴ௧ for state and strategy

Terminate

New datapoint

Strategy: max ܸ௧, ܴ௧ ܴ௫

Simulation period ݐ

Profit (Profit Period *t* + max ܸ௧ାଵ)

௪

Determination of state variables ݏ௧

An approximation of a system involves finding a function that can predict the output values for a given set of inputs. In this case, the inputs are the state variables and the decision and the outputs are

To approximate the value functions one could use several methods such as linear regressions, artificial neural networks, splines, etc. In the context of electricity trading, it will be discussed how to use linear regression with radial basis functions. The same principles and techniques apply to any

An approximation of a system involves finding a function that can predict the output values for a given set of inputs. In this case, the inputs are the state variables and the decision and the

The main feature of the algorithm is that the approximation is used to make decisions while collecting new data to further improve the currently available approximation. Thus, it is necessary that the regressed function fitting the data can be readily updated as new data is simulated, making the improved approximations immediately available for the next simulation. To fulfill this requirement, a

**x**

1,1 1,2 1, 1 2,1 2,2 2, 2

 

*n n nh n*

*h h*

(1)

**y**

**y**

,1 ,2 ,

**<sup>y</sup> <sup>Y</sup>**

*yy y*

*yy y yy y*

**x**

The matrix of inputs **X** has *n* data points and *g* dimensions and the matrix of outputs **Y** has *n*

**2.1. Regression technique** 

**2.1. Regression technique**

the continuation value and risk.

regression based on linear parameters.

outputs are the continuation value and risk.

recursive regression technique can be used.

Let be a set of inputs **X** and a set of collected data from outputs **Y**

*xx x xx x*

,1 ,2 ,

data points and *h* dimensions. Then, the approximation function is:

**<sup>x</sup> <sup>X</sup>**

*xx x*

1,1 1,2 1, 1 2,1 2,2 2, 2

**Figure 2.** Simulation and estimation update as performed by the proposed ADP algorithm

 

*n n ng n*

,

*g g*

4

$$\mathbf{X} = \begin{bmatrix} \mathbf{x}\_{1,1} & \mathbf{x}\_{1,2} & \dots & \mathbf{x}\_{1,g} \\ \mathbf{x}\_{2,1} & \mathbf{x}\_{2,2} & \dots & \mathbf{x}\_{2,g} \\ \vdots & \vdots & \ddots & \vdots \\ \mathbf{x}\_{n,1} & \mathbf{x}\_{n,2} & \dots & \mathbf{x}\_{n,g} \end{bmatrix} = \begin{bmatrix} \mathbf{x}\_{1} \\ \mathbf{x}\_{2} \\ \vdots \\ \mathbf{x}\_{n} \end{bmatrix}, \mathbf{Y} = \begin{bmatrix} \mathcal{Y}\_{1,1} & \mathcal{Y}\_{1,2} & \dots & \mathcal{Y}\_{1,h} \\ \mathcal{Y}\_{2,1} & \mathcal{Y}\_{2,2} & \dots & \mathcal{Y}\_{2,h} \\ \vdots & \vdots & \ddots & \vdots \\ \mathcal{Y}\_{n,1} & \mathcal{Y}\_{n,2} & \dots & \mathcal{Y}\_{n,h} \end{bmatrix} = \begin{bmatrix} \mathbf{y}\_{1} \\ \mathbf{y}\_{2} \\ \vdots \\ \mathbf{y}\_{n} \end{bmatrix} \tag{1}$$

The matrix of inputs *X* has *n* data points and *g* dimensions and the matrix of outputs *Y* has *n* data points and *h* dimensions. Then, the approximation function is:

$$\hat{\mathbf{Y}} = \sum\_{j=1}^{k} \boldsymbol{\varphi}\_{j}(\mathbf{X}) \cdot \boldsymbol{\Theta}^{j} = \mathbf{V}(\mathbf{X}) = \begin{bmatrix} \hat{\mathbf{y}}\_{1,1} & \hat{\mathbf{y}}\_{1,2} & \dots & \hat{\mathbf{y}}\_{1,h} \\ \hat{\boldsymbol{\lambda}} & \hat{\boldsymbol{\lambda}} & & \hat{\boldsymbol{\lambda}} \\ \hat{\mathbf{y}}\_{2,1} & \hat{\mathbf{y}}\_{2,2} & \dots & \hat{\mathbf{y}}\_{2,h} \\ \vdots & \vdots & \ddots & \vdots \\ \hat{\mathbf{y}}\_{n,1} & \hat{\mathbf{y}}\_{n,2} & \dots & \hat{\mathbf{y}}\_{n,h} \end{bmatrix} = \begin{bmatrix} \hat{\mathbf{y}}\_{1} \\ \hat{\mathbf{y}}\_{2} \\ \vdots \\ \hat{\mathbf{y}}\_{n} \end{bmatrix} = \boldsymbol{\rho}(\mathbf{X}) \cdot \boldsymbol{\Theta} \tag{2}$$

$$\boldsymbol{\Phi}(\mathbf{X}) = \begin{bmatrix} \boldsymbol{\phi}\_{1}(\mathbf{x}\_{1}) & \boldsymbol{\phi}\_{2}(\mathbf{x}\_{1}) & \dots & \boldsymbol{\phi}\_{k}(\mathbf{x}\_{1})\\ \boldsymbol{\phi}\_{1}(\mathbf{x}\_{2}) & \boldsymbol{\phi}\_{2}(\mathbf{x}\_{2}) & \dots & \boldsymbol{\phi}\_{k}(\mathbf{x}\_{2})\\ \vdots & \vdots & \ddots & \vdots\\ \boldsymbol{\phi}\_{1}(\mathbf{x}\_{n}) & \boldsymbol{\phi}\_{2}(\mathbf{x}\_{n}) & \dots & \boldsymbol{\phi}\_{k}(\mathbf{x}\_{n}) \end{bmatrix} = \begin{bmatrix} \mathbf{z}\_{1} \\ \mathbf{z}\_{2} \\ \vdots \\ \mathbf{z}\_{n} \end{bmatrix} = \mathbf{Z}, \ \mathbf{0} = \begin{bmatrix} \boldsymbol{\theta}^{1} \\ \boldsymbol{\theta}^{2} \\ \vdots \\ \boldsymbol{\theta}^{k} \end{bmatrix} \tag{3}$$

Where the kernel functions *φ*(*X* ) transform the input variables *X* into a *k*-dimensional space and are in general non-linear. Here, it is important to notice that despite the fact the *φ* functions are nonlinear, the approximation is still linear with respect the parameters *θ*. Therefore, the regression parameters can easily be found by solving a linear system of equations.

The parameter vector *θ* that minimizes the mean quadratic error *MQE* of the estimated outputs is:

$$\mathbf{0} = \left[\mathbf{Z}^{\mathrm{T}}\mathbf{Z}\right]^{-1}\mathbf{Z}^{\mathrm{T}}\mathbf{Y} = \mathbf{A}\mathbf{Z}^{\mathrm{T}}\mathbf{Y} \tag{4}$$

$$\mathbf{A} = \left[\mathbf{Z}^{\mathrm{T}} \mathbf{Z}\right]^{-1} \tag{5}$$

( )

(13)


 r

1

r

é ù *i i ii i ii i* ë û **θ θ A z zθ y**(19)

**θ θ A z zθ zA z y y**(16)

*i*

(12)

97

http://dx.doi.org/10.5772/57466

1 1 T TT <sup>1</sup> 11 11 <sup>T</sup> T T 1 1 1 1 <sup>1</sup> T T 1 1 1


<sup>+</sup> é ù è ø ë û

Replacing *Zi*−<sup>1</sup>

Naming *ρ<sup>i</sup>* <sup>=</sup> <sup>1</sup>

Replacing <sup>1</sup>

Now, simplifying *Ai*−1*Zi*−<sup>1</sup>

Taking common factor −*ρ<sup>i</sup> Ai*−1*z<sup>i</sup>*

*<sup>ρ</sup><sup>i</sup>* =1+ *z<sup>i</sup> Ai*−<sup>1</sup> *z<sup>i</sup>*

<sup>T</sup>*Zi*−<sup>1</sup> <sup>−</sup><sup>1</sup> = *Ai*−<sup>1</sup>

1 + *z<sup>i</sup> Ai*−<sup>1</sup> *z<sup>i</sup>*

æ ö é ùé ù ç ÷ ë ûë û <sup>=</sup> é ù - + ë û

**Z Z zz Z Z θ ZZ Z Y zy zZ Z z**

*i i ii i i i ii i i ii ii i i*

1 T 1 1

11 1 1 1 11 1 1 1 **θ A Z Y A zy A zzA Z Y A zzA zy** *i i i i i i i ii ii i i i ii ii i i i* = +- -- - - - -- - - -

11 11 1 1

T of the last 3 terms:


11 1 1 1

11 1 1 1

<sup>T</sup> <sup>=</sup> -- - 11 1 - r

Finally, canceling out we get a formula to update the parameter vector:

=- + - ê ú ë û *i i ii i ii i i i i i*

T T

 r

11 1 1


= ç ÷ - +

<sup>1</sup> 1

æ ö

+ è ø *i ii i i i i i ii ii i* **A zzA θ A Z Y zy zA z**

r

T and distributing:

<sup>T</sup>*Yi*−<sup>1</sup> =*θi*−<sup>1</sup>

r

T:

r

r

r


Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming

( ) <sup>T</sup> 1 1 T T

T T T T TT

T TT T

é ù

( ) T TT

T TT

<sup>1</sup> -- - - - é ù = - + -+ ê ú ë û *i i ii i ii i i i i ii i i* **θ θ A z z θ z A z y zA z y**(17)


**θ θ A z zθ A z zA z y A z y** *i i ii iii ii ii i i i i i i* = -- -- - - -- +(15)

The dimension of matrices *X* , *Y* and *Z* increase as new points are simulated, and thus the estimated parameters change as new data is collected. On the other hand, the algorithm needs the approximation to get new data. One approach could be calculating the new parameters after a number of simulations with equation (4) but it soon reveals impractical as an increasing amount of data need to be stored. Instead, by storing only the matrix *A* and the parameter vector θ, the matrix itself can recursively be updated jointly with the parameters, following the process described in [2]:

In the *i*-th iteration, a new data point *x<sup>i</sup>* , *y<sup>i</sup>* is simulated and the matrices *X* and *Y* become:

$$\mathbf{X}\_{i} = \begin{bmatrix} \mathbf{X}\_{i-1} \\ \mathbf{x}\_{i} \end{bmatrix}, \mathbf{Y}\_{i} = \begin{bmatrix} \mathbf{Y}\_{i-1} \\ \mathbf{y}\_{i} \end{bmatrix} \tag{6}$$

Also the matrix *Z* becomes:

$$\mathbf{Z}\_{i} = \begin{bmatrix} \boldsymbol{\Phi} \left( \mathbf{X}\_{i-1} \right) \\ \boldsymbol{\Phi} \left( \mathbf{x}\_{i} \right) \end{bmatrix} = \begin{bmatrix} \mathbf{Z}\_{i-1} \\ \mathbf{z}\_{i} \end{bmatrix} \tag{7}$$

Replacing these values in (4):

$$\mathbf{0}\_{i} = \left[\mathbf{Z}\_{i}^{\mathrm{T}} \mathbf{Z}\_{i}\right]^{-1} \mathbf{Z}\_{i}^{\mathrm{T}} \mathbf{Y}\_{i} = \mathbf{A}\_{i} \, \mathbf{Z}\_{i}^{\mathrm{T}} \mathbf{Y}\_{i} \tag{8}$$

$$\mathbf{0}\_{i} = \left[ \begin{bmatrix} \mathbf{Z}\_{i-1} \mathbf{}^{\mathrm{T}} & \mathbf{z}\_{i}^{\mathrm{T}} \\\\ \mathbf{z}^{i} \end{bmatrix} \begin{bmatrix} \mathbf{Z}\_{i-1} \\\\ \mathbf{z}^{i} \end{bmatrix} \right]^{-1} \left[ \mathbf{Z}\_{i-1} \mathbf{z}^{\mathrm{T}} \begin{bmatrix} \mathbf{Y}\_{i-1} \\\\ \mathbf{y}\_{i} \end{bmatrix} \right] \begin{array}{c} \mathbf{Y}\_{i-1} \\\\ \mathbf{y}\_{i} \end{array} \tag{9}$$

$$\mathbf{0}\_{i} = \left[\mathbf{Z}\_{i-1}\,^{\mathrm{T}}\mathbf{Z}\_{i-1} + \mathbf{z}\_{i}\,^{\mathrm{T}}\mathbf{z}\_{i}\right]^{-1} \left[\mathbf{Z}\_{i-1}\,^{\mathrm{T}}\mathbf{Y}\_{i-1} + \mathbf{z}\_{i}\,^{\mathrm{T}}\mathbf{y}\_{i}\right] \tag{10}$$

Using the Sherman-Morrison formula:

$$\mathbb{E}\left[\mathbf{L} + \boldsymbol{\mu}^{\mathrm{T}}\boldsymbol{\mu}\right]^{-1} = \mathbf{L}^{-1} - \frac{\mathbf{L}^{-1}\boldsymbol{\mu}^{\mathrm{T}}\boldsymbol{\mu}\mathbf{L}^{-1}}{1 + \boldsymbol{\mu}\mathbf{L}^{-1}\boldsymbol{\mu}^{\mathrm{T}}}\tag{11}$$

Where *L* is a *k* ×*k* invertible matrix and *u* is a *k*-dimensional row vector:

Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming http://dx.doi.org/10.5772/57466 97

$$\mathbf{0}\_{i} = \left[ \left[ \mathbf{Z}\_{i-1} \mathbf{^T Z}\_{i-1} \right]^{-1} - \frac{\left[ \mathbf{Z}\_{i-1} \mathbf{^T Z}\_{i-1} \right]^{-1} \mathbf{z}\_{i}^{\top} \mathbf{z}\_{i} \left[ \mathbf{Z}\_{i-1} \mathbf{^T Z}\_{i-1} \right]^{-1}}{1 + \mathbf{z}\_{i} \left[ \mathbf{Z}\_{i-1} \mathbf{^T Z}\_{i-1} \right]^{-1} \mathbf{z}\_{i}^{\top}} \right] \left( \mathbf{Z}\_{i-1} \mathbf{^T Y}\_{i-1} + \mathbf{z}\_{i}^{\top} \mathbf{y}\_{i} \right) \tag{12}$$

Replacing *Zi*−<sup>1</sup> <sup>T</sup>*Zi*−<sup>1</sup> <sup>−</sup><sup>1</sup> = *Ai*−<sup>1</sup>

<sup>1</sup> <sup>T</sup> - = é ù

The dimension of matrices *X* , *Y* and *Z* increase as new points are simulated, and thus the estimated parameters change as new data is collected. On the other hand, the algorithm needs the approximation to get new data. One approach could be calculating the new parameters after a number of simulations with equation (4) but it soon reveals impractical as an increasing amount of data need to be stored. Instead, by storing only the matrix *A* and the parameter vector θ, the matrix itself can recursively be updated jointly with the parameters, following

, *y<sup>i</sup>*

*i i*

( ) ( ) <sup>1</sup> <sup>1</sup>- - é ù é ù = = ê ú ê ú ê ú ë û ë û *i i*

*i*

**Z**

**X Y**

1 1 , - - é ù éù = = ê ú êú ë û ëû *i i*

*i i* **X Y**

*i i* **φ X Z**

<sup>1</sup> TT T -

1 T T 1 T T 1

*i i*

**Z Y**


> <sup>1</sup> TT TT 1 1 1 1 -

> > 1T 1 <sup>1</sup> T 1


Where *L* is a *k* ×*k* invertible matrix and *u* is a *k*-dimensional row vector:


1 T 1

**L**

= = é ù

1 1

**θZ z Z z <sup>z</sup> <sup>y</sup>**

*iii ii i*

the process described in [2]:

Also the matrix *Z* becomes:

Replacing these values in (4):

Using the Sherman-Morrison formula:

In the *i*-th iteration, a new data point *x<sup>i</sup>*

96 Dynamic Programming and Bayesian Inference, Concepts and Applications

ë û **A ZZ** (5)

is simulated and the matrices *X* and *Y* become:

**x y** (6)

**φ x <sup>z</sup>** (7)

ë û *i i i i i ii i* **θ Z Z Z Y AZ Y** (8)

*i*

*u u u u u u* (11)


(9)

$$\mathbf{0}\_{i} = \left(\mathbf{A}\_{i-1} - \frac{\mathbf{A}\_{i-1}\mathbf{z}\_{i}^{\top}\mathbf{z}\_{i}\ \mathbf{A}\_{i-1}}{1 + \mathbf{z}\_{i}\ \mathbf{A}\_{i-1}\ \mathbf{z}\_{i}^{\top}}\right) \left(\mathbf{Z}\_{i-1}{}^{\top}\mathbf{Y}\_{i-1} + \mathbf{z}\_{i}^{\top}\mathbf{y}\_{i}\right) \tag{13}$$

Naming *ρ<sup>i</sup>* <sup>=</sup> <sup>1</sup> 1 + *z<sup>i</sup> Ai*−<sup>1</sup> *z<sup>i</sup>* T and distributing:

$$\mathbf{0}\_{i} = \mathbf{A}\_{i-1} \mathbf{Z}\_{i-1} \mathbf{^T Y}\_{i-1} + \mathbf{A}\_{i-1} \mathbf{z}\_{i} \mathbf{^T y}\_{i} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i} \mathbf{^T z}\_{i} \ \mathbf{A}\_{i-1} \mathbf{Z}\_{i-1} \mathbf{^T Y}\_{i-1} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i} \ \mathbf{^T z}\_{i} \ \mathbf{A}\_{i-1} \mathbf{z}\_{i} \ \mathbf{^T y}\_{i} \tag{14}$$

Now, simplifying *Ai*−1*Zi*−<sup>1</sup> <sup>T</sup>*Yi*−<sup>1</sup> =*θi*−<sup>1</sup>

$$\mathbf{0}\_{i} = \mathbf{0}\_{i-1} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{z}\_{i} \ \mathbf{0}\_{i-1} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{z}\_{i} \ \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{y}\_{i} + \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{y}\_{i} \tag{15}$$

Taking common factor −*ρ<sup>i</sup> Ai*−1*z<sup>i</sup>* T of the last 3 terms:

$$\mathbf{0}\_{i} = \mathbf{0}\_{i-1} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \left[ \mathbf{z}\_{i} \ \mathbf{0}\_{i-1} + \mathbf{z}\_{i} \ \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{y}\_{i} - \frac{1}{\rho\_{i}} \mathbf{y}\_{i} \right] \tag{16}$$

Replacing <sup>1</sup> *<sup>ρ</sup><sup>i</sup>* =1+ *z<sup>i</sup> Ai*−<sup>1</sup> *z<sup>i</sup>* T:

$$\mathbf{0}\_{i} = \mathbf{0}\_{i-1} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \left[ \mathbf{z}\_{i} \ \mathbf{0}\_{i-1} + \mathbf{z}\_{i} \ \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \ \mathbf{y}\_{i} - \left( \mathbf{l} + \mathbf{z}\_{i} \ \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \right) \mathbf{y}\_{i} \right] \tag{17}$$

$$\mathbf{0}\_{i} = \mathbf{0}\_{i-1} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \left[ \mathbf{z}\_{i} \ \mathbf{0}\_{i-1} + \mathbf{z}\_{i} \ \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{y}\_{i} - \mathbf{y}\_{i} - \mathbf{z}\_{i} \ \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{y}\_{i} \right] \tag{18}$$

Finally, canceling out we get a formula to update the parameter vector:

$$\mathbf{0}\_{i} = \mathbf{0}\_{i-1} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \left[ \mathbf{z}\_{i} \ \mathbf{0}\_{i-1} - \mathbf{y}\_{i} \right] \tag{19}$$

Note that in order to update the approximations, it is needed to store and update only the values of the matrix *A* and the parameter vector *θ*. The complete set of update formulas is then:

$$\rho\_i = \frac{1}{1 + \mathbf{z}\_i \mathbf{A}\_{i-1} \mathbf{z}\_i^T} \tag{20}$$

( )

1

=

*p*

and the corresponding expected value of (1 / *zp*, *<sup>j</sup>*

*i*

**A**

The matrix *A<sup>i</sup>*

parameter vector *θ*.

to work most of the time.

**^**

large *δ* the starting vector value is lost in the first iterations.

from the formula *Y*

**2.2. Kernel functions**

formula is:

*a <sup>j</sup>*, *<sup>j</sup>*

2

<sup>¼</sup> <sup>=</sup>

( )

*zz z z z*

å M MOM

*pp p p pk <sup>i</sup>*

é ù é ù ê ú ê ú <sup>¼</sup>

*z zz zz*

,1 , ,2 , ,

increasing with *i*. That means that the absolute value of the diagonal elements of *A* would in general decrease as *i* increases. The relation factor between the starting value of all the elements

( )<sup>2</sup>

Setting starting diagonal elements with a small *δ* value implies implies a large number of iterations *i*. This would mean that the starting parameter vector represents already a large amount of "fictional" data. In that case, any new real data sample will have little impact on the parameter vector, and the convergence rate of the approximation might be significantly slowed. On the other hand, setting initial diagonal values calculated with large *δ* enables the algorithm to large modifications on the parameter vector as new data is collected, which could cause instability of the parameter vector and in the decision computed, again reducing the overall convergence rate. In practice, the factor *δ* depends on the problem and it must be assessed accordingly. From extensive experimentation, values of *δ* between 1 and 1000 tend

For initializing the parameter vector *θ*0, it is usual to set a constant vector. Values obtained

starting values of the parameter vector are important in cases where *δ* is small, while for a

The approximations are highly influenced by the type and parameterization of the kernel functions. These functions transform the original inputs *xm*, 1≤*m*≤ *g* into a *k*-dimensional space where a linear correlation between outputs and transformed inputs is more accurate.

There are several kernel functions, such as polynomial, trigonometric, logarithmic, radial basis, etc. The algorithm proposed in this work uses Gaussian radial basis functions, whose general

*<sup>s</sup>* =*φ*(E(*X* )) ⋅*θ*<sup>0</sup> produce coherent results. It is important to notice that the

, , 1/ *j j p j a z* d

*p pk p pk pk*

<sup>¼</sup> ë û ë û

*zz z z z*

,1 ,1 ,2 ,1 , 2 ,1 ,2 ,2 ,2 ,

*p pp p pk*

( )

Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming

is the inverse of a matrix sum whose diagonal elements are positive and

2

1

)2 controls how new values modify the

é ù = × ê ú ë û <sup>E</sup> (28)

(27)

99

http://dx.doi.org/10.5772/57466


$$\mathbf{0}\_{i} = \mathbf{0}\_{i-1} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\mathrm{T}} \left[ \mathbf{z}\_{i} \; \mathbf{0}\_{i-1} - \mathbf{y}\_{i} \right] \tag{21}$$

$$\mathbf{A}\_{i} = \mathbf{A}\_{i-1} - \rho\_{i} \mathbf{A}\_{i-1} \mathbf{z}\_{i}^{\top} \mathbf{z}\_{i} \ \mathbf{A}\_{i-1} \tag{22}$$

By using the equations (20), (21) and (22), it is possible to update a regression while using it to take decisions within the ADP algorithm without explicitly storing the entire dataset and performing the matrix inversion after each Monte Carlo simulation. However, an important issue arises at the start of the recursive process, namely the starting values for both the parameter vector *θ* and the matrix *A*. One possibility is to carry first a number of simulations with random decisions, and then use the data collected to calculate the starting values for *θ* and *A*. Another option is to use a diagonal matrix:

$$\mathbf{A}\_{l} = \begin{bmatrix} a\_{1,1} & 0 & \dots & 0 \\ 0 & a\_{2,2} & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \dots & a\_{k,k} \end{bmatrix} \tag{23}$$

This option is usually simpler despite the fact that setting the absolute values of the elements requires care and it is not always clear in the literature. To set suitable values, a closer look at the regression formulas for matrix A is needed:

$$\mathbf{A}\_{i} = \left[\mathbf{Z}\_{i}^{\mathrm{T}} \mathbf{Z}\_{i}\right]^{-1} = \left[\mathbf{Z}\_{i-1} \mathbf{z}\_{i-1}^{\mathrm{T}} \mathbf{Z}\_{i-1} + \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{z}\_{i}\right]^{-1} \tag{24}$$

$$\mathbf{A}\_{i} = \left[\mathbf{Z}\_{i}\mathbf{Z}\_{i}^{\mathrm{T}}\mathbf{Z}\_{i}\right]^{-1} = \left[\mathbf{Z}\_{i-2}\,\mathrm{^{\mathrm{T}}}\mathbf{Z}\_{i-2} + \mathbf{z}\_{i-1}\,\mathrm{^{\mathrm{T}}}\mathbf{z}\_{i-1} + \mathbf{z}\_{i}\,\mathrm{^{\mathrm{T}}}\mathbf{z}\_{i}\right]^{-1} \tag{25}$$

$$\mathbf{A}\_{i} = \left[\mathbf{Z}\_{i}\,^{\mathrm{T}}\mathbf{Z}\_{i}\right]^{-1} = \left[\sum\_{p=1}^{I} \mathbf{z}\_{p}\,^{\mathrm{T}}\mathbf{z}\_{p}\right]^{-1} \tag{26}$$

Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming http://dx.doi.org/10.5772/57466 99

$$\mathbf{A}\_{i} = \left[ \sum\_{p=1}^{i} \begin{bmatrix} \left(z\_{p,1}\right)^{2} & z\_{p,1}z\_{p,2} & \dots & z\_{p,1}z\_{p,k} \\ z\_{p,1}z\_{p,2} & \left(z\_{p,2}\right)^{2} & \dots & z\_{p,2}z\_{p,k} \\ \vdots & \vdots & \ddots & \vdots \\ z\_{p,1}z\_{p,k} & z\_{p,2}z\_{p,k} & \dots & \left(z\_{p,k}\right)^{2} \end{bmatrix} \right]^{-1} \tag{27}$$

The matrix *A<sup>i</sup>* is the inverse of a matrix sum whose diagonal elements are positive and increasing with *i*. That means that the absolute value of the diagonal elements of *A* would in general decrease as *i* increases. The relation factor between the starting value of all the elements *a <sup>j</sup>*, *<sup>j</sup>* and the corresponding expected value of (1 / *zp*, *<sup>j</sup>* )2 controls how new values modify the parameter vector *θ*.

$$a\_{j,j} = \mathcal{S} \cdot \mathbb{E}\left[\mathbf{1}\left(z\_{p,j}\right)^2\right] \tag{28}$$

Setting starting diagonal elements with a small *δ* value implies implies a large number of iterations *i*. This would mean that the starting parameter vector represents already a large amount of "fictional" data. In that case, any new real data sample will have little impact on the parameter vector, and the convergence rate of the approximation might be significantly slowed. On the other hand, setting initial diagonal values calculated with large *δ* enables the algorithm to large modifications on the parameter vector as new data is collected, which could cause instability of the parameter vector and in the decision computed, again reducing the overall convergence rate. In practice, the factor *δ* depends on the problem and it must be assessed accordingly. From extensive experimentation, values of *δ* between 1 and 1000 tend to work most of the time.

For initializing the parameter vector *θ*0, it is usual to set a constant vector. Values obtained from the formula *Y* **^** *<sup>s</sup>* =*φ*(E(*X* )) ⋅*θ*<sup>0</sup> produce coherent results. It is important to notice that the starting values of the parameter vector are important in cases where *δ* is small, while for a large *δ* the starting vector value is lost in the first iterations.

## **2.2. Kernel functions**

Note that in order to update the approximations, it is needed to store and update only the values of the matrix *A* and the parameter vector *θ*. The complete set of update formulas is


1 r

<sup>T</sup> <sup>=</sup> -- - 11 1 - r

<sup>T</sup> **A A A zzA** *i i ii ii i* = - -- - 11 1 r

1,1

*a*

0 0

2,2

*a*

é ù <sup>¼</sup> ê ú <sup>¼</sup> <sup>=</sup>

ë û <sup>¼</sup> M MOM *<sup>i</sup>*

This option is usually simpler despite the fact that setting the absolute values of the elements requires care and it is not always clear in the literature. To set suitable values, a closer look at

> 1 1 T TT 1 1 - -

1 1 T T TT 2 2 11 - -

<sup>1</sup> T T

é ù ê ú = = ë û ê ú


**A ZZ z z** *i ii pp* å

1

é ù

ê ú ë û

=

*p*

*i*

0 0 0 0

,

**A** (23)



1

(26)


*k k*

*a*

and *A*. Another option is to use a diagonal matrix:

98 Dynamic Programming and Bayesian Inference, Concepts and Applications

the regression formulas for matrix A is needed:

By using the equations (20), (21) and (22), it is possible to update a regression while using it to take decisions within the ADP algorithm without explicitly storing the entire dataset and performing the matrix inversion after each Monte Carlo simulation. However, an important issue arises at the start of the recursive process, namely the starting values for both the parameter vector *θ* and the matrix *A*. One possibility is to carry first a number of simulations with random decisions, and then use the data collected to calculate the starting values for *θ*

<sup>=</sup> <sup>+</sup> *i*

T 1 1

*ii i* **zA z** (20)

(22)

é ù *i i ii i ii i* ë û **θ θ A z zθ y**(21)

then:

The approximations are highly influenced by the type and parameterization of the kernel functions. These functions transform the original inputs *xm*, 1≤*m*≤ *g* into a *k*-dimensional space where a linear correlation between outputs and transformed inputs is more accurate.

There are several kernel functions, such as polynomial, trigonometric, logarithmic, radial basis, etc. The algorithm proposed in this work uses Gaussian radial basis functions, whose general formula is:

$$\varphi\_j(\mathbf{x}) = e^{-\left(\alpha \parallel \|\mathbf{x} - \mathbf{c}\_j\|\right)^2} \tag{29}$$

regression data collected. The data simulated by several threads of ADP can be easily combined using these matrices. As parallel ADP threads *a* and *b* gather different sets of data, the

> 1 <sup>1</sup> T T

Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming

é ùé ù

**θ B C zz zy** (32)

http://dx.doi.org/10.5772/57466

101

**θ B C zz zy** (33)

**θ zz zz zy zy** (34)

*ab ab* **θB B CC** (35)

*i i- i i* **B B zz** = + (36)

<sup>T</sup> -<sup>1</sup> = + *j j* **C C zy** *<sup>i</sup> <sup>i</sup> i i* (37)

**B B** (38)

**C C** (39)


1 1

1 <sup>1</sup> T T -

é ùé ù

*ii ii i v i v*

= =

= =

*i i*

= = é ù ê úê ú <sup>×</sup> ë û ë ûë û å å *v v*

= = é ù ê úê ú <sup>×</sup> ë û ë ûë û å å *n n*

1 TT TT


é ù é ù =+ ×+ <sup>ê</sup> ú ê <sup>ú</sup> <sup>ê</sup> ú ê <sup>ú</sup> <sup>ë</sup> û ë <sup>û</sup> åå åå *vn vn*

*ii ii ii ii i iv i iv*

In general, data gathered from several parallel algorithms can be recombined by updating and saving the matrices *B* and *C* after each simulation *i* of each thread *j*. The update formulas are:

> T 1 *j j*

After a fixed simulation time, they can be synchronized and recombined, so all the threads

=1 <sup>=</sup>å *th j*

=1 <sup>=</sup>å *t j h*

*j*

*j*


== ==

However, saving the matrices *B* and *C* allows to recombine all the data later as

1 1

*aaa ii ii*

parameter vectors that they approximate and update are different



*bbb*

This result can be generalized to any number of parallel threads.

share the same data gathered:

where *c <sup>j</sup>* is 1x*g* vector called a center or centroid, with *g* being the number of input variables.

The functions measure the distance of each variable in the input state space to *k* centers and then transform each distance using a Gaussian function. The Euclidean norm is typically used to measure the distance. Nevertheless, any other norm can be considered.

The number and relative position of centers *c <sup>j</sup>* are important. The number of centers deter‐ mines the dimension of the matrices in the linear regression, and also the smoothing and overfitting properties of the approximation. There must be enough centers and they must be placed to "cover" the entire state space, avoiding blanks in regions of interest while keeping the number of centers to a minimum. A random setting of the centers in the state space can serve in some cases. However, it is usually more appropriate to place the centers using techniques such as Latin Hyper Cube Sampling, as it is done in the implemented algorithm.

## **2.3. ADP and linear regressions in the context of distributed computing**

In the context of ADP, a large amount of the calculation is attributed to the Monte Carlo simulations. Consequently, distributed computing techniques can be exploited to dramatically improve and speed up the optimization. The optimal point between thread opening and result gathering basically depends on the hardware system, network latency and simulation times. With time-consuming simulations, it is often useful to parallelize the program after each approximation update, leaving several threads gather more data in parallel before any update. But if the simulation time is short, the overhead times to distribute the calculation are usually larger than the speed up achieved through parallel simulation. In such cases, another approach should be used. A practical approach is to run parallel ADP algorithms for the same problem in several threads in order to combine and synchronize all approximations at regular time intervals. The combination of the results of two or more independent threads is based on the same mathematical basis as the recursive update, provided that the non-linear parametric functions are identical for all threads.

As shown in equation (4), the optimal parameter vector is:

$$\mathbf{0} = \left[\mathbf{Z}^{\mathrm{T}}\mathbf{Z}\right]^{-1}\mathbf{Z}^{\mathrm{T}}\mathbf{Y} = \left[\sum\_{i=1}^{n}\mathbf{z}\_{i}^{\mathrm{T}}\mathbf{z}\_{i}\right]^{-1}\cdot\left[\sum\_{i=1}^{n}\mathbf{z}\_{i}^{\mathrm{T}}\mathbf{y}\_{i}\right] \tag{30}$$

$$\mathbf{0} = \begin{bmatrix} \mathbf{B} \end{bmatrix}^{-1} \mathbf{C}, \mathbf{B} = \sum\_{i=1}^{n} \mathbf{z}\_i^{\mathrm{T}} \mathbf{z}\_i, \mathbf{C} = \sum\_{i=1}^{n} \mathbf{z}\_i^{\mathrm{T}} \mathbf{y}\_i \tag{31}$$

where each *z<sup>i</sup>* and *y<sup>i</sup>* represent the transformed inputs and the outputs respectively of the *i*-th simulation of a total of *n* data points. As *i* increases, the matrices *B* and *C* summarize all the regression data collected. The data simulated by several threads of ADP can be easily combined using these matrices. As parallel ADP threads *a* and *b* gather different sets of data, the parameter vectors that they approximate and update are different

$$\mathbf{0}^{a} = \left[\mathbf{B}^{a}\right]^{-1}\mathbf{C}^{a} = \left[\sum\_{i=1}^{\nu} \mathbf{z}\_{i}^{\mathrm{T}}\mathbf{z}\_{i}\right]^{-1} \cdot \left[\sum\_{i=1}^{\nu} \mathbf{z}\_{i}^{\mathrm{T}}\mathbf{y}\_{i}\right] \tag{32}$$

$$\mathbf{0}^{b} = \left[\mathbf{B}^{b}\right]^{-1}\mathbf{C}^{b} = \left[\sum\_{i=\mathbf{v}}^{n} \mathbf{z}\_{i}^{\mathrm{T}}\mathbf{z}\_{i}\right]^{-1} \cdot \left[\sum\_{i=\mathbf{v}}^{n} \mathbf{z}\_{i}^{\mathrm{T}}\mathbf{y}\_{i}\right] \tag{33}$$

However, saving the matrices *B* and *C* allows to recombine all the data later as

$$\mathbf{0} = \left[\sum\_{i=1}^{\mathbf{v}} \mathbf{z}\_i^T \mathbf{z}\_i + \sum\_{i=\mathbf{v}}^n \mathbf{z}\_i^T \mathbf{z}\_i\right]^{-1} \cdot \left[\sum\_{i=1}^{\mathbf{v}} \mathbf{z}\_i^T \mathbf{y}\_i + \sum\_{i=\mathbf{v}}^n \mathbf{z}\_i^T \mathbf{y}\_i\right] \tag{34}$$

$$\mathbf{0} = \left[\mathbf{B}^a + \mathbf{B}^b\right]^{-1} \cdot \left[\mathbf{C}^a + \mathbf{C}^b\right] \tag{35}$$

This result can be generalized to any number of parallel threads.

*φj* (*x*)=*e*

to measure the distance. Nevertheless, any other norm can be considered.

such as Latin Hyper Cube Sampling, as it is done in the implemented algorithm.

**2.3. ADP and linear regressions in the context of distributed computing**

The number and relative position of centers *c <sup>j</sup>*

100 Dynamic Programming and Bayesian Inference, Concepts and Applications

functions are identical for all threads.

where each *z<sup>i</sup>*

As shown in equation (4), the optimal parameter vector is:


where *c <sup>j</sup>*

<sup>−</sup>(*<sup>α</sup> <sup>x</sup>*−*<sup>c</sup> <sup>j</sup>* )<sup>2</sup>

The functions measure the distance of each variable in the input state space to *k* centers and then transform each distance using a Gaussian function. The Euclidean norm is typically used

mines the dimension of the matrices in the linear regression, and also the smoothing and overfitting properties of the approximation. There must be enough centers and they must be placed to "cover" the entire state space, avoiding blanks in regions of interest while keeping the number of centers to a minimum. A random setting of the centers in the state space can serve in some cases. However, it is usually more appropriate to place the centers using techniques

In the context of ADP, a large amount of the calculation is attributed to the Monte Carlo simulations. Consequently, distributed computing techniques can be exploited to dramatically improve and speed up the optimization. The optimal point between thread opening and result gathering basically depends on the hardware system, network latency and simulation times. With time-consuming simulations, it is often useful to parallelize the program after each approximation update, leaving several threads gather more data in parallel before any update. But if the simulation time is short, the overhead times to distribute the calculation are usually larger than the speed up achieved through parallel simulation. In such cases, another approach should be used. A practical approach is to run parallel ADP algorithms for the same problem in several threads in order to combine and synchronize all approximations at regular time intervals. The combination of the results of two or more independent threads is based on the same mathematical basis as the recursive update, provided that the non-linear parametric

1

é ùé ù

*ii ii*

and *y<sup>i</sup>* represent the transformed inputs and the outputs respectively of the *i*-th

*ii ii*

**θ ZZ ZY z z z y** (30)

**θ B CB z z C z y** (31)


1 1

= =

*i i*

1 1

= =

*i i*

simulation of a total of *n* data points. As *i* increases, the matrices *B* and *C* summarize all the

*n n*

å å *n n*

<sup>1</sup> TT T T

1 T T

, , -

== = é ù ë û å å

<sup>=</sup> é ù <sup>=</sup> ê úê ú <sup>×</sup> ë û ë ûë û

is 1x*g* vector called a center or centroid, with *g* being the number of input variables.

are important. The number of centers deter‐

(29)

In general, data gathered from several parallel algorithms can be recombined by updating and saving the matrices *B* and *C* after each simulation *i* of each thread *j*. The update formulas are:

$$\mathbf{B}\_{i}^{j} = \mathbf{B}\_{i-1}^{j} + \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{z}\_{i} \tag{36}$$

$$\mathbf{C}\_{i}^{j} = \mathbf{C}\_{i-1}^{j} + \mathbf{z}\_{i}^{\mathrm{T}} \mathbf{y}\_{i} \tag{37}$$

After a fixed simulation time, they can be synchronized and recombined, so all the threads share the same data gathered:

$$\mathbf{B} = \sum\_{j=1}^{th} \mathbf{B}^j \tag{38}$$

$$\mathbf{C} = \sum\_{j=1}^{th} \mathbf{C}^j \tag{39}$$

Finally, each thread can restart the simulation process using as starting values for *θ* and *A*

$$\mathbf{A} = \begin{bmatrix} \mathbf{B} \end{bmatrix}^{-1} \tag{40}$$

years (mostly OTC) to several months ahead. Shorter term forward markets negotiating electricity with delivery horizon of weeks, or even one day in advance in the so-called dayahead markets are quite common. Market liquidity is typically higher as the contracting

Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming

http://dx.doi.org/10.5772/57466

103

Because of technical requirements of real time system balance, spot power markets are always centralized and run by entities in charge of the physical operation of the system to keep reliability and security. Equilibrium spot prices are computed each 5 min, 30 min or in hourly basis according to the realized power demand and the last bid accepted for keeping the system balance in real time. Rigorously, locational marginal prices are set in order to account for transmission constraints. Therefore, spot prices reflect the actual conditions of the system at

Though spot prices are subject to high uncertainty, liquidity of this market is warranted as the generator can always sell its production at real time prices. For this reason, the spot market is regarded a last-resort market. One advantage of participating only in this market is that unavailability of the generating unit does not have financial consequences for the generator

Let *pS* (*t*) the prevailing price at the *t*-th time interval in the spot market, *PS* (*t*) the power delivered by the generating unit and *P*max its generating capacity. Under the hypothesis the generator is price-taker and its marginal costs of generation, denoted by *MC*(*t*), are constant

> 0 if () () ( ) if () () <sup>ì</sup> <sup>&</sup>lt; <sup>=</sup> <sup>í</sup> <sup>&</sup>gt; <sup>î</sup> *S*

The operating profit *BS* (*t*) the generator obtains in the spot market by implementing this

This equation shows the operating flexibility of generator to alter its output in response to the spot price in order to avoid operating losses if prevailing spot prices drop below marginal

Figure 3 depicts the discontinuous nature of the profit function *BS* (*t*) when participating in the spot market, i.e. *BS* (*t*)≥0 for all prices. Indeed, this profit function can be assimilated to a call option with strike price *MC*. Figure 3 also schematically illustrates the probability density function (pdf) of the spot price of electricity *f* (*pS* ). This function is typically highly right-

( ) max ( ) ( ) ,0 max ( ) ( ) ,0 ( ) *<sup>s</sup> s max max <sup>s</sup> max B t p t P MC t P* = - =- é ù é ù *p t MC t P* ë û ë û (43)

*P p t MC t* (42)

*max S p t MC t P t*

horizon shortens.

the time of delivery.

costs.

other than the opportunity cost of the lost production.

with the rate of production, the optimal operating policy is:

*S*

optimal production policy can be written as:

skewed and presents strong leptokurtosis.

$$\mathbf{0} = \begin{bmatrix} \mathbf{B} \end{bmatrix}^{-1} \mathbf{C} \tag{41}$$

The final approximation is not as good as the one carried out by a single thread for the same amount of simulations. This is due to the fact that the decisions taken by the single thread algorithm have always all the information gathered up to the decision point while the multithread algorithm lacks the information gathered by other threads since the last synchro‐ nization. Nevertheless, with the correct choice of synchronization and simulation cycles, the overall optimization process can run much faster.
