**4.4. Risk constraint formulation**

As already mentioned before, the financial decision process can be modeled by means of a MDP. Naming profit *B <sup>w</sup>* for each sample the sum of income, cost and transaction cost over all instruments, the objective function (56) becomes:

$$\max\_{\mathbf{x}} \left[ \mathbb{E} \left[ \sum\_{t} \left[ \boldsymbol{\mathcal{B}}^{\mathrm{w}} \left( \mathbf{x}\_{t}, \mathbf{x}\_{t-1} \right) \right] \right] \right] \tag{70}$$

Exchanging the order of the expectation operator and the summation, and expanding the summation:

$$\max\_{\mathbf{x}} \left[ \mathbb{E} \left[ \mathcal{B}\_1^{\mathbf{w}} \left( \mathbf{x}\_1, \mathbf{x}\_0 \right) \right] + \sum\_{t=2}^n \mathbb{E} \left[ \mathcal{B}\_t^{\mathbf{w}} \left( \mathbf{x}\_t, \mathbf{x}\_{t-1} \right) \right] \right] \tag{71}$$

where *n* is the total number of time periods considered.

At this point and considering the first term depends only on the initial market position *x*0 and the first strategy decision *x*1, the maximization can be decomposed using the Bellman´s Principle of optimality as follows:

Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming http://dx.doi.org/10.5772/57466 115

$$\max\_{\mathbf{x}\_{l}} \left[ \mathsf{E} \left[ \boldsymbol{\mathcal{B}}\_{l}^{\mathrm{w}} \left( \mathbf{x}\_{l}, \mathbf{x}\_{0} \right) \right] + \max\_{\mathbf{x}\_{t:2...n}} \sum\_{t=2}^{n} \mathsf{E} \left[ \boldsymbol{\mathcal{B}}\_{t}^{\mathrm{w}} \left( \mathbf{x}\_{t}, \mathbf{x}\_{t-1} \right) \right] \right] \tag{72}$$

If *x* is independent of the Monte Carlo sample, the terms inside the summation over all future periods (*t* =2…*n*) are simply the expected profit for the period *t* after a decision *xt*−1→ *xt*. However, the model should take into account that future strategy decisions may be different for each Monte Carlo sample, accounting for adjustments the decision-maker almost certainly would execute to face specific scenarios. Then, the expected profit *Bt* ¯ and the decision itself for future decision stages will depend on a set of variables *st*, which represent the variables considered by the decision maker in order to adapt the strategy to a particular situation and the equation (72) becomes:

$$\max\_{\mathbf{x}\_{l}} \left[ \overline{B\_{l}} \left( \mathbf{x}\_{l}, \mathbf{x}\_{0}, s\_{l} \right) + \max\_{\mathbf{x}\_{r=2...n}} \sum\_{t=2}^{n} \overline{B\_{t}} \left( \mathbf{x}\_{t}, \mathbf{x}\_{t-1}, s\_{t} \right) \right] \tag{73}$$

Defining the continuation value functions *Vt* as:

is below the prevailing spot prices, following the decision model of equation (42). Variable costs of generation are assumed linear with power output, i.e. marginal costs are constant.

The operation-failure cycles of the generating unit are obtained from a chronological Marko‐ vian stochastic simulation. For each spot price sample, a time series of power output is synthesized for every generation unit. The hourly power output is simulated o following three

**1.** Based on failure and repair rates defined by the state the unit resided in the previous hour, a random failure is simulated [12]. If a failure is in place, the output power is set to zero

**2.** The dispatch of the unit is simulated, taking into account the marginal cost of generation and the prevailing sample spot price at that time interval. Here perfect foresight of the spot price is assumed in order to decide the dispatch and fulfill the minimal generation

**3.** If dispatched, other unit's technical restrictions are fulfilled, e.g. ramping capabilities.

depending on whether the unit is generating or is in stand-by.

114 Dynamic Programming and Bayesian Inference, Concepts and Applications

instruments, the objective function (56) becomes:

where *n* is the total number of time periods considered.

Principle of optimality as follows:

This chronological stochastic model reproduces with accuracy the dynamics involved in failure and repair cycles of generators, giving the possibility to select different failure rates

As already mentioned before, the financial decision process can be modeled by means of a MDP. Naming profit *B <sup>w</sup>* for each sample the sum of income, cost and transaction cost over all

Exchanging the order of the expectation operator and the summation, and expanding the

1 10 ( ) ( ) <sup>1</sup> 2

At this point and considering the first term depends only on the initial market position *x*0 and the first strategy decision *x*1, the maximization can be decomposed using the Bellman´s

=

é ù ê ú é ùé ù <sup>+</sup> ë ûë û ë û å *n*

E *B xx* (70)

E E *B x x B x x* (71)

max ( ) <sup>1</sup> , é ù é ù ê ú ê ú é ù ë û ë û ë û å *<sup>w</sup> t t <sup>x</sup> <sup>t</sup>*

max , , -

*w wt tt <sup>x</sup> <sup>t</sup>*

steps:

times.

summation:

for this particular hour.

**4.4. Risk constraint formulation**

$$V\_t\left(\mathbf{x}\_t, \mathbf{x}\_{t-1}, \mathbf{s}\_t\right) = \overline{B\_t}\left(\mathbf{x}\_t, \mathbf{x}\_{t-1}, \mathbf{s}\_t\right) + \max\_{\mathbf{x}\_{t+1}} \left[V\_{t+1}\left(\mathbf{x}\_t, \mathbf{x}\_{t+1}, \mathbf{s}\_{t+1}\right)\right] \tag{74}$$

The maximization can be solved by a set of recursive maximizations, each one solving only one decision stage:

$$\max\_{\mathbf{x}\_t} \left[ \ \ \ \mathbf{V}\_t \left( \mathbf{x}\_t, \mathbf{x}\_{t-1}, \mathbf{s}\_t \right) \right] \tag{75}$$

With this model, the optimization can be decomposed in steps and the dynamic nature of a strategy can be accurately replicated. Despite the fact future trading decisions *xt*=2…*n* are considered and optimized, the practical product of this procedure is the new optimal reba‐ lanced state *xt*=1 starting from the previous trading position *xt*=0. The further trading positions are only optimal given the current information available and should be reconsidered later. Therefore, each new trading position (*xt*=2, *xt*=3, …, *xt*=*n*) should be the product of a similar optimization incorporating the additional market information available immediately before.

The value functions provide the expected continuation value within a state space defined by the state variables, *xt*, *xt*−1 and *st* 3 . However, the continuation functions, which are essential to solve the optimization problem, are unknown beforehand. It is here that the ADP approach is introduced to approximate the value and risk functions for the state space.

<sup>3</sup> There are several other decomposition methods, some of which exclude the decision as state space variable defining the value functions in a post-decision state space. These approaches make the step maximization sometimes harder but present advantages such as a state space of fewer dimensions. See for example [2].

The set of constraints remains the same as they were already defined for each period *t*. Nevertheless, special considerations should be made to calculate de risk and to fulfill the riskconstrained optimization. As *Vt* only account for the expected profit, risk functions *Rt* should also similarly be calculated for the same state space in order to enforce the risk constraint.

be enforced at the early iterations of the ADP algorithm, simply because the sample size of the

Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming

http://dx.doi.org/10.5772/57466

117

With the objective of validating the results of the proposed ADP algorithm, a first simple exemplary case is considered in which a thermal generator sells energy in the spot market and in a future contract. The results of the ADP algorithm were compared with the results of a

The fractions of energy sold in the spot market and in a quarter future contract were optimized considering that the future can be traded during the delivery period. The optimization determines three decision stages during this period, one at the beginning of each month, consisting on sell or buy energy in the future market based on the previous state. The spacestate previous to each decision is defined only by the level of future already sold, in order to

The results obtained by solving the problem by means of the ADP and DP algorithms are presented in Figure 7. The plots represent the expected profit and the downside risk measured by the CVaR of the optimal strategy as a function of the initial state, i.e. the energy already committed in the future contract at the initial stage. An excellent agreement between the optimal strategies obtained by ADP and DP is evidenced, validating the proposed approach.

It can be noticed that the expected profit rises as the amount of energy sold forward increases. This is caused by the risk premium paid to the generator in the future market, i.e. the mean future prices are higher than the mean spot prices. Additionally, the transaction costs are not compensated by the risk premium; therefore the best trading strategy is to maintain similar involvement in the future market to the initial level without rebalancing the portfolio. This is illustrated in Figure 8, where the optimal decisions for the first month are practically the same

It is noteworthy to observe in Figure 7 that financial risk lessens when forward contracting in the future market increases. This means that for the conventional generator considered, which present a high availability, the delivery risk is lower than the risk of not being dispatched in the spot market. The behavior of the risk curve is closely related to the unit's failure and reparation rates and to the marginal production costs. Generators with low marginal costs are in the first places of the dispatch merit order, and hence the risk of not being dispatched is low. Moreover, high failure rates imply also a higher delivery risk. Out of these relations arise a broad number of risk curves that differ from one generator to another and suggest that considerable risk mitigation by aggregating different generators in a portfolio is possible.

when solving with a conventional DP and an ADP approach.

conventional DP algorithm for which the space-state was discretized appropriately.

dataset is small to get a consistent and statistically converged risk value.

**5. Numerical case study**

keep tractable the DP problem.

**5.2. Validation results**

**5.1. Algorithm validation**

By definition, the linear regression will deliver an approximation that minimizes the mean square error on the entire dataset. In a stochastic setting where the same inputs leads to several different outputs, the regression will accurately estimate the expected value for a given set of inputs, provided the sample size and the approximation order are appropriate. This fits perfect for the case of the value function but for the approximation of the risk function some problems arise.

Let suppose that the CVaR is chosen as risk metric. As the algorithm progresses, new data points are collected, i.e. a set of state variables and its corresponding simulated profits for the period. For approximating the CVaR associated to a particular point in the state space, one approach is to select a subset from the dataset whose input variables are "close" to the point. Then the CVaR is calculated first by sorting the profit values and then taking the mean of those below the specified α-quantile. Now, let suppose that a new data point is simulated and an update of the CVaR approximation is needed. To do so, the process described must be repeated, but now including the new data point. This simple approach has large disadvantages: all the data points must be stored and the mean is not easily updated as old data may be excluded or included of the zone below the α-quantile. These drawbacks are caused by the fact that the CVaR is quantile-based. To solve these difficulties, another solution is envisioned. Instead of approximating directly the CVaR, another risk measure is used to approximate the CVaR within the space state. The risk measure used is called Relative Lower Semideviation (RLS) and it is moment based instead. Hence, it can be updated more easily and without needing to store the entire dataset. These types of risk measures are described in detail in [13-16] and for a stochastic profit *Pt* have the form:

$$RLS\left(B\_t\right) = -\mathbb{E}\left[\left.B\_t\right] + a \cdot \sigma\_p\left^-\left[\left.B\_t\right]\right\|\right.\tag{76}$$

$$\sigma\_p \, ^- \begin{bmatrix} B\_t \end{bmatrix} = \left( \mathbb{E} \left[ \max \left( \mathbb{E} \left[ B\_t \right] - B\_t, 0 \right)^p \right] \right) \bigvee\_p \tag{77}$$

where the equation (77) is the negative semideviation of degree *p* of the stochastic profit *Pt*.

It can be proven that these moment-based risk measures are coherent if 0≤*a* ≤1 and *p* ≥1. To approximate a CVaR with a 5%-quantile, in this work the parameters used are *a* =1 and *p* =9.5. To compute the approximation, two linear regressions were used to calculate the expectations on the profit and on the negative deviations, which can be updated using the same method proposed for the value functions. In a Monte Carlo scheme, a large amount of data is needed to compute reliable risk estimations. Therefore, the risk constraint should not be enforced at the early iterations of the ADP algorithm, simply because the sample size of the dataset is small to get a consistent and statistically converged risk value.
