**1. Introduction**

2013). FRB of Philadelphia Working Paper No. 13-23. Retrieved January 28, 2014, from

[8] Gorton GB. Information, liquidity, and the (ongoing) panic of 2007, American Economic

[9] Hong H, Huang J, Wu D. The information content of Basel III liquidity risk measures. Retrieved Monday, 2 December 2013, from: http://ssrn.com/abstract=2177614.

[10] Mukuddem-Petersen J, Petersen MA (2008). Optimizing Asset and Capital Adequacy Management in Banking, Journal of Optimization Theory and Applications 137(1),

[11] Petersen MA, De Waal B, Hlatshwayo LNP, Mukuddem-Petersen J (2013). A Note on

[12] Petersen MA, De Waal B, Mukuddem-Petersen J, Mulaudzi MP. Subprime mortgage funding and liquidity risk, *Quantitative Finance*, doi: 10.1080/14697688.2011.637076,

[13] Petersen MA, Mukuddem-Petersen J. Basel III Liquidity Regulation and Its Implications, Business Expert Press, McGraw-Hill, New York, 2014, ISBN: 978-1-60649-872-9 (print);

[14] Petersen MA, Senosi MC, Mukuddem-Petersen J. Subprime Mortgage Models, New

[15] Soner HM, Fleming WH. (2008). Controlled Markov Processes and Viscosity Solutions,

[16] van den End JW, Tabbae M. When liquidity risk becomes a systemic issue: Empirical evidence of bank behaviour, *Journal of Financial Stability*, doi:10.1016/j.jfs.2011.05.003,

[17] Wu D, Hong H. The information value of Basel III liquidity risk measures. Retrieved

Friday, 29 November 2013, from: http://ssrn.com/abstract=2177614.

http://ssrn.com/abstract=2269223 or http://dx.doi.org/10.2139/ssrn.2269223

Review, Papers and Proceedings, vol. 99, no. 2 (May 2009), 567-572.

90 Dynamic Programming and Bayesian Inference, Concepts and Applications

Basel III and Liquidity. Applied Economic Letters 20(8), 777–780.

205-230.

2012.

2012.

ISBN: 978-1-60649-873-6 (ebook).

York: Nova, ISBN: 978-1-61728-694-0, 2012.

2nd Edition, New York: Springer Verlag.

Since the mid-twentieth century, Dynamic Programming (DP) has proved to be a flexible and powerful approach to address optimal decisions problems. Nevertheless, a decisive drawback of the conventional DP is the need for exploring the whole state space in order to find the optimal solution. The immense amount of mathematical operations involved to solve real-scale problems, constrained the application of DP to small or highly simplified cases. Indeed, state space grows exponentially with the number of variables when considering multivariate optimization. The curse of dimensionality is a well-known limitation of conventional DP algorithms for tackling large-scale problems ubiquitous in real science and engineering applications.

In the last decades, many new algorithms emerged in different branches of science to overcome the inherent limitations of conventional DP. Unlike conventional DP, these algorithms avoid enumerating and calculating every possible state of a system during the optimization process. Instead, they estimate relevant features of the state space. This approach circumvents the dimensionality limitations of the conventional DP while retaining many of its advantages.

In this chapter, the application of advanced stochastic dynamic programming techniques to the optimization of the forward sell strategy of a power generator subjected to delivery risk is considered. The proposed approach allows rebalancing the portfolio during the period of analysis. In electricity markets, a power generator can sell in advance part or all its future energy production at a fixed price, hedging against the high price volatility of the spot market. The strategy of eliminating the price risk by selling in advance the entire production in the forward market to a fixed price is often thought as the minimum-risk trading policy. None‐ theless, it can be proven that this is not the case for most generators. The outages of the generation units and transmission lines, as well as unforeseen limitations in the primary energy

© 2014 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

supply expose generators to delivery risk [1]. Delivery risk considerably modifies the proba‐ bility distribution of profits, shifting the optimal trading strategy toward a portfolio mixing forward contracts and power sold in the spot market. Because of the size of the probability state space and the limited computing capabilities, the problem of the optimal trading strategy has not a closed form solution and thus, its determination is matter of current study. The increase in computing power and recent developments in Operational Research has brought new insights into the solution of such problems.

hold the information of the best solution until the end, ruling out suboptimal options. This rule prevents the exponential growth in the decision sequence path, scaling the problem only

Risk-Constrained Forward Trading Optimization by Stochastic Approximate Dynamic Programming

Yet, other reasons of explosive dimensionality growth remain, namely the number of states, as well as decision and outcome spaces. For small financial decision problems, conventional DP algorithms are able to find the optimal policy. For real-scale problems, gross simplifications are often necessary to keep tractability. Sometimes, these simplifications render the model

Finding an appropriate combination of financial instruments in a portfolio can be portrayed as a Markov Decision Problem (MDP). A MDP is defined as a succession of states that are reached through decisions (actions) followed by an outcome (reaction) of the system. After each decision, the system evolves into another state, according probabilities defined by the previous state and the decision taken. Each transition between states has a cost, usually dependent on the action taken, and a reward produced by the reaction of the system, as depicted in Figure 1. In these processes, a decision maker should choose a sequence of actions so that the expected sum of rewards minus the sum of incurred costs is maximized over a

> Figure 1. Markov Decision Process depiction According to the Bellman´s Principle of Optimality, for every MDP can be determined a series of Value Functions, which represents the continuation value of each state. The continuation value associated to a given state is the expected sum of rewards that the optimal strategy would yield from that state until the end of the process (or the expected average reward if the MDP is infinite).

ݐݏܥ െ ݀ݎܽݓܴ݁

݊݅ݐܽܿ

௧ାଵ݁ݐܽݐݏ ௧ାଵ݁ݑ݈ܽݒ

http://dx.doi.org/10.5772/57466

93

It is easy to see how the value functions of a MDP can be found using a classic backwards DP algorithm. Starting from the final states, a DP algorithm exhaustively calculates the continuation value for a discrete number of states. All these continuation values, collected in a lookup table, constitute later the Value Functions which are accurate but not very compact or easy to calculate. After acquiring the Value Functions, it is simple to find an optimal decision for each state as the one that maximizes the sum of the expected reward and the expected continuation value of the next state or states. However, the problems that a DP algorithm can address by this procedure are restricted by

According to the Bellman´s Principle of Optimality, for every MDP can be determined a series of Value Functions, which represents the continuation value of each state. The continuation value associated to a given state is the expected sum of rewards that the optimal strategy would yield from that state until the end of the process (or the expected average reward if the MDP

> Other forms to represent and/or approximate the Value Functions can then be proposed. These approaches should not require exhaustive calculation of every state. The Value Functions can be interpolated between computed states. Approximate Dynamic Programming algorithms are built on this cornerstone: approximation of the Value Functions in the state space domain. The estimation methods can be linear regressions, artificial neural networks, etc. Several authors make detailed

It is easy to see how the value functions of a MDP can be found using a classic backwards DP algorithm. Starting from the final states, a DP algorithm exhaustively calculates the continu‐ ation value for a discrete number of states. All these continuation values, collected in a lookup table, constitute later the Value Functions which are accurate but not very compact or easy to calculate. After acquiring the Value Functions, it is simple to find an optimal decision for each state as the one that maximizes the sum of the expected reward and the expected continuation value of the next state or states. However, the problems that a DP algorithm can address by

For approximating and updating the Value Functions, the proposed algorithm uses linear regression on Gaussian radial basis functions jointly with Monte Carlo simulations to consider randomness. An

The ADP algorithm starts with a series of approximations of the value functions, usually constant. Then taking a Monte Carlo sample, a simulation of the system is conducted. At each decision stage, the algorithm makes a decision that is optimal regarding to the current state and the available approximations. Finally, after each Monte Carlo simulation, decisions and outcomes are used to refine the estimation of the Value Functions and a complementary Risk Functions, denoted by ܸ௧ and ܴ௧ respectively. The process continues iteratively, until a certain termination criterion is fulfilled. A

analysis of MDPs and the use of ADP and DP algorithms to solve them [2],[3].

interior-point optimization algorithm is implemented to make decisions.

simple diagram of this approach is illustrated inFigure 2.

this procedure are restricted by the size of their state spaces.

3

linearly.

unrealistic turning results meaningless.

the size of their state spaces.

**Figure 1.** Markov Decision Process depiction

is infinite).

௧݁ݐܽݐݏ ௧݁ݑ݈ܽݒ

certain period of time.

In the past decade and by virtue of the ever increasing computational power, many methods emerged in different scientific fields with several different names: Reinforced Learning, Q-Learning, Neuro-Dynamic Programming, etc. All these methods were later brought together in what is currently known as Approximated Dynamic Programming (ADP) [2],[3]. These algorithms resign the exhaustive enumeration and calculation of the space-state typically performed by conventional DP. Instead, they iteratively approximate a function of the space state through stochastic simulation and statistical regression techniques, circumventing the dimensionality problem of DP.

Although ADP algorithms are being used in several other fields of science, the application to design optimal trading strategies in power markets has not been proposed so far. In this chapter, ADP techniques are exploited to optimize the selling strategy of a power generator trading in a frictional market with transaction costs. Three available products are considered: selling in the spot market, and/or get involved in quarterly and one-year forward contracts. The objective of the generator is to maximize the expected profit while limiting financial risk. Decisions can be made only at the beginning of each month. At each decision stage, the current trading position can be changed at a cost in order to rebalance the portfolio.
