2. High-voltage power grid optimization models

### 2.1. Overview

operational challenges, including uncertainty of supply availability, distributed storage management, real-time coordination of distributed energy resources, and changing directions of flow in distribution networks. These challenges demand a shift of the traditional centralized power system operations paradigm toward the smart grid paradigm [1], where distributed computing and control stand out as a promising technology with the potential of achieving

The academic literature includes various applications of distributed computing in power system operations, including long- and mid-term planning, short-term scheduling, state estimation and monitoring, real-time control, and simulation [2–5]. Early studies pointed out several challenges related to communications and the heterogeneous characteristics of distributed computing systems, which needed to be addressed first in order to implement distributed computing applications. Nowadays, standard communication protocols are a mature technology and most current distributed computing resources can perform a broad range of operations. Such advances in distributed computing technology have paved the way for developing and implementing scal-

The prevailing industry view, as we move forward into the future smart grid, is that it will entail: (i) broadcasting of dynamic prices or other information and (ii) telemetry backhaul to market participants. In the proposed model, distributed energy resource aggregators are often regarded as transaction brokers between end customers and various upstream market participants. The "failure-free market" design for a pure market-driven solution under this paradigm has been elusive, despite decades of research and development. In this chapter, we analyze the deployment of distributed computing as an enabling tool for managing the short-

• At the level of the high-voltage grid, we centrally optimize operations using a stochastic unit commitment (SUC) model, which endogenously allocates reserve capacity by explicitly modeling uncertainty. Specifically, we present an asynchronous distributed algorithm for solving SUC, which extends the asynchronous algorithm proposed in Ref. [6] in three aspects: (i) we propose a hybrid approach for modeling quarterly dispatch decisions alongside hourly commitment decisions; (ii) we introduce a stepsize scaling on the iterative method to diminish the error due to asynchronous execution; and (iii) we propose two methods for a faster initialization of the algorithm. The asynchronous algorithm is implemented in a high-performance computing (HPC) cluster and benchmarked against a deterministic unit commitment model with exogenous reserve targets (DUCR). We find that distributed computing allows solving SUC within the same time frame required for

• At the level of the distribution grid, we rely on stochastic distributed control to manage consumer devices using the ColorPower architecture [7–9], which enables harnessing flexible residential demand response while respecting the desire of consumers for control, privacy, and simplicity. The ColorPower control approach is inspired by the very automatic cooperative protocols that govern Internet communications. These protocols represent a distributed and federated control paradigm, in which information and decision-making

authority remain local, yet global system stability is ensured.

operations with optimal performance.

22 Recent Progress in Parallel and Distributed Computing

able distributed algorithms for power systems.

term operations of smart grids in two levels:

solving DUCR.

Operations of the high-voltage power grid are typically scheduled in two stages: (i) day-ahead scheduling, where operations are planned based on forecast conditions for the system and the on/off status of slow generators is fixed and (ii) real-time scheduling, where system operators balance the system for the actual conditions using the available flexibility in the system. Models for short-term scheduling are solved on a daily basis, and they occupy a central role in clearing power markets and operating power systems.

Until recently, power system operators have relied on deterministic short-term scheduling models with reserve margins to secure the system against load forecast errors and outages [11–14]. The integration of renewable energy sources has placed these practices under question because they ignore the inherent uncertainty of renewable energy supply, thereby motivating system operators and researchers to look for systematic methods to address uncertainty in real-time operations. A consistent methodology for mitigating the impacts of renewable energy uncertainty—and operational uncertainty in general—is stochastic programming. Stochastic models for short-term scheduling (i.e., SUC models) were originally considered in the seminal work of Takriti et al. [15] and Carpentier et al. [16], as an approach for mitigating demand uncertainty and generator outages. Subsequently, numerous variants of the SUC model have been proposed, which differ on the number of stages, the source of uncertainty, the representation of uncertainty, and the solution methods that are used. See Ref. [17] and references therein for a recent survey.

In the present work, we use the deterministic and stochastic unit commitment models for dayahead scheduling presented in Sections 3.1 and 3.2. The proposed models differ from previously proposed models in the literature in which they use hybrid time resolution: hourly commitment decisions (u, v, w and z) and 15-min dispatch decisions (p, r and f). This formulation allows modeling subhourly phenomena, which have been shown to be important for the operation of systems with significant levels of renewable energy integration [18].

#### 2.2. Deterministic unit commitment with exogenous reserve targets

Using the notation provided in the beginning of the section, we model deterministic unit commitment with reserves (DUCR) as the minimization problem Eqs. (1)–(9).

$$\min\_{\mathbf{y}\_{\mathcal{Y}}, \mathbf{y}\_{\mathcal{t}}, \mathbf{u}\_{\mathcal{t}}, \mathbf{v}\_{\mathcal{t}}} \sum\_{\mathbf{g} \in G} \left( \sum\_{\mathbf{r} \in T\_{\mathcal{t}0}} (\mathbf{K}\_{\mathcal{S}} \boldsymbol{\mu}\_{\mathcal{S}, \mathbf{r}} + \mathbf{S}\_{\mathcal{S}} \boldsymbol{v}\_{\mathcal{S}, \mathbf{r}}) + \sum\_{t \in T\_{\mathcal{t}\mathcal{S}}} \mathbf{C}\_{\mathcal{S}}(\boldsymbol{p}\_{\mathcal{g}, t}) \right) \tag{1}$$

$$\text{s.t. } \sum\_{\mathbf{g} \in G(n)} p\_{\mathbf{g},t} + \sum\_{l \in L(\mathbf{\bar{\boldsymbol{\eta}}},n)} f\_{l,t} + \overline{\xi}\_{n,t} \ge D\_{n,t} + \sum\_{l \in L(n\_{\boldsymbol{\eta}}\cdot)} f\_{l,t} \,\forall n \in \mathcal{N}, t \in T\_{15} \tag{2}$$

$$\sum\_{g \in G(a)} r\_{g,t}^2 \ge \mathcal{R}\_{a'}^2 \sum\_{g \in G(\mathcal{N}(a))} (r\_{g,t}^2 + r\_{g,t}^3) \ge \mathcal{R}\_a^2 + \mathcal{R}\_a^3, \forall a \in A, t \in T\_{15} \tag{3}$$

$$\mathcal{G}\_{l,t} = \mathcal{B}\_l \left( \Theta\_{n(l),t} - \Theta\_{m(l),t} \right)\_{\prime} - F\_l^- \le f\_{l,t} \le F\_l^+ \; \forall l \in L, t \in T\_{15} \tag{4}$$

$$P\_{\mathcal{g}}^{-}u\_{\mathcal{g},\tau(t)} \le p\_{\mathcal{g},t'} \ p\_{\mathcal{g},t} + r\_{\mathcal{g},t}^2 + r\_{\mathcal{g},t}^3 \le P\_{\mathcal{g}}^+ u\_{\mathcal{g},\tau(t)} \,\forall \mathbf{g} \in \mathbf{G}\_{\text{SLlow}}, t \in T\_{15} \tag{5}$$

$$P\_{\mathcal{g}}^{-}u\_{\mathcal{g},\pi(t)} \leq p\_{\mathcal{g},t'} \ p\_{\mathcal{g},t} + r\_{\mathcal{g},t}^2 \leq P\_{\mathcal{g}}^{+}u\_{\mathcal{g},\pi(t)\prime} \ p\_{\mathcal{g},t} + r\_{\mathcal{g},t}^2 + r\_{\mathcal{g},t}^3 \leq P\_{\mathcal{g}}^{+} \ \forall \mathcal{g} \in G \backslash G\_{\text{SLlow}}, t \in T\_{15} \tag{6}$$

$$-TL\_{\mathfrak{g}} + \left(TL\_{\mathfrak{g}} - R\_{\mathfrak{g}}^{-}\right)u\_{\mathfrak{g},\tau(t)} \le p\_{\mathfrak{g},t} - p\_{\mathfrak{g},t-1} \,\forall \mathbf{g} \in \mathbf{G}, t \in T\_{15} \tag{7}$$

$$\begin{aligned} p\_{\mathbf{g},t} + \frac{15}{\Delta T\_2} r\_{\mathbf{g},t}^2 - p\_{\mathbf{g},t-1} &\le \text{TL}\_{\mathbf{g}} - (\text{TL}\_{\mathbf{g}} - \text{R}\_{\mathbf{g}}^+) \ u\_{\mathbf{g},\tau(t-1)}, \\ p\_{\mathbf{g},t} + \frac{15}{\Delta T\_3} (r\_{\mathbf{g},t}^2 + r\_{\mathbf{g},t}^3) - p\_{\mathbf{g},t-1} &\le \text{TL}\_{\mathbf{g}} - (\text{TL}\_{\mathbf{g}} - \text{R}\_{\mathbf{g}}^+) \ u\_{\mathbf{g},\tau(t-1)} \; \forall \mathbf{g} \in \mathbf{G}, t \in T\_{15} \end{aligned} \tag{8}$$

$$
\mu\_{\mathbb{g},\tau} \in \{0,1\}, v\_{\mathbb{g},\tau} \in \{0,1\} \; \forall \mathbf{g} \in G, \tau \in T\_{60} \tag{9}
$$

The objective function Eq. (1) corresponds to the total operating cost, composed by the no-load cost, the startup cost, and the production cost. Constraints Eq. (2) enforce nodal power balance, while allowing for production shedding. Demand shedding can be included in the present formulation as having a very expensive generator connected to each bus. Eq. (3) enforces the reserve margins on each area of the system, allowing for reserve cascading (secondary reserve capacity can be used to provide tertiary reserve). Eq. (4) models DC power flow constraints in terms of bus angles and thermal limits of transmission lines.

The feasible production set of thermal generators is described by Eqs. (5)–(9). Production and reserve provision limits are expressed as Eq. (5) for slow generators, that can provide reserves only when they are online, and as Eq. (6) for the remaining set of generators, which can provide secondary reserves when they are online and tertiary reserves both when they are online and offline. Ramp rate constraints Eqs. (7)–(8) are based on the formulation provided by Frangioni et al. [19]. Ramp-up rate constraints Eq. (8) enforce, in addition to the ramp-up rate limit on production, that there is enough ramping capability between periods <sup>t</sup> � 1 and <sup>t</sup> to ramp-up <sup>r</sup><sup>2</sup> g,t MW within ΔT<sup>2</sup> minutes (which can be used to provide secondary reserve), and to ramp-up r2 g,t <sup>þ</sup> <sup>r</sup><sup>3</sup> g,t MW within ΔT<sup>3</sup> minutes (which can be used to provide tertiary reserve). Constraints Eq. (9) enforce minimum up and down times, as proposed by Rajan and Takriti [20].

Boundary conditions of the problem are modeled by allowing the time indices to cycle within the horizon, in other words, for any commitment variable x�, <sup>τ</sup> with τ < 1, we define x�, <sup>τ</sup> :¼ x�,ððτ�1<sup>Þ</sup> mod <sup>j</sup>T60jþ1Þ. Similarly, for any dispatch variable x�,t with t < 1 or t > jT15j, we define x�,t :¼ x�,ððt�1<sup>Þ</sup> mod <sup>j</sup>T15jþ1Þ. In this fashion, we model initial conditions (τ < 1, t < 1) and restrain end effects of the model (τ ¼ jT60j, t ¼ jT15j), simultaneously. In practical cases, initial conditions are given by the current operating conditions and end effects are dealt with by using an extended look-ahead horizon.

#### 2.3. Two-stage stochastic unit commitment and scenario decomposition

2.2. Deterministic unit commitment with exogenous reserve targets

X g∈ G

pg,t <sup>þ</sup> <sup>X</sup> <sup>l</sup><sup>∈</sup> <sup>L</sup>ð�, <sup>n</sup><sup>Þ</sup>

�

�TLg þ ðTLg � R�

15 ΔT<sup>2</sup> r 2

terms of bus angles and thermal limits of transmission lines.

min p,r, u, v, <sup>f</sup>

s:t: X g∈ GðnÞ

24 Recent Progress in Parallel and Distributed Computing

X g∈ GðaÞ r 2 g,t ≥ R<sup>2</sup>

P�

15 ΔT<sup>3</sup> ðr 2 g,t þ r 3 g,t

<sup>g</sup> ug, <sup>τ</sup>ðt<sup>Þ</sup> ≤ pg,t

pg,t þ

P�

r2 g,t <sup>þ</sup> <sup>r</sup><sup>3</sup> f l,t ¼ Bl

<sup>g</sup> ug,<sup>τ</sup>ðt<sup>Þ</sup> ≤ pg,t

, pg,t þ r 2 g,t ≤ P<sup>þ</sup>

pg,t þ

commitment with reserves (DUCR) as the minimization problem Eqs. (1)–(9).

� X τ∈ T<sup>60</sup>

<sup>a</sup> , <sup>X</sup> g∈ GðNðaÞÞ

θ<sup>n</sup>ðlÞ,t � θ<sup>m</sup>ðlÞ,t

, pg,t þ r 2 g,t þ r 3 g,t ≤ P<sup>þ</sup>

ðr 2 g,t þ r 3 g,t<sup>Þ</sup> <sup>≥</sup> <sup>R</sup><sup>2</sup>

> � , � F�

<sup>g</sup> ug, <sup>τ</sup>ðtÞ, pg,t þ r

Using the notation provided in the beginning of the section, we model deterministic unit

<sup>ð</sup>Kgug, <sup>τ</sup> <sup>þ</sup> Sgvg, <sup>τ</sup>Þ þ <sup>X</sup>

<sup>f</sup> l,t <sup>þ</sup> <sup>ξ</sup>n,t <sup>≥</sup> Dn,t <sup>þ</sup> <sup>X</sup>

t∈ T<sup>15</sup>

<sup>l</sup><sup>∈</sup> <sup>L</sup>ðn, �Þ

<sup>a</sup> <sup>þ</sup> <sup>R</sup><sup>3</sup>

<sup>l</sup> ≤ f l,t ≤ F<sup>þ</sup>

2 g,t þ r 3 g,t ≤ P<sup>þ</sup>

g,t � pg,t�<sup>1</sup> <sup>≤</sup> TLg � ðTLg � <sup>R</sup><sup>þ</sup>

Þ � pg,t�<sup>1</sup> <sup>≤</sup> TLg � ðTLg � <sup>R</sup><sup>þ</sup>

The objective function Eq. (1) corresponds to the total operating cost, composed by the no-load cost, the startup cost, and the production cost. Constraints Eq. (2) enforce nodal power balance, while allowing for production shedding. Demand shedding can be included in the present formulation as having a very expensive generator connected to each bus. Eq. (3) enforces the reserve margins on each area of the system, allowing for reserve cascading (secondary reserve capacity can be used to provide tertiary reserve). Eq. (4) models DC power flow constraints in

The feasible production set of thermal generators is described by Eqs. (5)–(9). Production and reserve provision limits are expressed as Eq. (5) for slow generators, that can provide reserves only when they are online, and as Eq. (6) for the remaining set of generators, which can provide secondary reserves when they are online and tertiary reserves both when they are online and offline. Ramp rate constraints Eqs. (7)–(8) are based on the formulation provided by Frangioni et al. [19]. Ramp-up rate constraints Eq. (8) enforce, in addition to the ramp-up rate limit on production, that there is enough ramping capability between periods <sup>t</sup> � 1 and <sup>t</sup> to ramp-up <sup>r</sup><sup>2</sup>

MW within ΔT<sup>2</sup> minutes (which can be used to provide secondary reserve), and to ramp-up

Eq. (9) enforce minimum up and down times, as proposed by Rajan and Takriti [20].

g,t MW within ΔT<sup>3</sup> minutes (which can be used to provide tertiary reserve). Constraints

Cgðpg,t Þ �

f l,t ∀n∈ N, t∈T<sup>15</sup> ð2Þ

<sup>a</sup> ∀a∈ A, t ∈T<sup>15</sup> ð3Þ

<sup>l</sup> ∀l∈L, t ∈T<sup>15</sup> ð4Þ

<sup>g</sup> ∀g ∈ G\GSLOW , t ∈T<sup>15</sup> ð6Þ

<sup>g</sup> ug,<sup>τ</sup>ðt<sup>Þ</sup> ∀g∈ GSLOW , t ∈T<sup>15</sup> ð5Þ

<sup>g</sup> <sup>Þ</sup> ug,<sup>τ</sup>ðt<sup>Þ</sup> <sup>≤</sup> pg,t � pg,t�<sup>1</sup> <sup>∀</sup><sup>g</sup> <sup>∈</sup> G, t<sup>∈</sup> <sup>T</sup><sup>15</sup> <sup>ð</sup>7<sup>Þ</sup>

<sup>g</sup> Þ ug, <sup>τ</sup>ðt�1Þ,

ug,<sup>τ</sup> ∈f0, 1g, vg,<sup>τ</sup> ∈f0, 1g ∀g∈ G, τ∈ T<sup>60</sup> ð9Þ

<sup>g</sup> Þ ug,<sup>τ</sup>ðt�1<sup>Þ</sup> ∀g∈ G, t ∈T<sup>15</sup>

ð1Þ

ð8Þ

g,t

Following Papavasiliou et al. [21], we formulate SUC as the two-stage stochastic program of Eqs. (10)–(17).

$$\min\_{\substack{p,\,\mu,\,\text{w},\,\text{f}\\s\equiv\text{S}}} \sum\_{s\in\text{S}} \pi\_{s} \sum\_{\text{g}\in\text{G}} \left( \sum\_{\pi\in T\_{\text{f}0}} (\mathcal{K}\_{\text{g}} u\_{\text{g},s,\,\pi} + \mathcal{S}\_{\text{g}} v\_{\text{g},s,\,\pi}) + \sum\_{t\in\text{T}\_{\text{f}5}} \mathcal{C}\_{\text{g}}(p\_{\text{g},s,t}) \right) \tag{10}$$

$$\text{s.t.} \sum\_{\mathcal{g} \in G(\mathfrak{a})} p\_{\mathcal{g},s,t} + \sum\_{l \in L(\cdot,\mathfrak{a})} f\_{l,s,t} + \xi\_{\mathfrak{a},s,l} \ge D\_{\mathfrak{a},t} + \sum\_{l \in L(\mathfrak{a}\_{\mathfrak{r}} \cdot)} f\_{l,s,t} \,\forall \mathfrak{a} \in \mathcal{N}, \,\,\mathfrak{s} \in \mathcal{S}, t \in T\_{15} \tag{11}$$

$$f\_{l,s,t} = \mathcal{B}\_l \left( \Theta\_{\mathbf{n}(l),s,t} - \Theta\_{\mathbf{m}(l),s,t} \right)\_{\prime} - F\_l^- \le f\_{l,s,t} \le F\_l^+ \text{ } \forall l \in L, s \in \mathbb{S}, t \in T\_{15} \tag{12}$$

$$P\_{\mathcal{g}}^{-}\mu\_{\mathcal{g},s,\,\tau(t)} \leq p\_{\mathcal{g},s,t} \leq P\_{\mathcal{g}}^{+}\mu\_{\mathcal{g},s,\,\tau(t)} \; \forall \mathcal{g} \in \mathcal{G}, \; s \in \mathcal{S}, t \in T\_{15} \tag{13}$$

$$\begin{aligned} & -TL\_{\mathcal{S}} + (TL\_{\mathcal{S}} - R\_{\mathcal{S}}^{-}) \; \boldsymbol{\mu}\_{\mathcal{S}, \boldsymbol{\tau}(t)} \leq & p\_{\mathcal{S}, \boldsymbol{s}, t} - p\_{\mathcal{S}, \boldsymbol{s}, t-1} \leq \\ & \mathbf{T} \; \boldsymbol{\mu} \quad (\mathbf{T}\boldsymbol{1} \; \quad \boldsymbol{\sigma}^{\perp}) \; \boldsymbol{\dots} \qquad \qquad \boldsymbol{\omega}\_{\mathcal{T}, \boldsymbol{\sigma}^{\perp}, \boldsymbol{\tau}^{\perp}, \boldsymbol{\tau}^{\perp}, t-\tau} \; \boldsymbol{\epsilon} \end{aligned} \tag{14}$$

$$TL\_{\mathfrak{g}} - (TL\_{\mathfrak{g}} - R\_{\mathfrak{g}}^{+}) \ u\_{\mathfrak{g}, \tau(t-1)} \ \forall \mathfrak{g} \in G, s \in \mathcal{S}, t \in T\_{15}$$

$$
\boldsymbol{\upsilon}\_{\mathcal{G},s,\boldsymbol{\tau}} \ge \boldsymbol{\mu}\_{\mathcal{G},s,\boldsymbol{\tau}} - \boldsymbol{\mu}\_{\mathcal{G},s,\boldsymbol{\tau}} \quad \sum\_{\boldsymbol{\sigma}=\boldsymbol{\tau}-\boldsymbol{l}\boldsymbol{l}\boldsymbol{\tau}\_{\mathcal{S}}+1}^{\boldsymbol{\tau}} \boldsymbol{\upsilon}\_{\mathcal{G},s,\boldsymbol{\sigma}} \le \boldsymbol{\mu}\_{\mathcal{G},s,\boldsymbol{\tau}\prime} \quad \sum\_{\boldsymbol{\sigma}=\boldsymbol{\tau}-\boldsymbol{D}\boldsymbol{\Gamma}\_{\mathcal{S}}+1}^{\boldsymbol{\tau}} \boldsymbol{\upsilon}\_{\mathcal{G},s,\boldsymbol{\sigma}} \le 1 - \boldsymbol{\mu}\_{\mathcal{G},s,\boldsymbol{\tau}-\boldsymbol{D}\boldsymbol{\Gamma}\_{\mathcal{G}'}}\tag{15}
$$

$$\mu\_{\emptyset,s,\tau} \in \{0,1\}, \ v\_{\emptyset,s,\tau} \in \{0,1\} \text{ } \forall \emptyset \in \mathcal{G}, \ s \in \mathcal{S}, \tau \in T\_{60}$$

$$
\pi\_{\mathfrak{s}}\mu\_{\mathfrak{g},\mathfrak{s},\mathfrak{r}} = \pi\_{\mathfrak{s}}w\_{\mathfrak{g},\mathfrak{r}} \to \mu\_{\mathfrak{g},\mathfrak{s},\mathfrak{r}'}\pi\_{\mathfrak{s}}v\_{\mathfrak{g},\mathfrak{s},\mathfrak{r}} = \pi\_{\mathfrak{s}}z\_{\mathfrak{g},\mathfrak{r}} \to \nu\_{\mathfrak{g},\mathfrak{s},\mathfrak{r}}\,\forall \mathfrak{g} \in G\_{\text{SLOW}}, \mathfrak{s} \in \mathbb{S}, \tau \in T\_{\mathfrak{G}0} \tag{16}
$$

$$\begin{aligned} \max\_{\mathbf{g}, \tau} \succeq w\_{\mathbf{g}, \tau} - w\_{\mathbf{g}, \tau \iota} \quad & \sum\_{\boldsymbol{\sigma} = \boldsymbol{\tau} - \mathrm{l}\mathcal{I}\mathbf{T}\_{\mathfrak{g}} + 1}^{\boldsymbol{\tau}} \boldsymbol{z}\_{\mathbf{g}, \boldsymbol{\sigma}} \le w\_{\mathbf{g}, \tau \iota} \quad & \sum\_{\boldsymbol{\sigma} = \boldsymbol{\tau} - \mathrm{l}\mathcal{T}\_{\mathfrak{g}} + 1}^{\boldsymbol{\tau}} \boldsymbol{z}\_{\mathbf{g}, \boldsymbol{\sigma}} \le 1 - w\_{\mathbf{g}, \tau - \mathrm{l}\mathcal{T}\_{\mathfrak{g}}} \\ & w\_{\mathbf{g}, \tau} \in \{0, 1\}, \boldsymbol{z}\_{\mathbf{g}, \tau} \in \{0, 1\} \ \forall \mathbf{g} \in \mathbf{G}, \tau \in T\_{\mathcal{G}0} \end{aligned} \tag{17}$$

The objective function in Eq. (10) corresponds to the expected cost over the set of scenarios S, with associated probabilities πs. Constraints in Eqs. (11)–(12) are analogous to Eqs. (2) and (4). No explicit reserve requirements are enforced in the stochastic unit commitment model, since reserves are endogenously determined by the explicit modeling of uncertainty. Consequently, generator constraints of the deterministic problem, Eqs. (5)–(10), become identical for all thermal generators and can be expressed as Eqs. (13)–(15). Nonanticipativity constraints Eq. (16) are formulated using state variables w and z for the commitment and startup decisions of slow thermal generators (first-stage decisions). We associate Lagrange multipliers μ and ν with nonanticipativity constraints. Constraints in Eq. (17) enforce minimum up and down times on unit commitment variables.

#### 3. An asynchronous distributed algorithm for stochastic unit commitment

#### 3.1. Scenario decomposition of the SUC problem

The SUC problem in Eqs. (10)–(17) grows linearly in size with the number of scenarios. Hence, SUC problems are in general of large scale, even for small system models. This motivated Takriti et al. [15] and Carpentier et al. [16] to rely on Lagrangian decomposition methods for solving the problem.

Recent SUC studies have focused on designing decomposition algorithms, capable of solving the problem in operationally acceptable time frames. Papavasiliou et al. [21] proposed a dual scenario decomposition scheme where the dual is solved using the subgradient method, and where the dual function is evaluated in parallel. Kim and Zavala [22] also used a dual scenario decomposition scheme, but solved the dual problem using a bundle method. Cheung et al. [23] present a parallel implementation of the progressive hedging algorithm of Rockafellar and Wets [24].

All previously mentioned parallel algorithms for SUC are synchronous algorithms, i.e., scenario subproblems are solved in parallel at each iteration of the decomposition method; however, it is necessary to solve all scenario subproblems before advancing to the next iteration. In cases where the solution times of subproblems differ significantly, synchronous algorithms lead to an underutilization of the parallel computing infrastructure and a loss of parallel efficiency. We have found instances where the time required to evaluate subproblems for difficult scenarios is 75 times longer than the solution time for easy scenarios.

Aiming at overcoming the difficulties faced by synchronous algorithms, we propose an asynchronous distributed algorithm for solving SUC. The algorithm is based on the scenario decomposition scheme for SUC proposed in Ref. [21], where the authors relax the nonanticipativity constraints Eq. (16) and form the following Lagrangian dual problem

$$\max\_{\boldsymbol{\mu}, \boldsymbol{\nu}} h\_0(\boldsymbol{\mu}, \boldsymbol{\nu}) + \sum\_{s \in \mathcal{S}} h\_s(\boldsymbol{\mu}\_{s'}, \boldsymbol{\nu}\_s), \tag{18}$$

where h0ðμ, νÞ and hsðμs, νsÞ are defined according to Eqs. (19) and (20), respectively. We use boldface to denote vectors and partial indexation of dual variables with respect to scenarios, so that <sup>μ</sup><sup>s</sup> :¼ ½μ<sup>g</sup>1,s, <sup>1</sup>… <sup>μ</sup><sup>g</sup>jG<sup>j</sup> ,s,1� <sup>T</sup>. The constraints within the infimum in Eq. (20) refer to constraints Eqs. (11)–(15) for scenario s (dropping the scenario indexation of variables).

$$h\_0(\boldsymbol{\mu}, \boldsymbol{\nu}) := \inf\_{\boldsymbol{w}, \boldsymbol{\tau}} \left\{ \sum\_{\boldsymbol{\mathcal{S}} \in \operatorname{G} \rm{i} \rm{SO} \boldsymbol{\tau} \in \operatorname{T}\_{\operatorname{i}0}} \sum\_{\boldsymbol{\tau} \in \operatorname{\mathcal{S}}} \left( - \left( \sum\_{\boldsymbol{s} \in \operatorname{\mathcal{S}}} (\boldsymbol{\pi}\_{\boldsymbol{s}} \boldsymbol{\mu}\_{\boldsymbol{\mathcal{S}}, \boldsymbol{s}, \boldsymbol{\tau}}) \right) \boldsymbol{w}\_{\boldsymbol{\mathcal{S}}, \boldsymbol{\tau}} - \left( \sum\_{\boldsymbol{s} \in \operatorname{\mathcal{S}}} (\boldsymbol{\pi}\_{\boldsymbol{s}} \boldsymbol{\nu}\_{\boldsymbol{\mathcal{S}}, \boldsymbol{s}, \boldsymbol{\tau}}) \right) \boldsymbol{z}\_{\boldsymbol{\mathcal{S}}, \boldsymbol{\tau}} \right) \right\} \tag{19}$$

A Distributed Computing Architecture for the Large-Scale Integration of Renewable Energy and Distributed… http://dx.doi.org/10.5772/67791 27

$$h\_{s}(\boldsymbol{\mu}\_{s},\boldsymbol{\nu}\_{s}) := \pi\_{s} \inf\_{\boldsymbol{p}\_{s},\boldsymbol{u}\_{s},\boldsymbol{v}\_{f}} \left\{ \sum\_{\boldsymbol{\mathcal{S}} \in \mathcal{U} \in \mathcal{T}\_{\rm IS}} \sum\_{t \in \mathcal{S}} \mathbb{C}\_{\mathcal{S}}(\boldsymbol{p}\_{\mathcal{S},t}) + \sum\_{\boldsymbol{\mathcal{S}} \in \mathcal{G} \prec \mathcal{G}\_{\rm SL}} \sum\_{t \in \mathcal{T}\_{\rm S}} (\mathcal{K}\_{\mathcal{S}} \boldsymbol{u}\_{\mathcal{S},\boldsymbol{\tau}} + \boldsymbol{\mathcal{S}}\_{\rm S} \boldsymbol{v}\_{\mathcal{S},\boldsymbol{\tau}}) + \right\} \tag{20}$$

Both h0ðμ, νÞ and hsðμs, νsÞ for all s ∈S are nondifferentiable convex functions. Evaluating h0ðμ, νÞ amounts to solving a small integer programming problem, for the constraints of which we have a linear-size convex hull description [20]. Evaluating hsðμs, νsÞ amounts to solving a deterministic unit commitment (DUC) problem without reserve requirements, which is a mixed-integer linear program of potentially large scale for realistic system models. In practice, the run time for evaluating hsðμs, νsÞ for any s and any dual multipliers is at least two orders of magnitude greater than the run time for evaluating h0ðμ, νÞ.

The proposed distributed algorithm exploits the characteristics of h0ðμ, νÞ and hsðμs, νsÞ in order to maximize Eq. (18) and compute lower bounds on the optimal SUC solution, while recovering feasible nonanticipative commitment schedules with associated expected costs (upper bounds to the optimal SUC solution). The dual maximization algorithm is inspired by the work of Nedić et al. on asynchronous incremental subgradient methods [25].

#### 3.2. Dual maximization and primal recovery

formulated using state variables w and z for the commitment and startup decisions of slow thermal generators (first-stage decisions). We associate Lagrange multipliers μ and ν with nonanticipativity constraints. Constraints in Eq. (17) enforce minimum up and down times on

3. An asynchronous distributed algorithm for stochastic unit commitment

The SUC problem in Eqs. (10)–(17) grows linearly in size with the number of scenarios. Hence, SUC problems are in general of large scale, even for small system models. This motivated Takriti et al. [15] and Carpentier et al. [16] to rely on Lagrangian decomposition methods for solving the problem.

Recent SUC studies have focused on designing decomposition algorithms, capable of solving the problem in operationally acceptable time frames. Papavasiliou et al. [21] proposed a dual scenario decomposition scheme where the dual is solved using the subgradient method, and where the dual function is evaluated in parallel. Kim and Zavala [22] also used a dual scenario decomposition scheme, but solved the dual problem using a bundle method. Cheung et al. [23] present a parallel implementation of the progressive hedging algorithm of Rockafellar and

All previously mentioned parallel algorithms for SUC are synchronous algorithms, i.e., scenario subproblems are solved in parallel at each iteration of the decomposition method; however, it is necessary to solve all scenario subproblems before advancing to the next iteration. In cases where the solution times of subproblems differ significantly, synchronous algorithms lead to an underutilization of the parallel computing infrastructure and a loss of parallel efficiency. We have found instances where the time required to evaluate subproblems for difficult scenarios is 75

Aiming at overcoming the difficulties faced by synchronous algorithms, we propose an asynchronous distributed algorithm for solving SUC. The algorithm is based on the scenario decomposition scheme for SUC proposed in Ref. [21], where the authors relax the nonantici-

s∈ S

� wg, <sup>τ</sup> �

<sup>T</sup>. The constraints within the infimum in Eq. (20) refer to constraints

�X s∈S

ðπsνg,s, <sup>τ</sup>Þ

� zg,<sup>τ</sup> �

: ð17Þ

9 = ;

ð19Þ

where h0ðμ, νÞ and hsðμs, νsÞ are defined according to Eqs. (19) and (20), respectively. We use boldface to denote vectors and partial indexation of dual variables with respect to scenarios, so

ðπsμg,s, <sup>τ</sup>Þ

hsðμs, <sup>ν</sup>sÞ, <sup>ð</sup>18<sup>Þ</sup>

pativity constraints Eq. (16) and form the following Lagrangian dual problem

Eqs. (11)–(15) for scenario s (dropping the scenario indexation of variables).

<sup>μ</sup>, <sup>ν</sup> <sup>h</sup>0ðμ, <sup>ν</sup>Þ þ<sup>X</sup>

max

unit commitment variables.

26 Recent Progress in Parallel and Distributed Computing

Wets [24].

that <sup>μ</sup><sup>s</sup> :¼ ½μ<sup>g</sup>1,s, <sup>1</sup>… <sup>μ</sup><sup>g</sup>jG<sup>j</sup>

h0ðμ, νÞ :¼ inf

w, <sup>z</sup>

8 < :

3.1. Scenario decomposition of the SUC problem

times longer than the solution time for easy scenarios.

,s,1�

X g∈ GSLOW

X τ∈ T<sup>60</sup> � � �X s∈S For simplicity, assume that we have 1 þ DP þ PP available parallel processors which can all access a shared memory space. We allocate one processor to coordinate the parallel execution and manage the shared memory space, DP ≤ jSj processors to solve the dual problem in Eq. (18) and PP processors to recover complete solutions to the SUC problem in Eqs. (10)–(17). Interactions between different processors are presented in Figure 1.

Figure 1 Asynchronous algorithm layout. Information within square brackets is read or written at a single step of the algorithm.

We maximize the dual function in Eq. (18) using a block coordinate descent (BCD) method, in which each update is performed over a block of dual variables associated with a scenario, ðμs, νsÞ for certain s ∈S, following the direction of the subgradient of the dual function in the block of variables ðμs, νsÞ. The BCD method is implemented in parallel and asynchronously by having each dual processor perform updates on the dual variables associated with a certain scenario, which are not being updated by any other dual processor at the same time. Scenarios whose dual variables are not currently being updated by any processor are held in the dual queue Q<sup>D</sup>, to be updated later.

We maintain shared memory registers of <sup>Q</sup><sup>D</sup>. We denote the current multipliers as μ<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> , ν kðsÞ s ∀s ∈S, where kðsÞ is the number of updates to the block of scenario s; the previous-to-current dual multipliers as μ<sup>k</sup>ðsÞ�<sup>1</sup> <sup>s</sup> , ν <sup>k</sup>ðsÞ�<sup>1</sup> <sup>s</sup> and their associated lower bound on hs μ<sup>k</sup>ðsÞ�<sup>1</sup> <sup>s</sup> , ν <sup>k</sup>ðsÞ�<sup>1</sup> <sup>s</sup> as �h kðsÞ�1 <sup>s</sup> , ∀s∈ S; the global update count as k; and the best lower bound found in Eqs. (10)–(17) as LB. Additionally, a shared memory register of the primal queue Q<sup>P</sup> is required for recovering primal solutions. Then, each dual processor performs the following operations:

1. Read and remove the first scenario s from Q<sup>D</sup>.

$$\mathbf{2.} \quad \text{Read} \left( \boldsymbol{\mu}\_{s}^{k(s)}, \boldsymbol{\nu}\_{s}^{k(s)} \right) \text{ and evaluate } h\_{s} \left( \boldsymbol{\mu}\_{s}^{k(s)}, \ \boldsymbol{\nu}\_{s}^{k(s)} \right) \text{ approximately.}$$


$$\begin{aligned} \overline{\mu} &:= \left( \mu\_{s\_1}^{k(s\_1)-1}, \dots, \mu\_{s}^{k(s)}, \dots, \mu\_{s\_M}^{k(s\_M)-1} \right), \\ \overline{\nu} &:= \left( \nu\_{s\_1}^{k(s\_1)-1}, \dots, \nu\_{s}^{k(s)}, \dots, \nu\_{s\_M}^{k(s\_M)-1} \right), \end{aligned}$$

and evaluate h0ðμ, νÞ approximately.

5. Read the current global iteration count k and perform a BCD update on the dual multipliers

$$\begin{aligned} \boldsymbol{\mu}\_{s}^{k(s)+1} &:= \boldsymbol{\mu}\_{s}^{k(s)} + \frac{\alpha\_{k}}{\beta\_{s}} \cdot \boldsymbol{\pi}\_{s} (\boldsymbol{\mu}\_{SLOW}^{\*} - \mathbf{w}^{\*}) \\ \boldsymbol{\nu}\_{s}^{k(s)+1} &:= \boldsymbol{\nu}\_{s}^{k(s)} + \frac{\alpha\_{k}}{\beta\_{s}} \cdot \boldsymbol{\pi}\_{s} (\boldsymbol{\nu}\_{SLOW}^{\*} - \mathbf{z}^{\*}), \end{aligned}$$

where ðw�, z�Þ is an approximate minimizer of Eq. (19) for ðμ, νÞ, ðp�, u�, v�,f � Þ is an approximate minimizer of Eq. (20) for μ<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> , ν kðsÞ s and ðu� SLOW , v� SLOW Þ corresponds to the commitment and startup for slow generators in ðp�, u�, v�, f � Þ.

6. Compute a new lower bound as

$$LB^{\rm new} := \check{h}\_0(\overline{\mu}, \overline{\nu}) + \check{h}\_s\left(\mu\_s^{k(s)}, \nu\_s^{k(s)}\right) + \sum\_{\substack{\omega \in \mathcal{S} \backslash \{s\}}} \check{h}\_{\omega}^{k(\omega)-1}\right)$$

where �h0ðμ, <sup>ν</sup><sup>Þ</sup> <sup>≤</sup> <sup>h</sup>0ðμ, <sup>ν</sup><sup>Þ</sup> and �hs � μ<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> , ν kðsÞ s � ≤ hs � μ<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> , ν kðsÞ s � are the lower bounds of the MILP solution of Eqs. (19) and (20).

	- a. kþ ¼ 1.

We maximize the dual function in Eq. (18) using a block coordinate descent (BCD) method, in which each update is performed over a block of dual variables associated with a scenario, ðμs, νsÞ for certain s ∈S, following the direction of the subgradient of the dual function in the block of variables ðμs, νsÞ. The BCD method is implemented in parallel and asynchronously by having each dual processor perform updates on the dual variables associated with a certain scenario, which are not being updated by any other dual processor at the same time. Scenarios whose dual variables are not currently being updated by any processor are held in the dual

∀s ∈S, where kðsÞ is the number of updates to the block of scenario s; the previous-to-current

<sup>s</sup> , ∀s∈ S; the global update count as k; and the best lower bound found in Eqs. (10)–(17) as LB. Additionally, a shared memory register of the primal queue Q<sup>P</sup> is required for recovering

<sup>ω</sup> for all ω ∈S∖fsg.

<sup>k</sup>ðs1Þ�<sup>1</sup> <sup>s</sup><sup>1</sup> ,…, <sup>ν</sup>

<sup>μ</sup><sup>k</sup>ðs1Þ�<sup>1</sup> <sup>s</sup><sup>1</sup> , …, <sup>μ</sup><sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> ,…, <sup>μ</sup><sup>k</sup>ðsMÞ�<sup>1</sup> sM

5. Read the current global iteration count k and perform a BCD update on the dual multi-

αk βs

<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> <sup>þ</sup> αk βs

where ðw�, z�Þ is an approximate minimizer of Eq. (19) for ðμ, νÞ, ðp�, u�, v�,f

 μ<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> , ν kðsÞ s 

<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> , …, ν

� πsðu�

� πsðv�

and their associated lower bound on hs

approximately.

<sup>k</sup>ðsMÞ�<sup>1</sup> sM ,

SLOW � w�

SLOW � z�

and ðu�

Þ

Þ,

SLOW , v�

� Þ.  μ<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> , ν kðsÞ s 

μ<sup>k</sup>ðsÞ�<sup>1</sup> <sup>s</sup> , ν

<sup>k</sup>ðsÞ�<sup>1</sup> <sup>s</sup> as

� Þ is an

SLOW Þ corresponds to the

We maintain shared memory registers of Q<sup>D</sup>. We denote the current multipliers as

primal solutions. Then, each dual processor performs the following operations:

 μ<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> , ν kðsÞ s 

kðωÞ�1

<sup>μ</sup><sup>k</sup>ðsÞþ<sup>1</sup> <sup>s</sup> :<sup>¼</sup> <sup>μ</sup><sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> <sup>þ</sup>

<sup>k</sup>ðsÞþ<sup>1</sup> <sup>s</sup> :<sup>¼</sup> <sup>ν</sup>

commitment and startup for slow generators in ðp�, u�, v�, f

queue Q<sup>D</sup>, to be updated later.

28 Recent Progress in Parallel and Distributed Computing

μ<sup>k</sup>ðsÞ�<sup>1</sup> <sup>s</sup> , ν

1. Read and remove the first scenario s from Q<sup>D</sup>.

<sup>k</sup>ðωÞ�<sup>1</sup> <sup>ω</sup> and �h

4. Construct the delayed multiplier vectors,

and evaluate h0ðμ, νÞ approximately.

approximate minimizer of Eq. (20) for

<sup>k</sup>ðsÞ�<sup>1</sup> <sup>s</sup> 

and evaluate hs

μ :¼ 

ν :¼ ν

ν

dual multipliers as

 μ<sup>k</sup>ðs<sup>Þ</sup> <sup>s</sup> , ν kðsÞ s 

μ<sup>k</sup>ðωÞ�<sup>1</sup> <sup>ω</sup> , ν

�h kðsÞ�1

2. Read

3. Read

pliers


Steps 1–3 of the dual processor algorithm are self-explanatory. Step 4 constructs a compound of the previous iterates which is useful for computing lower bounds.

During the execution of the algorithm, step 5 will perform updates to the blocks of dual variables associated to all scenarios. As hsðμs, νsÞ is easier to evaluate for certain scenarios than others, the blocks of dual variables associated to easier scenarios will be updated more frequently than harder scenarios. We model this process, in a simplified fashion, as if every update is performed on a randomly selected scenario from a nonuniform distribution, where the probability of selecting scenario s corresponds to

$$\beta\_s := \frac{T\_s^{\text{between}}}{\sum\_{\omega \in S} T\_{\omega}^{\text{between}}},$$

where Tbetween <sup>s</sup> is the average time between two updates on scenario s (Tbetween <sup>s</sup> is estimated during execution). The asynchronous BCD method can then be understood as a stochastic approximate subgradient method [26, 27]. This is an approximate method for three reasons: (i) as the objective function contains a nonseparable nondifferentiable function h0ðμ, νÞ, there is no guarantee that the expected update direction coincides with a subgradient of the objective of Eq. (8) at the current iterate, (ii) h0ðμ, νÞ is evaluated for a delayed version of the multipliers ðμ, νÞ, and (iii) h0ðμ, νÞ and hsðμs, νsÞ are evaluated only approximately up to a certain MILP gap. Provided that we use a diminishing, nonsummable and square-summable stepsize α<sup>k</sup> of the type 1=k q , and that the error in the subgradient is bounded, the method will converge to an approximate solution of the dual problem in Eq. (8) [26, 27].

In step 6, we compute a lower bound on the primal problem Eqs. (10)–(17) using previous evaluations of hsðμs, νsÞ recorded in memory, as proposed in Ref. [6]. Step 7 updates the shared memory registers for future iterations and step 8 closes the internal loop of the dual processor.

We recover primal solutions by taking advantage of the fact that ðu� SLOW , v� SLOW Þ is a feasible solution for ðw, zÞ in Eqs. (10)–(17). Therefore, in order to compute complete primal solutions and obtain upper bounds for problem in Eqs. (10)–(17), we can fix w :¼ u� SLOW and z :¼ v� SLOW and solve the remaining problem, as proposed in Ref. [28]. After fixing ðw, zÞ, the remaining problem becomes separable by scenario; hence, in order to solve it, we need to solve a restricted DUC for each scenario in S. These primal evaluation jobs, i.e., solving the restricted DUC for fu� SLOW g � S, are appended at the end of the primal queue Q<sup>P</sup> by dual processors after each update (step 7.e). Note that we do not require storing v� SLOW because its value is implied by u� SLOW.

The primal queue is managed by the coordinator process, which assigns primal jobs to primal processors as they become available. The computation of primal solutions is therefore also asynchronous, in the sense that it runs independently of dual iterations and that the evaluation of candidate solutions u� SLOW does not require that the previous candidates have already been evaluated for all scenarios. Once a certain candidate u<sup>l</sup> has been evaluated for all scenarios, the coordinator can compute a new upper bound to Eqs. (10)–(17) as

$$\text{UIB} := \min \left\{ \text{UIB}, \sum\_{s \in S} \text{UIB}\_s^l \right\} \,\, \tag{25}$$

where UBl <sup>s</sup> is the upper bound associated with u<sup>l</sup> on the restricted DUC problem of scenario s. The coordinator process keeps track of the candidate associated with the smaller upper bound throughout the execution.

Finally, the coordinator process will terminate the algorithm when 1 � LB=UB ≤ E, where E is a prescribed tolerance, or when reaching a prescribed maximum solution time. At this point, the algorithm retrieves the best-found solution and the bound on the distance of this solution from the optimal objective function value.

#### 3.3. Dual algorithm initialization

The lower bounds computed by the algorithm presented in the previous section depend on previous evaluations of hsðμs, νsÞ for other scenarios. As the evaluation of hsðμs, νsÞ can require a substantial amount of time for certain scenarios, the computation of the first lower bound considering nontrivial values of hsðμs, νsÞ for all scenarios can be delayed significantly with respect to the advance of dual iterations and primal recovery. In other words, it might be the case that the algorithm finds a very good primal solution but it is unable to terminate because it is missing the value of hsðμs, νsÞ for a single scenario.

In order to prevent these situations and in order to obtain nontrivial bounds faster, in the first pass of the dual processors over all scenarios, we can replace hsðμs, νsÞ with a surrogate ηsðμs, νsÞ which is easier to compute, such that ηsðμs, νsÞ ≤ hsðμs, νsÞ for any ðμs, νsÞ. We propose two alternatives for ηsðμs, νsÞ:

A Distributed Computing Architecture for the Large-Scale Integration of Renewable Energy and Distributed… http://dx.doi.org/10.5772/67791 31

1. The linear relaxation of the scenario DUC (LP):

In step 6, we compute a lower bound on the primal problem Eqs. (10)–(17) using previous evaluations of hsðμs, νsÞ recorded in memory, as proposed in Ref. [6]. Step 7 updates the shared memory registers for future iterations and step 8 closes the internal loop of the dual processor.

solution for ðw, zÞ in Eqs. (10)–(17). Therefore, in order to compute complete primal solutions and

solve the remaining problem, as proposed in Ref. [28]. After fixing ðw, zÞ, the remaining problem becomes separable by scenario; hence, in order to solve it, we need to solve a restricted DUC for

are appended at the end of the primal queue Q<sup>P</sup> by dual processors after each update (step 7.e).

The primal queue is managed by the coordinator process, which assigns primal jobs to primal processors as they become available. The computation of primal solutions is therefore also asynchronous, in the sense that it runs independently of dual iterations and that the evaluation

evaluated for all scenarios. Once a certain candidate u<sup>l</sup> has been evaluated for all scenarios, the

The coordinator process keeps track of the candidate associated with the smaller upper bound

Finally, the coordinator process will terminate the algorithm when 1 � LB=UB ≤ E, where E is a prescribed tolerance, or when reaching a prescribed maximum solution time. At this point, the algorithm retrieves the best-found solution and the bound on the distance of this solution from

The lower bounds computed by the algorithm presented in the previous section depend on previous evaluations of hsðμs, νsÞ for other scenarios. As the evaluation of hsðμs, νsÞ can require a substantial amount of time for certain scenarios, the computation of the first lower bound considering nontrivial values of hsðμs, νsÞ for all scenarios can be delayed significantly with respect to the advance of dual iterations and primal recovery. In other words, it might be the case that the algorithm finds a very good primal solution but it is unable to terminate because

In order to prevent these situations and in order to obtain nontrivial bounds faster, in the first pass of the dual processors over all scenarios, we can replace hsðμs, νsÞ with a surrogate ηsðμs, νsÞ which is easier to compute, such that ηsðμs, νsÞ ≤ hsðμs, νsÞ for any ðμs, νsÞ. We propose

s∈S UBl s

<sup>s</sup> is the upper bound associated with u<sup>l</sup> on the restricted DUC problem of scenario s.

( )

UB :<sup>¼</sup> min UB,<sup>X</sup>

SLOW because its value is implied by u�

SLOW does not require that the previous candidates have already been

each scenario in S. These primal evaluation jobs, i.e., solving the restricted DUC for fu�

SLOW , v�

SLOW and z :¼ v�

SLOW.

, ð25Þ

SLOW Þ is a feasible

SLOW and

SLOW g � S,

We recover primal solutions by taking advantage of the fact that ðu�

obtain upper bounds for problem in Eqs. (10)–(17), we can fix w :¼ u�

coordinator can compute a new upper bound to Eqs. (10)–(17) as

Note that we do not require storing v�

30 Recent Progress in Parallel and Distributed Computing

of candidate solutions u�

throughout the execution.

the optimal objective function value.

it is missing the value of hsðμs, νsÞ for a single scenario.

3.3. Dual algorithm initialization

two alternatives for ηsðμs, νsÞ:

where UBl

$$\eta\_s(\boldsymbol{\mu}\_s, \boldsymbol{\nu}\_s) := \pi\_{s\_p} \inf\_{\boldsymbol{\mu}\_s, \boldsymbol{\mu}\_s, \boldsymbol{\nu}\_f} \left\{ \sum\_{\mathcal{S} \in \mathcal{G} \times \mathcal{T}\_{15}} \mathbb{C}\_{\mathcal{S}}(\boldsymbol{p}\_{\mathcal{S}, \boldsymbol{\tau}}) + \sum\_{\mathcal{S} \in \mathcal{G} \times \mathcal{G} \times \mathcal{W} \times \boldsymbol{\tau} \in \mathcal{T}\_{50}} (\mathcal{K}\_{\mathcal{S}} \boldsymbol{u}\_{\mathcal{S}, \boldsymbol{\tau}} + \mathcal{S}\_{\mathcal{S}} \boldsymbol{v}\_{\mathcal{S}, \boldsymbol{\tau}}) + \right\}$$

$$\left\{ \sum\_{\mathcal{S} \in \mathcal{G} \times \mathcal{W} \times \boldsymbol{\tau} \in \mathcal{T}\_{50}} \sum\_{\mathcal{T} \in \mathcal{T}\_{50}} \left( (\mathcal{K}\_{\mathcal{S}} + \boldsymbol{\mu}\_{\mathcal{S}, \boldsymbol{s}, \boldsymbol{\tau}}) \boldsymbol{u}\_{\mathcal{S}, \boldsymbol{\tau}} + (\mathcal{S}\_{\mathcal{S}} + \boldsymbol{\nu}\_{\mathcal{S}, \boldsymbol{s}, \boldsymbol{\tau}}) \boldsymbol{v}\_{\mathcal{S}, \boldsymbol{\tau}} \right) : \begin{aligned} &\text{ $ $  \mathcal{S} $} \\ &\text{linear relaxation of $  (11 \text{s}) $ $   $ - (15 \text{s})$  \end{aligned} \right\}$$

#### 2. An optimal power flow for each period (OPF):

$$\eta\_s(\boldsymbol{\mu}\_s, \boldsymbol{\nu}\_s) := \pi\_s \sum\_{t \in T\_{15}} \inf\_{p\_s, \boldsymbol{\nu}\_s, \boldsymbol{\xi}} \left\{ \frac{\sum\_{\mathcal{S}} \mathsf{C}\_{\mathcal{S}}(p\_{\mathcal{S}}) + \frac{1}{4} \sum\_{\mathcal{g} \in \mathcal{G} \colon G \to \mathcal{G}} K\_{\mathcal{S}} \boldsymbol{\mu}\_{\mathcal{S}} + \boldsymbol{\mathcal{S}}}{\frac{1}{4} \sum\_{\mathcal{S} \in G\_{\mathcal{S} \cup \mathcal{W}}} \left( \left( K\_{\mathcal{S}} + \boldsymbol{\mu}\_{\mathcal{g}, s, \boldsymbol{\tau}(t)} \right) \boldsymbol{\mu}\_{\mathcal{S}} + \left( \boldsymbol{\mathcal{S}}\_{\mathcal{S}} + \boldsymbol{\nu}\_{\mathcal{g}, s, \boldsymbol{\tau}(t)} \right) \boldsymbol{\upsilon}\_{\mathcal{S}} \right) : \right\}^{[1]}$$

$$(11st) - (13st), \boldsymbol{\mu} \in \{0, 1\}^{[\mathcal{G}]}, \boldsymbol{\nu} \in \{0, 1\}^{[\mathcal{G}\_{\mathcal{S} \cup \mathcal{W}}]}$$

where (11st) – (13st) correspond to constraints Eqs. (11)–(13) for scenario s and period t.

The LP approach requires solving a linear program of the same size as the original problem in Eq. (20), but it has the advantage that it can be obtained as an intermediate result while evaluating hsðμs, νsÞ (the LP approach does not add extra computations to the algorithm). The OPF approach, on the other hand, requires solving many small MILP problems, which can be solved faster than the linear relaxation of Eq. (20). The OPF approach ignores several constraints and cost components, such as the startup cost of nonslow generators, and it adds extra computations to the algorithm.

#### 3.4. Implementation and numerical experiments

We implement the DUCR model using Mosel and solve it directly using Xpress. We also implement the proposed asynchronous algorithm for SUC (described in the previous subsections) in Mosel, using the module mmjobs for handling parallel processes and communications, while solving the subproblems with Xpress [29]. We configure Xpress to solve the root node using the barrier algorithm and we set the termination gap to 1%, for both the DUCR and SUC subproblems, and the maximum solution wall time to 10 hours. Numerical experiments were run on the Sierra cluster hosted at the Lawrence Livermore National Laboratory. Each node of the Sierra cluster is equipped with two Intel XeonEP X5660 processors (12 cores per node) and 24GB of RAM memory. We use 10 nodes for the proposed distributed algorithm, assigning 5 nodes to dual processors, with 6 dual processors per node (DP ¼ 30), and 5 nodes to primal recovery, with 12 primal processors per node. The coordinator is implemented on a primal node and occupies one primal processor (PP ¼ 59).

We test the proposed algorithm on a detailed model of the Central Western European system, consisting of 656 thermal generators, 679 nodes, and 1037 lines. The model was constructed by using the network model of Hutcheon and Bialek [30], technical generator information provided to the authors by ENGIE, and multiarea demand and renewable energy information collected from national system operators (see [31] for details). We consider eight representative day types, one weekday and one weekend day per season, as being representative of the different conditions faced by the system throughout the year.

We consider 4 day-ahead scheduling models: the DUCR model and the SUC model with 30 (SUC30), 60 (SUC60), and 120 (SUC120) scenarios. The sizes of the different day-ahead scheduling models are presented in Table 1, where the size of the stochastic models refers to the size of the extensive form. While the DUCR model is of the scale of problems that fit in the memory of a single machine and can be solved by a commercial solver, the SUC models in extensive form are beyond current capabilities of commercial solvers.

Table 2 presents the solution time statistics for all day-ahead scheduling policies. In the case of SUC, we report these results for the two dual initialization alternatives proposed in Section 3.2.

The results of Table 2 indicate that the OPF initialization significantly outperforms the LP approach in terms of termination time. This is mainly due to the fact that the OPF approach provides nontrivial lower bounds including information for all scenarios much faster than the LP approach. On the other hand, the solution times of SUC60 and DUCR indicate that, using distributed computing, we can solve SUC at a comparable run time to that required by commercial solvers for DUCR on large-scale systems. Moreover, as shown in Table 3, for a given hard constraint on solution wall time such as 2 h (which is common for day-ahead power system operations), the proposed distributed algorithm provides solutions to SUC with up to 60 scenarios within 2% of optimality, which is acceptable for operational purposes.


Table 1. Problem sizes.


Table 2. Solution time statistics over 8 day types.


Table 3. Worst optimality gap (over 8 day types) vs. solution wall time.
