**2. Stabilized control for reliable deep learning platforms**

In this section, Lyapunov optimization theory which is for time-average optimization subject to stability is introduced at first (refer to Section 2.1), and then example-based explanation is presented (refer to Section 2.2). Finally, related discussions are organized (refer to Section 2.3).

#### **2.1 Theory**

In this section, we introduce the Lyapunov optimization theory which aims at time-average penalty function minimization subject to queue stability. Notice that the time-average penalty function minimization can be equivalently converted to time-average utility function maximization. The Lyapunov optimization theory can be used when the tradeoff exists between utility and stability. For example, it can be obviously seen that the tradeoff exists when current decision-making is optimal in terms of the minimization of penalty function, whereas the operation of the decision takes a lot of time, i.e., thus it introduces delays (i.e., queue-backlog increases in the system). Then, the optimal decision can be dynamically time-varying because focusing on utility maximization (i.e., penalty function minimization) is better when the delay in the current system is not serious (i.e., queueing delay is small or marginal). On the other hand, the optimal decision will be for the delay reduction when the delay in the current system is large. In this case, the decision should be for delay reduction while sacrificing certain amounts of utility maximization (or penalty function minimization).

Suppose that our time-average penalty function is denoted by *P*ð Þ *α*½ �*t* and it should be minimized and our control action decision-making is denoted by *α*½ �*t* . Then, the queue dynamics in the system, i.e., *Q t*½ �, can be formulated as follows:

$$Q[t+1] = \max\left\{ Q[t] + a(a[t]) - b(a[t]), 0 \right\} \tag{1}$$

$$Q[\mathbf{0}] = \mathbf{0} \tag{2}$$

where *a*ð Þ *α*½ �*t* is an arrival process at *Q t*½ � at *t* when our control action decisionmaking is *α*½ �*t* . In (1), *b*ð Þ *α*½ �*t* is a departure/service process at *Q t*½ � when our control action decision-making is *α*½ �*t* at *t*.

In this section, control action decision-making should be made in each unit time for time-average penalty function minimization subject to queue stability. Then, the mathematical program for minimizing time-average penalty function, *P*ð Þ *α*½ �*t* where the control action decision-making at *t* is *α*½ �*t* , can be presented as follows:

$$\min \; : \lim\_{t \to \infty} \sum\_{\tau=0}^{t-1} P(a[\tau]) \tag{3}$$

*Dynamic Decision-Making for Stabilized Deep Learning Software Platforms DOI: http://dx.doi.org/10.5772/intechopen.92971*

Subject to queue stability:

This algorithm is designed inspired by Lyapunov control theory, and thus, it is named to Lyapunov optimization theory [2]. In this chapter, the basic theory, examples, and discussions of the Lyapunov optimization theory are presented. Then, the use of Lyapunov optimization theory for real-time computer vision and deep learning platforms is discussed. Furthermore, the performance evaluation results with real-world deep learning framework computation (e.g., real-world image super-resolution computation results with various models) are presented in

In this section, Lyapunov optimization theory which is for time-average optimi-

In this section, we introduce the Lyapunov optimization theory which aims at time-average penalty function minimization subject to queue stability. Notice that the time-average penalty function minimization can be equivalently converted to time-average utility function maximization. The Lyapunov optimization theory can be used when the tradeoff exists between utility and stability. For example, it can be obviously seen that the tradeoff exists when current decision-making is optimal in terms of the minimization of penalty function, whereas the operation of the decision takes a lot of time, i.e., thus it introduces delays (i.e., queue-backlog increases in the system). Then, the optimal decision can be dynamically time-varying because focusing on utility maximization (i.e., penalty function minimization) is better when the delay in the current system is not serious (i.e., queueing delay is small or marginal). On the other hand, the optimal decision will be for the delay reduction when the delay in the current system is large. In this case, the decision should be for delay reduction while sacrificing certain amounts of utility maximization (or pen-

Suppose that our time-average penalty function is denoted by *P*ð Þ *α*½ �*t* and it should be minimized and our control action decision-making is denoted by *α*½ �*t* . Then, the queue dynamics in the system, i.e., *Q t*½ �, can be formulated as follows:

where *a*ð Þ *α*½ �*t* is an arrival process at *Q t*½ � at *t* when our control action decisionmaking is *α*½ �*t* . In (1), *b*ð Þ *α*½ �*t* is a departure/service process at *Q t*½ � when our control

In this section, control action decision-making should be made in each unit time for time-average penalty function minimization subject to queue stability. Then, the mathematical program for minimizing time-average penalty function, *P*ð Þ *α*½ �*t* where

> X*t*�1 *τ*¼0

the control action decision-making at *t* is *α*½ �*t* , can be presented as follows:

min : lim*<sup>t</sup>*!<sup>∞</sup>

*Q t*½ �¼ þ 1 max f g *Q t*½�þ *a*ð Þ� *α*½ �*t b*ð Þ *α*½ �*t* , 0 (1)

*Q*½ �¼ 0 0 (2)

*P*ð Þ *α τ*½ � (3)

zation subject to stability is introduced at first (refer to Section 2.1), and then example-based explanation is presented (refer to Section 2.2). Finally, related dis-

various aspects. Finally, the emerging applications will be introduced.

**2. Stabilized control for reliable deep learning platforms**

cussions are organized (refer to Section 2.3).

*Advances and Applications in Deep Learning*

**2.1 Theory**

alty function minimization).

action decision-making is *α*½ �*t* at *t*.

**98**

$$\lim\_{t \to \infty} \frac{1}{t} \sum\_{\tau=0}^{t-1} Q[\tau] < \infty. \tag{4}$$

In (3), *P*ð Þ *α*½ �*t* stands for the penalty function when a control action decisionmaking is *α*½ �*t* at *t*.

As mentioned, the Lyapunov optimization theory can be used when tradeoff between utility maximization (or penalty function minimization) and delays exists. Based on this nature, drift-plus-penalty (DPP) algorithm [2–4] is designed for maximizing the time-average utility subject to queue stability. Here, the Lyapunov function is defined as *LQt* ð Þ¼ ½ � <sup>1</sup> <sup>2</sup> ð Þ *Q t*½ � <sup>2</sup> , and let Δð Þ*:* be a conditional quadratic Lyapunov function which is formulated as ½ � *LQt* ð Þ� ½ � þ 1 *LQt* ð Þj ½ � *Q t*½ � , which is called as the drift on *t*. According to [2], this dynamic policy is designed to achieve queue stability by minimizing an upper bound of our considering penalty function on DPP which is given by

$$
\Delta(Q[t]) + \mathcal{V} \mathbb{E}[P(a[t])],\tag{5}
$$

where *V* is a tradeoff coefficient. The upper bound on the drift of the Lyapunov function at *t* is derived as follows:

$$L(Q[t+1]) - L(Q[t]) = \frac{1}{2} \left( Q\left( \left[t+1\right]^2 - Q\left[t\right]^2 \right) \tag{6}$$

$$\leq \frac{1}{2} \left( a(a[t])^2 + b(a[t])^2 \right) + Q[t](a(a[t]) - b(a[t])).\tag{7}$$

Therefore, the upper bound of the conditional Lyapunov drift can be derived as follows:

$$\begin{split} \Delta(Q(t)) &= \mathbb{E}[L(Q[t+1]) - L(Q[t]) | Q[t]] \\ &\leq \mathbb{C} + \mathbb{E}[Q[t](a(a[t]) - b(a[t]) | Q[t]], \end{split} \tag{8}$$

where *C* is a constant given by

$$\frac{1}{2}\mathbb{E}\left[a(a[t])^2 + b(a[t])^2|Q[t]\right] \le \mathcal{C},\tag{9}$$

which supposes that the arrival and departure process rates are upper bounded. Due to the fact that *C* is a constant, minimizing the upper bound on DPP is as follows:

$$\mathbb{V}\mathbb{E}[P(a[t])] + \mathbb{E}[Q[t] \cdot (a(a[t]) - b(a[t]))].\tag{10}$$
