5. Simultaneous stochastic perturbations (SSP)

In [16], Spall proposes a simultaneous perturbation stochastic approximation (SPSA) algorithm for finding optimal unknown parameters by minimizing some objective function. The main feature of the simultaneous perturbation gradient approximation (SPGA) resides in the way to approximate the gradient vector (in average): a sample gradient vector is estimated by perturbing simultaneously all components of the unknown vector in a stochastic way. This method requires only two or three measurements of the objective function, regardless of the dimension of the vector of unknown parameters. In a study by Hoang and Baraille [17], the idea of the SPGA is described in detail, with a wide variety of applications in engineering domains. The application to estimation of ECM in the filtering problem is given in the work done by Hoang and Baraille [18]. In the research by Hoang and Baraille [19], a simple algorithm for estimating the elements of an unknown matrix as well as the way to decompose the estimated matrix into a product of two matrices, under the condition that only the matrixvector product is accessible, has been proposed.

#### 5.1. Theoretical background of SPGA: gradient approximation

The component-wise perturbation is a method for numerical computation of the cost function with respect to the vector of unknown parameters. It is based on the idea to perturb separately each component of the vector of parameters. For very HdS, this technique is impossible to implement. An alternative to the component-wise perturbation is the SSP approach.

Let

$$D\!\!\!/(\Theta\_0) = \left[\frac{\partial \!\!\!/(\Theta\_0)}{\partial \theta\_1}, \ldots, \frac{\partial \!\!\!/(\Theta\_0)}{\partial \theta\_{n\_0}}\right]^T$$

denote the gradient of Jð Þ θ computed at θ ¼ θ0. Suppose Δj, j ¼ 1, …, n are RVs independent identically distributed (i.i.d) according to the Bernoulli law which assumes two values +1 or �1 with equal probabilities 1/2. It implies that

On Optimal and Simultaneous Stochastic Perturbations with Application to Estimation of High-Dimensional… http://dx.doi.org/10.5772/intechopen.77273 71

$$E\left(\Delta\_{\dot{\gamma}}\right) = 0, E\left(\Delta\_{\dot{\gamma}}\right)^2 = 1, E\left(\Delta\_{\dot{\gamma}}^{-1}\right) = 0, E\left(\Delta\_{\dot{\gamma}}^{-1}\right)^2 = 1, j = 1, 2, \dots, n \tag{23}$$

Suppose Jð Þ θ is infinitely differentiable at θ ¼ θ0. Using a Taylor series expansion,

$$\Delta \mathbf{J} \coloneqq \mathbf{J} \left( \Theta\_0 + \overline{\delta} \theta \right) - \mathbf{J} (\Theta\_0) = \overline{\delta} \theta^T D \mathbf{J} (\theta\_0) + (1/2) \overline{\delta} \theta^T D^2 \mathbf{J} (\theta\_0) \overline{\delta} \theta + \dots \tag{24}$$

where D<sup>2</sup> Jð Þ θ<sup>0</sup> is the Hessian matrix computed at θ ≔ θ0. For the choice

$$\overline{\delta}\theta \coloneqq \left(\delta\theta\_1, \dots, \delta\theta\_{n\_{\theta}}\right)^{T} = \overline{c}\overline{\Delta}\_{\prime}\overline{\Delta} = \left(\Delta\_1, \Delta\_2, \dots, \Delta\_{n\_{\theta}}\right)^{T},\tag{25}$$

c is a small positive value, from Eq. (24)

By iteration, for

of Φ.

Let

<sup>Φ</sup><sup>i</sup> <sup>≔</sup> <sup>Φ</sup><sup>i</sup>�<sup>1</sup> � <sup>σ</sup>iuivT

with zero mean and unit variance, vk is the kth right SV of Φ.

70 Perturbation Methods with Applications in Science and Engineering

5. Simultaneous stochastic perturbations (SSP)

vector product is accessible, has been proposed.

with equal probabilities 1/2. It implies that

5.1. Theoretical background of SPGA: gradient approximation

applying Lemma 4.3 with slight modifications, one finds that the OSP for Φk, k ¼ 0, 1, …, n � 1

Theorem 4.1. The optimal perturbation for the matrix Φ<sup>i</sup>�<sup>1</sup>, i ¼ 1, …, n is ψvi, where ψ is an RV

In [16], Spall proposes a simultaneous perturbation stochastic approximation (SPSA) algorithm for finding optimal unknown parameters by minimizing some objective function. The main feature of the simultaneous perturbation gradient approximation (SPGA) resides in the way to approximate the gradient vector (in average): a sample gradient vector is estimated by perturbing simultaneously all components of the unknown vector in a stochastic way. This method requires only two or three measurements of the objective function, regardless of the dimension of the vector of unknown parameters. In a study by Hoang and Baraille [17], the idea of the SPGA is described in detail, with a wide variety of applications in engineering domains. The application to estimation of ECM in the filtering problem is given in the work done by Hoang and Baraille [18]. In the research by Hoang and Baraille [19], a simple algorithm for estimating the elements of an unknown matrix as well as the way to decompose the estimated matrix into a product of two matrices, under the condition that only the matrix-

The component-wise perturbation is a method for numerical computation of the cost function with respect to the vector of unknown parameters. It is based on the idea to perturb separately each component of the vector of parameters. For very HdS, this technique is impossible to

> ∂Jð Þ θ<sup>0</sup> ∂θ<sup>1</sup>

denote the gradient of Jð Þ θ computed at θ ¼ θ0. Suppose Δj, j ¼ 1, …, n are RVs independent identically distributed (i.i.d) according to the Bernoulli law which assumes two values +1 or �1

;…;

<sup>T</sup>

∂Jð Þ θ<sup>0</sup> ∂θ<sup>n</sup><sup>θ</sup>

implement. An alternative to the component-wise perturbation is the SSP approach.

DJð Þ¼ θ<sup>0</sup>

are ψvk, k ¼ 1, 2, …n:, where ψ is an RV with zero mean and unit variance, vk is the k

<sup>i</sup> , i ¼ 1, …, n � 1; Φ<sup>0</sup> ¼ Φ: (22)

th right SV

$$
\Delta \mathcal{J}(\theta\_0) = c \overline{\Delta}^T D \mathcal{J}(\theta\_0) + \left(c^2 / 2\right) \overline{\Delta}^T D^2 \mathcal{J}(\theta\_0) \overline{\Delta} + \dots
$$

Dividing both sides of the last equality by δθ<sup>k</sup> ¼ cΔ<sup>k</sup> implies

$$\begin{aligned} \Delta l(\theta\_0) / \delta \theta\_k &= Dl(\theta\_0)^T \overline{\Delta}^k + (\mathfrak{c}/2) \overline{\Delta}^{k, \overline{T}} D^2 l(\theta\_0) \overline{\Delta} + \dots \\ \overline{\Delta}^k &:= \left(\Delta\_1 \Delta\_k^{-1}, \dots, 1, \dots, \Delta\_n \Delta\_k^{-1}\right)^T \end{aligned} \tag{26}$$

Taking the mathematical expectation for both sides of the last equation yields

$$\begin{aligned} \operatorname{E}\left[\Delta \mathcal{J}(\theta\_0) \delta \theta\_k^{-1}\right] &= \\ \Delta D (\theta\_0)^T E \left(\overline{\Delta}^k\right) + (c/2) E \left[\overline{\Delta}^T D^2 \mathcal{J}(\theta\_0) \overline{\Delta}^k\right] + \dots \end{aligned} \tag{27}$$

One sees that from the assumptions on <sup>Δ</sup>, <sup>E</sup> <sup>Δ</sup><sup>k</sup> � � <sup>¼</sup> ð Þ <sup>0</sup>;…; <sup>1</sup>; …; <sup>0</sup> <sup>T</sup> it follows DJ uð Þ<sup>0</sup> TE <sup>Δ</sup><sup>k</sup> � � <sup>¼</sup> <sup>∂</sup>Jð Þ <sup>θ</sup><sup>0</sup> <sup>=</sup>∂θk. Moreover, as all the moments of the Bernoulli variables <sup>Δ</sup><sup>i</sup> and <sup>Δ</sup>�<sup>1</sup> <sup>i</sup> are finite, E Δ<sup>T</sup> D2 <sup>J</sup>ð Þ <sup>θ</sup><sup>0</sup> <sup>Δ</sup><sup>k</sup> h i <sup>¼</sup> 0 since there exists a finite <sup>D</sup><sup>2</sup> Jð Þ θ<sup>0</sup> , one concludes that

$$E\left[\Delta l(\theta\_0)\delta\theta\_k^{-1}\right] = \partial l(\theta\_0) / \partial \theta\_k + O\left(\varepsilon^2\right) \tag{28}$$

The result expressed by Eq. (28) constitutes a basis for approximating the gradient vector by simultaneous perturbation. The left of Eq. (28) can be easily approximated by noticing that for an ensemble of <sup>L</sup> i.i.d samples <sup>Δ</sup>ð Þ<sup>1</sup> ;…; ;Δð Þ <sup>L</sup> h i, we can generate the corresponding ensemble of L i.i.d sample estimates for the gradient vector at θ ¼ θ0,

$$D f^{(l)}(\\\theta\_0) = \left[ \frac{\Delta f^{(l)}(\\\\\theta\_0)}{c \Delta\_1^{(l)}}, ..., \frac{\Delta f^{(l)}(\\\\\theta\_0)}{c \Delta\_n^{(l)}} \right]^T, l = 1, 2, ..., L \tag{29}$$

where Δð Þ<sup>l</sup> <sup>k</sup> is the k th component of the l th sample Δð Þ<sup>l</sup> . The left of Eq. (28) is then well approximated by averaging L sample gradients in Eq. (29),

$$E[\Delta f(\theta\_0) / \delta \theta\_k] \approx (1/L) \sum\_{l=1}^{L} \eta\_k^{(l)} \, \prime \, \eta\_k^{(l)} \coloneqq \frac{\Delta f^{(l)}(\theta\_0)}{c \Delta\_k^{(l)}} \, \prime \tag{30}$$

The product <sup>z</sup> <sup>¼</sup> <sup>Φ</sup><sup>b</sup> ð Þ <sup>L</sup> <sup>y</sup>, <sup>y</sup><sup>∈</sup> Rn, can be performed as zi <sup>¼</sup> <sup>P</sup><sup>n</sup>

<sup>z</sup> <sup>¼</sup> <sup>1</sup> L X L

zi <sup>¼</sup> <sup>1</sup> L X L

l¼1

Eq. (33) allows to perform z ¼ Φb ð Þ L y with L mð Þþ þ 2n 1 elementary operations.

1 δx ð Þl i

want to find the best approximation for Φ among members of the class of matrices

<sup>Φ</sup><sup>e</sup> <sup>¼</sup> ABT, A <sup>∈</sup> Rm�<sup>r</sup>

Condition (C) A, B are matrices of dimension <sup>m</sup> � r, r � <sup>n</sup>, <sup>r</sup> <sup>≤</sup> <sup>m</sup>, rank ABT � � <sup>¼</sup> <sup>r</sup>.

Under the condition (C), the optimization problem is formulated as

J Að Þ¼ ; <sup>B</sup> k k <sup>Φ</sup> � <sup>Φ</sup><sup>e</sup> <sup>2</sup>

<sup>Φ</sup><sup>~</sup> <sup>¼</sup> <sup>Φ</sup> <sup>þ</sup> ΔΦ, <sup>Φ</sup><sup>~</sup> <sup>¼</sup> <sup>U</sup><sup>~</sup> <sup>Σ</sup>~V<sup>~</sup> <sup>T</sup> and <sup>σ</sup>~<sup>1</sup> <sup>≥</sup> <sup>σ</sup>~2…<sup>≥</sup> <sup>σ</sup>~m, <sup>σ</sup>~<sup>k</sup> be the <sup>k</sup>

Theorem 3.1. Hoang and Baraille [19] implies that Φ<sup>o</sup>

elements of the estimate <sup>Φ</sup><sup>b</sup> ð Þ <sup>L</sup> of <sup>Φ</sup> (their number is of order <sup>O</sup> <sup>10</sup><sup>m</sup>�<sup>n</sup> ð ÞÞ.

Xm k¼1 δb ð Þl

Let Φ be a matrix of dimensions ð Þ m � n . For definiteness, let m ≤ n with rankð Þ¼ Φ m. We

<sup>F</sup> <sup>¼</sup> <sup>Φ</sup> � ABT � � �

> Xm k¼rþ1

<sup>o</sup> is a solution to the problem (36) s.t Condition (C) (Theorem 3.1 of Hoang and

σ2

<sup>e</sup> ≔ AoBT

where k k: <sup>F</sup> denotes the Frobenius matrix norm. Consider <sup>Φ</sup> and let <sup>U</sup>ΣV<sup>T</sup> be SVD of <sup>Φ</sup> (5). Let

truncating the SVD of Φ to its first r SVs and singular values. It allows to avoid storing

Let the elements of Φ (or Φb ) be available (may be in algorithmic form). By perturbing stochastically simultaneously all the elements of A and B, one can write out the iterative algorithm for

estimating the elements of A and B. For more detail, see Hoang and Baraille [19].

J Ao ð Þ¼ ; Bo

l¼1

<sup>α</sup>lδbð Þ<sup>l</sup> , <sup>α</sup><sup>l</sup> <sup>≔</sup> <sup>X</sup><sup>n</sup>

On Optimal and Simultaneous Stochastic Perturbations with Application to Estimation of High-Dimensional…

ð Þ <sup>L</sup> y, y <sup>∈</sup>R<sup>m</sup> is performed as

k¼1

, B∈R<sup>n</sup>�<sup>r</sup>

� 2

yk δx ð Þl k

a more compact form

under the constraint

where AoB<sup>T</sup>

Baraille [19]).

5.4.1. Decomposition algorithms

Similarly, computation of zi of <sup>z</sup> <sup>¼</sup> <sup>Φ</sup><sup>b</sup> <sup>T</sup>

5.4. Estimation of decomposition of Φ

<sup>k</sup>¼<sup>1</sup> <sup>ϕ</sup>bikyk <sup>=</sup>

P<sup>n</sup> k¼1 1 L P<sup>L</sup> l¼1 δb ð Þl i δx ð Þl k � �yk, or in

<sup>k</sup> yk, i ¼ 1, …, n (34)

: (33)

http://dx.doi.org/10.5772/intechopen.77273

73

: (35)

<sup>F</sup> ! minð Þ <sup>A</sup>;<sup>B</sup> , (36)

th singular value of Φ~ . Then, we have

<sup>k</sup> (37)

<sup>o</sup> is equal to the matrix formed by

Introduce the notations

$$\overline{m}\_{k} := \mathbb{E}[\Delta f(\Theta\_{0})/\delta\Theta\_{k}],\\m\_{k}(L) := (1/L)\sum\_{l=1}^{L} \eta\_{k}^{(l)},\\\mathbf{e}\_{k} := m\_{k}(L) - \overline{m}\_{k}.\tag{31}$$

Theorem 1 (Hoang and Baraille [19]) states that the estimate m Lð Þ <sup>≔</sup> ð Þ <sup>m</sup>1ð Þ <sup>L</sup> ;…; mn<sup>θ</sup> ð Þ <sup>L</sup> <sup>T</sup> converges to the gradient vector DJð Þ θ<sup>0</sup> as L ! ∞ and c ! 0 with the order Oð Þ 1=L where

$$m(L) \coloneqq (1/L) \sum\_{l=1}^{L} \eta^{(l)}.$$

#### 5.2. Algorithm for estimation of an unknown matrix

Let <sup>Δ</sup> <sup>≔</sup> ð Þ <sup>Δ</sup>1; …;Δ<sup>n</sup> <sup>T</sup>, <sup>Δ</sup>i, i <sup>¼</sup> <sup>1</sup>, …, n be Bernoulli independent and identically distributed (i.i.d.) variables assuming two values �1 with equal probability 1/2. Introduce Δ � ��<sup>1</sup> ≔ ð1=Δ1,…, 1=ΔnÞ <sup>T</sup>, Δ<sup>c</sup> ≔ cΔ, c > 0 is a small positive value.

Algorithm 5.1. Suppose it is possible to compute the product Φx ¼ b xð Þ for a given x. At the beginning let l ¼ 1. Let the value u be assigned to the vector x, i.e., x ≔ u, L be a (large) fixed integer number.

Step 1. Generate Δð Þ<sup>l</sup> whose components are l th samples of the Bernoulli i.i.d. variables assuming two values +/� 1 with equal probabilities 1/2;

$$\text{Step 2. Compute } \delta b^{(l)} = \Phi\left(\mu + \overline{\Delta}\_c^{(l)}\right) - \Phi u \text{ } \overline{\Delta}\_c^{(l)} = c \overline{\Delta}^{(l)}.$$

Step 3. Compute g ð Þl <sup>i</sup> ¼ δb ð Þl <sup>i</sup> <sup>Δ</sup>ð Þ<sup>l</sup> c h i�<sup>1</sup> , δbi is the i th component of δb, g ð Þl <sup>i</sup> is the column vector consisting of derivative of bið Þ u w.r.t. u, i ¼ 1, …, m.

Step 4. Go to Step 1 if l < L. Otherwise, go to Step 5.

Step 5. Compute

 $\widehat{\mathbf{g}}\_{i} = \frac{1}{L} \sum\_{l=1}^{L} \mathbf{g}\_{i}^{(l)}, \mathbf{i} = 1, \dots, m$ ,  $\widehat{\Phi}(L) \coloneqq D\_{x}b = \begin{bmatrix} \widehat{\mathbf{g}}\_{1}, \dots, \widehat{\mathbf{g}}\_{m} \end{bmatrix}^{T}$ .

#### 5.3. Operations with Φ and its transpose

Algorithm 5.1 allows to store Φb ð Þ L as composed from the two ensembles of vectors elements:

$$\begin{aligned} \text{En}\_{\mathsf{L}}(\delta \mathbf{x}) &:= \left[ \delta \mathbf{x}^{(1)}, \dots, \delta \mathbf{x}^{(l)} \right], \delta \mathbf{x}^{(l)} = c \overline{\Delta}^{(l)}, \\ \text{En}\_{\mathsf{L}}(\delta b) &:= \left[ \delta b^{(1)}, \dots, \delta b^{(L)} \right], \delta b^{(l)} = \left( \delta b\_1^{(l)}, \dots, \delta b\_m^{(l)} \right)^T, l = 1, \dots, L. \end{aligned} \tag{32}$$

On Optimal and Simultaneous Stochastic Perturbations with Application to Estimation of High-Dimensional… http://dx.doi.org/10.5772/intechopen.77273 73

The product <sup>z</sup> <sup>¼</sup> <sup>Φ</sup><sup>b</sup> ð Þ <sup>L</sup> <sup>y</sup>, <sup>y</sup><sup>∈</sup> Rn, can be performed as zi <sup>¼</sup> <sup>P</sup><sup>n</sup> <sup>k</sup>¼<sup>1</sup> <sup>ϕ</sup>bikyk <sup>=</sup> P<sup>n</sup> k¼1 1 L P<sup>L</sup> l¼1 δb ð Þl i δx ð Þl k � �yk, or in a more compact form

$$z = \frac{1}{L} \sum\_{l=1}^{L} \alpha\_l \delta b^{(l)},\\ \alpha\_l \coloneqq \sum\_{k=1}^{n} \frac{y\_k}{\delta \mathbf{x}\_k^{(l)}}.\tag{33}$$

Eq. (33) allows to perform z ¼ Φb ð Þ L y with L mð Þþ þ 2n 1 elementary operations. Similarly, computation of zi of <sup>z</sup> <sup>¼</sup> <sup>Φ</sup><sup>b</sup> <sup>T</sup> ð Þ <sup>L</sup> y, y <sup>∈</sup>R<sup>m</sup> is performed as

$$z\_i = \frac{1}{L} \sum\_{l=1}^{L} \frac{1}{\delta x\_i^{(l)}} \sum\_{k=1}^{m} \delta b\_k^{(l)} y\_{k'} \mathbf{i} = 1, \dots, n \tag{34}$$

#### 5.4. Estimation of decomposition of Φ

Let Φ be a matrix of dimensions ð Þ m � n . For definiteness, let m ≤ n with rankð Þ¼ Φ m. We want to find the best approximation for Φ among members of the class of matrices

$$
\Phi\_{\epsilon} = AB^T,\\
A \in \mathbb{R}^{m \times r}, B \in \mathbb{R}^{n \times r}. \tag{35}
$$

under the constraint

<sup>E</sup> <sup>Δ</sup>Jð Þ <sup>θ</sup><sup>0</sup> <sup>=</sup>δθ<sup>k</sup> ½ � <sup>≈</sup> ð Þ <sup>1</sup>=<sup>L</sup> <sup>X</sup>

mk <sup>≔</sup> <sup>E</sup> <sup>Δ</sup>Jð Þ <sup>θ</sup><sup>0</sup> <sup>=</sup>δθ<sup>k</sup> ½ �, mkð Þ <sup>L</sup> <sup>≔</sup> ð Þ <sup>1</sup>=<sup>L</sup> <sup>X</sup>

5.2. Algorithm for estimation of an unknown matrix

72 Perturbation Methods with Applications in Science and Engineering

<sup>T</sup>, Δ<sup>c</sup> ≔ cΔ, c > 0 is a small positive value.

Step 1. Generate Δð Þ<sup>l</sup> whose components are l

Step 2. Compute <sup>δ</sup>bð Þ<sup>l</sup> <sup>¼</sup> <sup>Φ</sup> <sup>u</sup> <sup>þ</sup> <sup>Δ</sup>ð Þ<sup>l</sup>

ð Þl <sup>i</sup> ¼ δb ð Þl <sup>i</sup> <sup>Δ</sup>ð Þ<sup>l</sup> c h i�<sup>1</sup>

5.3. Operations with Φ and its transpose

ing two values +/� 1 with equal probabilities 1/2;

consisting of derivative of bið Þ u w.r.t. u, i ¼ 1, …, m. Step 4. Go to Step 1 if l < L. Otherwise, go to Step 5.

c � �

<sup>i</sup> , i <sup>¼</sup> <sup>1</sup>, …, m, <sup>Φ</sup><sup>b</sup> ð Þ <sup>L</sup> <sup>≔</sup> Dxb <sup>¼</sup> <sup>b</sup>g1;…; <sup>b</sup>gm

EnLð Þ <sup>δ</sup><sup>b</sup> <sup>≔</sup> <sup>δ</sup>bð Þ<sup>1</sup> ;…; <sup>δ</sup>bð Þ <sup>L</sup> h i

Introduce the notations

1=ΔnÞ

integer number.

Step 3. Compute g

Step 5. Compute

<sup>b</sup>gi <sup>¼</sup> <sup>1</sup> L P<sup>L</sup> <sup>l</sup>¼<sup>1</sup> <sup>g</sup> ð Þl L

ηð Þl <sup>k</sup> , <sup>η</sup>ð Þ<sup>l</sup> <sup>k</sup> <sup>≔</sup> <sup>Δ</sup><sup>J</sup>

L

ηð Þl

l¼1

<sup>l</sup>¼<sup>1</sup> <sup>η</sup>ð Þ<sup>l</sup> :

ð Þ<sup>l</sup> ð Þ <sup>θ</sup><sup>0</sup> cΔð Þ<sup>l</sup> k

, (30)

con-

<sup>k</sup> , ek ≔ mkð Þ� L mk: (31)

� ��<sup>1</sup>

<sup>i</sup> is the column vector

,

(32)

, l ¼ 1, …, L:

th samples of the Bernoulli i.i.d. variables assum-

ð Þl

≔ ð1=Δ1,…,

l¼1

Theorem 1 (Hoang and Baraille [19]) states that the estimate m Lð Þ <sup>≔</sup> ð Þ <sup>m</sup>1ð Þ <sup>L</sup> ;…; mn<sup>θ</sup> ð Þ <sup>L</sup> <sup>T</sup>

m Lð Þ <sup>≔</sup> ð Þ <sup>1</sup>=<sup>L</sup> <sup>X</sup><sup>L</sup>

Let <sup>Δ</sup> <sup>≔</sup> ð Þ <sup>Δ</sup>1; …;Δ<sup>n</sup> <sup>T</sup>, <sup>Δ</sup>i, i <sup>¼</sup> <sup>1</sup>, …, n be Bernoulli independent and identically distributed (i.i.d.)

Algorithm 5.1. Suppose it is possible to compute the product Φx ¼ b xð Þ for a given x. At the beginning let l ¼ 1. Let the value u be assigned to the vector x, i.e., x ≔ u, L be a (large) fixed

<sup>c</sup> <sup>¼</sup> <sup>c</sup>Δð Þ<sup>l</sup>

� �<sup>T</sup>

Algorithm 5.1 allows to store Φb ð Þ L as composed from the two ensembles of vectors elements:

, <sup>δ</sup>bð Þ<sup>l</sup> <sup>¼</sup> <sup>δ</sup><sup>b</sup>

.

.

EnLð Þ <sup>δ</sup><sup>x</sup> <sup>≔</sup> <sup>δ</sup>xð Þ<sup>1</sup> ;…; <sup>δ</sup>xð Þ <sup>L</sup> � �, <sup>δ</sup>xð Þ<sup>l</sup> <sup>¼</sup> <sup>c</sup>Δð Þ<sup>l</sup>

� �<sup>T</sup>

ð Þl <sup>1</sup> ; …; <sup>δ</sup>bð Þ<sup>l</sup> m

th component of δb, g

verges to the gradient vector DJð Þ θ<sup>0</sup> as L ! ∞ and c ! 0 with the order Oð Þ 1=L where

variables assuming two values �1 with equal probability 1/2. Introduce Δ

� <sup>Φ</sup>u, <sup>Δ</sup>ð Þ<sup>l</sup>

, δbi is the i

Condition (C) A, B are matrices of dimension <sup>m</sup> � r, r � <sup>n</sup>, <sup>r</sup> <sup>≤</sup> <sup>m</sup>, rank ABT � � <sup>¼</sup> <sup>r</sup>.

Under the condition (C), the optimization problem is formulated as

$$J(A,B) = \left\|\Phi - \Phi\_{\epsilon}\right\|\_{F}^{2} = \left\|\Phi - AB^{T}\right\|\_{F}^{2} \to \min\_{(A,B)\_{\prime}} \tag{36}$$

where k k: <sup>F</sup> denotes the Frobenius matrix norm. Consider <sup>Φ</sup> and let <sup>U</sup>ΣV<sup>T</sup> be SVD of <sup>Φ</sup> (5). Let <sup>Φ</sup><sup>~</sup> <sup>¼</sup> <sup>Φ</sup> <sup>þ</sup> ΔΦ, <sup>Φ</sup><sup>~</sup> <sup>¼</sup> <sup>U</sup><sup>~</sup> <sup>Σ</sup>~V<sup>~</sup> <sup>T</sup> and <sup>σ</sup>~<sup>1</sup> <sup>≥</sup> <sup>σ</sup>~2…<sup>≥</sup> <sup>σ</sup>~m, <sup>σ</sup>~<sup>k</sup> be the <sup>k</sup> th singular value of Φ~ . Then, we have

$$J(A\_o, B\_o) = \sum\_{k=r+1}^{m} \sigma\_k^2 \tag{37}$$

where AoB<sup>T</sup> <sup>o</sup> is a solution to the problem (36) s.t Condition (C) (Theorem 3.1 of Hoang and Baraille [19]).

Theorem 3.1. Hoang and Baraille [19] implies that Φ<sup>o</sup> <sup>e</sup> ≔ AoBT <sup>o</sup> is equal to the matrix formed by truncating the SVD of Φ to its first r SVs and singular values. It allows to avoid storing elements of the estimate <sup>Φ</sup><sup>b</sup> ð Þ <sup>L</sup> of <sup>Φ</sup> (their number is of order <sup>O</sup> <sup>10</sup><sup>m</sup>�<sup>n</sup> ð ÞÞ.

#### 5.4.1. Decomposition algorithms

Let the elements of Φ (or Φb ) be available (may be in algorithmic form). By perturbing stochastically simultaneously all the elements of A and B, one can write out the iterative algorithm for estimating the elements of A and B. For more detail, see Hoang and Baraille [19].

#### 5.4.2. Iterative decomposition algorithm

Another way to decompose the matrix Φ is to solve iteratively the following optimization problems

Element <sup>ϕ</sup>ij <sup>ϕ</sup>bij <sup>ϕ</sup>bijð Þ<sup>1</sup> <sup>ϕ</sup>bijð Þ<sup>2</sup> (1,1) 5.00 4.677 4.914 �0.237 (1,2) 7.00 7.07 7.662 �0.592 (2,1) �2.00 �2.041 �1.08 �0.962 (2,2) �4.00 �4.081 �1.683 �2.398

On Optimal and Simultaneous Stochastic Perturbations with Application to Estimation of High-Dimensional…

http://dx.doi.org/10.5772/intechopen.77273

75

Perturbations Vector Predictor Amplification

svð Þ<sup>1</sup> ð Þ <sup>0</sup>:554; <sup>0</sup>:<sup>833</sup> <sup>T</sup> ð Þ <sup>8</sup>:597; �4:<sup>438</sup> <sup>T</sup> 9.676

svð Þ<sup>2</sup> ð Þ <sup>0</sup>:833; �0:<sup>554</sup> <sup>T</sup> ð Þ <sup>0</sup>:284; <sup>0</sup>:<sup>551</sup> <sup>T</sup> 0.62 xeið Þ<sup>1</sup> ð Þ <sup>0</sup>:962; �0:<sup>275</sup> <sup>T</sup> ð Þ <sup>2</sup>:885; �0:<sup>824</sup> <sup>T</sup> <sup>3</sup> xeið Þ<sup>2</sup> ð Þ <sup>0</sup>:707; �0:<sup>707</sup> <sup>T</sup> ð Þ �1:414; <sup>1</sup>:<sup>414</sup> <sup>T</sup> <sup>2</sup> xschð Þ<sup>1</sup> ð Þ <sup>0</sup>:707; <sup>0</sup>:<sup>707</sup> <sup>T</sup> ð Þ <sup>8</sup>:485; �4:<sup>243</sup> <sup>T</sup> 9.487 xschð Þ<sup>2</sup> ð Þ <sup>0</sup>:707; �0:<sup>707</sup> <sup>T</sup> ð Þ �1:414; <sup>1</sup>:<sup>414</sup> <sup>T</sup> <sup>2</sup> <sup>b</sup>xschð Þ<sup>1</sup> ð Þ <sup>0</sup>:962; �0:<sup>275</sup> <sup>T</sup> ð Þ �8:1; <sup>4</sup>:<sup>4</sup> <sup>T</sup> 9.22 <sup>b</sup>xschð Þ<sup>2</sup> ð Þ �0:275; �0:<sup>962</sup> <sup>T</sup> ð Þ <sup>2</sup>:885; �0:<sup>824</sup> <sup>T</sup> <sup>3</sup>

<sup>n</sup> ð Þ <sup>0</sup>:54; <sup>0</sup>:<sup>842</sup> <sup>T</sup> ð Þ <sup>8</sup>:592; �4:<sup>447</sup> <sup>T</sup> 9.674

<sup>n</sup> ð Þ <sup>0</sup>:372; <sup>0</sup>:<sup>928</sup> <sup>T</sup> ð Þ <sup>8</sup>:358; �4:<sup>457</sup> <sup>T</sup> 9.472

Table 1. Estimates of Φ obtained by Algorithm 5.2.

Figure 1. Estimates of elements of the matrix Φ.

xr

xr

bc ð Þ1

bc ð Þ2

Table 2. Different OPs.

Algorithm 5.2

At the beginning let i ¼ 1.

Step 1. For i ¼ 1, solve the minimization problem

$$J\_1 = \begin{array}{c} \left\| \Phi^1 - ab^T \right\|\_F^2 \to \min\_{a, b \nu} \quad a \in \mathbb{R}^m, b \in \mathbb{R}^n. \\ \Phi^1 := \Phi, \quad \text{rank}\left(ab^T\right) = 1 \end{array}$$

Its solution is denoted as <sup>b</sup>a ið Þ,bb ið Þ.

Step 2. For i < r, put i ≔ i þ 1 and solve the problem

$$J\_{i+1} = \left\| \left. \Phi^i - ab^T \right\| \right\|\_F^2 \to \min\_{a,b\nu} a \in \mathbb{R}^m, b \in \mathbb{R}^n. \ \Phi^i := \Phi - \sum\_{k=1}^{i-1} \widehat{a}(k) \widehat{b}^T(k),$$
 
$$\text{rank}(ab^T) = 1$$

Step 3. If i ¼ r, compute

$$
\widehat{\boldsymbol{\Phi}} = \widehat{\boldsymbol{A}}(\boldsymbol{r}) \widehat{\boldsymbol{B}}^{\top}(\boldsymbol{r}), \\
\widehat{\boldsymbol{A}}(\boldsymbol{r}) = [\widehat{\boldsymbol{a}}(1), \dots, \widehat{\boldsymbol{a}}(\boldsymbol{r})], \\
\widehat{\boldsymbol{B}}(\boldsymbol{r}) = \left[\widehat{\boldsymbol{b}}(1), \dots, \widehat{\boldsymbol{b}}(\boldsymbol{r})\right].
$$

and stop. Otherwise, go to Step 2.

From Theorem 3.2 of Hoang and Baraille [19], the couple A r b ð Þ, B r bð Þ is a solution for the problem (36)(C).

### 6. Numerical example

Consider the matrix Φ ∈R<sup>2</sup>�<sup>2</sup>

$$
\phi\_{11} = \mathbf{5}, \phi\_{12} = \mathbf{7}, \phi\_{21} = -\mathbf{2}, \phi\_{22} = -4.
$$

The singular values and the right SVs for Φ are displayed in Table 1 which are obtained by solving the classical equations for eigenvalues of Φ<sup>T</sup>Φ.

First, we apply Algorithm 5.1 to estimate the matrix Φ. Figure 1 shows the estimates produced by Algorithm 5.1. It is seen that the estimates are converging quickly to the true elements of Φ.

Next Algorithm 5.2 (Iterative Decomposition Algorithm) has been applied to estimate the decomposition of the matrix Φ. After each i th iteration, the algorithm yieds <sup>b</sup><sup>b</sup> ð Þi ,bc ð Þi .

The different OPs are shown in Table 2 where x<sup>r</sup> svð Þi , xeið Þi , and xschð Þi are the theoretical SV-POs, EI-POs, Sch-POs. The vectors <sup>b</sup>xschð Þ<sup>i</sup> are the components of Xt computed by Algorithm 3.1

On Optimal and Simultaneous Stochastic Perturbations with Application to Estimation of High-Dimensional… http://dx.doi.org/10.5772/intechopen.77273 75


Table 1. Estimates of Φ obtained by Algorithm 5.2.

5.4.2. Iterative decomposition algorithm

Step 1. For i ¼ 1, solve the minimization problem

74 Perturbation Methods with Applications in Science and Engineering

Step 2. For i < r, put i ≔ i þ 1 and solve the problem

<sup>b</sup>ð Þ¼ ½ � <sup>b</sup>að Þ<sup>1</sup> ;…;ba rð Þ , B r

solving the classical equations for eigenvalues of Φ<sup>T</sup>Φ.

decomposition of the matrix Φ. After each i

The different OPs are shown in Table 2 where x<sup>r</sup>

<sup>J</sup><sup>1</sup> <sup>¼</sup> <sup>Φ</sup><sup>1</sup> � abT � � �

<sup>F</sup> ! mina, <sup>b</sup>, <sup>a</sup> <sup>∈</sup>Rm, b∈Rn. <sup>Φ</sup><sup>i</sup>

From Theorem 3.2 of Hoang and Baraille [19], the couple A r

� 2

problems

Algorithm 5.2

Jiþ<sup>1</sup> <sup>¼</sup> <sup>Φ</sup><sup>i</sup> � ab<sup>T</sup> � � � � 2

problem (36)(C).

Φb ¼ A r bð ÞBb<sup>T</sup>

Step 3. If i ¼ r, compute

ð Þr , A r

6. Numerical example

Consider the matrix Φ ∈R<sup>2</sup>�<sup>2</sup>

and stop. Otherwise, go to Step 2.

At the beginning let i ¼ 1.

Its solution is denoted as <sup>b</sup>a ið Þ,bb ið Þ.

Another way to decompose the matrix Φ is to solve iteratively the following optimization

<sup>Φ</sup><sup>1</sup> <sup>≔</sup> <sup>Φ</sup>, rank ab<sup>T</sup> � � <sup>¼</sup> <sup>1</sup>

rank ab<sup>T</sup> � � <sup>¼</sup> <sup>1</sup>

bð Þ¼ bbð Þ1 ;…;bb rð Þ h i

ϕ<sup>11</sup> ¼ 5, ϕ<sup>12</sup> ¼ 7, ϕ<sup>21</sup> ¼ �2, ϕ<sup>22</sup> ¼ �4:

The singular values and the right SVs for Φ are displayed in Table 1 which are obtained by

First, we apply Algorithm 5.1 to estimate the matrix Φ. Figure 1 shows the estimates produced by Algorithm 5.1. It is seen that the estimates are converging quickly to the true elements of Φ. Next Algorithm 5.2 (Iterative Decomposition Algorithm) has been applied to estimate the

POs, EI-POs, Sch-POs. The vectors <sup>b</sup>xschð Þ<sup>i</sup> are the components of Xt computed by Algorithm 3.1

<sup>F</sup> ! mina, b, a<sup>∈</sup> Rm, b<sup>∈</sup> Rn:

<sup>≔</sup> <sup>Φ</sup> � <sup>P</sup><sup>i</sup>�<sup>1</sup>

<sup>k</sup>¼<sup>1</sup> <sup>b</sup>a kð Þb<sup>b</sup>

.

th iteration, the algorithm yieds <sup>b</sup><sup>b</sup>

T ð Þk ,

b ð Þ, B r

bð Þ is a solution for the

ð Þi ,bc ð Þi .

svð Þi , xeið Þi , and xschð Þi are the theoretical SV-

Figure 1. Estimates of elements of the matrix Φ.


Table 2. Different OPs.

(Sampling-P). As to <sup>b</sup><sup>c</sup> ð Þi <sup>n</sup> , they are the results of normalization (with the unit Euclidean norm) of <sup>b</sup><sup>c</sup> ð Þi .

En SCH ð Þ. Due to rank deficiency, the sample Md

The ROF is denoted as PEF (SSP).

The second data matrix Md

denoted as PEF (SSP).

estimated coefficients of <sup>b</sup>ckl of Md

lowering of water columns.

7.1.2. Data matrix based on SSP approach

optimization procedure is applied to minimize the distance between the data matrix M<sup>d</sup>

and the structured parametrized ECM M SCH ð Þ¼ Mv ⊗ Mh which is written in the form of the Schur product of two matrices M SCH ð Þ¼ Mvð Þ θ ⊗ Mhð Þ θ . Here, Mv is the vertical ECM, Mh is the horizontal ECM [18]), ð Þ θ is a vector of unknown parameters. Mention that the hypothesis on separability of the vertical and horizontal variables in the ECM is not new in the meteorology [20]. The gain is computed according to Eq. (3) with R ¼ αI, α > 0 is a small positive value.

On Optimal and Simultaneous Stochastic Perturbations with Application to Estimation of High-Dimensional…

SSP method. The SSP samples are simulated in the way similar to that described above for generating En SCH ð Þ, with the difference that the perturbation components <sup>δ</sup>hð Þ<sup>l</sup> ð Þ <sup>i</sup>; <sup>j</sup>; <sup>k</sup> are the i. i.d. random Bernoulli variables assuming two values �1 with the equal probability 1/2. The same optimization procedure has been applied to estimate M SSP ð Þ. The obtained ROF is

Figure 2 shows the evolution of estimates for the gain coefficients kð Þ1 computed from the

and En SSP ð Þ (curve "random"), during model integration. It is seen that two coefficients are evolved in nearly the same manner, of nearly the same magnitude as that of kð Þ1 in the CHF (Cooper-Haines filter, Cooper and Haines [21]). Mention that the CHF is a filter widely used in the oceanic data assimilation, which projects the PE of the surface height data by lifting or

Figure 2. Evolution of estimates for the gain coefficients <sup>k</sup>ð Þ<sup>1</sup> computed from <sup>b</sup>ckl on the basis of En SCH ð Þ (curve "Schur") and En SSP ð Þ (curve "random"), during model integration. It is seen that two coefficients are evolved in nearly the same manner, of nearly the same magnitude as that of kð Þ1 in the CHF. The same picture is obtained for other ck, k ¼ 2; 3; 4.

ð Þ SCH and Md

ð Þ SCH is considered only as a data matrix. The

http://dx.doi.org/10.5772/intechopen.77273

ð Þ SSP on the basis of En SCH ð Þ (curve" schur")

ð Þ SSP is obtained by perturbing the system state according to the

ð Þ SCH

77

Looking at the first OPs, one sees that xr svð Þ<sup>1</sup> , xschð Þ<sup>1</sup> , <sup>b</sup>xschð Þ<sup>1</sup> , and <sup>b</sup><sup>c</sup> ð Þ1 <sup>n</sup> produce almost the same amplification. The first xeið Þ<sup>1</sup> has the amplification three times less than those of <sup>x</sup><sup>r</sup> svð Þ1 , xschð Þ1 . The second xschð Þ<sup>2</sup> is much less opimal than xr svð Þ<sup>2</sup> and <sup>b</sup><sup>c</sup> ð Þ2 <sup>n</sup> . By comparing xr svð Þ<sup>i</sup> with <sup>b</sup><sup>c</sup> ð Þi <sup>n</sup> for i ¼ 1, 2, one concludes that the obtained results justify the correctness of Theorem 3.1 of Hoang and Baraille [19]. Mention that only <sup>b</sup>xschð Þ<sup>i</sup> and <sup>b</sup><sup>c</sup> ð Þi <sup>n</sup> can be calculated for HdS.

In Table 1, we show the results obtained by Algorithm 5.2 after two consecutive iterations (matrix estimation in <sup>R</sup><sup>1</sup> subspace). The elements of the true <sup>Φ</sup> <sup>¼</sup> <sup>ϕ</sup>ij h i are displayed in the second column, whereas their estimates—in the third column,

$$\widehat{\boldsymbol{\Phi}} = \widehat{\boldsymbol{\Phi}}^{(1)} + \widehat{\boldsymbol{\Phi}}^{(2)} = \sum\_{i=1}^{2} \widehat{\boldsymbol{b}}^{(i)} \widehat{\boldsymbol{c}}^{(i),T} \widehat{\boldsymbol{\Phi}}^{(i)} \\ \coloneqq \left[ \widehat{\boldsymbol{\phi}}\_{i\boldsymbol{j}}^{(i)} \right] = \widehat{\boldsymbol{b}}^{(i)} \widehat{\boldsymbol{c}}^{(i),T}$$

The estimates, resulting from the first iteration, are the elements of <sup>Φ</sup><sup>b</sup> ð Þ<sup>1</sup> (Table 1, column 4). After the first iteration, <sup>Φ</sup>ð Þ<sup>2</sup> <sup>≔</sup> <sup>Φ</sup> � <sup>Φ</sup><sup>b</sup> ð Þ<sup>1</sup> <sup>¼</sup> <sup>b</sup>ð Þ<sup>2</sup> <sup>c</sup>ð Þ<sup>2</sup> ,T and the optimization yields the estimates <sup>ϕ</sup>bð Þ<sup>2</sup> ij displayed in the column 5. From the columns 4–5, one sees that the first iteration allows to well estimate the two biggest elements Φ<sup>11</sup> ¼ 5, Φ<sup>12</sup> ¼ 7. In the similar way, the second iteration captures the two biggest elements of Φð Þ<sup>2</sup> .
