**3. Optimal interpolation (OI) and Kalman filter (KF)**

#### **3.1. Optimal interpolation**

The most special case in data assimilation is that the forecast model is linear and the errors are Gaussian. The solution among sequential methods to this case is provided by Kalman filter. Typically, the Kalman filter applies to the below state-space model:

$$\mathbf{x}\_{\iota+1} = M\mathbf{x}\_{\iota} + \eta\_{\iota},\tag{14}$$

$$\mathbf{y}\_{\iota} = H\mathbf{x}\_{\iota} + \boldsymbol{\zeta}\_{\iota},\tag{15}$$

where *M* and *H* are linear operators of model and measurement, respectively. *x* is model state and *y* is the observation, and the subscript implies the time step. and are the model errors and observational errors, respectively, which have variance: . The objective here is to estimate model state *x* using *y*, making it close to true state (unknown) as much as possible.

Assuming the estimate of model state at a time step is a linear combination of model forecast and observation , i.e. the filter itself is linear, so

$$\mathbf{x}^a = \mathbf{x}^b + K \left[ \mathbf{y}^o - H\mathbf{x}^b \right]. \tag{16}$$

Eq. (16) is the standard expression of Kalman filter. *K* is called Kalman gain that determines the optimal estimate and − is called the innovation. An analysis step is essentially to determine the increment to the forecast by combining the Kalman gain and the innovation. Before deriving *K*, we denote the covariance matrix of the analysis error *<sup>a</sup>* by , i.e. *Pa* = < *<sup>a</sup>* , ( *a* ) *<sup>T</sup>* >, where *<sup>a</sup>* = − and is the true value of model state. Similarly, observed errors and forecast errors are defined by *<sup>o</sup>* = − and *<sup>b</sup>* = − , respectively. It should be noticed that the forecast error *<sup>b</sup>* is different from the model error that is a systematic bias. Also, we denote *B* = < *<sup>b</sup>* , ( *<sup>b</sup>* )*<sup>T</sup>* > as the background (forecast) error covariance and *R* = < *<sup>o</sup>* , ( *<sup>o</sup>* )*<sup>T</sup>* > as the observational error covariance. It is also assumed that the observation error is not related to forecast error, so < *<sup>b</sup>* , ( *<sup>o</sup>* )*<sup>T</sup>* > = < *<sup>o</sup>* , ( *<sup>b</sup>* )*<sup>T</sup>* > = 0.

Clearly, we are seeking for *K* that can lead to minimum. Subtracting on both sides of Eq. (16) leads to the below equation:

$$\mathbf{x}^{a} - \mathbf{x}^{\nu} = \mathbf{x}^{b} - \mathbf{x}^{\nu} + K \left[ \mathbf{y}^{o} - H\mathbf{x}^{b} + H\mathbf{x}^{\nu} - H\mathbf{x}^{\nu} \right]. \tag{17}$$

Namely,

**3. Optimal interpolation (OI) and Kalman filter (KF)**

Typically, the Kalman filter applies to the below state-space model:

and *y* is the observation, and the subscript implies the time step.

close to true state (unknown) as much as possible.

and observation , i.e. the filter itself is linear, so

and

)*<sup>T</sup>* > = < *<sup>o</sup>*

The most special case in data assimilation is that the forecast model is linear and the errors are Gaussian. The solution among sequential methods to this case is provided by Kalman filter.

(14)

(15)

and

ë û (16)

is the true value of model state. Similarly, observed errors and

ë û (17)

. The objective here is to estimate model state *x* using *y*, making it

are the model errors

by , i.e. *Pa*

, ( *<sup>o</sup>*

on both sides of Eq.

, respectively. It should be noticed

that is a systematic bias. Also, we

 = < *<sup>a</sup>* ,

)*<sup>T</sup>* > as the

<sup>1</sup> , *t tt x Mx* <sup>+</sup> = +h

, *t tt y Hx* = +z

and observational errors, respectively, which have variance:

where *M* and *H* are linear operators of model and measurement, respectively. *x* is model state

Assuming the estimate of model state at a time step is a linear combination of model forecast

Eq. (16) is the standard expression of Kalman filter. *K* is called Kalman gain that determines the optimal estimate and − is called the innovation. An analysis step is essentially to determine the increment to the forecast by combining the Kalman gain and the innovation.

and *<sup>b</sup>*

)*<sup>T</sup>* > as the background (forecast) error covariance and *R* = < *<sup>o</sup>*

observational error covariance. It is also assumed that the observation error is not related to

= −

. *ab o b x x K y Hx* =+ - é ù

Before deriving *K*, we denote the covariance matrix of the analysis error *<sup>a</sup>*

= −

, ( *<sup>b</sup>*

Clearly, we are seeking for *K* that can lead to minimum. Subtracting

is different from the model error

)*<sup>T</sup>* > = 0.

. *a tr b tr o b tr tr x x x x K y Hx Hx Hx* - =-+ - + - é ù

**3.1. Optimal interpolation**

158 Nonlinear Systems - Design, Analysis, Estimation and Control

( *a* )

*<sup>T</sup>* >, where *<sup>a</sup>*

denote *B* = < *<sup>b</sup>*

= −

, ( *<sup>o</sup>*

forecast errors are defined by *<sup>o</sup>*

, ( *<sup>b</sup>*

(16) leads to the below equation:

that the forecast error *<sup>b</sup>*

forecast error, so < *<sup>b</sup>*

$$
\epsilon^a = \epsilon^b + K \left( \epsilon^o - H \epsilon^b \right),
\tag{18}
$$

And

$$\begin{split} \boldsymbol{P}^{o} &= \boldsymbol{E} \left[ \boldsymbol{\epsilon}^{b} + \boldsymbol{K} \left( \boldsymbol{\epsilon}^{o} - \boldsymbol{H} \boldsymbol{\epsilon}^{b} \right) \right] \left[ \boldsymbol{\epsilon}^{b} + \boldsymbol{K} \left( \boldsymbol{\epsilon}^{o} - \boldsymbol{H} \boldsymbol{\epsilon}^{b} \right) \right]^{\mathsf{T}} \\ &= \boldsymbol{E} \left[ \boldsymbol{\epsilon}^{b} \left( \boldsymbol{\epsilon}^{b} \right)^{\mathsf{T}} + \boldsymbol{\epsilon}^{b} \left( \boldsymbol{\epsilon}^{o} - \boldsymbol{H} \boldsymbol{\epsilon}^{b} \right)^{\mathsf{T}} \boldsymbol{K}^{\mathsf{T}} + \boldsymbol{K} \left( \boldsymbol{\epsilon}^{o} - \boldsymbol{H} \boldsymbol{\epsilon}^{b} \right) \left( \boldsymbol{\epsilon}^{b} \right)^{\mathsf{T}} + \boldsymbol{K} \left( \boldsymbol{\epsilon}^{o} - \boldsymbol{H} \boldsymbol{\epsilon}^{b} \right) \left( \boldsymbol{\epsilon}^{o} - \boldsymbol{H} \boldsymbol{\epsilon}^{b} \right)^{\mathsf{T}} \boldsymbol{K}^{\mathsf{T}} \right] \\ &= \boldsymbol{B} - \boldsymbol{B} \boldsymbol{H}^{\mathsf{T}} \boldsymbol{K}^{\mathsf{T}} - \boldsymbol{K} \boldsymbol{H} \boldsymbol{B} + \boldsymbol{K} \left( \boldsymbol{R} + \boldsymbol{H} \boldsymbol{\epsilon} \boldsymbol{H}^{\mathsf{T}} \right) \boldsymbol{K}^{\mathsf{T}}. \end{split} \tag{19}$$

Here, we used =. The optimal estimate asks the trace of minimum, namely, <sup>∂</sup> / ∂ = 0. It can be computed that

$$K = BH^T \left(HBH^T + R\right)^{-1}.\tag{20}$$

Substitute into (13)

$$P^{\omega} = B - BH^T K^T - KHB + BH^T (HBH^T + R)^{(-l)} (R + HBH^T) K^T = (I - KH)B. \tag{21}$$

Here, we invoked the below properties:

$$\frac{\partial A\mathbf{x}}{\partial \mathbf{x}^{\mathrm{T}}} = \frac{\partial \mathbf{x}^{\mathrm{T}}A}{\partial \mathbf{x}} = A \tag{22}$$

$$\frac{\partial \mathbf{x}^{\top} A \mathbf{x}}{\partial \mathbf{x}} = \left( A + A^{\top} \right) \mathbf{x} \tag{23}$$

$$\frac{\partial \mathbf{A}^{r} \mathbf{x}}{\partial \mathbf{x}^{r}} = \frac{\partial \mathbf{x}^{r} \mathbf{A}^{r}}{\partial \mathbf{x}} = \mathbf{A}^{r} \tag{24}$$

$$\frac{\partial \left( \text{trace} [XXX^{\top}] \right)}{\partial \mathbf{x}} = X \left( A + A^{\top} \right) \tag{25}$$

$$\frac{\partial \text{(trace} [AX^T])}{\partial x} = A^T \tag{26}$$

Thus, we have the optimal estimate filter:

$$\mathbf{x}^a = \mathbf{x}^b + K \left[ \mathbf{y}^o - H\mathbf{x}^b \right],\tag{27}$$

$$K = BH^{\top} \left( HBH^{\top} + R \right)^{-1},\tag{28}$$

$$P^a = (I - KH)B.\tag{29}$$

In the estimate (27)–(29), if the background error covariance *B* is prescribed, the estimate is called optimal interpolation. The OI does not involve state equation (14) and *B* is unchanged during the entire assimilation process.

#### **3.2. Kalman filter**

Now, we consider that *B* in (28) changes with the assimilation cycle. This is more realistic since the model prediction errors should be expected to decrease with the assimilation.

From Eq. (14), we have

$$\mathbf{x}\_{\iota+1}^{\mu} = M\mathbf{x}\_{\iota}^{\mu} + \eta\_{\iota},\tag{30}$$

$$\mathbf{x}\_{l+1}^{o} = E(M\mathbf{x}\_{l}^{o} + \eta\_{l}) = M\mathbf{x}\_{l}^{o} \tag{31}$$

Eq. (30) indicates that even the true value is input at a time step, model cannot get a true value for next step due to model bias . Eq. (31) shows a standard procedure for the model prediction of next step starting from the analysis of previous step.

Subtracting (30) from (31) produces

$$
\epsilon\_{\iota+1}^{\flat} = M \,\epsilon\_{\iota}^{\flat} - \eta\_{\iota}, \tag{32}
$$

$$B\_{t+1} = E\left(\boldsymbol{\varepsilon}\_{t+1}^b \left(\boldsymbol{\varepsilon}\_{t+1}^b\right)^T\right) = E[(M\boldsymbol{\varepsilon}\_t^a + \eta\_t)(M\boldsymbol{\varepsilon}\_t^a + \eta\_t)^T] = MP\_t^a M^T + \underline{Q} \tag{33}$$

where *P*<sup>t</sup> a = < *<sup>t</sup> a* , ( *<sup>t</sup> a* ) *<sup>T</sup>* > represents the analysis error covariance for time step *t*. The above equation considers the evolution of the background (prediction) error covariance with the time controlled by the dynamical model operator *M*. The above equations constitute the framework of Kalman filter (**Table 1**), namely


**Table 1.** The Kalman filter.

(trace[ ]) *<sup>T</sup> AX <sup>T</sup> <sup>A</sup> <sup>x</sup>*

, *ab o b x x K y Hx* =+ - é ù

In the estimate (27)–(29), if the background error covariance *B* is prescribed, the estimate is called optimal interpolation. The OI does not involve state equation (14) and *B* is unchanged

Now, we consider that *B* in (28) changes with the assimilation cycle. This is more realistic since

the model prediction errors should be expected to decrease with the assimilation.

<sup>1</sup> , *tr tr t tt x Mx* <sup>+</sup> = +

<sup>1</sup> ( ) *ba a t tt t x E Mx Mx* <sup>+</sup> = += h

<sup>1</sup> , *b a*

h

 ò*E M M MP M Q tt tt t*

 h

 ò*t tt* <sup>+</sup> = - *M*

1 11 ( ) ( ) [( )( ) ] + ++ = = + += + *<sup>T</sup> b b <sup>a</sup> a T aT B E t tt*

h

 ò

ò

Eq. (30) indicates that even the true value is input at a time step, model cannot get a true value

h

Thus, we have the optimal estimate filter:

160 Nonlinear Systems - Design, Analysis, Estimation and Control

during the entire assimilation process.

**3.2. Kalman filter**

From Eq. (14), we have

for next step due to model bias

Subtracting (30) from (31) produces

of next step starting from the analysis of previous step.

òò

¶ <sup>=</sup> ¶ (26)

<sup>1</sup> ( ), *T T K BH HBH R* - = + (28)

( ). *<sup>a</sup> P I KH B* = - (29)

ë û (27)

(30)

(31)

(32)

(33)

. Eq. (31) shows a standard procedure for the model prediction

One Kalman filter cycle consists of two parts, namely, one analysis step (Eqs. (27)–(29)) and one prediction step (Eqs. (31) and (33)). The analysis state and covariance are treated as initial conditions for the prediction step, until the next observation is available. Sometimes, is denoted by in Kalman filter literatures.

#### **3.3. Extended Kalman filter (EKF)**

In deriving the Kalman filter, we assume the state model *M* and measurement model *H* are both linear. Further, we also assume the error has Gaussian distribution. Therefore, classic KF only works for linear models and Gaussian distribution. If the dynamical model and measurement model are not linear, we cannot directly apply KF. Instead, linearization must be performed prior to apply KF. The linearized version of KF is called extended KF (EKF), which solves the below state-space estimate problem:

$$\mathbf{x}\_{t+1} = f\left(\mathbf{x}\_t\right) + \eta\_t,\tag{34}$$

$$\mathbf{h}\_{\nu}\mathbf{y}\_{\nu} = h\left(\mathbf{x}\_{\nu}\right) + \boldsymbol{\zeta}\_{\nu},\tag{35}$$

where *f* and *h* are nonlinear models, and and are additive noises.

The filter is still assumed to be 'linear', i.e.

$$\mathbf{x}^a = \mathbf{x}^b + K[\mathbf{y}^o - h(\mathbf{x}^b)] \tag{36}$$

Actually, it is not a linear combination of the forecast and observation if his not linear. However, we just extend the formulation of Eq. (16), and apply it intuitively in nonlinear cases. Ignoring high-order terms, the following holds approximately

$$h\left(\mathbf{x} + \delta \mathbf{x}\right) = h\left(\mathbf{x}\right) + \frac{\partial h}{\partial \mathbf{x}} \delta \mathbf{x} = h\left(\mathbf{x}\right) + H \delta \mathbf{x} \tag{37}$$

where *H* is the linearization of *h* and , <sup>=</sup> ∂ℎ ∂ . So,

$$\mathbf{y}^o \cdot \mathbf{y}^o - h\left(\mathbf{x}^b\right) = \mathbf{y}^o - h\left(\mathbf{x}^\nu + \mathbf{x}^b - \mathbf{x}^\nu\right) = \mathbf{y}^o - h(\mathbf{x}^\nu) - H(\mathbf{x}^b - \mathbf{x}^\nu) = \epsilon^o - H\epsilon^b \tag{38}$$

$$\boldsymbol{\mathfrak{x}}^a = \boldsymbol{\mathfrak{x}}^b + K(\boldsymbol{\varepsilon}^o - H\boldsymbol{\varepsilon}^b) \tag{39}$$

Eq. (39) is identical to Eq. (16). Similarly, subtracting on both sides of Eq. (47) leads to the below equation:

$$
\epsilon^a = \epsilon^b + K \left( \epsilon^o - H \epsilon^b \right) \tag{40}
$$

which is the same as Eq. (18). Following the same derivation as that for Eq. (18), we can obtain the equations similar to (27)–(29). Therefore, if the measurement model *h* is nonlinear, the KF can be still applied with a linearization of *h*.

Similar to Eqs. (30) and (31), the state model is as below:

$$\mathbf{x}\_{t\ast 1}^{\mu} = f\left(\mathbf{x}\_{t}^{\mu}\right) + \eta\_{t} \tag{41}$$

$$\mathbf{x}\_{\iota+1}^{\iota'} = E\left(f\left(\mathbf{x}\_{\iota}^{\iota}\right) + \eta\_{\iota}\right) = f\left(\mathbf{x}\_{\iota}^{\iota}\right). \tag{42}$$

Subtracting Eq. (41) from Eq. (42) produces

$$\begin{split} \epsilon\_{l+1}^{f} &= f\left(\mathbf{x}\_{l}^{a}\right) - f\left(\mathbf{x}\_{l}^{\nu}\right) - \eta\_{l} = f\left(\mathbf{x}\_{l}^{a}\right) - f\left(\mathbf{x}\_{l}^{\nu} - \mathbf{x}\_{l}^{a} + \mathbf{x}\_{l}^{a}\right) - \eta\_{l} \\ &= f\left(\mathbf{x}\_{l}^{a}\right) - f\left(\mathbf{x}\_{l}^{a} - \boldsymbol{\epsilon}\_{l}^{a}\right) - \eta\_{l} = M\boldsymbol{\epsilon}\_{l}^{a} - \eta\_{l} \end{split} \tag{43}$$

where , <sup>=</sup> ∂ ∂ .

[ ( )] *ab o b x x Ky hx* =+ - (36)

¶ (37)

ò

=+ - *K H* (40)

 ò

(39)

(41)

(42)

 h

(43)

 h

 ò (38)

Actually, it is not a linear combination of the forecast and observation if his not linear. However, we just extend the formulation of Eq. (16), and apply it intuitively in nonlinear cases.

( ) () ( ) *<sup>h</sup> hx x hx x hx H x <sup>x</sup>*

() ( ) () ( ) *o b o tr b tr o tr b tr o b y h x y h x x x y hx H x x H* - =- +- =- - - =-

( ) *ab o b xxK H* =+ ò

( ) *ab o b*

<sup>1</sup> ( ) *tr tr t tt x fx* <sup>+</sup> = +

<sup>1</sup> ( ( ) ) ( ). *fa a t tt t x Efx fx* <sup>+</sup> = += h

<sup>1</sup> () () () ( ) () ( ) *f a tr a tr a a t t t t t ttt t a aa a t tt t t t*

<sup>+</sup> = - -= - - + - = - - -= -

h

*fx fx fx fx x x fx fx M*

ò

 ò

òò

 ò

 ò

Eq. (39) is identical to Eq. (16). Similarly, subtracting on both sides of Eq. (47) leads to the

which is the same as Eq. (18). Following the same derivation as that for Eq. (18), we can obtain the equations similar to (27)–(29). Therefore, if the measurement model *h* is nonlinear, the KF

h

h

+= + = +

∂ℎ ∂ . So,

dd¶

Ignoring high-order terms, the following holds approximately

d

where *H* is the linearization of *h* and , <sup>=</sup>

162 Nonlinear Systems - Design, Analysis, Estimation and Control

can be still applied with a linearization of *h*.

Subtracting Eq. (41) from Eq. (42) produces

ò

Similar to Eqs. (30) and (31), the state model is as below:

below equation:

Comparing Eq. (31) with Eq. (33), it reveals that Eq. (33) still works. Thus, the EKF can be summarized as below (**Table 2**).

The procedure to perform EKF is similar to that for KF, as listed above. The disparities and similarities between EKF and KF include


It should be noticed that EKF is only an approximate of KF for nonlinear state model.


**Table 2.** The extended Kalman filter.

## **4. Ensemble Kalman filter (EnKF)**

#### **4.1. Basics of EnKF**

A challenge in EKF is to update background (prediction) error covariance, which requires the linearization of nonlinear model. The linearization of nonlinear model is often difficult technically, and even intractable in some cases, e.g. non-continuous functions existing in models. Another drawback of EKF is to neglect the contributions from higher-order statistical moments in calculating the error covariance.

To avoid the linearization of nonlinear model, the ensemble Kalman filter (EnKF) was introduced by Evensen [10, 11], in which the prediction error covariance B of Eq. (33) are estimated approximately using an ensemble of model forecasts. The main concept behind the formulation of the EnKF is that if the dynamical model is expressed as a stochastic differential equation, the prediction error statistics, which are described by the Fokker-Flank equation, can be estimated using ensemble integrations ( [10, 12]; thus, the error covariance matrix B can be calculated by integrating the ensemble of model states. The EnKF can overcome the EKF drawback that neglects the contributions from higher-order statistical moments in calculating the error covariance. The major strengths of the EnKF include the following:


EnKF has attracted a broad attention and been widely used in atmospheric and oceanic data assimilation.

Simply saying, EnKF avoids the computation and evolution of the error covariance B as in Eq. (33), and computes *B* using below formula as soon as it is required.

$$B = \frac{1}{N-1} \sum\_{i=1}^{N} (\mathbf{x}\_i^b - \overline{\mathbf{x}^b})(\mathbf{x}\_i^b - \overline{\mathbf{x}^b})^T \tag{44}$$

where represents the *i*-th member of the forecast ensemble of system state vector at step *t*, and *N* is the ensemble size. The use of Eq. (44) avoids processing *M*, the linearized operator of nonlinear model. However, the measurement function *H* is still linear or linearized while computing the Kalman gain *K*, which causes concern. To avoid the linearization of nonlinear measurement function, Houtekamer and Mitchell [14] wrote Kalman gain by

$$K = BH^T \left(HBH^T + R\right)^{-1},\tag{45}$$

$$BH^{\top} \equiv \frac{1}{N-1} \sum\_{i=1}^{N} [\mathbf{x}\_i^b - \overline{\mathbf{x}^b}] [h(\mathbf{x}\_i^b) - \overline{h(\mathbf{x}^b)}]^r,\tag{46}$$

$$HBH^T \equiv \frac{1}{N-1} \sum\_{i=1}^{N} [h(\mathbf{x}\_i^b) - \overline{h(\mathbf{x}^b)}] [h(\mathbf{x}\_i^b) - \overline{h(\mathbf{x}^b)}]^T,\tag{47}$$

where ℎ() <sup>=</sup> <sup>1</sup> ∑ = 1 ℎ( ). Eqs. (46) and (47) allow direct evaluation of the nonlinear measurement function *h* in calculating Kalman gain. However, Eqs. (46) and (47) have not been proven mathematically, and only were given intuitionally. Tang and Ambadan argued that Eqs. (46) and (47) approximately hold if and only if ℎ() = ℎ() and − is small for = 1, 2, ..., [15]. Under these conditions, Tang et al. argued Eqs. (46) and (47) actually linearize the nonlinear measurement functions *h* to *H* [16]. Therefore, direct application of the nonlinear measurement function in Eqs. (46) and (47), in fact, imposes an implicit linearization process using ensemble members. In next section, we will see that Eqs. (46) and (47) can be modified under a rigorous framework.

Thus, the procedures of EnKF are summarized as below (**Table 3**):


models. Another drawback of EKF is to neglect the contributions from higher-order statistical

To avoid the linearization of nonlinear model, the ensemble Kalman filter (EnKF) was introduced by Evensen [10, 11], in which the prediction error covariance B of Eq. (33) are estimated approximately using an ensemble of model forecasts. The main concept behind the formulation of the EnKF is that if the dynamical model is expressed as a stochastic differential equation, the prediction error statistics, which are described by the Fokker-Flank equation, can be estimated using ensemble integrations ( [10, 12]; thus, the error covariance matrix B can be calculated by integrating the ensemble of model states. The EnKF can overcome the EKF drawback that neglects the contributions from higher-order statistical moments in calculating

**i.** there is no need to calculate the tangent linear model or Jacobian of nonlinear models,

**ii.** the covariance matrix is propagated in time via fully nonlinear model equations (no

EnKF has attracted a broad attention and been widely used in atmospheric and oceanic data

Simply saying, EnKF avoids the computation and evolution of the error covariance B as in Eq.

and *N* is the ensemble size. The use of Eq. (44) avoids processing *M*, the linearized operator of nonlinear model. However, the measurement function *H* is still linear or linearized while computing the Kalman gain *K*, which causes concern. To avoid the linearization of nonlinear

> <sup>1</sup> [ ][ ( ) ( )] , <sup>1</sup> *T N b b b bT i i <sup>i</sup> BH x x hx hx*

<sup>1</sup> [ ( ) ( )][ ( ) ( )] <sup>1</sup> , *<sup>T</sup> <sup>N</sup> b b b bT i i <sup>i</sup> HBH hx hx hx hx*

represents the *i*-th member of the forecast ensemble of system state vector at step *t*,

*<sup>N</sup>* <sup>=</sup> = -- - å (44)

<sup>1</sup> ( ), *T T K BH HBH R* - = + (45)

*<sup>N</sup>* <sup>=</sup> º -- - å (46)

*<sup>N</sup>* <sup>=</sup> º -- - å (47)

which is extremely difficult for ocean (or atmosphere) general circulation models

the error covariance. The major strengths of the EnKF include the following:

**iii.** it is well suited to modern parallel computers (cluster computing) [13].

1 <sup>1</sup> ( )( ) <sup>1</sup> *N b b b bT*

measurement function, Houtekamer and Mitchell [14] wrote Kalman gain by

1

1

*i i <sup>i</sup> B x xx x*

linear approximation as in the EKF); and

(33), and computes *B* using below formula as soon as it is required.

moments in calculating the error covariance.

164 Nonlinear Systems - Design, Analysis, Estimation and Control

(GCMs);

assimilation.

where

$$\mathbf{x}\_{i}^{a} = \mathbf{x}\_{i}^{b} + K \left[ \mathbf{y}^{o} + \mathbf{z}^{i} - h \left( \mathbf{x}\_{i}^{b} \right) \right],\tag{48}$$

after *K* is obtained. It should be noted that each ensemble member produces an analysis; the average of all (*N*) analyses can be obtained.


**Table 3.** The ensemble Kalman filter.

It should be noted that the observation should be treated as a random variable with the mean equal to *y*<sup>o</sup> and covariance equal to *R*. This is why there is in Eq. (48). Simply, is often drawn from a normal distribution ∼ (0, ).

From the above procedure, we find that Eq. (44) is not directly applied here. Instead, we use Eqs. (46) and (47) to calculate *K*. This is because Eqs. (46) and (47) avoid the linearization of nonlinear model and also avoid the explicit expression of matrix *B*, which is often very large and cannot be written in current computer sources in many realistic problems. The measurement function, *h*, projecting model space (dimension) to observation space (dimension), greatly reduces the number of dimension.

## **4.2. Some remarks on EnKF with large dimensional problems**

## *4.2.1. Initial perturbation*

The success of EnKF highly depends on the quality of ensemble members produced by initial perturbations. It is impractical to represent all possible types of errors within the ensemble because of the computational cost, the method of generating initial perturbations must be chosen judiciously.

The first issue is the amplitude of initial perturbations. Usually, the following two factors are considered when selecting the amplitude of initial perturbations: the amplitude of observation error and the amplitude of model errors induced by model parameters and uncertainty in model physics. If a model is perfect, the amplitude of the perturbations should be the same as the amplitude of observation errors. This combined error is used to determine the amplitude of perturbations.

When the perturbation amplitude is determined, the practical initial perturbation field generating each ensemble member could be constructed by a normalized pseudorandom field multiplied by the prescribed amplitude. Considering the spatial coherence, the pseudorandom field is red noise as proposed by Evensen [17], summarized as below:

**1.** Calculate the statistical characteristics for the pseudorandom field to meet its variance of 1 and mean of 0 by solving the following nonlinear equation:

$$e^{-1} = \frac{\sum\_{l,p} e^{-2(k\_l^{\hat{\nu}} + r\_p^{\hat{\nu}}) \cdot \sigma^2} \cos(k\_l r\_h)}{\sum\_{l,p} e^{-2(k\_l^{\hat{\nu}} + r\_p^{\hat{\nu}}) \cdot \sigma^2}},\tag{49}$$

where <sup>=</sup> 2 <sup>=</sup> 2 Δ, <sup>=</sup> 2 <sup>=</sup> 2 Δ, and and are the number of grid points in the *x*-axis (lon.) and the *y*-axis (lat.). The *l* and *p* are wavenumbers, changing from 1 to the maximum value of /2 and /2. Δ and Δ are the intervals of two adjacent points, often set to 1, and ℎ is the decorrelation length. The purpose of Eq. (49) is to derive 2 for the other feature:

An Introduction to Ensemble-Based Data Assimilation Method in the Earth Sciences http://dx.doi.org/10.5772/64718 167

$$\mathcal{L}^2 = \frac{1}{\Delta k \sum\_{l,p} e^{-2(k\_l^2 + r\_p^2)/\sigma^2}} \tag{50}$$

**2.** After *c* and 2 are obtained, we can construct a two-dimensional pseudorandom field:

It should be noted that the observation should be treated as a random variable with the mean

From the above procedure, we find that Eq. (44) is not directly applied here. Instead, we use Eqs. (46) and (47) to calculate *K*. This is because Eqs. (46) and (47) avoid the linearization of nonlinear model and also avoid the explicit expression of matrix *B*, which is often very large and cannot be written in current computer sources in many realistic problems. The measurement function, *h*, projecting model space (dimension) to observation space (dimension), greatly

The success of EnKF highly depends on the quality of ensemble members produced by initial perturbations. It is impractical to represent all possible types of errors within the ensemble because of the computational cost, the method of generating initial perturbations must be

The first issue is the amplitude of initial perturbations. Usually, the following two factors are considered when selecting the amplitude of initial perturbations: the amplitude of observation error and the amplitude of model errors induced by model parameters and uncertainty in model physics. If a model is perfect, the amplitude of the perturbations should be the same as the amplitude of observation errors. This combined error is used to determine the amplitude

When the perturbation amplitude is determined, the practical initial perturbation field generating each ensemble member could be constructed by a normalized pseudorandom field multiplied by the prescribed amplitude. Considering the spatial coherence, the pseudorandom

**1.** Calculate the statistical characteristics for the pseudorandom field to meet its variance of

22 2

2( )/

*k r l h l p k r*


,

*l p <sup>e</sup> k r <sup>e</sup> e*

> <sup>=</sup> 2


22 2

in the *x*-axis (lon.) and the *y*-axis (lat.). The *l* and *p* are wavenumbers, changing from 1 to the maximum value of /2 and /2. Δ and Δ are the intervals of two adjacent points, often set to 1, and ℎ is the decorrelation length. The purpose of Eq. (49) is to derive 2 for

s

Δ, and and

cos( ) ,

(49)

are the number of grid points

2( )/

s

*l p*

field is red noise as proposed by Evensen [17], summarized as below:

1 and mean of 0 by solving the following nonlinear equation:

1 ,


Δ, <sup>=</sup> 2

in Eq. (48). Simply,

is often drawn

and covariance equal to *R*. This is why there is

**4.2. Some remarks on EnKF with large dimensional problems**

from a normal distribution ∼ (0, ).

166 Nonlinear Systems - Design, Analysis, Estimation and Control

reduces the number of dimension.

*4.2.1. Initial perturbation*

chosen judiciously.

of perturbations.

where <sup>=</sup> 2

the other feature:

<sup>=</sup> 2

equal to *y*<sup>o</sup>

$$W\left(\mathbf{x}\_n, \mathbf{y}\_n\right) = \sum\_{l, p} \frac{c}{\sqrt{\Delta k}} e^{-\frac{\left(k\_l^2 + r\_p^2\right)}{\sigma^2}} e^{2\pi i \phi\left(l, p\right)} e^{i\left(k\_l x\_n + r\_p y\_n\right)} \Delta k. \tag{51}$$

**3.** While , cover the whole domain, Eq. (51) produces a \* two-dimensional random field with spatial coherence structure and the variance of 1 and mean of 0. If the realistic uncertainty (error) has an amplitude *β*, the perturbation should be *βW*. Similarly, Eq. (51) is often used for the error perturbation used in the fourth step of the EnKF procedure.

Sometimes, we need to consider the vertical coherence of pseudorandom fields between adjacent levels in oceanic models. A simple method for this purpose is to construct the pseudorandom field at the *k*th level by following equation:

$$
\varepsilon\_k = \alpha \varepsilon\_{k-1} + \sqrt{1 - \alpha^2} W\_k,\tag{52}
$$

where = 1, ..., is the pseudorandom field at the *k*th level without considering vertical coherence, constructed using the above method. Initially, for the surface perturbation ( = 1), the vertical coherence is not considered, i.e. equals to zero since <sup>1</sup> does not exist. Eq. (52) indicates that a practical pseudorandom at the *k*th level () is composed of and <sup>1</sup>. As such the is correlated with <sup>1</sup>, i.e. the practical pseudorandom fields of two adjacent levels ( <sup>1</sup> and ) are coherent with each other. Their correlation or coherent structure is determined by the coefficient [0, 1]. Eq. (52) generates a sequence that is white in the vertical direction if = 0(i.e. = ), but a sequence that is perfect correlated in vertical if = 1(i.e. = <sup>1</sup>). Eq. (52) is also often used to construct random field that is temporally coherent, for example, a continuous random noise that has coherence in time, as used for in the forecast model [17]. The random noise in the EnKF procedure can also be replaced by the random noise imposed in model forcing. For example, the random noise is continuously added to wind forcing for oceanic models. Even for some atmospheric models with transition processes, there are inherent random noises making not necessary. One important criteria for and the amplitude *β* is to examine ensemble spread by some sensitivity experiments.

#### *4.2.2. The computational cost of Kalman gain*

The Kalman gain *K* has dimension of \*, where *L* is the number of model variables and *m* is the number of observational variables. In many realistic problems, *L* and *m* are very large numbers ( , the ensemble size), making the inversion very expensive.

A simple procedure is to rewrite the Kalman gain *K*, as below:

$$K = \tilde{\mathbf{x}} \tilde{\mathbf{x}}^T H^T \left( H \tilde{\mathbf{x}} \tilde{\mathbf{x}}^T H^T + \mathfrak{e} \mathbf{x}^T \right)^{-1},\tag{53}$$

where indicates that the model ensemble predictions removed the ensemble mean ( = [ − ], for = 1, 2, ..., ). = <sup>1</sup> was invoked here. If we assume the ensemble prediction error ( − ≈ − = ) is not correlated to observation error, i.e. = 0, the following is valid [17]:

$$(H\tilde{\mathbf{x}}\tilde{\mathbf{x}}^T H^\top + \varepsilon \mathbf{z}^\top) = (H\tilde{\mathbf{x}} + \varepsilon)(H\tilde{\mathbf{x}} + \varepsilon)^\top,\tag{54}$$

where ( + ) has dimension \* . Usually, ensemble size *N* is much less than *m*. Using the singular-value decomposition (SVD) technique, we have

$$\mathbb{E}\left(H\widetilde{\mathbf{x}} + \boldsymbol{\varepsilon}\right) = U\boldsymbol{\Sigma}V^{T} \tag{55}$$

Eq. (54) then becomes

$$U\left(H\tilde{\mathbf{x}}\tilde{\mathbf{x}}^{T}H^{T}+\mathfrak{e}\mathbf{x}^{T}\right) = U\Sigma V^{T}V\Sigma^{T}U = U\Sigma\Sigma^{T}U^{T} = U\Lambda U^{T} \tag{56}$$

So,

$$(H\tilde{\mathbf{x}}\tilde{\mathbf{x}}^T H^T + \mathbf{z}\mathbf{z}^T)^{-1} = U\Lambda^{-1}U^T\tag{57}$$

where and *Λ* are the eigenvector and the square of eigenvalues of ( + ). There are *N* nonzero eigenvalues for ( + ), therefore the dimension is not large, allowing us to efficiently compute the inversion for a global analysis in most practical situations.

#### *4.2.3. Stochastic EnKF and deterministic EnKF*

In EnKF introduced in the previous section, the observation assimilated into dynamical model should be treated to be stochastic variable, as expressed by <sup>+</sup> in Eq. (48). It is a must if the classic EnKF algorithm is used. It has been proven that if the EnKF assimilates deterministic observations (i.e., observation not changed at each ensemble member), the analysis error covariance will be systematically underestimated, typically leading to filter divergence, as indicated by below [11, 18]:

$$P^{\mu^\*} = (I - KH)B(I - KH)^T \tag{58}$$

Eq. (58) gives the analysis error covariance if the observed is not perturbed. Comparing Eq. (58) with Eq. (29), a theoretically unbiased estimate, \* is always less than .

However, the perturbed observation approach (i.e. + ) introduces an additional source of sampling error that reduces analysis error covariance accuracy and increases the probability of understanding analysis error covariance [19, 20]. Thus, an approach that only uses a single observation realization but avoids systematical underestimation of analysis error covariance was pursued. There are several approaches to implement this goal, as summarized by Tippettet al. [20]. Below, we will introduce an approach developed by Whitaker and Hamill [19], called Ensemble squareroot filter (EnSRF).

Denote the deviation of analysis from the analysis mean by = − , it is easy to write

$$
\tilde{\boldsymbol{\mathfrak{X}}}^a = \tilde{\boldsymbol{\mathfrak{X}}}^b + \tilde{\boldsymbol{K}} \left[ \tilde{\boldsymbol{\mathcal{Y}}}^o - H \tilde{\boldsymbol{\mathfrak{X}}}^b \right] \tag{59}
$$

where = − . If a single observation realization is assimilated in all ensemble members, = 0 and

$$
\tilde{\mathbf{x}}^a = \tilde{\mathbf{x}}^b - \tilde{K}H\tilde{\mathbf{x}}^b = \left(I - \tilde{K}H\right)\tilde{\mathbf{x}}^b,\tag{60}
$$

$$P^{\mu^\*} = \left(I - \tilde{K}H\right)B\left(I - \tilde{K}H\right)^T. \tag{61}$$

We seek a definition for that will result in an ensemble whose analysis error covariance equals to ( − ), i.e.

$$(I - \tilde{K}H)B(I - \tilde{K}H)^{\mathbb{T}} = \left(I - KH\right)B. \tag{62}$$

The solution of Eq. (62) is

*4.2.2. The computational cost of Kalman gain*

168 Nonlinear Systems - Design, Analysis, Estimation and Control

− ], for = 1, 2, ..., ). = <sup>1</sup>

singular-value decomposition (SVD) technique, we have

ee

( = [

following is valid [17]:

Eq. (54) then becomes

So,

The Kalman gain *K* has dimension of \*, where *L* is the number of model variables and *m* is the number of observational variables. In many realistic problems, *L* and *m* are very large

<sup>1</sup> ( ) , *TT TT T K xx H Hxx H*

ee

prediction error ( − ≈ − = ) is not correlated to observation error, i.e.

( ) ( )( ) , *TT T <sup>T</sup> Hxx H Hx Hx* % % + =+ +

( ) Σ *<sup>T</sup> Hx U V* % + = e

( ) *TT T T T TT T Hxx H U V V U U U U U* % % + = S S = SS = L

1 1 ( ) *TT T <sup>T</sup> Hxx H U U* ee

compute the inversion for a global analysis in most practical situations.

should be treated to be stochastic variable, as expressed by <sup>+</sup>

*4.2.3. Stochastic EnKF and deterministic EnKF*

where and *Λ* are the eigenvector and the square of eigenvalues of ( + ). There are *N* nonzero eigenvalues for ( + ), therefore the dimension is not large, allowing us to efficiently

In EnKF introduced in the previous section, the observation assimilated into dynamical model

classic EnKF algorithm is used. It has been proven that if the EnKF assimilates deterministic

 e

where ( + ) has dimension \* . Usually, ensemble size *N* is much less than *m*. Using the

 e

where indicates that the model ensemble predictions removed the ensemble mean

ee


was invoked here. If we assume the ensemble

% % (54)

(55)

(56)


in Eq. (48). It is a must if the

= 0, the

numbers ( , the ensemble size), making the inversion very expensive.

A simple procedure is to rewrite the Kalman gain *K*, as below:

$$
\tilde{K} = (\mathbf{l} + \sqrt{\frac{R}{HBH^T + R}})^{-1} K.\tag{63}
$$

Therefore, EnSRF is summarized as below (**Table 4**):

$$\begin{aligned} \mathbf{x}^{\mathcal{Q}} &= \overline{\mathbf{x}^{\mathcal{D}}} + K \Big[ \mathbf{y}^{\mathcal{O}} - H \overline{\mathbf{x}^{\mathcal{D}}} \Big] \\\\ \widetilde{\mathbf{x}}^{\mathcal{Q}} &= \overline{\mathbf{x}^{\mathcal{D}}} - \tilde{K} H \overline{\mathbf{x}}^{\mathcal{D}} \\\\ \mathbf{x}^{\mathcal{Q}} &= \widetilde{\mathbf{x}}^{\mathcal{Q}} + \widetilde{\mathbf{x}}^{\mathcal{Q}} \\\\ K &= BH^{T} (HBH^{T} + R)^{-1} \dots \\\\ \begin{bmatrix} \mathbf{z} \\\\ BH^{T} \end{bmatrix} &= \frac{1}{N-1} \sum\_{i=1}^{N} \left[ \mathbf{x}\_{i}^{\mathcal{B}} - \overline{\mathbf{x}^{\mathcal{D}}} \right] \left[ h(\mathbf{x}\_{i}^{\mathcal{B}}) - \overline{h(\mathbf{x}^{\mathcal{D}})} \right]^{T} \\\\ HBH^{T} &= \frac{1}{N-1} \sum\_{i=1}^{N} \left[ h(\mathbf{x}\_{i}^{\mathcal{B}}) - \overline{h(\mathbf{x}^{\mathcal{D}})} \right] \left[ h(\mathbf{x}\_{i}^{\mathcal{B}}) - \overline{h(\mathbf{x}^{\mathcal{D}})} \right]^{T} \\\\ \tilde{K} &= \left( 1 + \sqrt{\frac{R}{HBH^{T} + R}} \right)^{-1} K \end{aligned}$$

**Table 4.** The analysis scheme of EnSRF.

It should be noted that there are two Kalman gains used in EnSRF, the original *K* for updating ensemble mean and a new for updating the anomalies. It indicates that one single observation realization of classic EnKF has the same ensemble analysis mean as stochastic observations.

Initially, the term EnKF refers, in particular, to the stochastic ensemble Kalman filter that requires perturbing the observations. Subsequently, several deterministic EnKFs that avoid the use of perturbed observations were developed, e.g. the ETKF [21], the EAKF [22] and the EnSRF. These filter designs are labelled as variants of the EnKF because they are also based on the Kalman filtering formula and ensemble representations.

## *4.2.4. Inflation approach*

<sup>1</sup> (1 ) . *<sup>T</sup> <sup>R</sup> K K HBH R* - = +

Therefore, EnSRF is summarized as below (**Table 4**):

170 Nonlinear Systems - Design, Analysis, Estimation and Control

= + −

= ( + )−1, ,

 <sup>−</sup> 1∑ = 1

 <sup>−</sup> 1∑ = 1

+

**Table 4.** The analysis scheme of EnSRF.

[

[ℎ(

) −1 

− ][ℎ(

) − ℎ()][ℎ(

the Kalman filtering formula and ensemble representations.

) − ℎ()]

) − ℎ()]

It should be noted that there are two Kalman gains used in EnSRF, the original *K* for updating ensemble mean and a new for updating the anomalies. It indicates that one single observation realization of classic EnKF has the same ensemble analysis mean as stochastic observations.

Initially, the term EnKF refers, in particular, to the stochastic ensemble Kalman filter that requires perturbing the observations. Subsequently, several deterministic EnKFs that avoid the use of perturbed observations were developed, e.g. the ETKF [21], the EAKF [22] and the EnSRF. These filter designs are labelled as variants of the EnKF because they are also based on

= −

= +

<sup>=</sup> <sup>1</sup>

<sup>=</sup> <sup>1</sup>

= (1 +

+

% (63)

The forecast error covariance is defined by (44)

$$B = \frac{1}{N-1} \sum\_{i=1}^{N} (\mathbf{x}\_i^b - \overline{\mathbf{x}^b})(\mathbf{x}\_i^b - \overline{\mathbf{x}^b})^T = \frac{1}{N-1} \tilde{X}^\* \tilde{X}^T. \tag{64}$$

Eq. (64) is an approximation to *B* using forecast ensemble. Due to limited computational source, the ensemble size *N* is often restricted to a small value for many realistic issues. A small ensemble size may cause a very small ensemble spread, causing the approximation of *B* by Eq. (64), which is seriously underestimated. To solve this problem, *B* is multiplied by an inflator factor *λ* (slightly greater than 1). *λ* is empirically determined, such as some sensitivity experiments, with the typical value of 1.01. *λB* is used to replace *B* in EnKF formula. This approach is equivalent to the below approach:

$$\mathbf{x}^{b}\_{i} = \mathcal{A}\left(\mathbf{x}^{b}\_{i} - \overline{\mathbf{x}^{b}}\right) + \overline{\mathbf{x}^{b}} \tag{65}$$

#### *4.2.5. Localization of EnKF*

When EnKF is applied to high-dimensional atmospheric and oceanic models, the limited ensemble size will cause the estimated correlations to be noisy [11]. When the ensemble size is insufficient, it will produce spurious correlations between distant locations in the background covariance matrix *B*. Unless they are suppressed, these spurious correlations will cause observations from one location to affect the analysis in locations an arbitrarily large distance away, in an essentially random manner [23]. This needs to be remedied by the localization method.

Another reason for using localization is that the treatment of localization artificially reduces the spatial domain of influence of observations during the update. The localization dramatically reduces the necessary ensemble size, which is very important for operational systems. Two most common distance-based localization methods used in practice are local analysis and covariance localization.

Using local analysis, only measurements located within a certain distance from a grid point will impact the analysis in this grid point. This allows for an algorithm where the analysis is computed grid point by grid point. It was found that severe localization could lead to imbalance, but with large enough radius of influence (decorrelation length) for the measurements, this was not a problem. Hunt et al. use the local analysis method in their ETKF algorithm and developed a local ensemble transform Kalman filter (LETKF) [23].

To eliminate the small background error covariance associated with remote observations, Houtekamer and Mitchell uses a Schur (element-wise) product of a correlation function with local support and the covariance of the background error calculated from the ensemble [14]. That is, the matrix B in Eq. (48) is replaced by *ρ B*, where " " represents the element-wise product and the elements *ρ* relates to the distance *r* of the grid point to the observation *r* as below:

$$
\rho \rho \left( r \right) = \left( 1 + \alpha r + \frac{\alpha^2 r^2}{3} \right) e^{-ar}. \tag{66}
$$

Here, *α* is a scalar parameter. To the best of author's knowledge, this is the first case that the covariance localization is used in EnKF.

Nowadays, a typical covariance localization approach is used to represent prior covariances using an element-wise product of ensemble covariance and a correlation function with compact support [24]. Anderson applied this approach to the Data Assimilation Research Testbed system [25], which has been used for realistic cases.

## **5. General form of ensemble-based filters for Gaussian models**

In proceeding sections, we introduced Kalman-based filters. Originally Kalman filter applies linear model and linear measurement function. Further, EKF and EnKF were developed to address nonlinear models. However, the measurement functions are still assumed to be linear. Eqs. (46) and (47) can directly evaluate nonlinear measurement functions but they were proposed intuitionally and not proven yet. In this section, we will present a general form for nonlinear measurement function and further prove Eqs. (46) and (47) mathematically using the general form.

For generality, we assume the nonlinear model as Eqs. (34) and (35):

$$\mathbf{x}\_{t+1} = f\left(\mathbf{x}\_t\right) + \eta\_t,\tag{67}$$

$$\mathbf{y}\_{\iota} = h\left(\mathbf{x}\_{\iota}\right) + \boldsymbol{\zeta}\_{\iota},\tag{68}$$

where *f* and *h* are nonlinear operators of model and measurement. *x* is model state and *y* is the observation. and are the model errors and observed errors, respectively, which have variance . Assuming the estimate of model state at a time step is a linear combination of model forecast and observation , i.e. the filter itself is linear, so

$$\mathbf{x}^a = \mathbf{x}^b + K \left[ \mathbf{y}^o - h \left( \mathbf{x}^b \right) \right] \tag{69}$$

Denoting = − , = − , = − ℎ(), we have

An Introduction to Ensemble-Based Data Assimilation Method in the Earth Sciences http://dx.doi.org/10.5772/64718 173

$$
\hat{\mathbf{x}}^a = \hat{\mathbf{x}}^b - K\hat{\mathbf{y}} \tag{70}
$$

$$\begin{split} P^{\boldsymbol{\alpha}} &= E[\hat{\boldsymbol{\lambda}}^{\boldsymbol{\alpha}} (\hat{\boldsymbol{\lambda}}^{\boldsymbol{\alpha}})^{\boldsymbol{T}}] = E\left[ (\hat{\boldsymbol{\lambda}}^{\boldsymbol{\alpha}} - K\hat{\boldsymbol{\chi}}) \Big( \hat{\boldsymbol{\lambda}}^{\boldsymbol{\alpha}} - K\hat{\boldsymbol{\chi}} \Big)^{\boldsymbol{T}} \right] \\ &= E[\hat{\boldsymbol{\lambda}}^{\boldsymbol{\alpha}} (\hat{\boldsymbol{\lambda}}^{\boldsymbol{\alpha}})^{\boldsymbol{T}} - \hat{\boldsymbol{\lambda}}^{\boldsymbol{\alpha}} \hat{\boldsymbol{\chi}}^{\boldsymbol{T}} \boldsymbol{K}^{\boldsymbol{T}} - K\hat{\boldsymbol{\chi}} (\hat{\boldsymbol{\lambda}}^{\boldsymbol{\alpha}})^{\boldsymbol{T}} + K\hat{\boldsymbol{\chi}}\hat{\boldsymbol{\chi}}^{\boldsymbol{T}} \boldsymbol{K}^{\boldsymbol{T}} \\ &= P^{\boldsymbol{\alpha}} - P\_{\hat{\boldsymbol{\chi}}\hat{\boldsymbol{\chi}}} \boldsymbol{K}^{\boldsymbol{T}} - K\boldsymbol{P}\_{\hat{\boldsymbol{\chi}}\hat{\boldsymbol{\chi}}} + K\boldsymbol{P}\_{\hat{\boldsymbol{\chi}}\hat{\boldsymbol{\chi}}} \boldsymbol{K}^{\boldsymbol{T}} \end{split} \tag{71}$$

The optimal estimate asks the trace of minimum, namely,

$$\frac{\partial \lceil \text{trace}(P^{\omega}) \rceil}{\partial K} = -P\_{\hat{\omega}\hat{\mathbf{y}}} - P\_{\hat{\mathbf{y}}\hat{\mathbf{x}}} + 2KP\_{\hat{\mathbf{y}}\hat{\mathbf{y}}} = 0,\tag{72}$$

where we invoked the below properties:

product and the elements *ρ* relates to the distance *r* of the grid point to the observation *r* as

1 . 3 *<sup>r</sup> <sup>r</sup> r re* a

Here, *α* is a scalar parameter. To the best of author's knowledge, this is the first case that the

Nowadays, a typical covariance localization approach is used to represent prior covariances using an element-wise product of ensemble covariance and a correlation function with compact support [24]. Anderson applied this approach to the Data Assimilation Research

In proceeding sections, we introduced Kalman-based filters. Originally Kalman filter applies linear model and linear measurement function. Further, EKF and EnKF were developed to address nonlinear models. However, the measurement functions are still assumed to be linear. Eqs. (46) and (47) can directly evaluate nonlinear measurement functions but they were proposed intuitionally and not proven yet. In this section, we will present a general form for nonlinear measurement function and further prove Eqs. (46) and (47) mathematically using

<sup>1</sup> ( ) , *t tt x fx* <sup>+</sup> = +

( ) , *t tt y hx* = +

( ) *ab o b x x K y hx* =+ - é ù

h

z

where *f* and *h* are nonlinear operators of model and measurement. *x* is model state and *y* is the

variance . Assuming the estimate of model state at a time step is a linear combination of model forecast and observation , i.e. the filter itself is

are the model errors and observed errors, respectively, which have

a

è ø (66)

(67)

(68)

ë û (69)

( ) 2 2

 a - æ ö =+ + ç ÷

**5. General form of ensemble-based filters for Gaussian models**

For generality, we assume the nonlinear model as Eqs. (34) and (35):

Denoting = − , = − , = − ℎ(), we have

r

Testbed system [25], which has been used for realistic cases.

covariance localization is used in EnKF.

172 Nonlinear Systems - Design, Analysis, Estimation and Control

below:

the general form.

observation.

linear, so

and

$$\frac{\partial \left( \text{trace} \left[ XXX^{\top} \right] \right)}{\partial X} = X \left( A + A^{\top} \right) = 2XA,\tag{73}$$

$$\frac{\partial(\text{trace}[XA^{\top}])}{\partial X} = \frac{\partial(\text{trace}[AX^{\top}])}{\partial X} = A^{\top} = A,\tag{74}$$

Thus, we have the optimal estimate filter:

$$\mathbf{x}\_{\iota}^{a} = \mathbf{x}\_{\iota}^{b} + K \left[ \mathbf{y}^{o} - h \left( \mathbf{x}\_{\iota}^{b} \right) \right] \tag{75}$$

$$K = P\_{\ddagger\dagger} P\_{\ddagger\dagger}^{-1} \tag{76}$$

$$P^a = P^b - KP\_{\vec{\chi}\vec{\bar{\chi}}},\tag{77}$$

Eqs. (75)–(77) give a general algorithm for Gaussian nonlinear model and nonlinear measurement function. The first term of Eq. (74) can be interpreted as the cross-covariance between the state and observation errors, and the remaining expression can be interpreted as the error covariance of the difference between model observation and observation itself. Here, is defined as the error between the noisy observation and its prediction <sup>ℎ</sup> .

If the model is linear, obviously,

$$\mathbf{x}\_{\iota+1}^{\flat} = M\mathbf{x}\_{\iota}^{\ast} + \eta\_{\iota},\tag{78}$$

$$B\_{t+1} = MP\_t^\phi M^T + Q.\tag{79}$$

If the measurement function is linear, i.e.

$$\hat{\mathbf{y}} = \mathbf{y}^o - h\left(\mathbf{x}^b\right) - \boldsymbol{\zeta} = \mathbf{y}^o - H\mathbf{x}^b - \boldsymbol{\zeta} = H\mathbf{x}^{\nu} - H\mathbf{x}^b - \boldsymbol{\zeta} = H\hat{\mathbf{x}}^b - \boldsymbol{\zeta} \tag{80}$$

$$P\_{\hat{\mathbf{x}}\hat{\mathbf{y}}} = <\hat{\mathbf{x}}^b, \hat{\mathbf{y}}^T > = P^b H^T \tag{81}$$

$$P\_{\hat{\mathbf{y}}^{\hat{\mathbf{y}}}} = <\hat{\mathbf{y}}, \hat{\mathbf{y}}^{\top} > = HP^{\flat}H^{\top} + R \tag{82}$$

So, Kalman gain

$$K = P^\flat H^T \left( H P^\flat H^T + R \right)^{-1} \tag{83}$$

Eq. (83) is identical to Eq. (28). Therefore, Eq. (28), or KF, EKF and EnKF, is a special case of Eq. (76) under the assumption of linear measurement function.

In the standard KF, the state error covariance is updated at each analysis cycle during the measurement update process. Updating the error covariance matrix is important because it represents the change in forecast error covariance when a measurement is performed. The EnKF implementation does not require the covariance update equation because it can directly calculate the updated error covariance matrix from a set of ensemble members. Evensen [17] has derived the analysis of covariance equation that is consistent with the standard KF error covariance to update Eq. (28). But the true representation of the updated error covariance requires a large ensemble size, which is often computationally infeasible.

The general form of the Kalman gain makes use of the reformulated error covariance. In a broad sense, the above algorithm implicitly uses the prior covariance update equation (or the analysis error covariance matrix) to calculate the forecast error covariance. Thus, the above algorithm is fully consistent with the time update and measurement update formulation of the Kalman filter algorithm. On this basis, one can develop a new type of Kalman filter that chooses the ensemble members deterministically in such a way that they can capture the statistical moments of the nonlinear model accurately. In the next subsection, we will discuss the new type of Kalman filter, called sigma-point Kalman filter, based on the above algorithm.
