Adaptive Filter as Efficient Tool for Data Assimilation under Uncertainties

*Hong Son Hoang and Remy Baraille*

### **Abstract**

In this contribution, the problem of data assimilation as state estimation for dynamical systems under uncertainties is addressed. This emphasize is put on highdimensional systems context. Major difficulties in the design of data assimilation algorithms is a concern for computational resources (computational power and memory) and uncertainties (system parameters, statistics of model, and observational errors). The idea of the adaptive filter will be given in detail to see how it is possible to overcome uncertainties as well as to explain the main principle and tools for implementation of the adaptive filter for complex dynamical systems. Simple numerical examples are given to illustrate the principal differences of the AF with the Kalman filter and other methods. The simulation results are presented to compare the performance of the adaptive filter with the Kalman filter.

**Keywords:** adaptive filter, innovation process, minimum prediction error, simultaneous perturbation stochastic approximation, filter stability

#### **1. Introduction**

In this chapter, the adaptive filter (AF) is considered as a computational device that yields estimates of the system state by minimizing recursively (in time) the error between the predicted output of the device and its observed signal in real time. As the main objective of the AF is to produce estimates of the state in highdimensional systems (HdSs), we shall focus the attention on the mathematical form of the AF in a state-space form as that used in the Kalman filter (KF) [1]. In this chapter, the HdS is referred to as a system whose state dimension is of order O 107 � O 10<sup>8</sup> .

The assimilation problem in this chapter is formulated as a standard filtering problem. For simplicity, let the dynamical system be described by the equation

$$\varkappa(t+1) = \Phi\varkappa(t) + w(t), \varkappa(0) = \varkappa\_0, t = 0, 1, \dots \tag{1}$$

where *x t*ð Þ is the system state at the *t* time instant. At each time instant *t*, we are given the observation for the system output

$$z(t+1) = Hx(t+1) + v(t+1), t = 0, 1, \dots \tag{2}$$

In (1) and (2), *w t*ð Þ is the model error (ME), *v t*ð Þ is the observation error (ObE), and *Φ* represents the system dynamics. In general, the system (1) and (2) may be nonlinear with *Φx* ¼ *f x*ð Þ, *Hx* ¼ *h x*ð Þ. The filtering problem for a partial observed dynamical system (1) and (2) is to obtain the best possible estimate for the state *x t*ð Þ at each instant *t*, given the set of observations *Z*ð Þ¼ 1 : *t* ½ � *z*ð Þ1 , … , *z t*ð Þ .

There exist different techniques to solve estimation problems. The simplest approach is related to linear estimator [2], since it requires only first two moments. Linear estimation is frequently used in practice when there is a limitation in computational complexity. Among others, the widely used methods are maximum likelihood, least squares, method of moments, the Bayesian estimation, minimum mean square error (MMSE), etc. For more details, see [3].

There are limitations of optimal filters. In practice, the difficulties are numerous: the statistics of signals which may not be available or cannot be accurately estimated; there may not be available time for statistical estimation (real-time); the signals and systems may be non-stationary; memory required and computational load may be prohibitive. All these difficulties become insurmountable, especially for HdSs.

In order to deal with real-time applications, the AFs appear to be a valuable tool in solving estimation problems when there is no time for statistical estimation and when we are dealing with non-stationary signals and/or systems environment. They can operate satisfactorily in unknown and possibly time-varying environments without user intervention. They improve their performance during operation by learning statistics from current signal observations. Finally, they can track variations in the signal operating environment [4].

It is well-known that the MMSE estimator in the class of Borel measurable (with respect to (wrt) *Z*ð Þ 1 : *t* ) functions is given by the conditional mean

$$
\hat{\mathbf{x}}(t) = E[\mathbf{x}(t)/Z(\mathbf{1}:t)] \tag{3}
$$

Under the most favorable conditions (perfect knowledge of all system parameters and noise statistics), for a dynamical system with dimension of order 107–108, it is impossible to solve the ARE (due to computational burden), not to say about storing *M*, *P*. To overcome these difficulties, the AF is proposed. Mention that the KF is also an MMSE filter in the complete Hilbert space of random variables,

*Adaptive Filter as Efficient Tool for Data Assimilation under Uncertainties*

*DOI: http://dx.doi.org/10.5772/intechopen.92194*

*<sup>E</sup>*j j *<sup>ξ</sup>* <sup>2</sup> <sup>&</sup>lt; <sup>∞</sup>, with scalar product *<sup>ξ</sup>*1, *<sup>ξ</sup>*<sup>2</sup> ð Þ*<sup>H</sup>* <sup>¼</sup> *<sup>E</sup>*<sup>&</sup>lt; *<sup>ξ</sup>*1*ξ*<sup>2</sup> <sup>&</sup>gt; and the norm j j j j *<sup>ξ</sup>* <sup>¼</sup> *<sup>E</sup>* j j *<sup>ξ</sup>* <sup>2</sup> h i � � f g <sup>1</sup>*=*<sup>2</sup>

KF (EKF) [7], the unscented KF (UKF, [8]), and the Ensemble Kalman filter (EnKF, [9]). In the EnKF, the ECM is a sampled ECM whose samples are generated using samples of the state variable, and consequently the ECM in the KF becomes a sampled ECM. For an example of application of the EnKF for data assimilation in geophysical data assimilation with high dimensional model, see [10]. Another class of ensemble filtering technique is a class of particle filters (PF, [11]). The basic idea of the PF (also the EnKF) is to use a discrete set of weighted *n* particles to represent the distribution of *x t*ð Þ, where the distribution is updated at each time by changing

the particle weights according to their likelihoods.

algorithms are ineffective for HdS data assimilation.

in Section 7.

**7**

**2. Adaptive filter**

particular, in meteorology and oceanography.

For the nonlinear models, there are KF variants, among those are the extended

Despite a possible implementation of the KF variants, they might still be seriously biased because the accuracy of the KF update requires linearity of the observation function and Gaussianity of the distribution of system state *x t*ð Þ. In reality, the KF (4)–(8) may be biased and unstable, even divergent [12]. Today, the PF

In this chapter, we shall show how the AF can be efficient in dealing with uncertainties existing in the filtering problem (1) and (2). In Section 2, a brief outline of the AF is given. The main features of the AF, which are different to those of the KF, are presented. This concerns the optimal criteria, stabilizing gain structure, optimization algorithms. Section 3 shows in detail how the AF is capable of dealing with uncertainties in the specification of ME covariance. The hypothesis on a subspace of ME is presented in Section 4 from which one sees how one can make order reduction for representing the bias and ME covariance. Simple numerical examples on one- and two-dimensional systems are given in Section 5 to illustrate in details the differences between the AF and the traditional KF. Numerical experiments on low and high dimensional systems are given in Section 6 to demonstrate how the AF algorithm works. The performance comparison between the AF and KF, for both situations of perfect knowledge of ME statistics and that with ME uncertainties, is also presented. Conclusions and perspectives of the AF are summarized

The AF is originated from [13]. It is constructed for estimating the state of a dynamical system based on partially observed quantities related in some way to the system state. As reported before, for linear systems contaminated by Gaussian noise, the MMSE estimate can be obtained by the KF. Since publication of [1] in 1960, an uncountable number of works are done for solving engineering problems by KF, in all engineering fields, as well as many modifications have been proposed. The reasons for the need in modification of the KF are numerous, but mostly related to nonlinear dynamics, parameter uncertainties in specification of system parameters, bias of ME, unknown statistics of ME, model reduction. With the rapid progress of computer technology (computer power, memory, … ), various simplified versions of KF are suggested for solving filtering (or data assimilation) for HdSs, in

.

Under standard conditions, related to the noise sequences *w t*ð Þ, *v t*ð Þ (Gaussian i.i.d.—identically independent (temporal) distributed), the estimate (3) *x t* ^ð Þ for *x t*ð Þ can be obtained from the KF in the recursive form

$$
\hat{\mathbf{x}}(t+\mathbf{1}) = \hat{\mathbf{x}}(t+\mathbf{1}/t) + K\zeta(t+\mathbf{1}),\tag{4}
$$

$$
\hat{\mathfrak{x}}(t+\mathbf{1}/t) = \Phi \hat{\mathfrak{x}}(t), \zeta(t+\mathbf{1}) = z(t+\mathbf{1}) - \hat{\mathfrak{x}}(t+\mathbf{1}/t) \tag{5}
$$

$$K(t) = M(t)H^T \left[ H\mathbf{M}(t)H^T + R \right]^{-1} \tag{6}$$

$$M(t+1) = \Phi P(t)\Phi^T + Q \tag{7}$$

$$P(t) = \left[I - K(t)H[M(t)]I - K(t)H\right]^T + K(t)RK^T(t)\tag{8}$$

In (4)–(8), *Q*, *R* are the covariance matrix for *w t*ð Þ and *v t*ð Þ, respectively. One sees that *K* ¼ *K M*ð Þ—the gain matrix, is a function of *M* ≔ *M t*ð Þ þ 1 —the error covariance matrix (ECM) for the state prediction error (PE) *e t*ð Þ þ 1*=t* which is defined as *e t*ð Þ þ 1*=t* ≔ *x t*ð Þ� þ 1 *x t* ^ð Þ þ 1 , and *ζ*ð Þ *t* þ 1 is known as innovation vector. Note from (7) and (8) that the ECM *M t*ð Þ þ 1 can be found by solving the matrix nonlinear Algebraic Riccati equation (ARE). Generally speaking, a solution of the ARE is not unique. Conditions must be introduced for ensuring an existence of a unique non-negative definite solution [5]. It is remarkable that the ECMs *P*, *M* in (7) and (8) do not depend on observations; therefore, they can be computed in advance, offline, given the system matrices and the noise covariances. The same remark is valid for the gain matrix *K* in (6). In contrast, the gain in the AF is observation-dependent [6] (see Section 2).

*Adaptive Filter as Efficient Tool for Data Assimilation under Uncertainties DOI: http://dx.doi.org/10.5772/intechopen.92194*

Under the most favorable conditions (perfect knowledge of all system parameters and noise statistics), for a dynamical system with dimension of order 107–108, it is impossible to solve the ARE (due to computational burden), not to say about storing *M*, *P*. To overcome these difficulties, the AF is proposed. Mention that the KF is also an MMSE filter in the complete Hilbert space of random variables,

*<sup>E</sup>*j j *<sup>ξ</sup>* <sup>2</sup> <sup>&</sup>lt; <sup>∞</sup>, with scalar product *<sup>ξ</sup>*1, *<sup>ξ</sup>*<sup>2</sup> ð Þ*<sup>H</sup>* <sup>¼</sup> *<sup>E</sup>*<sup>&</sup>lt; *<sup>ξ</sup>*1*ξ*<sup>2</sup> <sup>&</sup>gt; and the norm j j j j *<sup>ξ</sup>* <sup>¼</sup> *<sup>E</sup>* j j *<sup>ξ</sup>* <sup>2</sup> h i � � f g <sup>1</sup>*=*<sup>2</sup> .

For the nonlinear models, there are KF variants, among those are the extended KF (EKF) [7], the unscented KF (UKF, [8]), and the Ensemble Kalman filter (EnKF, [9]). In the EnKF, the ECM is a sampled ECM whose samples are generated using samples of the state variable, and consequently the ECM in the KF becomes a sampled ECM. For an example of application of the EnKF for data assimilation in geophysical data assimilation with high dimensional model, see [10]. Another class of ensemble filtering technique is a class of particle filters (PF, [11]). The basic idea of the PF (also the EnKF) is to use a discrete set of weighted *n* particles to represent the distribution of *x t*ð Þ, where the distribution is updated at each time by changing the particle weights according to their likelihoods.

Despite a possible implementation of the KF variants, they might still be seriously biased because the accuracy of the KF update requires linearity of the observation function and Gaussianity of the distribution of system state *x t*ð Þ. In reality, the KF (4)–(8) may be biased and unstable, even divergent [12]. Today, the PF algorithms are ineffective for HdS data assimilation.

In this chapter, we shall show how the AF can be efficient in dealing with uncertainties existing in the filtering problem (1) and (2). In Section 2, a brief outline of the AF is given. The main features of the AF, which are different to those of the KF, are presented. This concerns the optimal criteria, stabilizing gain structure, optimization algorithms. Section 3 shows in detail how the AF is capable of dealing with uncertainties in the specification of ME covariance. The hypothesis on a subspace of ME is presented in Section 4 from which one sees how one can make order reduction for representing the bias and ME covariance. Simple numerical examples on one- and two-dimensional systems are given in Section 5 to illustrate in details the differences between the AF and the traditional KF. Numerical experiments on low and high dimensional systems are given in Section 6 to demonstrate how the AF algorithm works. The performance comparison between the AF and KF, for both situations of perfect knowledge of ME statistics and that with ME uncertainties, is also presented. Conclusions and perspectives of the AF are summarized in Section 7.

#### **2. Adaptive filter**

The AF is originated from [13]. It is constructed for estimating the state of a dynamical system based on partially observed quantities related in some way to the system state. As reported before, for linear systems contaminated by Gaussian noise, the MMSE estimate can be obtained by the KF. Since publication of [1] in 1960, an uncountable number of works are done for solving engineering problems by KF, in all engineering fields, as well as many modifications have been proposed. The reasons for the need in modification of the KF are numerous, but mostly related to nonlinear dynamics, parameter uncertainties in specification of system parameters, bias of ME, unknown statistics of ME, model reduction. With the rapid progress of computer technology (computer power, memory, … ), various simplified versions of KF are suggested for solving filtering (or data assimilation) for HdSs, in particular, in meteorology and oceanography.

In (1) and (2), *w t*ð Þ is the model error (ME), *v t*ð Þ is the observation error (ObE), and *Φ* represents the system dynamics. In general, the system (1) and (2) may be nonlinear with *Φx* ¼ *f x*ð Þ, *Hx* ¼ *h x*ð Þ. The filtering problem for a partial observed dynamical system (1) and (2) is to obtain the best possible estimate for the state *x t*ð Þ at each instant *t*, given the set of observations *Z*ð Þ¼ 1 : *t* ½ � *z*ð Þ1 , … , *z t*ð Þ . There exist different techniques to solve estimation problems. The simplest approach is related to linear estimator [2], since it requires only first two moments. Linear estimation is frequently used in practice when there is a limitation in computational complexity. Among others, the widely used methods are maximum likelihood, least squares, method of moments, the Bayesian estimation, minimum

There are limitations of optimal filters. In practice, the difficulties are numerous:

In order to deal with real-time applications, the AFs appear to be a valuable tool in solving estimation problems when there is no time for statistical estimation and when we are dealing with non-stationary signals and/or systems environment. They can operate satisfactorily in unknown and possibly time-varying environments without user intervention. They improve their performance during operation by learning statistics from current signal observations. Finally, they can track

It is well-known that the MMSE estimator in the class of Borel measurable (with

Under standard conditions, related to the noise sequences *w t*ð Þ, *v t*ð Þ (Gaussian i.i.d.—identically independent (temporal) distributed), the estimate (3) *x t* ^ð Þ for *x t*ð Þ

In (4)–(8), *Q*, *R* are the covariance matrix for *w t*ð Þ and *v t*ð Þ, respectively. One sees that *K* ¼ *K M*ð Þ—the gain matrix, is a function of *M* ≔ *M t*ð Þ þ 1 —the error covariance matrix (ECM) for the state prediction error (PE) *e t*ð Þ þ 1*=t* which is defined as *e t*ð Þ þ 1*=t* ≔ *x t*ð Þ� þ 1 *x t* ^ð Þ þ 1 , and *ζ*ð Þ *t* þ 1 is known as innovation vector. Note from (7) and (8) that the ECM *M t*ð Þ þ 1 can be found by solving the matrix nonlinear Algebraic Riccati equation (ARE). Generally speaking, a solution of the ARE is not unique. Conditions must be introduced for ensuring an existence of a unique non-negative definite solution [5]. It is remarkable that the ECMs *P*, *M* in (7) and (8) do not depend on observations; therefore, they can be computed in advance, offline, given the system matrices and the noise covariances. The same remark is valid for the gain matrix *K* in (6). In contrast, the gain in the AF is

*x t* ^ðÞ¼ *Ex t* ½ � ð Þ*=Z*ð Þ 1 : *t* (3)

*x t* ^ð Þ¼ þ 1 *x t* ^ð Þþ þ 1*=t Kζ*ð Þ *t* þ 1 , (4)

*K t*ðÞ¼ *M t*ð Þ*H<sup>T</sup> HM t*ð Þ*H<sup>T</sup>* <sup>þ</sup> *<sup>R</sup>* �<sup>1</sup> (6)

*M t*ð Þ¼ <sup>þ</sup> <sup>1</sup> *<sup>Φ</sup>P t*ð Þ*Φ<sup>T</sup>* <sup>þ</sup> *<sup>Q</sup>* (7)

*x t* ^ð Þ¼ þ 1*=t Φx t* ^ð Þ, *ζ*ð Þ¼ *t* þ 1 *z t*ð Þ� þ 1 *x t* ^ð Þ þ 1*=t* (5)

*P t*ðÞ¼ ½ � *<sup>I</sup>* � *K t*ð Þ*<sup>H</sup> M t*ð Þ½ � *<sup>I</sup>* � *K t*ð Þ*<sup>H</sup>* <sup>T</sup> <sup>þ</sup> *K t*ð Þ*RK<sup>T</sup>*ð Þ*<sup>t</sup>* (8)

the statistics of signals which may not be available or cannot be accurately estimated; there may not be available time for statistical estimation (real-time); the signals and systems may be non-stationary; memory required and computational load may be prohibitive. All these difficulties become insurmountable, especially

mean square error (MMSE), etc. For more details, see [3].

*Dynamic Data Assimilation - Beating the Uncertainties*

variations in the signal operating environment [4].

can be obtained from the KF in the recursive form

observation-dependent [6] (see Section 2).

**6**

respect to (wrt) *Z*ð Þ 1 : *t* ) functions is given by the conditional mean

for HdSs.

Direct application of the KF to HdSs is impossible due to the limit in computer power, memory, and computational time. In particular, the KF requires to solve the matrix AREs (7) and (8) for computing ECMs *M t*ð Þ, *P t*ð Þ. Storing such matrices is impossible, not to say on computational time.

*Comment 2.1*. Generally speaking, the convergence rate of the algorithm (11) and (12) is *O*ð Þ 1*=t* . It is possible to improve the convergence rate of the SA by averaging

> 1 *t* X*t t*0 ¼1 *θ t*

*Comment 2.2.* For high HdSs, even with *θ* being of moderate dimension, instead

*Ψ* ≔ *Ψ*½ �� θ þ δθ *Ψ*½ � θ . This allows to evaluate the gradient-like vector by only two or

As seen from (11) and (12), implementation of SA algorithms is much simpler for searching optimal gain parameters compared to the other optimization methods. The SA algorithms require only numerical computing derivatives ∇*θΨ ζ*½ � ð Þ *t* þ 1 of the sample cost function *Ψ ζ*½ � ð Þ *t* þ 1 wrt *θ* evaluated at *θ*ð Þ*t* and *γ*ð Þ*t* is a scalar which

the ergodic hypothesis on of the system (1) and (2) from which there exists an

First, consider the situation when the vector of parameters consists of are all

ζð Þ¼ t þ 1 *z t*ð Þ� þ 1 ^*z t*ð Þ¼ þ 1*=t z t*ð Þ� þ 1 *HΦx t* ^ð Þ

*x t* ^ðÞ¼ *x t* ^ð Þþ � 1*=t K*ð Þ*θ* ζð Þt

δθζð Þ¼� t þ 1 *HΦ*δθx t ^ðÞ¼�*HΦ*δθKð Þθ ζðÞ¼� t *HΦ*δK ζð Þt *:*

δK*Ψ*½ �¼ ζð Þ t þ 1 δ<sup>K</sup> < ζð Þ t þ 1 , ζð Þ t þ 1 > ¼ 2 <ζð Þ t þ 1 , δKζð Þ t þ 1 >

¼ �2<sup>&</sup>lt; <sup>ζ</sup>ð Þ <sup>t</sup> <sup>þ</sup> <sup>1</sup> , *<sup>H</sup>Φ*δKζð Þ<sup>t</sup> <sup>&</sup>gt; ¼ �2<sup>&</sup>lt; *<sup>Φ</sup>*<sup>T</sup>*H*<sup>T</sup>ζð Þ <sup>t</sup> <sup>þ</sup> <sup>1</sup> , <sup>δ</sup>Kζð Þ<sup>t</sup> <sup>&</sup>gt; *:*

Let us compute derivatives of the sample cost function *Ψ* wrt the elements *Kij* of the gain *K*. To do so, one needs to integrate the adjoint operator *Φ<sup>T</sup>* s.t. the forcing

*δΨ=δKij* ¼ �*ψiζ*ð Þ *t*, *j* , (14)

*t*

For details on the SPSA algorithm and its convergence rate, see [15, 16].

of the algorithm (12) or (13), the SPSA (Simultaneous Perturbation Stochastic Perturbation) algorithms in [15, 16] are of preference. That is due to the fact that integration of HdS over the assimilation window is very expensive. These algorithms generate stochastic perturbation *δθ* <sup>¼</sup> ð Þ *δθ*1, … , *δθ<sup>n</sup> <sup>T</sup>* with components as

gradient) vector is computed as the divided difference *δΨ=δθ<sup>i</sup>* where

<sup>0</sup> ð Þ (13)

. That is possible due to introducing

*th*component of the gradient-like (pseudo-

*θm*ðÞ¼ *t*

*Adaptive Filter as Efficient Tool for Data Assimilation under Uncertainties*

of the iterates,

For more details, see [14].

*DOI: http://dx.doi.org/10.5772/intechopen.92194*

Bernoulli i.i.d. realizations. Each *i*

three time integration of the numerical model.

can be chosen a priori, for example, as *<sup>γ</sup>*ðÞ¼ *<sup>t</sup>* <sup>1</sup>

elements of *K*, *θ* ¼ *K:* Compute the innovation vector,

*<sup>f</sup>* <sup>≔</sup> � *<sup>H</sup><sup>T</sup>ζ*ð Þ *<sup>t</sup>* <sup>þ</sup> <sup>1</sup> which yields *<sup>ψ</sup>* <sup>≔</sup> � *<sup>Φ</sup>THTζ*ð Þ *<sup>t</sup>* <sup>þ</sup> <sup>1</sup> and hence

**3. Covariance uncertainties and AF**

**3.1 Covariance uncertainties**

*3.1.1 Adjoint approach*

asymptotic optimal gain

**9**

Different simplified approaches are proposed for overcoming difficulties in the application of the KF. The example of successful tool for solving data assimilation problems in HdSs is the EnKF [9]. In the EnKF, an ensemble of error samples, of small size, is generated on the basis of model states to approximate the ECMs. In practice of data assimilation for HdSs, it is possible to generate only ensembles of moderate sizes (of order *O*ð Þ 100 ) by model integrations over the assimilation window (time interval between two successive arrivals of observations) since one such integration takes several hours! The other approach like PF is based on sampling from conditional distributions. Theoretically, this approach is more appropriate for nonlinear problems because no linearization is required as in the EKF (Extended KF based on linearization technique). However, even for filtering problems with state dimensions of order *O*ð Þ 10 , relatively large ensembles of size *O*ð Þ 10000 would be required in order to produce reasonably good performance.

The AF in [13] is based on the different idea. Here, no linearization is required for nonlinear filtering problems. For the problem (1) and (2), the filter is of the form (4) and (5) but the gain *K* ¼ *K*ð Þ*θ* is assumed to be of a given stabilizing structure [6]. It means that *K* is parametrized by some vector of unknown parameters *θ* ∈ Θ so that the filter (4) and (5) with the gain *K*ð Þ*θ* , ∀*θ* ∈ Θ is stable. It is wellknown that under mild conditions, the solution of the ARE will tend (quickly) to stationary solution *M*<sup>∞</sup> and so the gain (6), to the stationary gain *K*∞. Moreover, the innovation *ζ*ð Þ¼ *t* þ 1 *z t*ð Þ� þ 1 ^*z t*ð Þ þ 1*=t* , representing the error for the output prediction ^*z t*ð Þ þ 1*=t* ≔ *Hx t* ^ð Þ¼ þ 1*=t HΦx t* ^ð Þ, is unbiased and of minimum variance. This fact leads to the idea to seek the optimal vector *θ* by minimizing

$$J(\theta) \coloneqq E[\Psi(\zeta)] \to \arg\min \theta \tag{9}$$

here *E*½ �*:* denotes the *average* in a *probabilistic sense*. For stationary systems (1) and (2), if we assume the validity of the ergodic hypothesis, the *average* value in a *probabilistic sense*, expressed in (7), is almost everywhere equivalent to the time average (for large time of running the dynamical system). The optimal *θ* <sup>∗</sup> can be found by solving the equation

$$\nabla\_{\theta} I(\theta) = \nabla\_{\theta} E[\Psi(\zeta)] = E[\nabla\_{\theta} \Psi(\zeta)] = \mathbf{0} \tag{10}$$

A stochastic approximation (SA) algorithm for solving (10) can be written out

$$
\theta(t+1) = \theta(t) - \chi(t)\nabla\_{\theta(t)}\Psi[\zeta(t+1)] \tag{11}
$$

Conditions related to the sequence of positive scalar *γ*ð Þ*t* for ensuring a convergence f g *θ*ð Þ*t* in the procedure (11) are

$$\gamma(t) > 0, \sum\_{t=0}^{\infty} \gamma(t) = \infty, \sum\_{t=0}^{\infty} \gamma^2(t) < \infty \tag{12}$$

One of the most advantages of the SA algorithm (11) is that, instead of computing the gradient of the cost function (9) (which requires knowledge of probability distribution), the algorithm (11) is based on the knowledge of only the gradient of sample cost function *Ψ* (wrt to *θ*) which can be easily evaluated numerically.

*Adaptive Filter as Efficient Tool for Data Assimilation under Uncertainties DOI: http://dx.doi.org/10.5772/intechopen.92194*

*Comment 2.1*. Generally speaking, the convergence rate of the algorithm (11) and (12) is *O*ð Þ 1*=t* . It is possible to improve the convergence rate of the SA by averaging of the iterates,

$$\theta\_m(t) = \frac{1}{t} \sum\_{t'=1}^t \theta(t') \tag{13}$$

For more details, see [14].

Direct application of the KF to HdSs is impossible due to the limit in computer power, memory, and computational time. In particular, the KF requires to solve the matrix AREs (7) and (8) for computing ECMs *M t*ð Þ, *P t*ð Þ. Storing such matrices is

Different simplified approaches are proposed for overcoming difficulties in the application of the KF. The example of successful tool for solving data assimilation problems in HdSs is the EnKF [9]. In the EnKF, an ensemble of error samples, of small size, is generated on the basis of model states to approximate the ECMs. In practice of data assimilation for HdSs, it is possible to generate only ensembles of moderate sizes (of order *O*ð Þ 100 ) by model integrations over the assimilation window (time interval between two successive arrivals of observations) since one such integration takes several hours! The other approach like PF is based on sampling from conditional distributions. Theoretically, this approach is more appropriate for nonlinear problems because no linearization is required as in the EKF (Extended KF based on linearization technique). However, even for filtering problems with state dimensions of order *O*ð Þ 10 , relatively large ensembles of size *O*ð Þ 10000 would be required in order to produce reasonably good performance. The AF in [13] is based on the different idea. Here, no linearization is required for nonlinear filtering problems. For the problem (1) and (2), the filter is of the form (4) and (5) but the gain *K* ¼ *K*ð Þ*θ* is assumed to be of a given stabilizing structure [6]. It means that *K* is parametrized by some vector of unknown parameters *θ* ∈ Θ so that the filter (4) and (5) with the gain *K*ð Þ*θ* , ∀*θ* ∈ Θ is stable. It is wellknown that under mild conditions, the solution of the ARE will tend (quickly) to stationary solution *M*<sup>∞</sup> and so the gain (6), to the stationary gain *K*∞. Moreover, the innovation *ζ*ð Þ¼ *t* þ 1 *z t*ð Þ� þ 1 ^*z t*ð Þ þ 1*=t* , representing the error for the output prediction ^*z t*ð Þ þ 1*=t* ≔ *Hx t* ^ð Þ¼ þ 1*=t HΦx t* ^ð Þ, is unbiased and of minimum variance. This fact leads to the idea to seek the optimal vector *θ* by minimizing

*J*ð Þ*θ* ≔ *E*½ �! *Ψ ζ*ð Þ arg min *θ* (9)

∇*θJ*ð Þ¼ *θ* ∇*θE*½ �¼ *Ψ ζ*ð Þ *E*½ �¼ ∇*θΨ ζ*ð Þ 0 (10)

*θ*ð Þ¼ *t* þ 1 *θ*ðÞ�*t γ*ð Þ*t* ∇*<sup>θ</sup>*ð Þ*<sup>t</sup> Ψ*½ � ζð Þ t þ 1 (11)

ð Þ*t* < ∞ (12)

X∞ *t*¼0 *γ*2

here *E*½ �*:* denotes the *average* in a *probabilistic sense*. For stationary systems (1) and (2), if we assume the validity of the ergodic hypothesis, the *average* value in a *probabilistic sense*, expressed in (7), is almost everywhere equivalent to the time average (for large time of running the dynamical system). The optimal *θ* <sup>∗</sup> can be

A stochastic approximation (SA) algorithm for solving (10) can be written out

Conditions related to the sequence of positive scalar *γ*ð Þ*t* for ensuring a

One of the most advantages of the SA algorithm (11) is that, instead of computing the gradient of the cost function (9) (which requires knowledge of probability distribution), the algorithm (11) is based on the knowledge of only the gradient of sample cost function *Ψ* (wrt to *θ*) which can be easily evaluated

*γ*ðÞ¼ *t* ∞,

impossible, not to say on computational time.

*Dynamic Data Assimilation - Beating the Uncertainties*

found by solving the equation

numerically.

**8**

convergence f g *θ*ð Þ*t* in the procedure (11) are

*<sup>γ</sup>*ð Þ*<sup>t</sup>* <sup>&</sup>gt;0, <sup>X</sup><sup>∞</sup>

*t*¼0

*Comment 2.2.* For high HdSs, even with *θ* being of moderate dimension, instead of the algorithm (12) or (13), the SPSA (Simultaneous Perturbation Stochastic Perturbation) algorithms in [15, 16] are of preference. That is due to the fact that integration of HdS over the assimilation window is very expensive. These algorithms generate stochastic perturbation *δθ* <sup>¼</sup> ð Þ *δθ*1, … , *δθ<sup>n</sup> <sup>T</sup>* with components as Bernoulli i.i.d. realizations. Each *i th*component of the gradient-like (pseudogradient) vector is computed as the divided difference *δΨ=δθ<sup>i</sup>* where *Ψ* ≔ *Ψ*½ �� θ þ δθ *Ψ*½ � θ . This allows to evaluate the gradient-like vector by only two or three time integration of the numerical model.

For details on the SPSA algorithm and its convergence rate, see [15, 16].

#### **3. Covariance uncertainties and AF**

#### **3.1 Covariance uncertainties**

#### *3.1.1 Adjoint approach*

As seen from (11) and (12), implementation of SA algorithms is much simpler for searching optimal gain parameters compared to the other optimization methods. The SA algorithms require only numerical computing derivatives ∇*θΨ ζ*½ � ð Þ *t* þ 1 of the sample cost function *Ψ ζ*½ � ð Þ *t* þ 1 wrt *θ* evaluated at *θ*ð Þ*t* and *γ*ð Þ*t* is a scalar which can be chosen a priori, for example, as *<sup>γ</sup>*ðÞ¼ *<sup>t</sup>* <sup>1</sup> *t* . That is possible due to introducing the ergodic hypothesis on of the system (1) and (2) from which there exists an asymptotic optimal gain

First, consider the situation when the vector of parameters consists of are all elements of *K*, *θ* ¼ *K:* Compute the innovation vector,

$$
\zeta(\mathbf{t}+\mathbf{1}) = \mathbf{z}(t+\mathbf{1}) - \dot{\mathbf{z}}(t+\mathbf{1}/t) = \mathbf{z}(t+\mathbf{1}) - H\Phi\dot{\mathbf{x}}(t)
$$

$$
\dot{\mathbf{x}}(t) = \dot{\mathbf{x}}(t-\mathbf{1}/t) + K(\theta)\zeta(\mathbf{t})
$$

$$
\delta\_{\theta}\zeta(\mathbf{t}+\mathbf{1}) = -H\Phi\delta\_{\theta}\dot{\mathbf{x}}(\mathbf{t}) = -H\Phi\delta\_{\theta}\mathbf{K}(\theta)\zeta(\mathbf{t}) = -H\Phi\delta\mathbf{K}\zeta(\mathbf{t}).
$$

$$
\delta\_{\mathbf{K}}\Psi\Big[\zeta(\mathbf{t}+\mathbf{1})\big] = \delta\_{\mathbf{K}} < \zeta(\mathbf{t}+\mathbf{1}), \zeta(\mathbf{t}+\mathbf{1})> = 2 < \zeta(\mathbf{t}+\mathbf{1}), \delta\_{\mathbf{K}}\zeta(\mathbf{t}+\mathbf{1})> 
$$

$$
= -2 < \zeta(\mathbf{t}+\mathbf{1}), H\Phi\delta\mathbf{K}\zeta(\mathbf{t})> = -2 < \Phi^{\mathsf{T}}H^{\mathsf{T}}\zeta(\mathbf{t}+\mathbf{1}), \delta\mathbf{K}\zeta(\mathbf{t})>.
$$

Let us compute derivatives of the sample cost function *Ψ* wrt the elements *Kij* of the gain *K*. To do so, one needs to integrate the adjoint operator *Φ<sup>T</sup>* s.t. the forcing *<sup>f</sup>* <sup>≔</sup> � *<sup>H</sup><sup>T</sup>ζ*ð Þ *<sup>t</sup>* <sup>þ</sup> <sup>1</sup> which yields *<sup>ψ</sup>* <sup>≔</sup> � *<sup>Φ</sup>THTζ*ð Þ *<sup>t</sup>* <sup>þ</sup> <sup>1</sup> and hence

$$
\delta\Psi'/\delta\mathcal{K}\_{\vec{\eta}} = -\nu\_i\zeta(t,j),\tag{14}
$$

here *ψ<sup>i</sup>* is the *i th* component of *<sup>ψ</sup>*, *<sup>ζ</sup>*ð Þ *<sup>t</sup>*, *<sup>j</sup>* the *<sup>j</sup> th* component of *<sup>ζ</sup>*ð Þ*<sup>t</sup>* . The AF now takes the form

$$
\hat{\mathfrak{x}}(t+\mathbf{1}) = \hat{\mathfrak{x}}(t+\mathbf{1}/t) + K\zeta(t+\mathbf{1}),\tag{15}
$$

for HdSs, the number of elements of *K* is very large. It is therefore primordial to choose a parametrized stabilizing structure for *K* (depending on *θ*) to ensure a stability of *L* and reducing a number of tuning parameters. This question is addressed

*Adaptive Filter as Efficient Tool for Data Assimilation under Uncertainties*

*<sup>e</sup> HeMe H<sup>T</sup>*

Consider the gain structure (19). Suppose that *Me* has been chosen in agreement with the required stability conditions. Before tuning the parameters *θ<sup>i</sup>* to minimize the cost function (9), remark that stability of the filter is still ensured for the

where Λ ¼ diagð Þ *λ*1, … , *λ<sup>r</sup>* , *λ<sup>k</sup>* ∈ð Þ 0, 1 since then for *Γ* ¼ ΛΘ ¼ diag *γ*1, … , *γ<sup>r</sup>* ð Þ it implies *γ*ðÞ¼ *t λ*ð Þ*t θ*ð Þ*t* ∈ ð Þ *ϵ*, 2 � *ϵ* , where *ϵ*∈ ð Þ 0, 1 . Writing the equation for the filtered error (FE) *ef*ð Þ*t* ≔ *x t* ^ð Þ� *x t*ð Þ one sees that the matrix *L* in (18) also represents the transition of the FE *ef*ð Þ*t* . It means that it is possible to choose a more

The problem (21) is solved without using the observations, hence it is offline. Once the optimal Λ<sup>∗</sup> has been found, the standard AF is implemented s.t. the filter gain

*<sup>e</sup>* ,*K* <sup>∗</sup>

optimal initial gain by solving, for example, the minimization problem

*K t*ðÞ¼ *Pr*Θ*K*<sup>∗</sup>

*Jo*ð Þ¼ <sup>Λ</sup> k k *<sup>L</sup>*ð Þ <sup>Λ</sup> <sup>2</sup>

*K t*ðÞ¼ *Pr*ΛΘ*Ke*, (20)

<sup>2</sup> ! *min* <sup>Λ</sup> (21)

*<sup>e</sup>* ≔ Λ<sup>∗</sup>*Ke* (22)

where *Pr* <sup>∈</sup> *<sup>R</sup>nxr* is a matrix with dimensions *<sup>n</sup>* � *<sup>r</sup>*, *<sup>r</sup>* is the dimension of the reduced space (equal to the number of unstable eigenvectors (EiVecs)of *Φ*), the matrix *Me* is a strictly positive symmetric definitive playing the role of the ECM in the reduced space *Re*, Θ is a diagonal matrix with diagonal elements *θ<sup>i</sup>* whose values belong to ð Þ *ϵ*, 2 � *ϵ* , i.e. *θ<sup>i</sup>* ∈ð Þ *ϵ*, 2 � *ϵ* , with *ϵ*∈ð Þ 0, 1 whose value depends on the modulus of the first stable eigenvalue (EiV) [6]. We will refer to the filter s.t. (19) with *Θ* ¼ *Id* (*Id* is the identity matrix of appropriate dimension) as a *nonadaptive filter* (NAF). In the AF, the parameters *θ<sup>i</sup>* are adjusted each time when a new observation arrives, to minimize the cost function (9). Thus *θ<sup>i</sup>* is a time-varying function. As to the matrix *Pr*, its choice is important to ensure a filter stability. One simple and efficient procedure (called Prediction Error Sampling Procedure—PeSP) to generate *Pr* is to use the power orthogonal iteration method [18] which allows to compute real leading Schur vectors (SchVecs) of *Φi.* The advantage of using the SchVecs compared to the EiVecs, is that they are real and their computation is stable. It is seen that the optimal AF is found in a class of stable filters which is stable even for an unstable numerical model. As to the VM, the optimal trajectory is found on the basis of only the numerical model with the initial state as a control vector. It means that for unstable dynamics, the errors in the forcing or numerical errors arising during computations will be amplified and lead to large estimation error growth. More seriously, the VM requires a large set of observations and large number of iterations (i.e., many forward and backward integrations of the direct and adjoint models) which naturally leads to increase of

*<sup>e</sup>* <sup>þ</sup> <sup>α</sup><sup>I</sup> �<sup>1</sup>

, *He* ≔ *HPr* (19)

in [6]. One of possible structures for *K* is of the form

*DOI: http://dx.doi.org/10.5772/intechopen.92194*

*K t*ðÞ¼ *Pr*Θ*Ke*, *Ke* <sup>≔</sup> *MeH<sup>T</sup>*

estimation error too.

following gain:

**11**

**3.3 On improving the initial gain**

$$
\hat{\mathfrak{x}}(t+\mathbf{1}/t) = \Phi \hat{\mathfrak{x}}(t), \zeta(t+\mathbf{1}) = \mathbf{z}(t+\mathbf{1}) - H\hat{\mathfrak{x}}(t+\mathbf{1}/t), \tag{16}
$$

$$K(t+1) = K(t) - \chi(t)\nabla\_K \Psi[\zeta(t+1)],\tag{17}$$

where ∇*KΨ*ð Þ ζð Þ t þ 1 is the gradient vector whose components are computed by (14). In the AF (15)–(17), no matrix ARE (see (7) and (8) in the KF) is involved. The AF (15)–(17) is quite realizable for HdSs, since at each assimilation instant we need to integrate only the direct model to produce the forecast (16) and (eventually) an adjoint model over the assimilation window for computing ∇*KΨ*½ � ζð Þ t þ 1 [13].

#### *3.1.2 Simultaneous perturbation stochastic approximation (SPSA) approach*

Remark that in the form (14) the adjoint operator *Φ<sup>T</sup>* would be available to implement the AF. It is well-known that construction of numerical code for *Φ<sup>T</sup>* is a very difficult and heavy task, especially for meteorological and oceanic numerical models which are HdSs and nonlinear (linearization is required).

A comparison study of the AF with other assimilation methods is done in [17]. Compared to the AF, the widely used variational method (VM) minimizes the distance between the observations available (for example, the observations of the whole set *Z*½ � 1 : *T* ) and the outputs of the dynamical system. This optimization problem is carried out in the phase space, hence is very difficult and expensive. Theoretically, a simplification is possible subject to (s.t.) the condition of linearity of the dynamical system: in this case, one can reformulate the VM minimization problem as searching the best estimate for the initial system state *x*ð Þ 0 . For HdSs, to ensure a merely high quality estimate for *x*ð Þ 0 , it is necessary: (i) to take the observation window as large as possible; (ii) to parameterize the initial state by some parameters (using a slow manifold, for example). Iterative minimization procedures require usually *O*ð Þ 10 iterates involving integrating the direct and adjoint models over the window 1, ½ � *T* . For an unstable dynamics, integration of direct and adjoint equations over a long period naturally amplify the initial errors during assimilation process. For a more detailed comparison between the AF and VM, see [17].

Thus, if the ergodic conditions hold, there exists an optimal stationary gain and the AF in limit will approach to the optimal one *in the given class of stable filters*. It is important to emphasize that up to this point, no covariance matrices *Q*, *R* are specified. It means that the AF in the form (12) is robust to uncertainties in the specification of the covariances of the ME and ObE.

#### **3.2 Stability of the AF**

One of the main features of the AF is related to its stability.

For simplicity of presentation, in the previous section, the AF algorithm is written out under the assumption (13). In practice, application of the AF in the form (13) is not recommended since instability may occur. It is easy to see that the transition matrix of the filter is given by

$$L = (I - \mathcal{K}H)\Phi \tag{18}$$

It is evident that if we do not take care on the structure of *K*, varying stochastically all elements of *K* can lead to instability of *L* and the filter will be exploded. Moreover, *Adaptive Filter as Efficient Tool for Data Assimilation under Uncertainties DOI: http://dx.doi.org/10.5772/intechopen.92194*

for HdSs, the number of elements of *K* is very large. It is therefore primordial to choose a parametrized stabilizing structure for *K* (depending on *θ*) to ensure a stability of *L* and reducing a number of tuning parameters. This question is addressed in [6]. One of possible structures for *K* is of the form

$$K(t) = P\_r \Theta K\_\varepsilon,\\ K\_\varepsilon := \mathbf{M}\_\varepsilon \mathbf{H}\_\varepsilon^T \left[ H\_\varepsilon \mathbf{M}\_\varepsilon \left. \mathbf{H}\_\varepsilon^T + \mathbf{a} \mathbf{I} \right]^{-1},\\ H\_\varepsilon := \mathbf{H} \mathbf{P}\_r \tag{19}$$

where *Pr* <sup>∈</sup> *<sup>R</sup>nxr* is a matrix with dimensions *<sup>n</sup>* � *<sup>r</sup>*, *<sup>r</sup>* is the dimension of the reduced space (equal to the number of unstable eigenvectors (EiVecs)of *Φ*), the matrix *Me* is a strictly positive symmetric definitive playing the role of the ECM in the reduced space *Re*, Θ is a diagonal matrix with diagonal elements *θ<sup>i</sup>* whose values belong to ð Þ *ϵ*, 2 � *ϵ* , i.e. *θ<sup>i</sup>* ∈ð Þ *ϵ*, 2 � *ϵ* , with *ϵ*∈ð Þ 0, 1 whose value depends on the modulus of the first stable eigenvalue (EiV) [6]. We will refer to the filter s.t. (19) with *Θ* ¼ *Id* (*Id* is the identity matrix of appropriate dimension) as a *nonadaptive filter* (NAF). In the AF, the parameters *θ<sup>i</sup>* are adjusted each time when a new observation arrives, to minimize the cost function (9). Thus *θ<sup>i</sup>* is a time-varying function. As to the matrix *Pr*, its choice is important to ensure a filter stability. One simple and efficient procedure (called Prediction Error Sampling Procedure—PeSP) to generate *Pr* is to use the power orthogonal iteration method [18] which allows to compute real leading Schur vectors (SchVecs) of *Φi.* The advantage of using the SchVecs compared to the EiVecs, is that they are real and their computation is stable. It is seen that the optimal AF is found in a class of stable filters which is stable even for an unstable numerical model. As to the VM, the optimal trajectory is found on the basis of only the numerical model with the initial state as a control vector. It means that for unstable dynamics, the errors in the forcing or numerical errors arising during computations will be amplified and lead to large estimation error growth. More seriously, the VM requires a large set of observations and large number of iterations (i.e., many forward and backward integrations of the direct and adjoint models) which naturally leads to increase of estimation error too.

#### **3.3 On improving the initial gain**

Consider the gain structure (19). Suppose that *Me* has been chosen in agreement with the required stability conditions. Before tuning the parameters *θ<sup>i</sup>* to minimize the cost function (9), remark that stability of the filter is still ensured for the following gain:

$$K(t) = P\_r \Lambda \Theta K\_\varepsilon,\tag{20}$$

where Λ ¼ diagð Þ *λ*1, … , *λ<sup>r</sup>* , *λ<sup>k</sup>* ∈ð Þ 0, 1 since then for *Γ* ¼ ΛΘ ¼ diag *γ*1, … , *γ<sup>r</sup>* ð Þ it implies *γ*ðÞ¼ *t λ*ð Þ*t θ*ð Þ*t* ∈ ð Þ *ϵ*, 2 � *ϵ* , where *ϵ*∈ ð Þ 0, 1 . Writing the equation for the filtered error (FE) *ef*ð Þ*t* ≔ *x t* ^ð Þ� *x t*ð Þ one sees that the matrix *L* in (18) also represents the transition of the FE *ef*ð Þ*t* . It means that it is possible to choose a more optimal initial gain by solving, for example, the minimization problem

$$J\_o(\Lambda) = \|L(\Lambda)\|\_2^2 \to \min\_{\Lambda} \tag{21}$$

The problem (21) is solved without using the observations, hence it is offline. Once the optimal Λ<sup>∗</sup> has been found, the standard AF is implemented s.t. the filter gain

$$K(t) = P\_r \Theta K\_\epsilon^\*,\\ K\_\epsilon^\* \coloneqq \Lambda^\* K\_\epsilon \tag{22}$$

here *ψ<sup>i</sup>* is the *i*

takes the form

*th* component of *<sup>ψ</sup>*, *<sup>ζ</sup>*ð Þ *<sup>t</sup>*, *<sup>j</sup>* the *<sup>j</sup>*

*Dynamic Data Assimilation - Beating the Uncertainties*

*th* component of *<sup>ζ</sup>*ð Þ*<sup>t</sup>* . The AF now

*x t* ^ð Þ¼ þ 1 *x t* ^ð Þþ þ 1*=t Kζ*ð Þ *t* þ 1 , (15)

*K t*ð Þ¼ þ 1 *K t*ðÞ� *γ*ð Þ*t* ∇*KΨ*½ � ζð Þ t þ 1 , (17)

*x t* ^ð Þ¼ þ 1*=t Φx t* ^ð Þ, *ζ*ð Þ¼ *t* þ 1 *z t*ð Þ� þ 1 *Hx t* ^ð Þ þ 1*=t* , (16)

where ∇*KΨ*ð Þ ζð Þ t þ 1 is the gradient vector whose components are computed by (14). In the AF (15)–(17), no matrix ARE (see (7) and (8) in the KF) is involved. The AF (15)–(17) is quite realizable for HdSs, since at each assimilation instant we need to integrate only the direct model to produce the forecast (16) and (eventually) an adjoint model over the assimilation window for computing ∇*KΨ*½ � ζð Þ t þ 1 [13].

*3.1.2 Simultaneous perturbation stochastic approximation (SPSA) approach*

models which are HdSs and nonlinear (linearization is required).

detailed comparison between the AF and VM, see [17].

specification of the covariances of the ME and ObE.

transition matrix of the filter is given by

One of the main features of the AF is related to its stability.

**3.2 Stability of the AF**

**10**

Remark that in the form (14) the adjoint operator *Φ<sup>T</sup>* would be available to implement the AF. It is well-known that construction of numerical code for *Φ<sup>T</sup>* is a very difficult and heavy task, especially for meteorological and oceanic numerical

A comparison study of the AF with other assimilation methods is done in [17]. Compared to the AF, the widely used variational method (VM) minimizes the distance between the observations available (for example, the observations of the whole set *Z*½ � 1 : *T* ) and the outputs of the dynamical system. This optimization problem is carried out in the phase space, hence is very difficult and expensive. Theoretically, a simplification is possible subject to (s.t.) the condition of linearity of the dynamical system: in this case, one can reformulate the VM minimization problem as searching the best estimate for the initial system state *x*ð Þ 0 . For HdSs, to ensure a merely high quality estimate for *x*ð Þ 0 , it is necessary: (i) to take the observation window as large as possible; (ii) to parameterize the initial state by some parameters (using a slow manifold, for example). Iterative minimization procedures require usually *O*ð Þ 10 iterates involving integrating the direct and adjoint models over the window 1, ½ � *T* . For an unstable dynamics, integration of direct and adjoint equations over a long period naturally amplify the initial errors during assimilation process. For a more

Thus, if the ergodic conditions hold, there exists an optimal stationary gain and the AF in limit will approach to the optimal one *in the given class of stable filters*. It is important to emphasize that up to this point, no covariance matrices *Q*, *R* are specified. It means that the AF in the form (12) is robust to uncertainties in the

For simplicity of presentation, in the previous section, the AF algorithm is written out under the assumption (13). In practice, application of the AF in the form (13) is not recommended since instability may occur. It is easy to see that the

It is evident that if we do not take care on the structure of *K*, varying stochastically all elements of *K* can lead to instability of *L* and the filter will be exploded. Moreover,

*L* ¼ ð Þ *I* � *KH Φ* (18)

It is seen that using the structure (22) this optimization procedure does not require the information on the ME statistics.

**5. Simple numerical examples**

*DOI: http://dx.doi.org/10.5772/intechopen.92194*

tainties, introduce the one-dimensional system

To see the difference between the AF and the KF in doing with ME uncer-

*Adaptive Filter as Efficient Tool for Data Assimilation under Uncertainties*

In (25), *Φ* is the unique eigenvalue (also the singular value) of the system

*x t*ð Þ¼ þ 1 *Φx t*ðÞþ *w t*ð Þ, *z t*ð Þ¼ þ 1 *hx t*ð Þþ þ 1 *v t*ð Þ þ 1 , *t* ¼ 0, 1, … (25)

i. For simplicity, let *Φ* ¼ 1, *h* ¼ 1. This corresponds to the situation when the system is *neutrally stable*. The filter fundamental matrix (18) now is *L K*ð Þ¼ ð Þ 1 � *K* which is stable if *K* ∈ð Þ 0, 2 . For the KF gain (4)–(8), as

true for any *Mkf*ð Þ*t* ≥ 0, *R*> 0. It means then the KF is stable. Mention that if *<sup>Q</sup>* <sup>&</sup>gt; 0 always *Mkf*ð Þ*<sup>t</sup>* <sup>&</sup>gt; 0. In general, *Kkf*ðÞ¼ *<sup>t</sup> Mkf*ð Þ*<sup>t</sup> Mkf*ðÞþ*<sup>t</sup> <sup>R</sup>* <sup>þ</sup> where

For the AF, we have in this case *Pr* ¼ 1*:* Consider the gain *Kaf*ð Þ*θ* ≔ *PrθKe*,

We have then for the NAF (*θ* ¼ 1Þ 0<*Ke* < 1 and *Knaf* ¼ *Ke*.

For the AF, the transition matrix (18) reads *Laf*ð Þ¼ *θ* ð Þ 1 � *θKe* . For *θ* ∈ð Þ 0, 2 , ∣*Lkf*ð Þ*θ* ∣ ∈ð Þ 0, 1 , *Kaf*ð Þ*θ* ∈ ð Þ 0, 2 and the AF is stable. It is evident that there is a larger margin for varying the gain in the AF than that in the KF since *Kkf*ð Þ*t* ∈ ð Þ 0, 1 . One sees that the stationary KF is a member of the class of stable AFs (19). The performance of AF is optimized by solving the

problem (9) using the procedure (11) and (12) or SPSA algorithms

structure. In this situation, the filter is stable even for *Kaf* ¼ 0.

*Φ* � 1 *KeΦ*

In particular, when *<sup>Φ</sup>* ! 1, approximately *<sup>θ</sup>* <sup>∈</sup> 0, <sup>2</sup>

*Ke<sup>Φ</sup>* ! <sup>1</sup>

ii. Let *Φ* <1, i.e., the system (1) is *stable*. The results in (i) are valid for the AF

iii. Let d e *Φ* >1—the system (1) is *unstable.* Consider two situations (a) *Φ* >1;

<*θ* <

usually in practice), approximately *θ* ∈ ð Þ 0, 2 as in the situation (i). For large *Φ* ≫ 1,

for varying *θ* (or *Q* ≫ *R*) and *Kaf* ! 1. It is important to emphasize that as *Ke* is chosen by designer, we can define the interval for varying *θ* if the amplitude of *Φ* is more or less known. In practice, one can vary *θ* ∈½ � *ϵ*, 2 þ *ϵ* with small *ϵ*>0 for *Φ*

*Φ* þ 1

*Ke*

*Ke* (right-hand limit) and there remains no margin

*Mkf*ð Þþ*<sup>t</sup> <sup>R</sup>* we have *Kkf*ð Þ*<sup>t</sup>* <sup>∈</sup>ð Þ 0, 1 , *Mkf*ð Þ*<sup>t</sup>* is the solution of (7). That is

*Me*þ*<sup>R</sup>*, *Me* <sup>&</sup>gt;0, *<sup>R</sup>*>0, *Me* is constant.

*Ke<sup>Φ</sup>* (26)

. When *<sup>Q</sup>* <sup>≫</sup> *<sup>R</sup>* (that is

**5.1 One-dimensional system**

*Kkf*ðÞ¼ *t*

(*Comment 2.2*).

For *Φ* >1 we have

*Φ*�1 *Ke<sup>Φ</sup>* ! <sup>1</sup>

**13**

(b) *Φ* < � 1. As before *Pr* ¼ 1.

*Ke* (left-hand limit), *<sup>Φ</sup>*þ<sup>1</sup>

close to 1, and with *ϵ* close to 1 for large *Φ*.

*Mkf*ð Þ*t*

½ � *A* <sup>þ</sup> is the pseudo-inverse of *A* [21].

where *Ke* is the gain of the form *Ke* <sup>¼</sup> *Me*

dynamics.

## **4. Joint estimation of state and model error in AF**

The previous section shows how the AF is designed to deal with the difficulty in specification of covariances of the ME and ObE. This is done without exploiting a possibility to determine, more or less correctly, a subspace for the ME. If such a subspace can be determined without major difficulties, it would be beneficial for better estimating the AF gain and improving the filter performance. In [19], the hypothesis of the structure of the ME has been introduced and a number of experiments have been successfully conducted.

There is a long history of joint estimation of state and ME for filtering algorithms, in particular, with the bias and covariance estimation. One of the most original approaches, dealing with the treatment of bias in recursive filtering (known as bias-separated estimation—BSE), is carried out by Friedland in [20]. He has shown that the MMSE *state estimator* for a linear dynamical system augmented with bias states can be decomposed into three parts: (1) bias-free state estimator; (2) bias estimator; and (3) blender. This BSE approach has the advantage that it requires fewer numerical operations than the traditional augmented-state implementation and avoids numerical ill-conditioning compared to the case of bias-separated estimation by filtering technique.

It is common to treat the bias as part of the system state and then estimate the bias as well as the system state. There are two types of ME—deterministic (DME) and stochastic (SME). Generally speaking, a suitable equation can be introduced for the ME. In the presence of bias, under the assumption on constant *b*, instead of (1) one has

$$\mathbf{x}(t+1) = \Phi \mathbf{x}(t) + b(t) + w(t), \\ b(t+1) = b(t), \\ t = \mathbf{0}, \mathbf{1}, \mathbf{2} \dots \tag{23}$$

To introduce a subspace for the variables *w t*ð Þ, *b t*ð Þ the SME and DME in (23), let

$$w = \mathcal{G}\_w \rho, b = \mathcal{G}\_b d$$

$$\mathcal{G}\_w \in \mathbb{R}^{n \times n\_w}, \mathcal{G}\_b \in \mathbb{R}^{n \times n\_d}, n \ge n\_w, n \ge n\_b \tag{24}$$

Generally speaking, *Gw*, *Gb* are unknown, and finding reasonable hypothesizes for them is desirable but not self-evident. In [19], one hypothesis for *Gw*, *Gb* has been introduced (it will be referred to as Hypothesis on model error—HME).

The information on *Gw*, *Gb*, given in (25), allows to better estimate the DME *b* and SME *w* for improving the filter performance, especially for *nb* < *n*, *nw* < *n* in a HdS setting. The difficulty, encountered in practice of operational forecasting systems, is that (practically) nothing is given a priori on the space of the ME values. To overcome this difficulty, one simple hypothesis has been introduced in [19]. This hypothesis is postulated by taking into consideration the fact that for a large number of data assimilation problems in HdSs, the model time step *δt* (chosen for ensuring a stability of numerical scheme and for guaranteeing a high precision of the discrete solution) is much smaller than Δ*t*—the assimilation window (time interval between two successive observation arrivals).

Suppose that Δ*t* ¼ *naδt* where *na* is a positive integer number.

*Hypothesis* (on the subspace of ME—HME) [19]. Under the condition that *na* is relatively large, the ME belongs to the subspace spanned by all unstable and neutral EiVecs (or SchVecs) of the system dynamics *Φ*.
