**2. The general framework of several assimilation approaches**

Intuitionally, one might think that an optimal simulation scheme is to directly replace model variables by observations during numerical integrations. Such a direct replacement is usually not correct since observations are not perfect and contain errors. A simple replacement will introduce observation errors into models, and ignore possible impact of observation errors on model behaviours, easily resulting in imbalance of model dynamics and physics. Thus, the application of observations into numerical models must consider both model and observation errors that play a critical role in the assimilation process.

We will start to display the assimilation concept by a simple example. A detail introduction can be found in [2].

For an unknown true state value, denoted by , there are two samples, denoted by 1(e.g. model simulation) and 2(observation), which have the errors 1 and 2, respectively. Thus, we have

$$T\_1 = T\_t + \epsilon\_1,\tag{1}$$

$$T\_2 = T\_t + \epsilon\_2.\tag{2}$$

We assume the measurement or observation is unbiased, and the variances of errors are known, i.e. *E*( 1) = *E* ( 2) = 0, *Var* ( 1) = *σ*<sup>1</sup> 2 , *Var* ( 2) = *σ*<sup>2</sup> 2 . The question here is to seek an optimal estimate, denoted by (called analysis in the assimilation field), for using 1 and 2. This optimal estimate is the central issue of data assimilation.

There are several methods for this solution, as demonstrated below.

#### **2.1. Least-squares method**

Assume the analysis is a linear combination of both 1 and 2, that is, = 11 + 22. Due to the assumption that both 1 and 2 are unbiased, should be unbiased, i.e. () = ( ), so 1(1) + 2(2) = ( ), and then 1 + 2 = 1. The best (optimal) estimate should minimize the variance of as below:

$$\begin{split} \sigma\_a^2 = E\left[T\_a - T\_t\right]^2 &= E\left[a\_1 T\_1 + a\_2 T\_2 - T\_t\right]^2 = E\left[a\_1 \left(T\_1 - T\_t\right) + a\_2 \left(T\_2 - T\_t\right)\right]^2 \\ &= E\left(a\_1^2 \epsilon\_1^2 + a\_2^2 \epsilon\_2^2 + 2a\_1 a\_2 \epsilon\_1 \epsilon\_2\right) = a\_1^2 \sigma\_1^2 + \left(1 - a\_1\right)^2 \sigma\_2^2, \end{split} \tag{3}$$

here, we assumed that the errors of 1 and 2 are uncorrelated, i.e. *E*( <sup>1</sup> 2)=0. To minimize 2, let ∂ 2/ ∂1 = 0, thus

$$a\_{\rm l} = \frac{\sigma\_2^2}{\sigma\_1^2 + \sigma\_2^2} \tag{4}$$

Namely,

This tutorial places emphasis on the rationale behind each method, including: (i) the principle for deriving the algorithm; (ii) the basic assumptions of each method; (iii) the connection and relation between different methods (e.g. extended Kalman filter (EKF) and EnKF, EnKF and sigma-point Kalman filters (SPKF), etc.); and (iv)the advantage and deficiency of each method. This chapter has been written and organized through teaching for under-/graduatestudents in earth science courses. It can also be a good reference to researchers in the field of modelling

Intuitionally, one might think that an optimal simulation scheme is to directly replace model variables by observations during numerical integrations. Such a direct replacement is usually not correct since observations are not perfect and contain errors. A simple replacement will introduce observation errors into models, and ignore possible impact of observation errors on model behaviours, easily resulting in imbalance of model dynamics and physics. Thus, the application of observations into numerical models must consider both model and observation

We will start to display the assimilation concept by a simple example. A detail introduction

model simulation) and 2(observation), which have the errors 1 and 2, respectively. Thus, we

We assume the measurement or observation is unbiased, and the variances of errors are known,

Assume the analysis is a linear combination of both 1 and 2, that is, = 11 + 22. Due to the assumption that both 1 and 2 are unbiased, should be unbiased, i.e. () = (

), and then 1 + 2 = 1. The best (optimal) estimate should minimize

1 1, *T T* = +*<sup>t</sup>* ò

2 2. *T T* = +*<sup>t</sup>* ò

2

, there are two samples, denoted by 1(e.g.

(1)

(2)

. The question here is to seek an optimal estimate,

using 1 and 2. This optimal

), so

**2. The general framework of several assimilation approaches**

errors that play a critical role in the assimilation process.

2

denoted by (called analysis in the assimilation field), for

There are several methods for this solution, as demonstrated below.

estimate is the central issue of data assimilation.

, *Var* ( 2) = *σ*<sup>2</sup>

For an unknown true state value, denoted by

and data assimilation.

154 Nonlinear Systems - Design, Analysis, Estimation and Control

can be found in [2].

i.e. *E*( 1) = *E* ( 2) = 0, *Var* ( 1) = *σ*<sup>1</sup>

**2.1. Least-squares method**

1(1) + 2(2) = (

the variance of as below:

have

$$T\_a = a\_1 T\_1 + (1 - a\_1) T\_2 = T\_1 + \frac{\sigma\_2^2}{\sigma\_1^2 + \sigma\_2^2} (T\_2 - T\_1). \tag{5}$$

Using Eq. (5), the variance of Ta could be minimized.

#### **2.2. Variational approach**

In general, assimilation methods can be classified into two categories: variational and sequential. Variational methods such as three-dimensional variational (3D-Var) method and fourdimensional variational (4D-Var) method [3, 4] are batch methods, whereas sequential methods such as Kalman filter (KF) [5] belong to the estimation theory.

They both have had great success. The European Centre for Medium-Range Weather Forecasts (ECMWF) introduced the first 4D-Var method into the operational global analysis system in November 1997 [6–8]. The ensemble Kalman filter (EnKF) was first introduced into the operational ensemble prediction system by Canadian Meteorological Centre (CMC) in January 2005 [9].

This chapter is a tutorial of the ensemble-based sequential data assimilation methods, such as EnKF and its variants. However, we will briefly demonstrate the idea of variational assimilation by the above example.

First, a cost function should be defined for variational assimilation approach. For this simple example, we define the cost function as below:

$$J\left(T\right) = \frac{1}{2} \mathbf{I} \frac{\left(T - T\_1\right)^2}{\sigma\_1^2} + \frac{\left(T - T\_2\right)^2}{\sigma\_2^2} \mathbf{I} \tag{6}$$

$$T = a\_1 T\_1 + a\_2 T\_2. \tag{7}$$

The solution is to seek an analysis , determined by 1 and 2, leading to the cost function minimum, i.e. = min () . Obviously, we have ∂ ()/ ∂1 = 0 and ∂ ()/ ∂2 = 0. Substitute with (6), it is

$$\frac{\partial J(T)}{\partial a\_1} = \frac{T - T\_1}{\sigma\_1^2} \frac{\partial T}{\partial a\_1} + \frac{T - T\_2}{\sigma\_2^2} \frac{\partial T}{\partial a\_1} = 0 \tag{8}$$

Eq. (7) leads to ∂ ∂1 = 1. Thus, the solution of (8), denoted by , satisfies

$$T\_a = \frac{\sigma\_2^2}{\sigma\_1^2 + \sigma\_2^2} T\_1 + \frac{\sigma\_1^2}{\sigma\_1^2 + \sigma\_2^2} T\_2. \tag{9}$$

The above is a simple example of the 3D variational assimilation approach, where we only consider the analysis error (cost function) for a single time. However, in many cases, we need to consider the error growth during a period, i.e. the sum of errors during the period, in the cost function Eq. (6). For example, the cost function of 4D-Var is defined as below:

$$J(T) = \frac{1}{2} \sum\_{s=1}^{N} \left[ \frac{\left(T(t\_s) - T\_1(t\_s)\right)^2}{\sigma\_1^2} + \frac{\left(T(t\_s) - T\_2(t\_s)\right)^2}{\sigma\_2^2} \right]. \tag{10}$$

Meanwhile () follows a dynamical model, saying () = ∫ 0 (()) = ((0)), where

*F* is a nonlinear dynamical model, *M*n is the integral operator and <sup>0</sup> is the initial time. Thus, the cost function value of (10) is only determined by the initial condition. Namely, the objective here is to seek optimal initial condition (0) that enables (10) minimum, i.e. minimizing (10) subject to dynamical model *F*. This is a standard conditional extreme problem that can be solved by Lagrange multiplier approach. However, the complexity of dynamical model excludes the possibility to get the analytical solution. We have to solve the minimum problem with aid of numerical methods, e.g. Newton conjugate gradient method. All of numerical methods require the gradient value ∂ ∂0 for solution.

Again, it is almost impossible for obtaining analytical solution of ∂ ∂0 due to the complexity of *F*. Usually researchers get the gradient value numerically using an approach of tangent linear and adjoint models. The details on tangent linear and adjoint models can be found in relevant references as cited above. It should be noticed that it is very difficult, even intractable sometimes, to construct tangent linear and adjoint models in some cases. Thus, more and more researchers have started to apply sequential assimilation methods instead of 4D-Var in recent years. Next, we will introduce the concept of the sequential assimilation method using the above example.

#### **2.3. Bayesian approach**

The solution is to seek an analysis , determined by 1 and 2, leading to the cost function minimum, i.e. = min () . Obviously, we have ∂ ()/ ∂1 = 0 and ∂ ()/ ∂2 = 0.

> 1 2 2 2 1 11 21 ( ) <sup>0</sup> ¶ -¶ -¶ =+ = ¶ ¶¶ *JT T T T T T T a aa*

 s

= 1. Thus, the solution of (8), denoted by , satisfies

 s

2 2 2 1 22 22 1 2 12 12 . *T TT <sup>a</sup>*

 ss

The above is a simple example of the 3D variational assimilation approach, where we only consider the analysis error (cost function) for a single time. However, in many cases, we need to consider the error growth during a period, i.e. the sum of errors during the period, in the

> 1 2 2 1 2

the cost function value of (10) is only determined by the initial condition. Namely, the objective here is to seek optimal initial condition (0) that enables (10) minimum, i.e. minimizing (10) subject to dynamical model *F*. This is a standard conditional extreme problem that can be solved by Lagrange multiplier approach. However, the complexity of dynamical model excludes the possibility to get the analytical solution. We have to solve the minimum problem with aid of numerical methods, e.g. Newton conjugate gradient method. All of numerical

for solution.

of *F*. Usually researchers get the gradient value numerically using an approach of tangent linear and adjoint models. The details on tangent linear and adjoint models can be found in relevant references as cited above. It should be noticed that it is very difficult, even intractable sometimes, to construct tangent linear and adjoint models in some cases. Thus, more and more researchers have started to apply sequential assimilation methods instead of 4D-Var in recent years. Next, we will introduce the concept of the sequential assimilation method using the

1 ( ( ) ( )) ( ( ) ( )) () [ ]. <sup>2</sup> <sup>=</sup> - - = + å*<sup>N</sup> nn n n*

2 2

 s

0

∂0

 

1 2

(8)

(10)

(()) = ((0)), where

<sup>0</sup> is the initial time. Thus,

due to the complexity

+ + (9)

s

s

= +

cost function Eq. (6). For example, the cost function of 4D-Var is defined as below:

*Tt Tt Tt T t J T* s

ss

*n*

Meanwhile () follows a dynamical model, saying () = ∫

*F* is a nonlinear dynamical model, *M*n is the integral operator and

∂0

Again, it is almost impossible for obtaining analytical solution of ∂

Substitute with (6), it is

Eq. (7) leads to ∂

∂1

156 Nonlinear Systems - Design, Analysis, Estimation and Control

methods require the gradient value ∂

above example.

Assume 1 and 1 are the mean value and standard deviation of the model prediction that implies a prior probability distribution of truth *T*,

$$p(T) = \frac{1}{\sqrt{2\pi}\,\sigma\_1} e^{-\frac{(T-T\_1)^2}{2\sigma\_1^2}}\tag{11}$$

Obviously, this is a Gaussian distribution function, which can be denoted by N(*T*1, 1) Given the observation 2 and its standard deviation 2, the posterior distribution of the truth can be expressed by Bayes' theorem:

$$p\left(T|T\_z\right) = \frac{p\left(T\_2|T\right)p\left(T\right)}{p\left(T\_2\right)} \propto \frac{1}{\sqrt{2\pi}\sigma\_2} e^{\frac{(T-T\_1)^2}{2\sigma\_2^2}} \frac{1}{\sqrt{2\pi}\sigma\_1} e^{\frac{(T-T\_1)^2}{2\sigma\_1^2}}.\tag{12}$$

(2) was ignored in (12) since it is independent of *T*, and usually plays as a normalization factor. The likelihood function (2|) describes the probability that the observation becomes 2 given an estimation of *T*. It is commonly assumed to be Gaussian (, 2). The object here is to estimate the truth by maximizing the posterior probability ( |2)(namely, we ask the truth to occur as much as possible—maximum probability). Maximizing ( |2) is equivalent to maximizing the logarithm of the right item of (12), i.e.

$$\begin{split} \log(p(T \mid T\_{2})) &= \log(\frac{1}{\sqrt{2\pi}\sigma\_{2}}) - \frac{(T - T\_{2})^{2}}{2\sigma\_{2}^{2}} + \log(\frac{1}{\sqrt{2\pi}\sigma\_{1}}) - \frac{(T - T\_{1})^{2}}{2\sigma\_{1}^{2}} \\ &= \text{const} - \frac{1}{2} \left[ \frac{\left(T - T\_{2}\right)^{2}}{\sigma\_{2}^{2}} + \frac{\left(T - T\_{1}\right)^{2}}{\sigma\_{1}^{2}} \right]. \end{split} \tag{13}$$

Obviously, the maximum of ( |2) occurs at the minimum of the second item on the righthand side of (13), i.e. the minimum of the cost function *J* of (6). Thus, under the assumption of Gaussian distribution, maximizing a posterior probability (Bayesian approach) is equivalent to minimizing cost function (variational assimilation approach). Further, if the model *F* is linear and the probability distribution is Gaussian, it can be further proved that the Kalman filter is equivalent to 4D-Var adjoint assimilation method.
