**7. Beyond Kalman filters: particle filter and its derivatives**

### **7.1. Standard particle filter**

used in [14]. Further discussions on optimal construction of sigma points should be conducted

Again, we look at sigma-point generation, i.e. Eqs. (106) and (107) or (109) and (110). As we defined, an augmented matrix is applied here [; ; ]. Without losing the generality, rewrite

,0 ,0

*k k*

é ù é ù ê ú ê ú <sup>=</sup> ê ú ê ú ê ú ê ú ë û ë û

, ,0 , , , , , 0 0

*ki k k i k i k i k i k i*

é ù é ù é ù ê ú ê ú ê ú = + ê ú ê ú ê ú ê ú ê ú ê ú ë û ë û ë û

*xx x*

*x x*

0 0

*a*

 h*c*

> z

Similarly, we can write Eq. (107) or (110) using individual variables. From Eqs. (111) and (112),

**•** Noise and model state analyses in constructing sigma points at *k* step are independent. It

at time step *k*, and independent with model noise and observation noise drawn for analysis

æ ö ç ÷ <sup>=</sup> ç ÷ ç ÷ è ø

**•** There are no update equations for noise, so they are randomly taken from Gaussian

 + 1, are not related to , (time step of *k*), as argued above. Thus, Eq. (108) always holds unless the noise is designed considering the temporal coherence such as red noise in time.

**•** Based on the above, the actual ensemble size is 2 + 1, and not 2 + 1. This is because neither model noise nor observation noise can produce ensemble alone. Model errors

must be joined together to produce ensemble members with . Let us see this in details:

*R*

*x k*

*P P Q*

is from Eq. (97) and noise are draw from a Gaussian distribution. If

is only relevant to noise that is drawn

actually does not have meaning. Thus, it should be

, used for constructing sigma points at time step

(111)

(112)

(113)

and

,0 ,0

*k k*

h

z

h

z

we assume that noise is taken randomly each time,

of the time step + 1, thus, , is a diagonal block matrix, i.e.

,

and

and

*X k*

for a realistic application of SPKF.

180 Nonlinear Systems - Design, Analysis, Estimation and Control

them as below:

we can draw

,

should be noted that

distribution, i.e. the index *i* in

a reasonable assumption that the

We have introduced the Kalman filter (KF), extended Kalman filter (EKF), ensemble Kalman filter (EnKF) and sigma-point Kalman filter (SPKF) in previous sections. All of those filters belong to the sequential data assimilation method, i.e. observation data is assimilated into the model system as soon as it is available. The Bayesian estimation theory provides a general framework of the sequential data assimilation methods. If we assume the state-space model is given by Eqs. (34) and (35), the analysis step of a Bayesian-based assimilation method is deduced by Bayes' theorem:

$$p\left(\mathbf{x}\_{\iota}|\mathbf{y}\_{\iota}\right) = \frac{p\left(\mathbf{y}\_{\iota}|\mathbf{x}\_{\iota}\right)p\left(\mathbf{x}\_{\iota}\right)}{p\left(\mathbf{y}\_{\iota}\right)},\tag{115}$$

where ( ) plays as a normalization factor.

Recalling Section 2.3, Eq. (12) actually assumes that the prior probability density function ( ) and the likelihood function (| ) are Gaussian distribution functions, and thus the posterior probability density function (| ) is also a Gaussian. Based on the Gaussian assumption, the cost function of 3D-Var (i.e. Eq. (6)) can be derived, and it is equivalent to the Kalman filter Eqs. (27)–(29). All the Kalman-based filters (e.g. EKF, EnKF, EnSRF, SPKF, etc.) contain the inherent Gaussian assumption, and they are derived and validated for Gaussian systems in theory. However, this Gaussian assumption is often not applicable for nonlinear systems. Even for an initial Gaussian error, it often becomes non-Gaussian while propagating forward with nonlinear models.

The particle filter (PF) is a sequential data assimilation method that is able to deal with the nonlinear and non-Gaussian state estimation problem. Like EnKF, PF also uses an ensemble, but it is used to approximately estimate the full probability density function rather than only the error covariance *B*. An ensemble member is also referred to as a particle in PF literatures. Suppose the prior probability density is the sum of Dirac delta functions

$$\mathbf{p}\left(\mathbf{x}\_{\iota}\right) = \sum\_{\iota=1}^{N} \delta\left(\mathbf{x}\_{\iota} - \mathbf{x}\_{\iota}^{\iota}\right) \tag{116}$$

where , = 1, 2, ..., are particles drawn from ( ). The posterior probability density is derived by applying the Bayes' theorem directly, that is

$$p\left(\mathbf{x}\_{i}|\mathbf{y}\_{i}\right) \propto p\left(\mathbf{y}\_{i}|\mathbf{x}\_{i}\right)p\left(\mathbf{x}\_{i}\right) = \sum\_{l=1}^{N} w\_{l,l} \mathcal{S}\left(\mathbf{x}\_{l} - \mathbf{x}\_{i}^{l}\right) \tag{117}$$

in which , ∝ (| ), and a normalization step, is required to make , , = 1, 2, ..., sum up to 1. If we assume the likelihood function is Gaussian, , can be computed by

$$p(\mathbf{y}\_{\iota} \mid \mathbf{x}\_{\iota}^{\iota}) = \frac{1}{\sqrt{2\pi R}} \exp\{-1/2[\mathbf{y}\_{\iota} - h(\mathbf{x}\_{\iota}^{\iota})]R^{-1}[\mathbf{y}\_{\iota} - h(\mathbf{x}\_{\iota}^{\iota})]^{\mathsf{T}}\}.\tag{118}$$

Or else we can use any specified probability density function of (| ) to compute the likelihood.

With the posterior probability density function (| ), the analysis value and covariance can be computed by

$$\overline{\mathbf{x}}\_{\iota} = \left\{ \mathbf{x}^\* \: p(\mathbf{x} \mid \mathbf{y}\_{\iota}) d\mathbf{x} = \sum\_{\iota=1}^N \mathbf{w}\_{\iota,\iota} \mathbf{x}\_{\iota}^{\iota} \right. \tag{119}$$

$$\text{var}\left(\mathbf{x}\_{\iota}\right) = \left[\mathbf{x}^{2} \ast p(\mathbf{x} \mid \mathbf{y}\_{\iota}) d\mathbf{x} - \overline{\mathbf{x}}\_{\iota}^{2} = \sum\_{i=1}^{N} \mathbf{w}\_{\iota,i} \left(\mathbf{x}\_{\iota}^{\iota}\right)^{2} - \overline{\mathbf{x}}\_{\iota}^{2} \tag{120}$$

and higher-order moments of the posterior state can also be estimated.

Before stepping forward to next stage, a resampling step is required to make each particle with uniform weight. A typical resampling strategy is the sequential importance resampling (SIR) that removes particles with very small weights and duplicates those with large weights. A detailed algorithm of SIR can be found in [35]. The resampling algorithm gives the indices and number of copies of those particles that should be duplicated, i.e. computes 1, 2, …,

according to the weights, where each ∈ 1, 2, …, . And then , = 1, 2, ..., are regarded as new particles.

In summary, the algorithm of standard particle filter is given below:


The standard particle filter [36] is also known as the bootstrap particle filter or SIR particle filter.

## **7.2. Variants of PF**

probability density function (|

182 Nonlinear Systems - Design, Analysis, Estimation and Control

nonlinear models.

where 

likelihood.

be computed by

in which , ∝ (|

) is also a Gaussian. Based on the Gaussian assumption,

<sup>=</sup> = - å (116)

). The posterior probability density is

), the analysis value and covariance can

) to compute the

the cost function of 3D-Var (i.e. Eq. (6)) can be derived, and it is equivalent to the Kalman filter Eqs. (27)–(29). All the Kalman-based filters (e.g. EKF, EnKF, EnSRF, SPKF, etc.) contain the inherent Gaussian assumption, and they are derived and validated for Gaussian systems in theory. However, this Gaussian assumption is often not applicable for nonlinear systems. Even for an initial Gaussian error, it often becomes non-Gaussian while propagating forward with

The particle filter (PF) is a sequential data assimilation method that is able to deal with the nonlinear and non-Gaussian state estimation problem. Like EnKF, PF also uses an ensemble, but it is used to approximately estimate the full probability density function rather than only the error covariance *B*. An ensemble member is also referred to as a particle in PF literatures.

> ( ) ( ) <sup>1</sup> *N i*

( ) ( )() , ( ) <sup>1</sup> | | *<sup>N</sup> <sup>i</sup> t t tt t ti t t <sup>i</sup> p xy p yx p x w x x*

up to 1. If we assume the likelihood function is Gaussian, , can be computed by

Or else we can use any specified probability density function of (|

<sup>1</sup> <sup>1</sup> ( | ) exp{ 1/ 2[ ( )] [ ( )] }. <sup>2</sup> - = -- - *<sup>i</sup> <sup>i</sup> i T t t tt tt py x y hx R y hx*

, <sup>1</sup> \*(| ) *<sup>N</sup> <sup>i</sup>*

( ) <sup>2</sup> <sup>2</sup> 2 2 , <sup>1</sup> var \* ( | ) ( ) *<sup>N</sup> <sup>i</sup>*

d

), and a normalization step, is required to make , , = 1, 2, ..., sum

<sup>=</sup> µ =- å (117)

*<sup>R</sup>* (118)

*<sup>t</sup> <sup>t</sup> ti t <sup>i</sup> x x p x y dx w x* <sup>=</sup> = = ò å (119)

*<sup>t</sup> t t ti t t <sup>i</sup> x x p x y dx x w x x* <sup>=</sup> <sup>=</sup> ò -= - å (120)

*<sup>t</sup> t t <sup>i</sup> px x x* d

Suppose the prior probability density is the sum of Dirac delta functions

, = 1, 2, ..., are particles drawn from (

p

With the posterior probability density function (|

derived by applying the Bayes' theorem directly, that is

The particle filter is a highly promising technique because it does not invoke any Gaussian assumptions. It has been widely used and studied in many other fields. The PF estimates the full probability density function of the forecasted state based on an ensemble of states with different weights. However, the PF suffers from the problem of filter degeneracy, i.e. the procedure collapses to a very small number of highly weighted particles among a horde of almost useless particles carrying a tiny proportion of the probability mass. Even if resampling techniques are used, the degeneracy cannot be completely avoided with limited ensemble size. The number of particles must grow substantially with the dimension of the system to avoid degeneracy [37, 38], a requirement that is apparently too costly for large models such as GCMs. Various efforts have been made to resolve this issue, as documented in an excellent overview [39].

Several strategies are often employed to address the problem of filter degeneracy in applications of the particle filter. For example, Papadakis et al. proposed a weighted ensemble Kalman filter (WEnKF) [40] that uses an ensemble-based Kalman filter as the proposal density from which the particles are drawn. Van Leeuwen et al. developed a fully nonlinear particle filter by exploiting the freedom of the proposal transition density, which ensures not only that all particles ultimately occupy high-probability regions of state-space but also that most of the particles have similar weights [41]. The implicit particle filter uses gradient descent minimization combined with random maps to find the region of high probability, avoiding the calculation of Hessians [42]. Luo et al. have proposed an efficient particle filter that uses residual nudging to prevent the residual norm of the state estimates from exceeding a prespecified threshold [43]. These particle filters were very recently proposed and have attracted broad attention in the community of atmos./ocean. data assimilation. Below, we will briefly introduce the equivalent weights particle filter (EWPF) by Van Leeuwen [39, 41].

The equivalent weights particle filter is a fully nonlinear data assimilation method that works in a two-stage process. It uses the proposal density to ensure that the particles have almost equivalent weights, by which the filter degeneracy can be avoided.

In the standard PF, the particles at time step *t* are propagated by the original model, i.e. + 1 = ( ) + , which implies that the particles at time step + 1 are drawn from the transition density ( + 1| ). In that case, the weight of each + 1 varies greatly and filter degeneracy is very likely to happen.

In EWPF, another transition density, call the proposal density, is introduced. The proposal density depends on the future observation + 1 and all previous particles , = 1, 2, ..., . With the help of proposal density, the particle is propagated using a different model

$$\mathbf{x}\_{t+1}^{\prime} = \mathbf{g}\left(\mathbf{x}\_{t}^{\prime}, \mathbf{y}\_{t+1}\right) + \boldsymbol{\eta}\_{t}. \tag{121}$$

The model *g* can be anything, for instance, one can add a relaxation term and change random forcing:

$$\mathbf{x}\_{k+1}^{\prime} = f\left(\mathbf{x}\_k^{\prime}\right) + \eta\_k^{\prime} + A\left(\mathbf{y}\_{\prime+1} - H\left(\mathbf{x}\_k^{\prime}\right)\right), \quad k = 1, \ldots, p\left(k\right) \tag{122}$$

where () is a function of the time between observations, and each *k* implies each model step without observation. *A* is a relaxation term that will 'drag' the particle towards future observation. In [44], it is given by

$$A = p\left(k\right)\underline{Q}H^{\top}R^{-1},\tag{123}$$

where the matrices *Q* and *R* correspond to the model error covariance and observation error covariance, respectively.

The second stage of EWPF involves updating each particle at the observation time + 1 via the formula

$$\mathbf{x}\_{\iota+1}^{\iota} = f(\mathbf{x}\_{\iota}^{\iota}) + a\_{\iota} \underline{Q} H^{\sf T} \left( H \underline{Q} H^{\sf T} + R \right)^{-1} (\mathbf{y}\_{\iota+1} - H(f(\mathbf{x}\_{\iota}^{\iota}))) + \boldsymbol{\eta}\_{\iota}^{\iota} \tag{124}$$

where are scalers computed so as to make the weights of the particles equal. Using the expression for weights and setting all weights equal to a target weight (e.g. 1/)

$$\mathbf{w}\_{i} = p\left(\mathbf{y}\_{i+1}|\mathbf{x}\_{i+1}^{\boldsymbol{\ell}}\left(\boldsymbol{\alpha}\_{i}\right)\right) = \mathbf{w}\_{\text{target}}\tag{125}$$

 can be solved by numerical methods.

Eqs. (122)–(125) show an example of how to construct the proposal model *g* in(121)), it can also be done by running 4D-var on each particle (implicit particle filter), or using the EnKF as proposal density. Those methods refer to Morzfeld et al. [42] and Papadakis et al. [40].

#### **7.3. Remarks of PF**

by exploiting the freedom of the proposal transition density, which ensures not only that all particles ultimately occupy high-probability regions of state-space but also that most of the particles have similar weights [41]. The implicit particle filter uses gradient descent minimization combined with random maps to find the region of high probability, avoiding the calculation of Hessians [42]. Luo et al. have proposed an efficient particle filter that uses residual nudging to prevent the residual norm of the state estimates from exceeding a prespecified threshold [43]. These particle filters were very recently proposed and have attracted broad attention in the community of atmos./ocean. data assimilation. Below, we will briefly

The equivalent weights particle filter is a fully nonlinear data assimilation method that works in a two-stage process. It uses the proposal density to ensure that the particles have almost

In the standard PF, the particles at time step *t* are propagated by the original model, i.e.

In EWPF, another transition density, call the proposal density, is introduced. The proposal

h

1 1 ( ) , . *i i t tt t x gx y* + + = +

1 1 ( ) ( ( )), 1,..., ( ) *i ii <sup>i</sup> k kk t k x f x Ay H x k pk* + + = ++ - =

h

The model *g* can be anything, for instance, one can add a relaxation term and change random

where () is a function of the time between observations, and each *k* implies each model step without observation. *A* is a relaxation term that will 'drag' the particle towards future obser-

( ) <sup>1</sup>

where the matrices *Q* and *R* correspond to the model error covariance and observation error

density depends on the future observation + 1 and all previous particles

). In that case, the weight of each + 1

, which implies that the particles at time step + 1 are drawn from the

varies greatly and filter

is propagated using a different model

(122)

, *<sup>T</sup> A p k QH R*- = (123)

(121)

, = 1, 2, ..., .

introduce the equivalent weights particle filter (EWPF) by Van Leeuwen [39, 41].

equivalent weights, by which the filter degeneracy can be avoided.

 + 1 = (

forcing:

 ) +

transition density ( + 1|

vation. In [44], it is given by

covariance, respectively.

degeneracy is very likely to happen.

184 Nonlinear Systems - Design, Analysis, Estimation and Control

With the help of proposal density, the particle

## *7.3.1. Combined method of EnKF and PF*

The ensemble Kalman particle filter (EnKPF) is a combination of the EnKF and the SIR particle filter. It was recently introduced to address non-Gaussian features in data assimilation for highly nonlinear systems, by providing a continuous interpolation between the EnKF and SIR-PF analysis schemes [45].

As stated above, both EnKF and PF methods are based on the Bayesian estimation theory, but they approximate the probability density function of the state in different ways. The EnKF only approximates the mean and covariance of the state through a series of equally weighted ensemble members. And the particle filter considers the weights of the ensemble members according to the likelihoods. The EnKF contains the Gaussian assumption but requires relatively small ensemble size to prevent filter degeneracy, which is in contrast with the PF.

The EnKPF takes advantage of both methods by combining the analysis schemes of the EnKF and the SIR-PF using a controllable index (i.e. tuning parameter). In contrast with both the EnKF and the SIR-PF, the analysis scheme of the EnKPF not only updates the ensemble members but also considers the weights.

Assume that the forecast ensemble , = 1, 2, …, and the observation data *y* are available, and that the forecast covariance can be calculated using the ensemble, the analysis scheme of EnKPF is given below.

**1.** Choose [0, 1] and apply the EnKF that is based on the inflated observation error covariance / as follows:

$$K\_1\left(\boldsymbol{\gamma}\right) = P^\prime H^\top \left(H P^\prime H^\top + R \; \boldsymbol{\gamma}\right)^{-1} = \boldsymbol{\gamma} P^\prime H^\prime \left(\boldsymbol{\gamma} H P^\prime H^\top + R\right)^{-1} \tag{126}$$

$$\mathbf{v}\_{i} = \mathbf{x}\_{i}^{f} + K\_{1}(\boldsymbol{\gamma})(\mathbf{y} - H\mathbf{x}\_{i}^{f}) \tag{127}$$

$$\mathcal{Q} = \frac{1}{\mathcal{I}} K\_1(\mathcal{I}) R K\_1(\mathcal{I})^T \tag{128}$$

**2.** Compute the weights for each updated member as follows:

$$\Phi \mathbf{w}\_i = \phi \left( \mathbf{y}; H\mathbf{v}\_i, \frac{R}{1-\gamma} + H\underline{Q}H^T \right) \tag{129}$$

and normalize the weights by = /∑ = 1 , in which *ϕ* is the probability density function of a Gaussian.

**3.** Calculate the resampling index () for each member according to using the SIR algorithm, then set

$$\mathbf{x}\_{i}^{\mu} = \mathbf{v}\_{s(i)} + K\_{1} \left( \boldsymbol{\chi} \right) \frac{\boldsymbol{\epsilon}\_{1,i}}{\sqrt{\boldsymbol{\mathcal{V}}}} \tag{130}$$

where 1, *<sup>i</sup>* is a random observation error drawn from the Gaussian (0, ).

**4.** Compute 2(1 − ) = (1 − )[(1 − ) + ]−1, and generate 2, *<sup>i</sup>* from (0, ) and EnKF with the inflated observation error again as follows:

$$\mathbf{x}\_{i}^{\boldsymbol{a}} = \mathbf{x}\_{i}^{\boldsymbol{\mu}} + K\_{2} \left( \mathbf{l} - \boldsymbol{\mathcal{V}} \right) \left[ \mathbf{y} + \frac{\epsilon\_{2,i}}{\sqrt{\mathbf{l} - \boldsymbol{\mathcal{V}}}} - H \mathbf{x}\_{i}^{\boldsymbol{\mu}} \right] \tag{131}$$

*γ* can be determined recursively to match the optimal performance of EnKPF. More details of EnKPF can be found in [45, 46].

#### *7.3.2. Localization in PF*

Previous sections have introduced the localization technique in EnKF, which greatly improves the performance of EnKF in high-dimensional models. The advantages of localization motivate the search for a localization procedure in particle filtering.

Van Leeuwen had a deep discussion on this topic [39]. He argued that *one can calculate the weights locally, but it is not easy for resampling. In the resampling step low-weight particles are abandoned and high-weight particles are duplicated. However, with local weights, different particles are selected in different parts of the domain. The problem is that we have to have continuous (in space) model fields to propagate forward in time with the model. Just constructing a new particle that consists of one particle in one part of the model domain and another particle in another domain will lead to problems at the boundary between these two*.

The problem of spatial discontinuity makes the localization in particle filter not feasible currently. Most of the advanced particle filters (e.g. EWPFand implicit particle filter) are using the idea of global weight, i.e. the weight for each member is a scalar.

However, there are still some attempts on the localization in particle filter. For example, Poterjoy developed the localized particle filter (LPF) that updates particles locally using ideas borrowed from EnKF [47]. The paper has demonstrated some advantages of the new filter over EnKF, especially when the observation networks consist of densely spaced measurements that relate nonlinearly to the model state. This is a very interesting work about the particle filter, it also has a potential to work with large atmos./ocean. data assimilation systems.
