**6. Sigma-point Kalman filters (SPKF)**

## **6.1. Basics of SPKF**

error covariance

So, Kalman gain

If the model is linear, obviously,

174 Nonlinear Systems - Design, Analysis, Estimation and Control

If the measurement function is linear, i.e.

of the difference between model observation and observation itself. Here,

(78)

(80)

<sup>1</sup> . *a T B MP M Q t t* <sup>+</sup> = + (79)

zz

*b T bT P x y PH xy*=< >= (81)

<sup>T</sup> ˆ ˆ , *b T P y y HP H R yy* =< >= + (82)

<sup>1</sup> ( ) *bT bT K P H HP H R* - = + (83)

is defined as the error between the noisy observation and its prediction <sup>ℎ</sup> .

<sup>1</sup> ,, *b a t tt x Mx* <sup>+</sup> = +

( ) ˆ ˆ *o b o b tr b <sup>b</sup> y y h x y Hx Hx Hx Hx* = - -= - -= - -= -

ˆˆ ˆ ˆ ,

 z

Eq. (83) is identical to Eq. (28). Therefore, Eq. (28), or KF, EKF and EnKF, is a special case of Eq.

In the standard KF, the state error covariance is updated at each analysis cycle during the measurement update process. Updating the error covariance matrix is important because it represents the change in forecast error covariance when a measurement is performed. The EnKF implementation does not require the covariance update equation because it can directly calculate the updated error covariance matrix from a set of ensemble members. Evensen [17] has derived the analysis of covariance equation that is consistent with the standard KF error covariance to update Eq. (28). But the true representation of the updated error covariance

The general form of the Kalman gain makes use of the reformulated error covariance. In a broad sense, the above algorithm implicitly uses the prior covariance update equation (or the analysis error covariance matrix) to calculate the forecast error covariance. Thus, the above algorithm is fully consistent with the time update and measurement update formulation of the Kalman filter algorithm. On this basis, one can develop a new type of Kalman filter that chooses the ensemble members deterministically in such a way that they can capture the statistical

z

ˆˆ

(76) under the assumption of linear measurement function.

requires a large ensemble size, which is often computationally infeasible.

h

EnKF was developed in order to overcome the linearization of nonlinear models. As introduced earlier, the idea behind EnKF is to 'integrate' Fokker-Plank equation using ensemble technique to estimate the forecast error covariance. Theoretically, if the ensemble size is infinite, the estimate approaches the true value. However, in reality, we can only use finite ensemble size, even very small size for many problems, leading to truncation errors. Thus, some concerns exist such as how to wisely generate finite samples for the optimal estimate of prediction error covariance, how much the least ensemble size is for an efficient estimate of error covariance and how much the true error covariance can be taken into account in the EnKF, given an ensemble size. In this section, we will introduce a new ensemble technique for EnKF, which is called sigma-point Kalman filter (SPKF).

The so-called sigma-point approach is based on deterministic sampling of state distribution to calculate the approximate covariance matrices for the standard Kalman filter equations. The family of SPKF algorithms includes the unscented Kalman filter (UKF [26]), the central difference Kalman filter (CDKF [27]) and their square root versions [28]. Another interpretation of the sigma-point approach is that it implicitly performs a statistical linearization of the nonlinear model through a weighted statistical linear regression (WSLR) to calculate the covariance matrices [29]. In SPKF, the model linearization is done through a linear regression between a number of points (called sigma points) drawn from a prior distribution of a random variable rather than through a truncated Taylor series expansion at a single point. It has been found that this linearization is much more accurate than a truncated Taylor series linearization [28]. Eqs. (80)–(82) construct a core of SPKF. A central issue here is how to generate the optimal ensemble members for applying these equations. There are two basic approaches aforementioned, UKF and CDKF. For an *L*-dimensional dynamical system represented by a set of discretized state-space equations of (67), it has been proven that 2 + 1 ensemble members, constructed by UKF or CDKF, can precisely estimate the mean and covariance. We ignore the theoretical proof and only outline the UKF scheme as below.

Denote 2 + 1 sigma points at time *k* for producing ensemble members by = [, 0, , 1 <sup>+</sup> , ..., , <sup>+</sup> , , 1 <sup>−</sup> , …, , <sup>−</sup> ], which that is defined according to the following expressions:

$$\mathcal{X}\_{k,0} = \overline{X}\_k^a \tag{84}$$

$$\mathcal{X}\_{k,i}^{\*} = \overline{X}\_{k}^{a} + [\mathfrak{c}\sqrt{P\_{X,k}^{a}}]\_{i} \tag{85}$$

$$\mathcal{X}\_{k,i}^{-} = \overline{X}\_{k}^{a} - \text{[c.}\sqrt{P\_{X,k}^{a}}\text{]}\_{i} \tag{86}$$

where = + + is the sum of the dimensions of model states, model noise and measurement noise. The augmented state vector = [; ; ] is a *L*-dimensional vector. , is the covariance of the augmented state vector (analysis) at the previous step. [ , ] is the *i*th row (column) of the weighted matrix square root of the covariance matrix (*L* dimension). *c* is a scale parameter that will be specified later. The key point here is to produce (2 + 1) ensemble members by integrating model with 2 + 1 initial conditions of Eqs. (84)–(86); by these ensemble members, the filter Eqs. (80)–(82) will be performed.

The procedure is summarized as below:

**1.** Initially, perturb a small amount, denoted by 0 on initial condition 0, using Evensen method [17]; and also randomly generate perturbation for *q* and *r*, drawn from normal distributions of (0, ) and (0, ). Thus, we can construct the augmented state vector and corresponding covariance ( = 0)

$$
\overline{X}\_0^a = \boxed{x\_0}; 0; 0\overleftarrow{\phantom{\frac{1}{2}}}
\tag{87}
$$

$$P\_0^\times = \tilde{\mathfrak{X}}\_0 \tilde{\mathfrak{x}}\_0^\mathcal{T};\tag{88}$$

$$P\_{X,0} = \begin{pmatrix} P\_0^x & 0 & 0 \\ 0 & Q & 0 \\ 0 & 0 & R \end{pmatrix} . \tag{89}$$


used to produce sigma points for next step ( + 1), to form a recursive algorithm. Suppose we have 2 + 1 ensembles, the analysis mean and the covariance are calculated as follows:

$$\overline{\mathbf{x}}\_{k+1}^{\prime} = \sum\_{l=0}^{2L} \mathbf{w}\_{l}^{(m)} \mathbf{x}\_{k+1,l}^{\prime} \tag{90}$$

2 ( ) <sup>1</sup> 1, 1 1, 1 <sup>0</sup> ( ) [ ][ ] *<sup>L</sup> <sup>f</sup> c f f f fT xx k i ki k ki k <sup>i</sup> P wx x x x* <sup>+</sup> = -- å <sup>=</sup> + ++ + (91)

$$\{\mathbf{y}\_{k+1,l}^{f} = h\left(\mathbf{x}\_{k+1,l}^{f}, \boldsymbol{\zeta}\_{k+1,l}\right)\tag{92}$$

$$\overline{\mathbf{y}}\_{k\ast 1}^{\prime} = \sum\_{l=0}^{2L} \mathbf{w}\_{l}^{(m)} \mathbf{y}\_{k\ast 1,l}^{\prime} \tag{93}$$

$$(P\_{\mathcal{Y}})\_{k+1} = \sum\_{l=0}^{2L} \mathcal{W}\_l^{(c)} \llbracket \mathcal{Y}\_{k+1,l}^{f} - \overline{\mathcal{Y}}\_{k+1}^{f} \llbracket \mathcal{Y}\_{k+1,l}^{f} - \overline{\mathcal{Y}}\_{k+1}^{f} \rrbracket^{T} \tag{94}$$

$$(P\_{xy})\_{k+1} = \sum\_{l=0}^{2L} w\_l^{(c)} [\mathbf{x}\_{k+1,l}^f - \overline{\mathbf{x}}\_{k+1}^f] [\mathbf{y}\_{k+1,l}^f - \overline{\mathbf{y}}\_{k+1}^f]^T \tag{95}$$

$$K\_{k+1} = P\_{xy} P\_{yy}^{-1},\tag{96}$$

$$
\overline{\mathbf{x}}\_{k+1}^{a} = \overline{\mathbf{x}}\_{k+1}^{\prime} + K\_{k+1} \left[ \mathbf{y}\_{k+1} - \overline{\mathbf{y}}\_{k+1}^{\prime} \right] \tag{97}
$$

$$P\_{k+1}^{\mu} = (P\_{\infty}^{\prime})\_{k+1} - K\_{k+1} P\_{\gamma \mathcal{Y}} K\_{k+1}^{\mathcal{T}},\tag{98}$$

where

, , [ ] *a a*

, , [ ] - = -*a a*

is the covariance of the augmented state vector (analysis) at the previous step. [ ,

measurement noise. The augmented state vector = [; ; ] is a *L*-dimensional vector. ,

*i*th row (column) of the weighted matrix square root of the covariance matrix (*L* dimension). *c* is a scale parameter that will be specified later. The key point here is to produce (2 + 1) ensemble members by integrating model with 2 + 1 initial conditions of Eqs. (84)–(86); by

**1.** Initially, perturb a small amount, denoted by 0 on initial condition 0, using Evensen method [17]; and also randomly generate perturbation for *q* and *r*, drawn from normal distributions of (0, ) and (0, ). Thus, we can construct the augmented state vector

0

*P P Q*

*x*

0 0 0 0. 0 0

æ ö ç ÷ <sup>=</sup> ç ÷ ç ÷ è ø

**2.** From the above formula, we can calculate sigma points using Eqs. (84)–(86). Note that each set of sigma points, denoted by , has dimension *L*, e.g. the *i*th sigma point can be

**3.** Using the 2 + 1 sigma points to integrate state-space model. For the *i*th sigma point, we

 = (, , , ). When *i* varies from 1 to 2 + 1, we produce 2 + 1 ensemble members, from which analysis mean and covariance will be obtained, which are in turn

*R*

,0

*X*

*ki k X k i X cP* <sup>+</sup> = + (85)

*ki k X k i X cP* (86)

0 0;0;0 ; *<sup>a</sup> X x* = é ù ë û (87)

0 00 ; *x T P xx* = % % (88)

(89)

 ] is the

is the sum of the dimensions of model states, model noise and

c

c

these ensemble members, the filter Eqs. (80)–(82) will be performed.

where = + +

The procedure is summarized as below:

176 Nonlinear Systems - Design, Analysis, Estimation and Control

and corresponding covariance ( = 0)

expressed by , = [, ; , ; , ].

have + 1,

$$\mathbf{c} = \sqrt{L + \lambda} \tag{99}$$

$$\mathcal{W}\_0^{(m)} = \frac{\mathcal{X}}{L + \mathcal{X}} \tag{100}$$

$$
\lambda w\_0^{(c)} = \frac{\lambda}{L + \lambda} + 1 - \alpha^2 + \beta \tag{101}
$$

$$\mathbf{w}\_{i}^{(m)} = \mathbf{w}\_{i}^{(c)} = \frac{1}{\mathfrak{D}(L+\lambda)}, i = 1, 2, \dots, 2L \tag{102}$$

$$
\lambda = \alpha^2 \left( L + \kappa \right) - L,\tag{103}
$$

*α* and *κ* are tuning parameters. 0 < < 1 and 0. Often *κ* is chosen 0 as default value and = 2.

**4.** From + 1 , as well choosing random perturbation for model noise *η* and observation noise , drawn from Gaussian distribution of (0, ) and (0, ), we calculate sigma points using Eqs. (84)–(86), and repeat Step 2 and Step 3 and so on until the assimilation is completed for the entire period.

#### **6.2. Remarks of SPKF**

SPKF was recently introduced into the earth sciences [15, 30]. The main differences between SPKF and EnKF include


In SPKF, the number of sigma points is 2 + 1, here *L* is the dimension of the augmented state vector = [; ; ], i.e. = + + is the sum of model state, model noise and observation noise. Usually, *L* is the order 103 –104 , so the computational expense is a huge challenge in SPKF for realistic problems. A solution is to use the truncated singular-value decomposition (TSVD) to reduce the sigma points. As seen from Eqs. (84)–(86), the , is a \* matrix, thus the dimension of , determines the ensemble size. Suppose that , can be expressed as

An Introduction to Ensemble-Based Data Assimilation Method in the Earth Sciences http://dx.doi.org/10.5772/64718 179

$$P\_{X,k}^a = E\_{X,k}^a \Sigma\_k (E\_{X,k}^a)^T \tag{104}$$

where = diag( 1 , 2 , ..., ) is a diagonal matrix of eigenvalues that are sorted in descending order, i.e. <sup>1</sup> ≥ <sup>2</sup> ≥ ... ≥ , and , = [, , 1 , , , 2 , ..., , , ]. Truncating the first *m* modes, so we can write the sigma points (84)–(86) as below:

( ) 2 <sup>0</sup> <sup>1</sup> *<sup>c</sup> <sup>w</sup> <sup>L</sup>* l

l

( ) () <sup>1</sup> , 1,2,...2 2( ) *m c ww i L i i <sup>L</sup>* l

> ( ) <sup>2</sup> λ , = + a

and = 2.

**6.2. Remarks of SPKF**

SPKF and EnKF include

is pre-specified;

vector = [; ; ], i.e. = + +

noise. Usually, *L* is the order 103

dimension of ,

completed for the entire period.

178 Nonlinear Systems - Design, Analysis, Estimation and Control

**4.** From + 1

 k

*α* and *κ* are tuning parameters. 0 < < 1 and 0. Often *κ* is chosen 0 as default value

noise , drawn from Gaussian distribution of (0, ) and (0, ), we calculate sigma points using Eqs. (84)–(86), and repeat Step 2 and Step 3 and so on until the assimilation is

SPKF was recently introduced into the earth sciences [15, 30]. The main differences between

**i.** SPKF chooses the ensemble members deterministically while EnKF uses random

**ii.** the number of sigma points is a fixed value as 2 + 1, while the ensemble size in EnKF

**iii.** SPKF uses Eq. (98) to update the error covariance matrix, while EnKF does not update

**iv.** Sigma points are calculated using Eqs. (84)–(86) every time when the observation is

In SPKF, the number of sigma points is 2 + 1, here *L* is the dimension of the augmented state

for realistic problems. A solution is to use the truncated singular-value decomposition (TSVD)

better than the EnKF in the similar level of computational cost [31].

determines the ensemble size. Suppose that ,

available, while the ensemble members in EnKF only perturbed in the initial time. Recent application of SPKF on a realistic oceanic model indicates that the SPKF is

is the sum of model state, model noise and observation

is a \* matrix, thus the

can be expressed as

, so the computational expense is a huge challenge in SPKF

perturbation to generate ensemble members;

explicitly the error covariance matrix; and

–104

to reduce the sigma points. As seen from Eqs. (84)–(86), the ,

, as well choosing random perturbation for model noise *η* and observation

= +- +

a b

<sup>+</sup> (101)

*L L* (103)

== = <sup>+</sup> (102)

$$\mathcal{X}\_{k,0} = \overline{X}\_k^a \tag{105}$$

$$\mathcal{X}\_{k,i}^{\*} = \overline{X}\_{k}^{a} + c\sqrt{\sigma\_{k}^{i}}e\_{X,k,i}^{a} \tag{106}$$

$$\mathcal{X}\_{k,i}^{-} = \overline{X}\_{k}^{a} - \mathbf{c}\sqrt{\sigma\_{k}^{i}}e\_{\mathcal{X},k,i}^{a} \tag{107}$$

 = 1, 2, ..., . Thus, the ensemble size becomes 2 \* + 1, where < < . Some fast SVD algorithms can be used here, such as Lanczos and block Lanczos [32]. The application of the truncated SVD was also found in [33, 34].

Further simplifying , based on its definition (or Cholesky decomposition), i.e. , = , \* (, ) , where , is the data that has subtracted the ensemble mean. Thus, Eqs.(82)–(84) can be written as follows:

$$\mathcal{X}\_{k,0} = \overline{X}\_k^a \tag{108}$$

$$\mathcal{X}\_{k,i}^{\*} = \overline{X}\_{k}^{a} + \left[ \big[ \mathcal{L} A\_{X,k}^{a} \big]\_{i} \right] \tag{109}$$

$$\mathbf{x}\_{k,i}^{-} = \overline{X}\_{k}^{a} - [\mathbf{c}A\_{X,k}^{a}]\_{i} \tag{110}$$

where [, ] = [ ; ; ] , = 1, 2, ..., , ( ) = ( ) + [ − ]. Eqs.(109) and (110) transfer the covariance matrix , to data matrix , in constructing sigma points. The largest advantage is to avoid explicit expression of , , which could be a very large matrix beyond memory of current computers. However, Eqs.(109) and (110) cannot reduce the ensemble size 2 + 1 . A solution is to decompose, such as principal component analysis, as used in [14]. Further discussions on optimal construction of sigma points should be conducted for a realistic application of SPKF.

Again, we look at sigma-point generation, i.e. Eqs. (106) and (107) or (109) and (110). As we defined, an augmented matrix is applied here [; ; ]. Without losing the generality, rewrite them as below:

$$
\begin{bmatrix} \mathbf{x}\_{k,0} \\ \eta\_{k,0} \\ \zeta\_{k,0} \end{bmatrix} = \begin{bmatrix} \overline{\mathbf{x}}\_{k,0} \\ \mathbf{0} \\ \mathbf{0} \end{bmatrix} \tag{111}
$$

$$
\begin{bmatrix} \mathbf{x}\_{k,i} \\ \eta\_{k,i} \\ \mathcal{L}\_{k,i} \end{bmatrix} = \begin{bmatrix} \overline{\mathbf{x}}\_{k,0} \\ \mathbf{0} \\ \mathbf{0} \end{bmatrix} + \mathbf{c} \begin{bmatrix} \mathbf{x}\_{k,i}^a \\ \eta\_{k,i} \\ \zeta\_{k,i} \end{bmatrix} \tag{112}
$$

Similarly, we can write Eq. (107) or (110) using individual variables. From Eqs. (111) and (112), we can draw

**•** Noise and model state analyses in constructing sigma points at *k* step are independent. It should be noted that is from Eq. (97) and noise are draw from a Gaussian distribution. If we assume that noise is taken randomly each time, is only relevant to noise that is drawn at time step *k*, and independent with model noise and observation noise drawn for analysis of the time step + 1, thus, , is a diagonal block matrix, i.e.

$$P\_{X,k} = \begin{pmatrix} P\_k^x & 0 & 0 \\ 0 & Q & 0 \\ 0 & 0 & R \end{pmatrix} \tag{113}$$


at the initial time, initial perturbation on model states plus drawn noise for model errors and measurement errors are with mean and variance as follows:

$$\begin{aligned} \, \, \overline{X}\_0^a = \begin{bmatrix} \ge ; 0; 0 \end{bmatrix} \, \quad P\_{X,0} = \begin{pmatrix} P\_0^x & 0 & 0 \\ 0 & Q & 0 \\ 0 & 0 & R \end{pmatrix} \end{aligned} \tag{114}$$

Theoretically, there are 2( + + )+1 ensembles, denoted by the *i*th column of , 0 ( = 1, ..., ; + 1, ..., + ; + + 1, ..., + + ) and formula (84)–(86). However, at the *i*th column, the elements of the row, indicating the model inputs (, , ), only have the non-zero values of . Obviously, the sigma points of zero-values makes the update equation + 1, = (, ) invalid, thus, the actual ensemble size is 2 + 1.

When truncation technique is applied to reduce the ensemble size, the ensemble spread might be shrunk due to relatively small ensemble size. Like EnKF, an inflation approach of SPKF might be helpful. It is interested in developing such a scheme for SPKF. Also, we can localize SPKF, like localized EnKF, to solve memory and computation issues.

All of the remarks of SPKF are from the authors' thinking and understanding. It is interesting to further test and validate these ideas and properties using simple models.
