*C k* Moore-Penrose pseudoinverse of *Ck*.

blood on the stage." *Charles Spencer* (*Charlie) Chaplin*

 , *<sup>H</sup>* 

*p*(*xk*) Probability density function of a discrete random variable *xk*.

/ ˆ*wk N* , / ˆ *k N x* , / ˆ *k N y* Estimates of *wk*, *xk* and *yk* at time *k*, given data *zk* over an interval *k*

*<sup>k</sup>* Output of ˆ <sup>1</sup> , the inverse of the approximate Wiener-Hopf factor.

*<sup>k</sup>* Output of ˆ *<sup>H</sup>* , the adjoint of the inverse of the approximate

"The cinema is little more than a fad. It's canned drama. What audiences really want to see is flesh and

*Gk* Gain of the smoother developed by Rauch,Tung and Striebel.

( *Ak I* − <sup>1</sup>

1/ ) *<sup>T</sup> BQB P k kkk k* 

*CRC A*

1/ 1/

*x A BQB x*

realisation *xk+1* = *Akxk* + *wk,, γk = xk.* Verify that *Pk − <sup>T</sup> A P A = kk k* <sup>0</sup>

ˆ ˆ 0 *<sup>T</sup> k N k k kk k N*

**Problem 4.** For the model (1) – (2). assume that *Dk* = 0, *E*{*wk*} = *E*{*vk*} = 0, { }*<sup>T</sup> Ewwk k* = *Qk jk*

**Problem 5.** Under the assumptions of Problem 4, obtain an expression relating ˆ ˆ *<sup>H</sup>* and

, obtain the optimal realisable smoother solutions for output estimation, input

The random variable *xk* has a normal distribution with mean *μ* and

/ 1/

is equivalent to Rauch, Tung and Striebel's maximum-likelihood solution.

and 1/ ˆ *k N x* = 1/ 1 ˆ *k k x* + *Pk kk N* 1/ 1/

*CR z*

<sup>0</sup> *w* denote the output of linear time-varying system having the

1 1

 

and { }*<sup>T</sup> Ewvk k* = 0. Use the result of Problem 3 to show that ˆ ˆ *<sup>H</sup>* = *<sup>H</sup>*

*ei ei =* 1 1

*H*

*ei ei +* 2 2

*ei ei* and

*H*

*T T T k N kk k k k N kkk* *<sup>H</sup> A Pk k*

to confirm that

 *+* <sup>1</sup> 0 *<sup>T</sup> P Ak k* 

*ei*2 = −

 *+* 

,

smoother in which *Gk* = <sup>1</sup> / 1 1/ *<sup>T</sup> PAP kk k k k* is a gain matrix. Although this is not a minimummean-square-error solution, it outperforms the Kalman filter and can provide close to optimal performance whenever the underlying assumptions are reasonable.

The minimum-variance smoothers are state-space generalisations of the optimal noncausal Wiener solutions. They make use of the inverse of the approximate Wiener-Hopf factor ˆ <sup>1</sup> and its adjoint ˆ *<sup>H</sup>* . These smoothers achieve the best-possible performance, however, they are not minimum-order solutions. Consequently, any performance benefits need to be reconciled against the increased complexity.

#### **7.8 Problems**

#### **Problem 1.**

(i) Simplify the fixed-lag smoother

$$
\begin{bmatrix}
\hat{\mathbf{x}}\_{k+1/k} \\
\hat{\mathbf{x}}\_{k/k} \\
\hat{\mathbf{x}}\_{k-1/k} \\
\vdots \\
\hat{\mathbf{x}}\_{k-N+1/k}
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{A}\_{k} & \mathbf{0} & \cdots & \mathbf{0} \\
\boldsymbol{I}\_{n} & \mathbf{0} & & \mathbf{0} \\
\mathbf{0} & \boldsymbol{I}\_{n} & & \vdots \\
\vdots & \vdots & \ddots & & \vdots \\
\mathbf{0} & \mathbf{0} & & \mathbf{I}\_{n} & \mathbf{0}
\end{bmatrix} \begin{bmatrix}
\hat{\boldsymbol{\mathbf{x}}}\_{k/k-1} \\
\hat{\boldsymbol{\mathbf{x}}}\_{k-1/k-1} \\
\vdots \\
\hat{\boldsymbol{\mathbf{x}}}\_{k-2/k-1} \\
\vdots \\
\hat{\mathbf{x}}\_{k-N/k-1}
\end{bmatrix} + \begin{bmatrix}
\boldsymbol{K}\_{0,k} \\
\boldsymbol{K}\_{1,k} \\
\boldsymbol{K}\_{2,k} \\
\vdots \\
\boldsymbol{K}\_{N,k}
\end{bmatrix} (\mathbf{z}\_{k} - \mathbf{C}\_{k} \hat{\boldsymbol{\mathbf{x}}}\_{k/k-1}) \mathbf{z}\_{k}
$$

to obtain an expression for the components of the smoothed state.

(ii) Derive expressions for the two predicted error covariance submatrices of interest.

#### **Problem 2.**

(i) With the quantities defined in Section 4 and the assumptions 1/ ˆ *k N x* ~ / ( ˆ *Ak kN x* , ) *<sup>T</sup> BQB k kk* , / ˆ *k N x* ~ / (ˆ *k k x* , / ) *Pk k* , use the maximum-likelihood method to derive

1 1 1 / / / / / ˆ ( ( ) )(ˆ ) ) ˆ *<sup>T</sup> <sup>T</sup> T T <sup>T</sup> k N kk k k k k k kN kk k k k k kN x I P A BQB A x P A BQB x* .

(i) Use the Matrix Inversion Lemma to obtain Rauch, Tung and Striebel's smoother

$$
\hat{\mathfrak{x}}\_{k/N} = \hat{\mathfrak{x}}\_{k/k} + G\_k(\hat{\mathfrak{x}}\_{k+1/N} - \hat{\mathfrak{x}}\_{k+1/k}) \dots
$$

(ii) Employ the additional assumptions / / { } ˆ*<sup>T</sup> Ex x kk kk* = 0, 1/ 1/ { } ˆ*<sup>T</sup> Ex x k Nk N* and 1/ / { } ˆ*<sup>T</sup> Ex x k N kN* to show that 1/ 1/ { } ˆ ˆ*<sup>T</sup> Ex x k kk k* = 1 1 { } *<sup>T</sup> Ex x k k* − *Pk k* 1/ , 1/ 1/ { } ˆ ˆ*<sup>T</sup> Ex x k Nk N* = 1 1 { } *<sup>T</sup> Ex x k k* − *k N* 1/ and / / 1/ 1/ ( ) *<sup>T</sup> kN kk k k k k N k P GP G* .

<sup>&</sup>quot;My invention, (the motion picture camera), can be exploited for a certain time as a scientific curiosity, but apart from that it has no commercial value whatsoever." *August Marie Louis Nicolas Lumiere*

Smoothing, Filtering and Prediction:

*x* ,

is a gain matrix. Although this is not a minimum-

<sup>170</sup> Estimating the Past, Present and Future

mean-square-error solution, it outperforms the Kalman filter and can provide close to

The minimum-variance smoothers are state-space generalisations of the optimal noncausal Wiener solutions. They make use of the inverse of the approximate Wiener-Hopf factor ˆ <sup>1</sup> and its adjoint ˆ *<sup>H</sup>* . These smoothers achieve the best-possible performance, however, they are not minimum-order solutions. Consequently, any performance benefits need to be

smoother in which *Gk* = <sup>1</sup>

reconciled against the increased complexity.

(i) Simplify the fixed-lag smoother

**7.8 Problems Problem 1.**

**Problem 2.**

) *<sup>T</sup> BQB k kk* , / ˆ *k N x* ~ / (ˆ *k k*

/ 1 1/ *<sup>T</sup> PAP kk k k k* 

optimal performance whenever the underlying assumptions are reasonable.

1/ / 1 0, / 1/ 1 1,

*k k k k k k k k n k k k*

*x A x K x I x K*

ˆ 0 0 ˆ ˆ 0 0 ˆ

ˆ 00 0 ˆ

1/ / 1 ,

*kN k n kNk N k*

*x I x K*

to obtain an expression for the components of the smoothed state.

/ / / / / ˆ ( ( ) )(ˆ ) ) ˆ *<sup>T</sup> <sup>T</sup> T T <sup>T</sup> k N kk k k k k k kN kk k k k k kN x I P A BQB A x P A BQB x* .

1 1 { } *<sup>T</sup> Ex x k k* − *k N* 1/ and / / 1/ 1/ ( ) *<sup>T</sup> kN kk k k k k N k P GP G* .

1/ 2/ 1 2, / 1

ˆ 0 ˆ ( ) ˆ

(ii) Derive expressions for the two predicted error covariance submatrices of interest.

(i) With the quantities defined in Section 4 and the assumptions 1/ ˆ *k N x* ~ / ( ˆ *Ak kN*

1 1 1

(i) Use the Matrix Inversion Lemma to obtain Rauch, Tung and Striebel's smoother

/ / 1/ 1/ ˆˆ ˆ ˆ ( ) *kN kk k k N k k x x Gx x* .

"My invention, (the motion picture camera), can be exploited for a certain time as a scientific curiosity,

but apart from that it has no commercial value whatsoever." *August Marie Louis Nicolas Lumiere*

(ii) Employ the additional assumptions / / { } ˆ*<sup>T</sup> Ex x kk kk* = 0, 1/ 1/ { } ˆ*<sup>T</sup> Ex x k Nk N* and 1/ / { } ˆ*<sup>T</sup> Ex x k N kN* to show that 1/ 1/ { } ˆ ˆ*<sup>T</sup> Ex x k kk k* = 1 1 { } *<sup>T</sup> Ex x k k* − *Pk k* 1/ , 1/ 1/ { } ˆ ˆ*<sup>T</sup> Ex x k Nk N* =

*x* , / ) *Pk k* , use the maximum-likelihood method to derive

*k k n k k k k k kk*

*x I x K z Cx*

(iii) Use *Gk* = <sup>1</sup> ( *Ak I* − <sup>1</sup> 1/ ) *<sup>T</sup> BQB P k kkk k* and 1/ ˆ *k N x* = 1/ 1 ˆ *k k x* + *Pk kk N* 1/ 1/ to confirm that the smoothed estimate within

$$
\begin{bmatrix}
\hat{\boldsymbol{\mathfrak{x}}}\_{k+1/N} \\
\boldsymbol{\mathcal{A}}\_{k/N}
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{A}\_{k} & \boldsymbol{B}\_{k}\boldsymbol{Q}\_{k}\boldsymbol{B}\_{k}^{T} \\
\end{bmatrix} \begin{bmatrix}
\hat{\boldsymbol{\mathfrak{x}}}\_{k+1/N} \\
\boldsymbol{\mathcal{A}}\_{k+1/N}
\end{bmatrix} + \begin{bmatrix}
\boldsymbol{0} \\
\boldsymbol{\mathcal{C}}\_{k}^{T}\boldsymbol{R}\_{k}^{-1}\boldsymbol{z}\_{k}
\end{bmatrix}
$$

is equivalent to Rauch, Tung and Striebel's maximum-likelihood solution.

**Problem 3.** Let *α* = <sup>0</sup> *w* denote the output of linear time-varying system having the realisation *xk+1* = *Akxk* + *wk,, γk = xk.* Verify that *Pk − <sup>T</sup> A P A = kk k* <sup>0</sup> *<sup>H</sup> A Pk k +* <sup>1</sup> 0 *<sup>T</sup> P Ak k +*  1 0 0 *<sup>H</sup> Pk* .

**Problem 4.** For the model (1) – (2). assume that *Dk* = 0, *E*{*wk*} = *E*{*vk*} = 0, { }*<sup>T</sup> Ewwk k* = *Qk jk* , { }*<sup>T</sup> Evvj <sup>k</sup>* = *Rk jk* and { }*<sup>T</sup> Ewvk k* = 0. Use the result of Problem 3 to show that ˆ ˆ *<sup>H</sup>* = *<sup>H</sup>* 0 /1 ( *C P k kk* 1/ 0 ) *H T P C kk k* .

**Problem 5.** Under the assumptions of Problem 4, obtain an expression relating ˆ ˆ *<sup>H</sup>* and *<sup>H</sup>* for the case where *Dk* ≠ 0.

**Problem 6.** From *ei* = 2 1  , *<sup>H</sup>* *ei ei =* 1 1 *H* *ei ei +* 2 2 *H* *ei ei* and *ei*2 = − 1 2 *H H Q*  , obtain the optimal realisable smoother solutions for output estimation, input estimation and state estimation problems.

#### **7.9 Glossary**


<sup>&</sup>quot;The cinema is little more than a fad. It's canned drama. What audiences really want to see is flesh and blood on the stage." *Charles Spencer* (*Charlie) Chaplin*

*ei* A system (or map) that operates on the problem inputs *<sup>v</sup> i w* to

produce an estimation error *e*. It is convenient to make use of the factorisation *<sup>H</sup>* *ei ei =* 1 1 *H* *ei ei +* 2 2 *H* *ei ei* , where 2 2 *H* *ei ei* includes the filter/smoother solution and 1 1 *H* *ei ei* is a lower performance bound.

#### **7.10 References**


<sup>&</sup>quot;Who the hell wants to hear actors talk?" *Harry Morris Warner*

Chapter title

Author Name

### **Parameter Estimation**

#### **8.1 Introduction**

1

Smoothing, Filtering and Prediction:

*i*

*ei ei* includes the

*H*

*w* 

to

<sup>172</sup> Estimating the Past, Present and Future

*ei* A system (or map) that operates on the problem inputs *<sup>v</sup>*

*ei ei =* 1 1

[1] B. D. O. Anderson and J. B. Moore, *Optimal Filtering*, Prentice-Hall Inc, Englewood

[2] J. B. Moore, "Fixed-Lag Smoothing Results for Linear Dynamical Systems", *A.T.R.*, vol.

[3] J. B. Moore, "Discrete-Time Fixed-Lag Smoothing Algorithms", *Automatica*, vol. 9, pp.

[4] D. C. Fraser and J. E. Potter, "The Optimum Linear Smoother as a Combination of Two Optimum Linear Filters", *IEEE Transactions on Automatic Control*, vol. AC-14, no. 4, pp.

[5] H. E. Rauch, "Solutions to the Linear Smoothing Problem", *IEEE Transactions on* 

[6] H. E. Rauch, F. Tung and C. T. Striebel, "Maximum Likelihood Estimates of Linear

[7] G. A. Einicke, "Optimal and Robust Noncausal Filter Formulations", *IEEE Transactions* 

[8] G. A. Einicke, "Asymptotic Optimality of the Minimum Variance Fixed-Interval Smoother", *IEEE Transactions on Signal Process.*, vol. 55, no. 4, pp. 1543 – 1547, Apr. 200 [9] G. A. Einicke, J. C. Ralston, C. O. Hargrave, D. C. Reid and D. W. Hainsworth, "Longwall Mining Automation, An Application of Minimum-Variance Smoothing",

[10] H. K. Wimmer, "Monotonicity and Maximality of Solutions of Discrete-Time Algebraic Riccati Equations", *Journal of Mathematics, Systems, Estimation and Control*, vol. 2, no. 2,

[11] H. K. Wimmer and M. Pavon, "A comparison theorem for matrix Riccati difference

[12] L. E. Zachrisson, "On Optimal Smoothing of Continuous Time Kalman Processes,

[13] A. P. Sage and J. L. Melsa, *Estimation Theory with Applications to Communications and* 

[14] A. Asif, "Fast Rauch-Tung-Striebel Smoother-Base Image Restoration for Noncausal Images", *IEEE Signal Processing Letters*, vol. 11, no. 3, pp. 371 – 374, Mar. 2004. [15] T. Kailath, A. H. Sayed and B. Hassibi, *Linear Estimation*, Prentice-Hall, Inc., Upper

[16] R. A. Monzingo, "Discrete Optimal Linear Smoothing for Systems with Uncertain Observations", *IEEE Transactions on Information Theory*, vol. 21, no. 3, pp. 271 – 275, May 1975.

Dynamic Systems", *AIAA Journal*, vol. 3, no. 8, pp. 1445 – 1450, Aug., 1965.

filter/smoother solution and 1 1

factorisation *<sup>H</sup>* 

*Automatic Control*, vol. 8, no. 4, pp. 371 – 372, Oct.1963.

*on Signal Process.,* vol. 54, no. 3, pp. 1069 - 1077, Mar. 2006.

*IEEE Control Systems Magazine*, vol. 28, no. 6, pp. 28 – 37, Dec. 2008.

equations", *Systems & Control Letters*, vol. 19, pp. 233 – 239, 1992.

*Information Sciences*, vol. 1, pp. 143 – 172, 1969.

"Who the hell wants to hear actors talk?" *Harry Morris Warner*

Saddle River, New Jersey, 2000.

*Control*, McGraw-Hill Book Company, New York, 1971.

produce an estimation error *e*. It is convenient to make use of the

*H*

*H*

*ei ei* , where 2 2

*ei ei* is a lower performance bound.

*ei ei +* 2 2

*H*

**7.10 References** 

Cliffs, New Jersey, 1979.

7, no. 2, pp. 16 – 21, 1973.

387 – 390, Aug., 1969.

pp. 219 – 235, 1992

163 – 173, 1973.

Predictors, filters and smoothers have previously been described for state recovery under the assumption that the parameters of the generating models are correct. More often than not, the problem parameters are unknown and need to be identified. This section describes some standard statistical techniques for parameter estimation. Paradoxically, the discussed parameter estimation methods rely on having complete state information available. Although this is akin to a chicken-and-egg argument (state availability obviates the need for filters along with their attendant requirements for identified models), the task is not insurmountable.

Parameter Estimation 173

The role of solution designers is to provide a cost benefit. That is, their objectives are to deliver improved performance at an acceptable cost. Inevitably, this requires simplifications so that the problems become sufficiently tractable and amenable to feasible solution. For example, suppose that speech emanating from a radio is too noisy and barely intelligible. In principle, high-order models could be proposed to equalise the communication channel, demodulate the baseband signal and recover the phonemes. Typically, low-order solutions tend to offer better performance because of the difficulty in identifying large numbers of parameters under low-SNR conditions. Consider also the problem of monitoring the output of a gas sensor and triggering alarms when environmental conditions become hazardous. Complex models could be constructed to take into account diurnal pressure variations, local weather influences and transients due to passing vehicles. It often turns out that low-order solutions exhibit lower false alarm rates because there are fewer assumptions susceptible to error.

Thus, the absence of complete information need not inhibit solution development. Simple schemes may suffice, such as conducting trials with candidate parameter values and assessing the consequent error performance.

In maximum-likelihood estimation [1] – [5], unknown parameters *θ1*, *θ2*, …, *θM*, are identified given states, *xk*, by maximising a log-likelihood function, log *f*(*θ1*, *θ2*, …, *<sup>M</sup>* | )*<sup>k</sup> x* . For example, the subject of noise variance estimation was studied by Mehra in [6], where maximum-likelihood estimates (MLEs) were updated using the Newton-Raphson method. Rife and Boorstyn obtained Cramér-Rao bounds for some MLEs, which "indicate the best estimation that can be made with the available data" [7]. Nayak *et al* used the pseudo-

<sup>&</sup>quot;The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work" *John Von Neuman*

**8.2 Maximum-Likelihood Estimation** 

A solution can be found by setting (| )*<sup>k</sup> <sup>p</sup>*

 *x* 

considerably simplifies the equations to be solved.

samples of a discrete random variable *xk*. An estimate, ˆ

function of all the observations is the product of the densities

by solving for a *θ* that satisfies log ( | )*<sup>k</sup> <sup>f</sup>*

argument *θ* that maximises the probability density function, that is,

<sup>ˆ</sup> arg max ( | )*<sup>k</sup> <sup>p</sup> <sup>x</sup>* 

> *x*

<sup>ˆ</sup> arg max log ( | )*<sup>k</sup> <sup>p</sup> <sup>x</sup>*

<sup>1</sup> <sup>2</sup> ( | ) ( | )( | ) ( | ) *<sup>k</sup> <sup>N</sup> f x p xp x p x*

(| )

<sup>ˆ</sup> arg max log ( | )*<sup>k</sup> f x*

1 arg max ( | ) *N*

*k*

 *x* 

*k*

1

by no means agree in their investigation thereof." *Nicolaus Copernicus*"

 ,

*k p x*

*K*

logarithm function is monotonic, a solution may be found equivalently by maximising

Let *p*(*θ|xk*) denote the probability density function of an unknown parameter *θ*, given

Suppose that *N* mutually independent samples of *xk* are available, then the joint density

which serves as a likelihood function. The MLE of *θ* may be found maximising the log-

*p x*

 

likelihood approach is applicable to a wide range of distributions. For example, the task of estimating the intensity of a Poisson distribution from measurements is demonstrated

"Therefore I would not have it unknown to Your Holiness, the only thing which induced me to look for another way of reckoning the movements of the heavenly bodies was that I knew that mathematicians

*k*

= <sup>1</sup>

log ( | ) *N*

*k p x*

*k*

= 0. For exponential families of distributions, the use of (2)

 

, can be obtained by finding the

. (1)

(2)

(3)

(4)

= 0. The above maximum-

= 0 and solving for the unknown *θ*. Since the

**8.2.1General Method** 

and setting log ( | )*<sup>k</sup> <sup>p</sup>*

below.

likelihood

inverse to estimate unknown parameters in [8]. Belangér subsequently employed a leastsquares approach to estimate the process noise and measurement noise variances [9]. A recursive technique for least-squares parameter estimation was developed by Strejc [10]. Dempster, Laird and Rubin [11] proved the convergence of a general purpose technique for solving joint state and parameter estimation problems, which they called the expectation-maximization (EM) algorithm. They addressed problems where complete (state) information is not available to calculate the log-likelihood and instead maximised the expectation of 1 2 log ( , ,..., | ) *<sup>M</sup> <sup>k</sup> f z* , given incomplete measurements, *zk*. That is, by virtue of Jensen's inequality the unknowns are found by using an objective function (also called an approximate log-likelihood function), 1 2 *E*{log ( , ,..., | )} *M k f z* , as a surrogate for

log *f*(*θ1*, *θ2*, …, *<sup>M</sup>* | )*<sup>k</sup> x* .

The system identification literature is vast and some mature techniques have evolved. It is acknowledged that subspace identification methods have been developed for general problems where a system's stochastic inputs, deterministic inputs and outputs are available. The subspace algorithms [12] – [14] consist of two steps. First, the order of the system is identified from stacked vectors of the inputs and outputs. Then the unknown parameters are determined from an extended observability matrix.

Continuous-time maximum-likelihood estimation has been mentioned previously. Here, the attention is focussed on the specific problem of joint state and parameter estimation exclusively from discrete measurements of a system's outputs. The developments proceed as follows. Section 8.2 reviews the maximum-likelihood estimation method for obtaining unknown parameters. The same estimates can be found using the method of least squares, which was pioneered by Gauss for fitting astronomical observations. Well known (filtering) EM algorithms for variance and state matrix estimation are described in Section 8.3. Improved parameter estimation accuracy can be obtained via smoothing EM algorithms, which are introduced in Section 8.4.

The filtering and smoothing EM algorithms discussed herein require caution. When perfect state information is available, the corresponding likelihood functions are exact. However, the use of imperfect state estimates leads to approximate likelihood functions, approximate Cramér-Rao bounds and biased MLEs. When the SNR is sufficiently high and the states are recovered exactly, the bias terms within the state matrix elements and process noise variances diminish to zero. Consequently, process noise variance and state matrix estimation is recommended only when the measurement noise is negligible. Conversely, measurement noise variance estimation is advocated when the SNR is sufficiently low.

<sup>&</sup>quot;A hen is only an egg's way of making another egg." *Samuel Butler*

Smoothing, Filtering and Prediction -

*z* , as a surrogate for

*z* , given incomplete measurements, *zk*. That is, by

 

<sup>174</sup> Estimating the Past, Present and Future

inverse to estimate unknown parameters in [8]. Belangér subsequently employed a leastsquares approach to estimate the process noise and measurement noise variances [9]. A recursive technique for least-squares parameter estimation was developed by Strejc [10]. Dempster, Laird and Rubin [11] proved the convergence of a general purpose technique for solving joint state and parameter estimation problems, which they called the expectation-maximization (EM) algorithm. They addressed problems where complete (state) information is not available to calculate the log-likelihood and instead maximised

virtue of Jensen's inequality the unknowns are found by using an objective function (also

The system identification literature is vast and some mature techniques have evolved. It is acknowledged that subspace identification methods have been developed for general problems where a system's stochastic inputs, deterministic inputs and outputs are available. The subspace algorithms [12] – [14] consist of two steps. First, the order of the system is identified from stacked vectors of the inputs and outputs. Then the unknown parameters

Continuous-time maximum-likelihood estimation has been mentioned previously. Here, the attention is focussed on the specific problem of joint state and parameter estimation exclusively from discrete measurements of a system's outputs. The developments proceed as follows. Section 8.2 reviews the maximum-likelihood estimation method for obtaining unknown parameters. The same estimates can be found using the method of least squares, which was pioneered by Gauss for fitting astronomical observations. Well known (filtering) EM algorithms for variance and state matrix estimation are described in Section 8.3. Improved parameter estimation accuracy can be obtained via smoothing EM algorithms,

The filtering and smoothing EM algorithms discussed herein require caution. When perfect state information is available, the corresponding likelihood functions are exact. However, the use of imperfect state estimates leads to approximate likelihood functions, approximate Cramér-Rao bounds and biased MLEs. When the SNR is sufficiently high and the states are recovered exactly, the bias terms within the state matrix elements and process noise variances diminish to zero. Consequently, process noise variance and state matrix estimation is recommended only when the measurement noise is negligible. Conversely, measurement noise variance estimation is advocated when the SNR is

the expectation of 1 2 log ( , ,..., | ) *<sup>M</sup> <sup>k</sup> f*

log *f*(*θ1*, *θ2*, …, *<sup>M</sup>* | )*<sup>k</sup>*

*x* .

which are introduced in Section 8.4.

sufficiently low.

are determined from an extended observability matrix.

"A hen is only an egg's way of making another egg." *Samuel Butler*

 

called an approximate log-likelihood function), 1 2 *E*{log ( , ,..., | )} *M k f*

#### **8.2 Maximum-Likelihood Estimation**

#### **8.2.1General Method**

Let *p*(*θ|xk*) denote the probability density function of an unknown parameter *θ*, given samples of a discrete random variable *xk*. An estimate, ˆ , can be obtained by finding the argument *θ* that maximises the probability density function, that is,

$$\bar{\theta} = \underset{\theta}{\text{arg}\,\text{max}} = p(\theta \mid \mathbf{x}\_k) \,. \tag{1}$$

A solution can be found by setting (| )*<sup>k</sup> <sup>p</sup> x* = 0 and solving for the unknown *θ*. Since the logarithm function is monotonic, a solution may be found equivalently by maximising

$$\hat{\theta} = \underset{\theta}{\text{arg}\,\text{max}} = \log p(\theta \mid \mathbf{x}\_k) \tag{2}$$

and setting log ( | )*<sup>k</sup> <sup>p</sup> x* = 0. For exponential families of distributions, the use of (2) considerably simplifies the equations to be solved.

Suppose that *N* mutually independent samples of *xk* are available, then the joint density function of all the observations is the product of the densities

$$\begin{aligned} f(\boldsymbol{\theta} \mid \mathbf{x}\_k) &= p(\boldsymbol{\theta} \mid \mathbf{x}\_1) p(\boldsymbol{\theta} \mid \mathbf{x}\_2) \cdots p(\boldsymbol{\theta} \mid \mathbf{x}\_N) \\ &= \prod\_{k=1}^{K} p(\boldsymbol{\theta} \mid \mathbf{x}\_k) \, \, \end{aligned} \tag{3}$$

which serves as a likelihood function. The MLE of *θ* may be found maximising the loglikelihood

$$\begin{aligned} \hat{\theta} &= \underset{\theta}{\text{arg}\,\text{max}} \; \log f(\theta \mid \mathbf{x}\_k) \\\\ &= \underset{\theta}{\text{arg}\,\text{max}} \; \sum\_{k=1}^{N} p(\theta \mid \mathbf{x}\_k) \end{aligned} \tag{4}$$

by solving for a *θ* that satisfies log ( | )*<sup>k</sup> <sup>f</sup> x* = <sup>1</sup> log ( | ) *k k p x* = 0. The above maximum-

likelihood approach is applicable to a wide range of distributions. For example, the task of estimating the intensity of a Poisson distribution from measurements is demonstrated below.

<sup>&</sup>quot;Therefore I would not have it unknown to Your Holiness, the only thing which induced me to look for another way of reckoning the movements of the heavenly bodies was that I knew that mathematicians by no means agree in their investigation thereof." *Nicolaus Copernicus*"

set out below.

Setting 0 1

solving the *M* equations

0 log ( | ) *<sup>k</sup> fa x a* 

*f x*

/ 2 / 2 1 1

. (9)

/ 2 2 2

*x ax*

1

 .

 

*k*

 = <sup>2</sup> 0 1 *N k k a x* 

 

> *a x* <sup>2</sup> *<sup>w</sup>* ) that

which results in the MLE

*R x Rx*

An example of estimating a model coefficient using the Gaussian log-likelihood approach is

*Example 2.* Consider an autoregressive order-one process *xk+*1 = *a0xk* + *wk* in which it is

0 1 1 0

*<sup>N</sup> N N k w w k k*

*k k*

*x x* 

<sup>1</sup> log ( | ) log (2 ) ( ) <sup>2</sup>

1 *N*

1 1

*x x*

 

*k k*

2 1

.

*x*

Often within filtering and smoothing applications there are multiple parameters to be identified. Denote the unknown parameters by *θ*1, *θ*2, …, *θM*, then the MLEs may be found by

> *x*

> *x*

*k*

*N*

*k N k k*

*k*

*<sup>N</sup> <sup>N</sup> <sup>N</sup> <sup>T</sup> k xx k xx k*

<sup>1</sup> log ( ) log (2 ) ( )( ) <sup>2</sup>

desired to estimate *a*<sup>0</sup> from samples of *xk*. It follows from *xk*+1 ~ <sup>0</sup> ( , *<sup>k</sup>*

0

 

1 2 1 log ( , , | ) <sup>0</sup> *M k <sup>f</sup>* 

1 2 2 log ( , , | ) <sup>0</sup> *M k <sup>f</sup>* 

> *f*

An vector parameter estimation example is outlined below. *Example 3.* Consider the third-order autoregressive model

which can be written in the state-space form

hope of advance." *Orville Wright*

 

1 2 log ( , , | ) 0. *M k M*

"If we all worked on the assumption that what is accepted as true is really true, there would be little

 

 *x*

*<sup>k</sup>* 3 2 2 11 0 *<sup>k</sup> <sup>k</sup> k k x ax ax ax w* (10)

 

ˆ

*a*

*fa x*

equal to zero gives <sup>1</sup>

*Example 1.* Suppose that *N* observations of integer *xk* have a Poisson distribution ( ) ! *xk k k <sup>e</sup> f x x* , where the intensity, *μ,* is unknown. The corresponding log-likelihood

function is

$$\log f(\boldsymbol{\mu} \mid \mathbf{x}\_{\boldsymbol{\lambda}}) = \log \left( \frac{e^{-\boldsymbol{\mu}} \boldsymbol{\mu}^{\boldsymbol{\lambda}\_1}}{\mathbf{x}\_1!} \frac{e^{-\boldsymbol{\mu}} \boldsymbol{\mu}^{\boldsymbol{\lambda}\_2}}{\mathbf{x}\_2!} \frac{e^{-\boldsymbol{\mu}} \boldsymbol{\mu}^{\boldsymbol{\lambda}\_3}}{\mathbf{x}\_3!} \dots \frac{e^{-\boldsymbol{\mu}} \boldsymbol{\mu}^{\boldsymbol{\lambda}\_N}}{\mathbf{x}\_N!} \right)$$

$$= \log \left( \frac{1}{\mathbf{x}\_1! \mathbf{x}\_2! \cdots \mathbf{x}\_N!} \boldsymbol{\mu}^{\boldsymbol{\lambda}\_1 + \mathbf{x}\_2 + \dots + \mathbf{x}\_N} e^{-\boldsymbol{\lambda} \boldsymbol{\mu}} \right) \tag{5}$$

$$=-\log(\mathfrak{x}\_1!\mathfrak{x}\_2!\cdots\mathfrak{x}\_N!) + \log(\mu^{\mathfrak{x}\_1+\mathfrak{x}\_2+\cdots+\mathfrak{x}\_N}) - N\mu \perp$$

Taking 1 log ( | ) <sup>1</sup> <sup>0</sup> *N k k k f x x N* yields

$$
\hat{\mu} = \frac{1}{N} \sum\_{k=1}^{N} \mathbf{x}\_k \ . \tag{6}
$$

Since 2 2 log ( | )*<sup>k</sup> f x* = <sup>2</sup> 1 1 *<sup>N</sup> k k x* is negative for all *μ* and *xk* ≥ 0, the stationary point (6)

occurs at a maximum of (5). That is to say, ˆ is indeed a maximum-likelihood estimate.

#### **8.2.2 State Matrix Estimation**

From the Central Limit Theorem, which was mentioned in Chapter 6, the mean of a sufficiently large sample of independent identically distributed random variables will asymptotically approach a normal distribution. Consequently, in many maximumlikelihood estimation applications it is assumed that random variables are normally distributed. Recall that the normal (or Gaussian) probability density function of a discrete random variable *xk* with mean *μ* and covariance *Rxx* is

$$p(\mathbf{x}\_k) = \frac{1}{(2\pi)^{N/2} \left| R\_{\mathbf{x}\mathbf{x}} \right|^{1/2}} \exp\left\{ -\frac{1}{2} (\mathbf{x}\_k - \boldsymbol{\mu})^T R\_{\mathbf{x}\mathbf{x}}^{-1} (\mathbf{x}\_k - \boldsymbol{\mu}) \right\},\tag{7}$$

in which *Rxx* denotes the determinant of *Rxx* . A likelihood function for a sample of *N* independently identically distributed random variables is

$$f(\mathbf{x}\_k) = \prod\_{k=1}^{N} p(\mathbf{x}\_k) = \frac{1}{(2\pi)^{N/2} \left| R\_{\text{xx}} \right|^{N/2}} \sum\_{k=1}^{N} \exp\left\{ -\frac{1}{2} (\mathbf{x}\_k - \boldsymbol{\mu})^T R\_{\text{xx}}^{-1} (\mathbf{x}\_k - \boldsymbol{\mu}) \right\}. \tag{8}$$

In general, it is more convenient to work with the log-likelihood function

<sup>&</sup>quot;How wonderful that we have met with a paradox. Now we have some hope of making progress." *Niels Henrik David Bohr*

Smoothing, Filtering and Prediction -

(5)

<sup>176</sup> Estimating the Past, Present and Future

*Example 1.* Suppose that *N* observations of integer *xk* have a Poisson distribution

*x x x xN*

 

*xxx x*

123

*N*

<sup>1</sup> <sup>ˆ</sup> *<sup>N</sup>*

*N* 

1 2 log( ! ! !) log( ) *xx xN*

1

From the Central Limit Theorem, which was mentioned in Chapter 6, the mean of a sufficiently large sample of independent identically distributed random variables will asymptotically approach a normal distribution. Consequently, in many maximumlikelihood estimation applications it is assumed that random variables are normally distributed. Recall that the normal (or Gaussian) probability density function of a discrete

*k k x*

.

log ( | ) log !!! !

1 2 <sup>1</sup> log !! !

*xx x*

*<sup>N</sup> xx x*

*eee e f x*

, where the intensity, *μ,* is unknown. The corresponding log-likelihood

*e*

*xx xN N*

*N*

*N*

. (6)

ˆ is indeed a maximum-likelihood estimate.

is negative for all *μ* and *xk* ≥ 0, the stationary point (6)

1

. (8)

distributed

1

, (7)

*T*

  

( ) !

*k*

Taking

Since

*Niels Henrik David Bohr*

2

*<sup>e</sup> f x*

function is

*xk*

<sup>1</sup> <sup>2</sup> <sup>3</sup>

1 2

log ( | ) <sup>1</sup> <sup>0</sup> *N*

 yields

= <sup>2</sup>

occurs at a maximum of (5). That is to say,

random variable *xk* with mean *μ* and covariance *Rxx* is

1/ 2 / 2

*R*

independently identically distributed random variables is

/ 2 / 2 <sup>1</sup> <sup>1</sup>

*xx*

(2 ) 2

<sup>1</sup> <sup>1</sup> ( ) exp ( ) ( )

<sup>1</sup> <sup>1</sup> () () exp ( ) ( ) (2 ) 2 *<sup>N</sup> <sup>N</sup> <sup>T</sup> <sup>k</sup> <sup>k</sup> <sup>N</sup> <sup>k</sup> xx k <sup>N</sup> <sup>k</sup> <sup>k</sup> xx f x px x Rx R*

In general, it is more convenient to work with the log-likelihood function

in which *Rxx* denotes the determinant of *Rxx* . A likelihood function for a sample of *N*

"How wonderful that we have met with a paradox. Now we have some hope of making progress."

*<sup>k</sup> <sup>k</sup> xx k <sup>N</sup>*

*p x x Rx*

*k*

2 log ( | )*<sup>k</sup> f x* 

**8.2.2 State Matrix Estimation** 

1 2

1

*k k f x x N*

> 1 1 *<sup>N</sup>*

*k k x*

*k*

*k*

*x* 

$$\log f(\mathbf{x}\_k) = -\log \left( 2\pi \right)^{N/2} \left| \mathbf{R}\_{\text{xx}} \right|^{N/2} - \frac{1}{2} \sum\_{k=1}^{N} (\mathbf{x}\_k - \boldsymbol{\mu})^T \mathbf{R}\_{\text{xx}}^{-1} (\mathbf{x}\_k - \boldsymbol{\mu}) \,. \tag{9}$$

An example of estimating a model coefficient using the Gaussian log-likelihood approach is set out below.

*Example 2.* Consider an autoregressive order-one process *xk+*1 = *a0xk* + *wk* in which it is desired to estimate *a*<sup>0</sup> from samples of *xk*. It follows from *xk*+1 ~ <sup>0</sup> ( , *<sup>k</sup> a x* <sup>2</sup> *<sup>w</sup>* ) that

$$\log f(a\_0 \mid \mathbf{x}\_{k+1}) = -\log \left(2\pi\right)^{N/2} \sigma\_w^N - \frac{1}{2} \sum\_{k=1}^N \sigma\_w^{-2} \left(\mathbf{x}\_{k+1} - a\_0 \mathbf{x}\_k\right)^2 \dots$$

Setting 0 1 0 log ( | ) *<sup>k</sup> fa x a* equal to zero gives <sup>1</sup> 1 *N k k k x x* = <sup>2</sup> 0 1 *N k k a x* which results in the MLE

$$
\hat{a}\_0 = \frac{\sum\_{k=1}^N \mathbf{x}\_{k+1} \mathbf{x}\_k}{\sum\_{k=1}^N \mathbf{x}\_k^2}.
$$

Often within filtering and smoothing applications there are multiple parameters to be identified. Denote the unknown parameters by *θ*1, *θ*2, …, *θM*, then the MLEs may be found by solving the *M* equations

$$\frac{\hat{\boldsymbol{\mathcal{O}}} \log f(\boldsymbol{\theta\_{1}}, \boldsymbol{\theta\_{2}}, \cdots \boldsymbol{\theta\_{M}} \mid \boldsymbol{x\_{k}})}{\hat{\boldsymbol{\mathcal{O}}} \boldsymbol{\theta\_{1}}} = \mathbf{0}$$
 
$$\frac{\hat{\boldsymbol{\mathcal{O}}} \log f(\boldsymbol{\theta\_{1}}, \boldsymbol{\theta\_{2}}, \cdots \boldsymbol{\theta\_{M}} \mid \boldsymbol{x\_{k}})}{\hat{\boldsymbol{\mathcal{O}}} \boldsymbol{\theta\_{2}}} = \mathbf{0}$$

$$\begin{aligned} \vdots \\ \frac{\partial \log f(\theta\_l, \theta\_2, \dots, \theta\_M \mid x\_k)}{\partial \theta\_M} = 0 \dots$$

An vector parameter estimation example is outlined below. 

*Example 3.* Consider the third-order autoregressive model

$$\mathbf{x}\_{k+3} + a\_2 \mathbf{x}\_{k+2} + a\_1 \mathbf{x}\_{k+1} + a\_0 \mathbf{x}\_k = \mathbf{w}\_k \tag{10}$$

which can be written in the state-space form

<sup>&</sup>quot;If we all worked on the assumption that what is accepted as true is really true, there would be little hope of advance." *Orville Wright*

2 2 1 <sup>1</sup> <sup>ˆ</sup> ( ) *N w k k x*

2 2 1 <sup>1</sup> <sup>ˆ</sup> ( ) <sup>1</sup> *N w k k x*

<sup>2</sup> <sup>2</sup>

<sup>2</sup> <sup>2</sup> ( { } { })

*<sup>w</sup>* + <sup>2</sup> *x* , <sup>2</sup> { { }} *EExk* = <sup>2</sup> { } *E*

 *<sup>w</sup>* = <sup>2</sup> 

> ( , 2 2 *w s*

. As is pointed out in [1], since ˆ

"Everyone hears only what he understands." *Johann Wolfgang von Goethe*

2 2 1

 

> 1 <sup>1</sup> <sup>2</sup> 1 *N*

.

 *<sup>w</sup>* + <sup>2</sup> *E*{ }

herein that the sample size is sufficiently large so that *N*-1 ≈ (*N* - 1)-1 and (15) may be approximated by (14). A caution about modelling error contributing bias is mentioned

*Example 5.* Suppose that the states considered in Example 4 are actually generated by *xk* = *μ* + *wk* + *sk*, where *sk* is an independent input that accounts for the presence of modelling error.

> *<sup>w</sup>* .

The Cramér-Rao Lower Bound (CRLB) establishes a limit of precision that can be achieved for any unbiased estimate of a parameter *θ*. It actually defines a lower bound for the

equals the parameter error variance. Determining lower bounds for parameter error

 

*k E xx*

1 *<sup>k</sup> <sup>N</sup> Ex E N*

<sup>1</sup> { } ( ) <sup>1</sup> *N*

*w k k*

*N*

*EE x N*

If the random samples are taken from a population without replacement, the samples are not independent, the covariance between two different samples is nonzero and the MLE (14) is biased. If the sampling is done with replacement then the sample values are independent

The corrected denominator within the above sample variance is only noticeable for small sample sizes, as the difference between (14) and (15) is negligible for large *N*. The MLE (15) is unbiased, that is, its expected value equals the variance of the population. To confirm this

*k k*

, <sup>2</sup> { { }} *EExk* = <sup>2</sup> *E*{ }

*<sup>w</sup>* as required. Unless stated otherwise, it is assumed

) leads to 2 2 <sup>2</sup>

*w s k k x*

*N*

 = <sup>2</sup> 

1 <sup>1</sup> ˆ ˆ ( ) *N*

is assumed to be unbiased, the variance

, in which

 and <sup>2</sup> { } *E <sup>w</sup>* =

(16)

, without replacement. (14)

, with replacement. (15)

*N*

*N*

and the following correction applies

property, observe that

Using <sup>2</sup> { } *E xk* = <sup>2</sup>

2 / 

below.

variance <sup>2</sup>

2 ˆ 

ˆ of ˆ 

*<sup>w</sup> N* within (16) yields <sup>2</sup> { } *E*

In this case, the assumption *xk* ~

**8.2.4Cramér-Rao Lower Bound** 

case (14) is no longer an unbiased estimate of <sup>2</sup>

$$
\begin{bmatrix} \mathbf{x}\_{1,k+1} \\ \mathbf{x}\_{2,k+1} \\ \mathbf{x}\_{3,k+1} \end{bmatrix} = \begin{bmatrix} -a\_2 & -a\_1 & -a\_0 \\ 1 & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & 1 & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{x}\_{1,k} \\ \mathbf{x}\_{2,k} \\ \mathbf{x}\_{3,k} \end{bmatrix} + \begin{bmatrix} \mathbf{w}\_k \\ \mathbf{0} \\ \mathbf{0} \end{bmatrix}.\tag{11}
$$

Assuming 1, 1 *<sup>k</sup> x* ~ 2 1, 1 2, 0 3, ( *kkk ax ax ax* , <sup>2</sup> *<sup>w</sup>* ) and setting to zero the partial derivatives of the corresponding log-likelihood function with respect to *a*0, *a*1 and *a*2 yields

$$
\begin{bmatrix}
\sum\_{k=1}^{N} \mathbf{x}\_{3,k}^{2} & \sum\_{k=1}^{N} \mathbf{x}\_{2,k} \mathbf{x}\_{3,k} & \sum\_{k=1}^{N} \mathbf{x}\_{1,k} \mathbf{x}\_{3,k} \\
\sum\_{k=1}^{N} \mathbf{x}\_{1,k} \mathbf{x}\_{3,k} & \sum\_{k=1}^{N} \mathbf{x}\_{2,k} \mathbf{x}\_{1,k} & \sum\_{k=1}^{N} \mathbf{x}\_{1,k}^{2}
\end{bmatrix}
\begin{bmatrix}
a\_{0} \\
a\_{1} \\
a\_{2} \\
\end{bmatrix} = \left|
\sum\_{k=1}^{N} \mathbf{x}\_{1,k+1} \mathbf{x}\_{2,k} \right| . \tag{12}
$$

Hence, the MLEs are given by

$$
\begin{bmatrix} \widehat{\boldsymbol{a}}\_{0} \\ \widehat{\boldsymbol{a}}\_{1} \\ \widehat{\boldsymbol{a}}\_{2} \end{bmatrix} = - \begin{bmatrix} \sum\_{k=1}^{N} \mathbf{x}\_{3,k}^{2} & \sum\_{k=1}^{N} \mathbf{x}\_{2,k} \mathbf{x}\_{3,k} & \sum\_{k=1}^{N} \mathbf{x}\_{1,k} \mathbf{x}\_{3,k} \\ - \left[ \sum\_{k=1}^{N} \mathbf{x}\_{2,k} \mathbf{x}\_{3,k} & \sum\_{k=1}^{N} \mathbf{x}\_{2,k}^{2} & \sum\_{k=1}^{N} \mathbf{x}\_{2,k} \mathbf{x}\_{1,k} \\ \sum\_{k=1}^{N} \mathbf{x}\_{1,k} \mathbf{x}\_{3,k} & \sum\_{k=1}^{N} \mathbf{x}\_{2,k} \mathbf{x}\_{1,k} & \sum\_{k=1}^{N} \mathbf{x}\_{1,k}^{2} \end{bmatrix}^{-1} \begin{bmatrix} \sum\_{k=1}^{N} \mathbf{x}\_{1,k+1} \mathbf{x}\_{3,k} \\ \sum\_{k=1}^{N} \mathbf{x}\_{1,k+1} \mathbf{x}\_{2,k} \\ \sum\_{k=1}^{N} \mathbf{x}\_{1,k+1} \mathbf{x}\_{1,k} \\ \vdots \\ \sum\_{k=1}^{N} \mathbf{x}\_{1,k+1} \mathbf{x}\_{k,k} \end{bmatrix}. \tag{13}$$

#### **8.2.3Variance Estimation**

MLEs can be similarly calculated for unknown variances, as is demonstrated by the following example.

*Example 4.* Consider a random variable generated by *xk* = *μ* + *wk* where *μ* is fixed and *wk* is assumed to be a zero-mean Gaussian white input sequence. Since *xk* ~ ( , 2 *<sup>w</sup>* ), it follows that

$$\log f(\sigma\_w^2 \mid \mathbf{x}\_k) = -\frac{N}{2}\log 2\pi - \frac{N}{2}\log \sigma\_w^2 - \frac{1}{2}\sigma\_w^{-2}\sum\_{k=1}^N (\mathbf{x}\_k - \mu)^2$$

and

$$\frac{\partial \log f(\boldsymbol{\sigma}\_w^2 \mid \boldsymbol{x}\_k)}{\partial \boldsymbol{\sigma}\_w^2} = -\frac{N}{2} (\boldsymbol{\sigma}\_w^2)^{-1} + \frac{1}{2} (\boldsymbol{\sigma}\_w^2)^{-2} \sum\_{k=1}^N (\boldsymbol{x}\_k - \boldsymbol{\mu})^2 \cdot \boldsymbol{\Lambda}$$

From the solution of 2 2 log ( | ) *w k w f x* = 0, the MLE is

<sup>&</sup>quot;In science one tries to tell people, in such a way as to be understood by everyone, something that noone knew before. But in poetry, it's the exact opposite." *Paul Adrien Maurice Dirac*

Smoothing, Filtering and Prediction -

. (11)

. (12)

. (13)

 ( , 

*<sup>w</sup>* ) and setting to zero the partial derivatives

 

 

1

 *x* 

 

*k*

*x*

2 1 2 2 2

*w w k*

1

*N*

*N*

<sup>178</sup> Estimating the Past, Present and Future

100 0 010 0

0

*a*

2

1

*a*

3, 2, 3, 1, 3, 1, 1 3, 1 1 1 1

*x xx xx x x*

*k k k k k k k*

2, 3, 2, 2, 1, 1 1, 1 2,

*xx x xx a x x*

*xx xx x x x*

*N N N N*

*k k k k N N N N*

*k k k k k k k*

2 1, 3, 2, 1, 1, 1, 1 1, 1 1 1 1

> 3, 2, 3, 1, 3, 1, 1 3, 1 1 1 1

*x xx xx x x*

*k k k k k k k*

*k k k k k k k*

MLEs can be similarly calculated for unknown variances, as is demonstrated by the

*Example 4.* Consider a random variable generated by *xk* = *μ* + *wk* where *μ* is fixed and

<sup>1</sup> log ( | ) log 2 log ( ) 2 22

log ( | ) <sup>1</sup> () () ( ) 2 2

"In science one tries to tell people, in such a way as to be understood by everyone, something that no-

.

*w k*

*w k w wk*

2 2 2 2

 

*wk* is assumed to be a zero-mean Gaussian white input sequence. Since *xk* ~

*xx xx x x x*

*k k k k k k k*

2 1, 3, 2, 1, 1, 1, 1 1, 1 1 1 1

*k k k k k k k*

*k k k*

*x a a ax w*

of the corresponding log-likelihood function with respect to *a*0, *a*1 and *a*2 yields

*N N N N*

*k k k k N N N N*

1 1 1 1

*k k k k N N N N*

*k k k k*

2 1 2, 3, 2, 2, 1, 1, 1 2, 1 1 1 1

 

*N N f x*

2

*w k*

*f x N*

= 0, the MLE is

one knew before. But in poetry, it's the exact opposite." *Paul Adrien Maurice Dirac*

2

2 2 log ( | ) *w k w f x* 

*k k k k N N N N*

*a xx x xx x x*

*k k k k*

1, 1 2 1 0 1, 2, 1 2, 3, 1 3,

*x x x x*

*ax ax ax* , <sup>2</sup>

2

2

 

Assuming 1, 1 *<sup>k</sup> x* ~ 2 1, 1 2, 0 3, ( *kkk* 

2

Hence, the MLEs are given by

0

ˆ ˆ ˆ

*a*

2

*a*

**8.2.3Variance Estimation** 

following example.

*<sup>w</sup>* ), it follows that

From the solution of

2 

and

*k k k k*

$$
\hat{\sigma}\_w^2 = \frac{1}{N} \sum\_{k=1}^N (\mathbf{x}\_k - \boldsymbol{\mu})^2 \text{ / without replacement.} \tag{14}
$$

If the random samples are taken from a population without replacement, the samples are not independent, the covariance between two different samples is nonzero and the MLE (14) is biased. If the sampling is done with replacement then the sample values are independent and the following correction applies

$$
\hat{\sigma}\_w^2 = \frac{1}{N-1} \sum\_{k=1}^N (\mathbf{x}\_k - \mu)^2 \text{ , with replacement.} \tag{15}
$$

The corrected denominator within the above sample variance is only noticeable for small sample sizes, as the difference between (14) and (15) is negligible for large *N*. The MLE (15) is unbiased, that is, its expected value equals the variance of the population. To confirm this property, observe that

$$E\{\sigma\_w^2\} = E\left\{\frac{1}{N-1}\sum\_{k=1}^N (\mathbf{x}\_k - \mu)^2\right\}$$

$$= E\left\{\frac{1}{N-1}\sum\_{k=1}^N \mathbf{x}\_k^2 - 2\mu\mathbf{x}\_k + \mu^2\right\}\tag{16}$$

$$= \frac{N}{N-1}(E\{\mathbf{x}\_k^2\} - E\{\mu^2\})\,\mathrm{.}$$

Using <sup>2</sup> { } *E xk* = <sup>2</sup> *<sup>w</sup>* + <sup>2</sup> *x* , <sup>2</sup> { { }} *EExk* = <sup>2</sup> { } *E <sup>w</sup>* + <sup>2</sup> *E*{ } , <sup>2</sup> { { }} *EExk* = <sup>2</sup> *E*{ } = <sup>2</sup> and <sup>2</sup> { } *E <sup>w</sup>* = 2 / *<sup>w</sup> N* within (16) yields <sup>2</sup> { } *E <sup>w</sup>* = <sup>2</sup> *<sup>w</sup>* as required. Unless stated otherwise, it is assumed herein that the sample size is sufficiently large so that *N*-1 ≈ (*N* - 1)-1 and (15) may be approximated by (14). A caution about modelling error contributing bias is mentioned below.

*Example 5.* Suppose that the states considered in Example 4 are actually generated by *xk* = *μ* + *wk* + *sk*, where *sk* is an independent input that accounts for the presence of modelling error. In this case, the assumption *xk* ~ ( , 2 2 *w s* ) leads to 2 2 <sup>2</sup> 1 <sup>1</sup> ˆ ˆ ( ) *N w s k k x N* , in which case (14) is no longer an unbiased estimate of <sup>2</sup> *<sup>w</sup>* . +

#### **8.2.4Cramér-Rao Lower Bound**

The Cramér-Rao Lower Bound (CRLB) establishes a limit of precision that can be achieved for any unbiased estimate of a parameter *θ*. It actually defines a lower bound for the variance <sup>2</sup> ˆ of ˆ . As is pointed out in [1], since ˆ is assumed to be unbiased, the variance 2 ˆ equals the parameter error variance. Determining lower bounds for parameter error

<sup>&</sup>quot;Everyone hears only what he understands." *Johann Wolfgang von Goethe*

which is unbiased because *E*{ }

Fisher information is

withincreasing sample size.

*M M* Fisher information matrix

of Fisher information matrix inverse

 (0, <sup>2</sup> 

and therefore

*wk*, with *wk* ~

Therefore,

4 it is found that

2 2

2 2

 

*w k*

*f x N*

ˆ =

/

 

<sup>2</sup> log ( , | ) *w k f x* 

2

2 2 <sup>ˆ</sup> / 

variance of the MLE (19). It is also apparent from (20) that the error variance of

( ) () () () 1/ / 1 ( ) ( ) ( )

*u u u u k k k k kk u u u k k k k k x AKC K x x I LC L z* 

> 2 1 <sup>ˆ</sup> ( ) *<sup>i</sup>*

Formal vector CRLB theorems and accompanying proofs are detailed in [2] – [5].

ˆ ( ) ˆ

ˆ ( )

 *Fii*

*<sup>w</sup>* ). Recall from Example 6 that

1

22 23 2

*w w k*

1

*N*

 

*N w k k x*

*N N f x*

*Example 7.* Consider the problem of estimating both *μ* and <sup>2</sup>

= <sup>2</sup>

2

*w k*

*f x N*

2

 

log ( , | ) () () ( ) () 2

*w k*

West could twitter from the grave." *@DitaVonTeese*

1 1 *<sup>N</sup>*

2 log ( | ) ( ) { } *<sup>k</sup>*

*f x F E EN N* 

*k k E x N*

The above inequality suggests that a minimum of one sample is sufficient to bound the

The CRLB is extended for estimating a vector of parameters *θ*1, *θ*2, …, *θM* by defining the

for *i*, *j* = 1 … *M*. The parameter error variances are then bounded by the diagonal elements

2 2 2 2

 

*w w k*

.

2 1 2 2 2

, which implies

<sup>1</sup> log ( , | ) log 2 log ( ) 2 22

( )

and

log ( , | ) <sup>1</sup> () () ( ) 2 2

*w k*

*x*

"Laying in bed this morning contemplating how amazing it would be if somehow Oscar Wilde and Mae

*w k w wk*

<sup>=</sup><sup>1</sup>

*E N*

1 *<sup>N</sup> k*

2 2

 

*<sup>w</sup> N* . (20)

. (22)

1

 *x* 

 

 = <sup>2</sup> *Nw*

. In Example

*k*

2 2 2 log ( , | ) *w k f x* 

1

*N*

*x*

*N*

*w w*

= *μ*. From Theorem 1, the

, (21)

*<sup>w</sup>* from *N* samples of *xk* = *μ* +

ˆ decreases

variances is useful for model selection. Another way of selecting models involves comparing residual error variances [23]. A lucid introduction to Gaussian CRLBs is presented in [2]. An extensive survey that refers to the pioneering contributions of Fisher, Cramér and Rao appears in [4].

The bounds on the parameter variances are found from the inverse of the so-called Fisher information. A formal definition of the CRLB for scalar parameters is as follows.

**Theorem 1 (Cramér-Rao Lower Bound) [2] - [5]:** Assume that (| )*<sup>k</sup> f x* satisfies the following regularity conditions:

$$\text{(i)}\qquad\frac{\partial \log f(\theta \mid \mathbf{x}\_{\boldsymbol{k}})}{\partial \theta} \text{ and } \frac{\partial^2 \log f(\theta \mid \mathbf{x}\_{\boldsymbol{k}})}{\partial \theta^2} \text{ exist for all } \theta \text{, and }$$

$$\text{(iii)}\qquad E\left\{\frac{\partial \log f(\theta \mid \mathbf{x}\_k)}{\partial \theta}\right\} = 0, \text{ for all } \theta.$$

*Define the Fisher information by* 

$$F(\theta) = -E\left\{\frac{\hat{\sigma}^2 \log f(\theta \mid \mathbf{x}\_k)}{\hat{\sigma}\theta^2}\right\},\tag{17}$$

*where the derivative is evaluated at the actual value of θ. Then the variance* <sup>2</sup> ˆ  *of an unbiased estimate* ˆ  *satisfies* 

$$
\sigma\_{\vec{\theta}}^2 \ge F^{-1}(\theta) \,. \tag{18}
$$

Proofs for the above theorem appear in [2] – [5].

*Example 6.* Suppose that samples of *xk* = *μ* + *wk* are available, where *wk* is a zero-mean Gaussian white input sequence and *μ* is unknown. Since *wk* ~ (0, <sup>2</sup> *<sup>w</sup>* ),

$$\log f(\boldsymbol{\mu} \mid \mathbf{x}\_k) = -\frac{N}{2}\log 2\pi - \frac{N}{2}\log \sigma\_w^2 - \frac{1}{2}\sigma\_w^{-2}\sum\_{k=1}^N (\mathbf{x}\_k - \boldsymbol{\mu})^2$$

and

$$\frac{\partial \log f(\mu \mid \mathbf{x}\_k)}{\partial \mu} = \sigma\_w^{-2} \sum\_{k=1}^N (\mathbf{x}\_k - \mu) \,.$$

Setting log ( | ) <sup>0</sup> *<sup>k</sup> f x* yields the MLE

$$
\hat{\mu} = \frac{1}{N} \sum\_{k=1}^{N} \mathbb{1}\_{k} \,\prime \,\prime \,\tag{19}
$$

<sup>&</sup>quot;Wall street people learn nothing and forget everything." *Benjamin Graham*

Smoothing, Filtering and Prediction -

, (17)

. (18)

2 2 2 1

, (19)

 *x* 

*k*

*N*

 (0, <sup>2</sup> *<sup>w</sup>* ),

> 

ˆ 

 *of an unbiased* 

*x* satisfies the

<sup>180</sup> Estimating the Past, Present and Future

variances is useful for model selection. Another way of selecting models involves comparing residual error variances [23]. A lucid introduction to Gaussian CRLBs is presented in [2]. An extensive survey that refers to the pioneering contributions of Fisher, Cramér and Rao

The bounds on the parameter variances are found from the inverse of the so-called Fisher

 *exist for all θ, and* 

2

*Example 6.* Suppose that samples of *xk* = *μ* + *wk* are available, where *wk* is a zero-mean

<sup>1</sup> log ( | ) log 2 log ( ) 2 22

*k*

1

*k k x*

<sup>1</sup> <sup>ˆ</sup> *<sup>N</sup>*

"Wall street people learn nothing and forget everything." *Benjamin Graham*

*N* 

*k w wk*

 

2 1

 

*N*

*w k k*

log ( | ) ( )

.

*f x <sup>x</sup>*

information. A formal definition of the CRLB for scalar parameters is as follows.

**Theorem 1 (Cramér-Rao Lower Bound) [2] - [5]:** Assume that (| )*<sup>k</sup> f*

2

log ( | ) ( ) *<sup>k</sup> f x F E*

*where the derivative is evaluated at the actual value of θ. Then the variance* <sup>2</sup>

2 1 <sup>ˆ</sup> *F* ( ) 

Gaussian white input sequence and *μ* is unknown. Since *wk* ~

*N N f x*

2 log ( | )*<sup>k</sup> f x* 

appears in [4].

following regularity conditions:

*(ii)* log ( | ) 0, *<sup>k</sup> f x <sup>E</sup>*

*Define the Fisher information by* 

 *satisfies* 

Setting log ( | ) <sup>0</sup> *<sup>k</sup> f x* 

 *x* 

 *and* 

 *for all θ.* 

Proofs for the above theorem appear in [2] – [5].

yields the MLE

nothing

2

*(i)* log ( | )*<sup>k</sup> <sup>f</sup>*

*estimate* ˆ 

and

which is unbiased because *E*{ }ˆ = 1 1 *<sup>N</sup> k k E x N* <sup>=</sup><sup>1</sup> 1 *<sup>N</sup> k E N* = *μ*. From Theorem 1, the Fisher information is

$$F(\mu) = E\left\{-\frac{\left.\hat{\sigma}^2 \log f(\mu \mid \mathbf{x}\_k)\right|}{\left.\hat{\sigma}\mu^2\right.}\right\} = E\{N\sigma\_w^{-2}\} = N\sigma\_w^{-2}$$

and therefore

$$
\sigma\_{\hat{\mu}}^2 \ge \sigma\_w^2 / N \,. \tag{20}
$$

The above inequality suggests that a minimum of one sample is sufficient to bound the variance of the MLE (19). It is also apparent from (20) that the error variance of ˆ decreases withincreasing sample size.

The CRLB is extended for estimating a vector of parameters *θ*1, *θ*2, …, *θM* by defining the *M M* Fisher information matrix

$$
\begin{bmatrix}
\hat{\mathbf{x}}\_{k+1/k}^{(u)} \\
\hat{\mathbf{x}}\_{k/k}^{(u)}
\end{bmatrix} = \begin{bmatrix}
(A - K\_k^{(u)} \mathbf{C}) & K\_k^{(u)} \\
(I - L\_k^{(u)} \mathbf{C}) & L\_k^{(u)}
\end{bmatrix} \begin{bmatrix}
\hat{\mathbf{x}}\_{k/k-1}^{(u)} \\
\mathbf{z}\_k
\end{bmatrix}' \tag{21}
$$

for *i*, *j* = 1 … *M*. The parameter error variances are then bounded by the diagonal elements of Fisher information matrix inverse

$$
\sigma\_{\hat{\boldsymbol{\theta}}}^2 \ge \mathcal{F}\_{\boldsymbol{\mu}}^{-1}(\boldsymbol{\theta}) \,. \tag{22}
$$

Formal vector CRLB theorems and accompanying proofs are detailed in [2] – [5].

*Example 7.* Consider the problem of estimating both *μ* and <sup>2</sup> *<sup>w</sup>* from *N* samples of *xk* = *μ* + *wk*, with *wk* ~ (0, <sup>2</sup> *<sup>w</sup>* ). Recall from Example 6 that

$$\log f(\boldsymbol{\mu}, \sigma\_w^2 \mid \boldsymbol{x}\_k) = -\frac{N}{2}\log 2\pi - \frac{N}{2}\log \sigma\_w^2 - \frac{1}{2}\sigma\_w^{-2}\sum\_{k=1}^N (\boldsymbol{x}\_k - \boldsymbol{\mu})^2 \dots$$

Therefore, <sup>2</sup> log ( , | ) *w k f x* = <sup>2</sup> 1 ( ) *N w k k x* and 2 2 2 log ( , | ) *w k f x* = <sup>2</sup> *N w* . In Example

4 it is found that 2 2 1 2 2 2 2 1 log ( , | ) <sup>1</sup> () () ( ) 2 2 *N w k w w k w k f x N x* , which implies

$$\frac{\partial^2 \log f(\mu, \sigma\_w^2 \mid \boldsymbol{x}\_k)}{\partial(\sigma\_w^2)^2} = \frac{N}{2} (\sigma\_w^2)^{-2} - (\sigma\_w^2)^{-3} \sum\_{k=1}^N (\boldsymbol{x}\_k - \mu)^2$$

<sup>&</sup>quot;Laying in bed this morning contemplating how amazing it would be if somehow Oscar Wilde and Mae West could twitter from the grave." *@DitaVonTeese*

**8.3 Filtering EM Algorithms** 

monotonically nonincreasing.

developments below.

**8.3.2.1 EM Algorithm** 

system are modelled by

likelihood, the expectation of 1 2 log ( , ,..., | ) *<sup>M</sup> <sup>k</sup> f*

**8.3.2 Measurement Noise Variance Estimation** 

presented here to establish the monotonicity of variance iterations.

"I'm no model lady. A model is just an imitation of the real thing." *Mary* (*Mae) Jane West*

The EM algorithm [3], [7], [11], [15] – [17], [19] – [22] is a general purpose technique for solving joint state and parameter estimation problems. In maximum-likelihood estimation, it is desired to estimate parameters *θ1*, *θ2*, …, *θM*, given states by maximising the log-likelihood

> 

maximised instead. This basic technique was in use prior to Dempster, Laird and Rubin naming it the EM algorithm 1977 [11]. They published a general formulation of the algorithm, which consists of iterating an expectation step and a maximization step. Their expectation step involves least squares calculations on the incomplete observations using the current parameter iterations to estimate the underlying states. In the maximization step, the unknown parameters are re-estimated by maximising a joint log likelihood function using state estimates from the previous expectation step. This sequence is repeated for either a finite number of iterations or until the estimates and the log likelihood function are stable. Dempster, Laird and Rubin [11] also established parameter map conditions for the convergence of the algorithm, namely that the incomplete data log likelihood function is

 Wu [16] subsequently noted an equivalence between the conditions for a map to be closed and the continuity of a function. In particular, if the likelihood function satisfies certain modality, continuity and differentiability conditions, the parameter sequence converges to some stationary value. A detailed analysis of Wu's convergence results appears in [3]. Shumway and Stoffer [15] introduced a framework that is employed herein, namely, the use of a Kalman filter within the expectation step to recover the states. Feder and Weinstein [17] showed how a multiparameter estimation problem can be decoupled into separate maximum likelihood estimations within an EM algorithm. Some results on the convergence of EM algorithms for variance and state matrix estimation [19] – [20] are included within the

The problem of estimating parameters from incomplete information has been previously studied in [11] – [16]. It is noted in [11] that the likelihood functions for variance estimation do not exist in explicit closed form. This precludes straightforward calculation of the Hessians required in [3] to assert convergence. Therefore, an alternative analysis is

The expectation step described below employs the approach introduced in [15] and involves the use of a Kalman filter to obtain state estimates. The maximization step requires the calculation of decoupled MLEs similarly to [17]. Measurements of a linear time-invariant

*x* . When complete state information is not available to calculate the log-

*x* , given incomplete measurements, *zk*, is

**8.3.1Background** 

log *f*(*θ1*, *θ2*, …, *<sup>M</sup>* | )*<sup>k</sup>* 

$$=\frac{N}{2}(\sigma\_w^2)^{-2} - N(\sigma\_w^2)^{-2}$$

$$=-\frac{N}{2}\sigma\_w^{-4},$$

$$\frac{\hat{\sigma}^2 \log f(\mu, \sigma\_w^2 \mid \mathbf{x}\_k)}{\hat{\sigma}\mu \hat{\sigma}\sigma\_w^2} = -(\sigma\_w^2)^{-2}\sum\_{k=1}^N (\mathbf{x}\_k - \mu) \text{ and } E\left\{\frac{\hat{\sigma}^2 \log f(\mu, \sigma\_w^2 \mid \mathbf{x}\_k)}{\hat{\sigma}\mu \hat{\sigma}\sigma\_w^2}\right\} = 0.$$

The Fisher information matrix and its inverse are then obtained from (21) as

$$F(\boldsymbol{\mu}, \boldsymbol{\sigma}\_{\boldsymbol{w}}^{2}) = \begin{bmatrix} N\boldsymbol{\sigma}\_{\boldsymbol{w}}^{-2} & \mathbf{0} \\ \mathbf{0} & \mathbf{0}.5N\boldsymbol{\sigma}\_{\boldsymbol{w}}^{-4} \end{bmatrix} \prime \ F^{-1}(\boldsymbol{\mu}, \boldsymbol{\sigma}\_{\boldsymbol{w}}^{2}) = \begin{bmatrix} \boldsymbol{\sigma}\_{\boldsymbol{w}}^{2}/N & \mathbf{0} \\ \mathbf{0} & 2\boldsymbol{\sigma}\_{\boldsymbol{w}}^{4}/N \end{bmatrix} \cdot \boldsymbol{\varepsilon}$$

It is found from (22) that the lower bounds for the MLE variances are 2 2 <sup>ˆ</sup> / *<sup>w</sup> N* and 2 2 4 <sup>ˆ</sup> 2 / *<sup>w</sup> <sup>w</sup> N* . The impact of modelling error on parameter estimation accuracy is examined below.

*Example 8.* Consider the problem of estimating <sup>2</sup> *<sup>w</sup>* given samples of states which are generated by *xk* = *μ* + *wk* + *sk*, where *sk* is an independent sequence that accounts for the presence of modelling error. From the assumption *xk* ~ ( , 2 2 *w s* ), the associated log likelihood function is

$$\frac{\partial \log f(\sigma\_w^2 \mid \mathbf{x}\_k)}{\partial \sigma\_w^2} = -\frac{N}{2} (\sigma\_w^2 + \sigma\_s^2)^{-1} + \frac{1}{2} (\sigma\_w^2 + \sigma\_s^2)^{-2} \sum\_{k=1}^N (\mathbf{x}\_k - \mu)^2 \tag{2}$$

which leads to 2 2 2 2 log ( | ) ( ) *w k w f x* <sup>=</sup>2 22 ( ) <sup>2</sup> *w s N* , that is, <sup>2</sup> 2 ˆ *w* ≥ 2 22 2( ) / *w s N* . Thus, parameter estimation accuracy degrades as the variance of the modelling error increases. If 2 *<sup>s</sup>* is available *a priori* then setting 2 2 log ( | ) *w k w f x* = 0 leads to the improved estimate

$$
\hat{\sigma}\_{\omega}^{2} = -\sigma\_{s}^{2} + \frac{1}{N} \sum\_{k=1}^{K} (\boldsymbol{\pi}\_{k} - \boldsymbol{\mu})^{2} \dots
$$

<sup>&</sup>quot;There are only two kinds of people who are really fascinating; people who know absolutely everything, and people who know absolutely nothing." *Oscar Fingal O'Flahertie Wills Wilde*

Smoothing, Filtering and Prediction -

= 0.

<sup>ˆ</sup> / 

), the associated log

*w s N* . Thus,

*<sup>w</sup> N* and

<sup>182</sup> Estimating the Past, Present and Future

2 *<sup>w</sup> N* ,

1 2

. The impact of modelling error on parameter estimation accuracy is examined

generated by *xk* = *μ* + *wk* + *sk*, where *sk* is an independent sequence that accounts for the

log ( | ) <sup>1</sup> ( ) ( )( ) <sup>2</sup> <sup>2</sup>

parameter estimation accuracy degrades as the variance of the modelling error increases. If

 *<sup>s</sup>* + <sup>2</sup> 1 <sup>1</sup> ( ) *K*

*N*

"There are only two kinds of people who are really fascinating; people who know absolutely

everything, and people who know absolutely nothing." *Oscar Fingal O'Flahertie Wills Wilde*

*k k x*

.

2 2 log ( | ) *w k w f x* 

,

*w k*

<sup>=</sup>2 22 ( ) <sup>2</sup> *w s N* , that is, <sup>2</sup>

2 ˆ *<sup>w</sup>* = <sup>2</sup> 

*w*

*<sup>N</sup> F u*

 

2 2

 

*f x <sup>E</sup>* 

2

/ 0 (, ) 0 2/ *w*

 ( , 

2 21 2 22 2

*w s w s k*

 

2 log ( , | ) *w k w*

4

*N*

*<sup>w</sup>* given samples of states which are

 ≥ 2 22 2( ) / 

.

*w*

 2 2 *w s* 

1

2 ˆ *w* 

= 0 leads to the improved estimate

*x*

*N*

*N*

*N* 

1 () ( ) *N w k k*

4

,

*w*

It is found from (22) that the lower bounds for the MLE variances are 2 2

 and

2 2 2 2 () () <sup>2</sup> *<sup>w</sup> <sup>w</sup>*

= 2 2

2

<sup>0</sup> (, ) 0 0.5 *w*

*Example 8.* Consider the problem of estimating <sup>2</sup>

presence of modelling error. From the assumption *xk* ~

*w k*

*f x N*

2

2

2 2 2 2 log ( | ) ( ) *w k w f x* 

*<sup>s</sup>* is available *a priori* then setting

*x*

The Fisher information matrix and its inverse are then obtained from (21) as

*N*

<sup>4</sup>

2 log ( , | ) *w k w f x* 

2

*<sup>N</sup> F u*

*w*

2 2

2 2 4 <sup>ˆ</sup> 2 / *<sup>w</sup> <sup>w</sup> N*

below.

 

likelihood function is

which leads to

2 

 

 

#### **8.3 Filtering EM Algorithms**

#### **8.3.1Background**

The EM algorithm [3], [7], [11], [15] – [17], [19] – [22] is a general purpose technique for solving joint state and parameter estimation problems. In maximum-likelihood estimation, it is desired to estimate parameters *θ1*, *θ2*, …, *θM*, given states by maximising the log-likelihood log *f*(*θ1*, *θ2*, …, *<sup>M</sup>* | )*<sup>k</sup> x* . When complete state information is not available to calculate the loglikelihood, the expectation of 1 2 log ( , ,..., | ) *<sup>M</sup> <sup>k</sup> f x* , given incomplete measurements, *zk*, is maximised instead. This basic technique was in use prior to Dempster, Laird and Rubin naming it the EM algorithm 1977 [11]. They published a general formulation of the algorithm, which consists of iterating an expectation step and a maximization step. Their expectation step involves least squares calculations on the incomplete observations using the current parameter iterations to estimate the underlying states. In the maximization step, the unknown parameters are re-estimated by maximising a joint log likelihood function using state estimates from the previous expectation step. This sequence is repeated for either a finite number of iterations or until the estimates and the log likelihood function are stable. Dempster, Laird and Rubin [11] also established parameter map conditions for the convergence of the algorithm, namely that the incomplete data log likelihood function is monotonically nonincreasing.

 Wu [16] subsequently noted an equivalence between the conditions for a map to be closed and the continuity of a function. In particular, if the likelihood function satisfies certain modality, continuity and differentiability conditions, the parameter sequence converges to some stationary value. A detailed analysis of Wu's convergence results appears in [3]. Shumway and Stoffer [15] introduced a framework that is employed herein, namely, the use of a Kalman filter within the expectation step to recover the states. Feder and Weinstein [17] showed how a multiparameter estimation problem can be decoupled into separate maximum likelihood estimations within an EM algorithm. Some results on the convergence of EM algorithms for variance and state matrix estimation [19] – [20] are included within the developments below.

#### **8.3.2 Measurement Noise Variance Estimation**

#### **8.3.2.1 EM Algorithm**

The problem of estimating parameters from incomplete information has been previously studied in [11] – [16]. It is noted in [11] that the likelihood functions for variance estimation do not exist in explicit closed form. This precludes straightforward calculation of the Hessians required in [3] to assert convergence. Therefore, an alternative analysis is presented here to establish the monotonicity of variance iterations.

The expectation step described below employs the approach introduced in [15] and involves the use of a Kalman filter to obtain state estimates. The maximization step requires the calculation of decoupled MLEs similarly to [17]. Measurements of a linear time-invariant system are modelled by

<sup>&</sup>quot;I'm no model lady. A model is just an imitation of the real thing." *Mary* (*Mae) Jane West*

estimates ( )

Step 2. For *i* = 1, …, *p*, use ( )

( 1) 2 2, ( ) ˆ *<sup>u</sup> v*

**8.3.2.2 Properties** 

written as

from ( ) / ˆ *u k k x* yields

where ( ) / *u*

where ( )

*Onassis*

/ 1

*<sup>u</sup> k k* = () ()

obtained from

*k k x* = *xk* − ( )

/ ˆ *u*

*k k x* and ( )

/ 1 *u*

() () ( ) 1 ()

/ ˆ *u k k y* .

 , …, ( 1) 2 , ( ) ˆ *<sup>u</sup> p v* ).

/ ˆ *u*

Step 1. Operate the Kalman filter (29) – (30) designed with ˆ ( ) *<sup>u</sup> R* to obtain corrected output

The above EM algorithm involves a repetition of two steps: the states are deduced using the current variance estimates and then the variances are re-identified from the latest states. Consequently, a two-part argument is employed to establish the monotonicity of the variance sequence. For the expectation step, it is shown that monotonically non-increasing variance iterates lead to monotonically non-increasing error covariances. Then for the maximisation step, it is argued that monotonic error covariances result in a monotonic measurement noise variance sequence. The design Riccati difference equation (28) can be

1/ / 1 ( ) ( ) () *<sup>u</sup> u u u T u uT <sup>u</sup> P A K CP A K C K RK Q S k k k kk <sup>k</sup> k k <sup>k</sup>* , (31)

*k k k kk k k x I L Cx L v* , (32)

, (33)

*k k kk k x Ax Bw* . (34)

*<sup>u</sup> u T <sup>T</sup> k k kk A A BQB* . (35)

*k k x* are the corrected and predicted state errors,

/

*<sup>u</sup> k k* = () ()

/ / { ( )} *u uT Ex x kk kk* and

( ) () () ( ) () () ( )

( ) () () () / / 1 ( ) *<sup>u</sup> uu u*

( ) () () ( ) () () / / 1 ( ) ( ) () *<sup>u</sup> u u u T u uT k k k kk <sup>k</sup> k k I L C I L C L RL*

> ( ) ( ) 1/ / *u u*

Hence, the observed predicted error covariance obeys the recursion ( ) ( ) 1/ /

*k k x* = *xk* − ( )

respectively. The observed corrected error covariance is defined as ( )

where () () () <sup>ˆ</sup> ( ) ( )( ) *u uu u T S K R RK k k <sup>k</sup>* accounts for the presence of parameter error. Subtracting *xk*

/ 1 ˆ *<sup>u</sup>*

/1 /1 / 1 / 1 ( ) *<sup>u</sup> uT uT <sup>u</sup> kk kk CC C R C k k k k*

/1 /1 { ( )} *u uT Ex x kk kk* . The observed predicted state error satisfies

Some observations concerning the above error covariances are described below. These results are used subsequently to establish the monotonicity of the above EM algorithm.

"I want minimum information given with maximum politeness." *Jacqueline (Jackie) Lee Bouvier Kennedy* 

*k k <sup>y</sup>* instead of *yk* within (27) to obtain <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>R</sup>* = diag( ( 1) 2

1, ( ) ˆ *<sup>u</sup> v* ,

$$\mathbf{x}\_{k+1} = A\mathbf{x}\_k + Bw\_{k'} \tag{23}$$

$$\mathbf{y}\_k = \mathbf{C}\mathbf{x}\_k + D\mathbf{w}\_k \tag{24}$$

$$
\omega\_{\pm} = \mathcal{Y}\_{\pm} + \mathcal{v}\_{\pm}.\tag{25}
$$

where *A n n* , *B n m* , *C <sup>p</sup><sup>n</sup>* , *D <sup>p</sup><sup>m</sup>* and *wk*, *vk* are stationary processes with { } *E wk* = 0, { }*<sup>T</sup> Ewwj <sup>k</sup>* = *Q jk* , { } *E vk* = { }*<sup>T</sup> Ewvj <sup>k</sup>* = 0 and { }*<sup>T</sup> Evvj <sup>k</sup>* = *R jk* . To simplify the presentation, it is initially assumed that the direct feed-through matrix, *D*, is zero. A nonzero *D* will be considered later.

Suppose that it is desired to estimate *R* = diag( <sup>2</sup> 1,*<sup>v</sup>* , <sup>2</sup> 2,*<sup>v</sup>* , …, <sup>2</sup> *<sup>p</sup>*,*<sup>v</sup>* ) given *N* samples of *zk* and *yk*. Let *zi,k , yi,k* and *vi,k* denote the *i*th element of the vectors *zk* , *yk* and *vk*, respectively. Then (25) may be written in terms of its *i* components, *zi,k* = *yi,k* + *vi,k*, that is,

$$
\boldsymbol{w}\_{i,k} = \boldsymbol{z}\_{i,k} - \boldsymbol{y}\_{i,k} \,. \tag{26}
$$

From the assumption *vi,k* ~ (0, <sup>2</sup> , ) *i v* , an MLE for the unknown <sup>2</sup> *i v*, is obtained from the sample variance formula

$$
\hat{\sigma}\_{i,v}^2 = \frac{1}{N} \sum\_{k=1}^N (\mathbf{z}\_{i,k} - \mathbf{y}\_{i,k})^2 \,. \tag{27}
$$

An EM algorithm for updating the measurement noise variance estimates is described as follows. Assume that there exists an estimate ˆ ( ) *<sup>u</sup> R* = diag( () 2 1, ( ) ˆ *<sup>u</sup> <sup>v</sup>* , () 2 2, ( ) ˆ *<sup>u</sup> <sup>v</sup>* , …, () 2 , ( ) ˆ *<sup>u</sup> p v* ) of *R* at iteration *u*. A Kalman filter designed with ˆ ( ) *<sup>u</sup> R* may then be employed to produce corrected output estimates ( ) / ˆ *u k k y* . The filter's design Riccati equation is given by

$$P\_{k+1/k}^{(u)} = (A - K\_k^{(u)} \mathbf{C}) P\_{k/k-1}^{(u)} (A - K\_k^{(u)} \mathbf{C})^\top + K\_k^{(u)} \hat{R}^{(u)} (K\_k^{(u)})^\top + BQB^\top,\tag{28}$$

where ( ) *<sup>u</sup> Kk* = ( ) ( ) / 1 / 1 ( *uT uT AP C CP C k k k k* + ˆ() 1 ) *<sup>u</sup> <sup>R</sup>* is the predictor gain. The output estimates are calculated from

$$
\begin{bmatrix}
\hat{\mathbf{x}}\_{k \times 1/k}^{(u)} \\
\hat{\mathbf{x}}\_{k/k}^{(u)}
\end{bmatrix} = \begin{bmatrix}
(A - K\_k^{(u)} \mathbf{C}) & K\_k^{(u)} \\
(I - L\_k^{(u)} \mathbf{C}) & L\_k^{(u)}
\end{bmatrix} \begin{bmatrix}
\hat{\mathbf{x}}\_{k \times 1/k}^{(u)} \\
\mathbf{z}\_k
\end{bmatrix},
\begin{aligned}
\hat{\mathbf{y}}\_{k \times 1}^{(u)} &= \mathbf{C} \hat{\mathbf{x}}\_{k/k}^{(u)}
\end{aligned} \tag{29}
$$

where ( ) *<sup>u</sup> Lk* = ( ) ( ) / 1 / 1 ( *uT uT P C CP C k k k k* + ˆ() 1 ) *<sup>u</sup> <sup>R</sup>* is the filter gain.

*Procedure 1 [19].* Assume that an initial estimate ˆ (1) *R* of *R* is available. Subsequent estimates, ˆ ( ) *<sup>u</sup> R* , *u* > 1, are calculated by repeating the following two-step procedure.

<sup>&</sup>quot;There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know." *Donald Henry Rumsfeld*

Smoothing, Filtering and Prediction -

*<sup>p</sup>*,*<sup>v</sup>* ) given *N* samples of *zk* and

*<sup>v</sup>* , …, () 2

*kk kk <sup>y</sup> Cx* , (29)

*i v*, is obtained from the

, ( ) ˆ *<sup>u</sup>* 

unknowns.

*p v* ) of *R* at

(30)

. To simplify the

(23) (24) (25)

<sup>184</sup> Estimating the Past, Present and Future

where *A n n* , *B n m* , *C <sup>p</sup><sup>n</sup>* , *D <sup>p</sup><sup>m</sup>* and *wk*, *vk* are stationary processes with

presentation, it is initially assumed that the direct feed-through matrix, *D*, is zero. A nonzero

*yk*. Let *zi,k , yi,k* and *vi,k* denote the *i*th element of the vectors *zk* , *yk* and *vk*, respectively. Then

An EM algorithm for updating the measurement noise variance estimates is described as

iteration *u*. A Kalman filter designed with ˆ ( ) *<sup>u</sup> R* may then be employed to produce corrected

*Procedure 1 [19].* Assume that an initial estimate ˆ (1) *R* of *R* is available. Subsequent estimates,

"There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There

*k k y* . The filter's design Riccati equation is given by

1,*<sup>v</sup>* , <sup>2</sup> 

, { } *E vk* = { }*<sup>T</sup> Ewvj <sup>k</sup>* = 0 and { }*<sup>T</sup> Evvj <sup>k</sup>* = *R jk*

*i v* , an MLE for the unknown <sup>2</sup>

<sup>ˆ</sup> ( ) ( ) () *<sup>u</sup> u u u T u u uT <sup>T</sup> P A K C P A K C K R K BQB k k k kk <sup>k</sup> <sup>k</sup> <sup>k</sup>* , (28)

/ 1 / 1 ( *uT uT AP C CP C k k k k* + ˆ() 1 ) *<sup>u</sup> <sup>R</sup>* is the predictor gain. The output estimates are

, () () / / ˆ ˆ *<sup>u</sup> <sup>u</sup>*

 2,*<sup>v</sup>* , …, <sup>2</sup> 

*ik ik ik* ,,, *v z y* . (26)

. (27)

1, ( ) ˆ *<sup>u</sup>* 

 *<sup>v</sup>* , () 2 2, ( ) ˆ *<sup>u</sup>* 

*<sup>k</sup>* <sup>1</sup> *k k x Ax Bw* ,

*kk k y Cx Dw* ,

*k kk z y v* ,

{ } *E wk* = 0, { }*<sup>T</sup> Ewwj <sup>k</sup>* = *Q jk*

*D* will be considered later.

From the assumption *vi,k* ~

sample variance formula

output estimates ( )

calculated from

/ ˆ *u*

/

where ( ) *<sup>u</sup> Kk* = ( ) ( )

where ( ) *<sup>u</sup> Lk* = ( ) ( )

1/ / 1

(25) may be written in terms of its *i* components, *zi,k* = *yi,k* + *vi,k*, that is,

, ) 

2 2 , , , 1 <sup>1</sup> <sup>ˆ</sup> ( ) *N i v ik ik k z y <sup>N</sup>*

( ) () () ( ) () () ()

( ) () () () 1/ 1/ ( ) ( ) ( )

/ 1 / 1 ( *uT uT P C CP C k k k k* + ˆ() 1 ) *<sup>u</sup> <sup>R</sup>* is the filter gain.

ˆ ( ) *<sup>u</sup> R* , *u* > 1, are calculated by repeating the following two-step procedure.

*u u u u k k k k kk u u u k k k k k x AKC K x x I LC L z* 

ˆ ( ) ˆ

ˆ ( )

are things we don't know we don't know." *Donald Henry Rumsfeld*

(0, <sup>2</sup>

follows. Assume that there exists an estimate ˆ ( ) *<sup>u</sup> R* = diag( () 2

Suppose that it is desired to estimate *R* = diag( <sup>2</sup>

Step 1. Operate the Kalman filter (29) – (30) designed with ˆ ( ) *<sup>u</sup> R* to obtain corrected output estimates ( ) / ˆ *u k k y* .

$$\begin{aligned} \text{Step 2.} \quad & \text{For } i = 1, \ldots, p\_\prime \text{ use } \hat{\mathfrak{z}}\_{k/k}^{(u)} \text{ instead of } y\_k \text{ within (27) to obtain } \bar{R}^{(u+l)} = \text{diag}((\hat{\sigma}\_{1,\flat}^{(u+l)})^2, \ldots, (\hat{\sigma}\_{2,\flat}^{(u+l)})^2). \\ & \qquad (\hat{\sigma}\_{2,\flat}^{(u+l)})^2, \ldots, (\hat{\sigma}\_{p,\flat}^{(u+l)})^2). \end{aligned}$$

#### **8.3.2.2 Properties**

The above EM algorithm involves a repetition of two steps: the states are deduced using the current variance estimates and then the variances are re-identified from the latest states. Consequently, a two-part argument is employed to establish the monotonicity of the variance sequence. For the expectation step, it is shown that monotonically non-increasing variance iterates lead to monotonically non-increasing error covariances. Then for the maximisation step, it is argued that monotonic error covariances result in a monotonic measurement noise variance sequence. The design Riccati difference equation (28) can be written as

$$P\_{k+1/k}^{(u)} = (A - K\_k^{(u)} \mathbf{C}) P\_{k/k-1}^{(u)} (A - K\_k^{(u)} \mathbf{C})^T + K\_k^{(u)} R (K\_k^{(u)})^T + Q + S\_k^{(u)},\tag{31}$$

where () () () <sup>ˆ</sup> ( ) ( )( ) *u uu u T S K R RK k k <sup>k</sup>* accounts for the presence of parameter error. Subtracting *xk* from ( ) / ˆ *u k k x* yields

$$\tilde{\mathbf{x}}\_{k/k}^{(u)} = (I - L\_k^{(u)} \mathbf{C}) \tilde{\mathbf{x}}\_{k/k-1}^{(u)} - L\_k^{(u)} \boldsymbol{\upsilon}\_{k/\prime} \tag{32}$$

where ( ) / *u k k x* = *xk* − ( ) / ˆ *u k k x* and ( ) / 1 *u k k x* = *xk* − ( ) / 1 ˆ *<sup>u</sup> k k x* are the corrected and predicted state errors, respectively. The observed corrected error covariance is defined as ( ) / *<sup>u</sup> k k* = () () / / { ( )} *u uT Ex x kk kk* and obtained from

$$\begin{aligned} \boldsymbol{\Sigma}\_{k/k}^{(u)} &= (\boldsymbol{I} - \boldsymbol{L}\_k^{(u)} \boldsymbol{\mathsf{C}}) \boldsymbol{\Sigma}\_{k/k-1}^{(u)} (\boldsymbol{I} - \boldsymbol{L}\_k^{(u)} \boldsymbol{\mathsf{C}})^T + \boldsymbol{L}\_k^{(u)} \boldsymbol{R} (\boldsymbol{L}\_k^{(u)})^T \\ &= \boldsymbol{\Sigma}\_{k/k-1}^{(u)} - \boldsymbol{\Sigma}\_{k/k-1}^{(u)} \boldsymbol{\mathsf{C}}^T (\boldsymbol{\mathsf{C}} \boldsymbol{\Sigma}\_{k/k-1}^{(u)} \boldsymbol{\mathsf{C}}^T + \boldsymbol{R})^{-1} \boldsymbol{\mathsf{C}} \boldsymbol{\Sigma}\_{k/k-1}^{(u)} \end{aligned} \tag{33}$$

where ( ) / 1 *<sup>u</sup> k k* = () () /1 /1 { ( )} *u uT Ex x kk kk* . The observed predicted state error satisfies

$$
\tilde{\mathfrak{X}}\_{k+1/k}^{(u)} = A \tilde{\mathfrak{X}}\_{k/k}^{(u)} + B w\_k \,. \tag{34}
$$

Hence, the observed predicted error covariance obeys the recursion

$$
\Sigma\_{k+1/k}^{(u)} = A\Sigma\_{k/k}^{(u)} A^T + BQB^T. \tag{35}
$$

Some observations concerning the above error covariances are described below. These results are used subsequently to establish the monotonicity of the above EM algorithm.

<sup>&</sup>quot;I want minimum information given with maximum politeness." *Jacqueline (Jackie) Lee Bouvier Kennedy Onassis*

Since A, Q and R are time-invariant, it suffices to show that

*= 1, …, n and the pair (A, C) is observable;* 

*(ii) there exist* ˆ (1) *R ≥ R ≥ 0 and* ( 1)

*Then* ˆ ( 1) *<sup>u</sup> R ≤* ˆ ( ) *<sup>u</sup> R (or* ˆ ( ) *<sup>u</sup> R ≤* ˆ ( 1) *<sup>u</sup> R ) for all u > 1.* 

( ) <sup>2</sup>

( ) <sup>2</sup>

/

*Lemma 4 [19]: Under the conditions of Lemma 3,* 

*and thus* ˆ ( 1) *<sup>u</sup> R =* ( )

MLEs converge to the actual values.

*Therefore,* ˆ( 1) *<sup>u</sup> R ≤* ˆ ( ) *<sup>u</sup> R and* ( 1)

( 1) ( 1) ( 1) () () ( )

*L L I L C L L I LC*

( )( ) ()( )

*u uT u T u uT u T k k k k k k u u k k*

1/

0 0

*Note for an X and Y satisfying I ≥ Y ≥ X ≥ 0 that YYT - XXT ≥ (I – X)(I – X)T – (I – Y)(I – Y)T.* 

*(36) follows. �*  It is established below that monotonic non-increasing error covariances result in a

*(i) the data zk has been generated by (23) – (25) in which A, B, C, Q are known,* ( )

*Proof: Let Ci denote the ith row of C. The approximate MLE within Procedure 1 is written as* 

( 1) 2 () 2 , , / 1 <sup>1</sup> () ( ) <sup>ˆ</sup> <sup>ˆ</sup> *<sup>N</sup> <sup>u</sup> <sup>u</sup> i v ik i k k k*

 

*N*

1

*k*

*N* 

1

0, 0, ˆ lim *<sup>u</sup>*

*QR u*

*z Cx*

<sup>1</sup> ( )

/ , *u T C C i k k i iv*

*u T C C k k + R. Since* <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>R</sup> is affine to* ( )

*u i k k ik*

*Cx v*

*monotonically non-increasing, it follows that* ˆ ( 1) *<sup>u</sup> R ≤* ˆ ( ) *<sup>u</sup> R . �* 

If the estimation problem is dominated by measurement noise, the measurement noise

( 1)

"Getting information off the internet is like taking a drink from a fire hydrant." *Mitchell David Kapor*

*R R* 

/ ,

. (36)

*<sup>i</sup> A < 1, i* 

(37)

(38)

(39)

*<sup>u</sup> Pk k (from Lemma 1) imply* ( 1) *<sup>u</sup> L C ≤* ( ) *<sup>u</sup> L C ≤ I and thus* 

*<sup>u</sup> P ) for all u > 1.* 

/

. (40)

*<sup>u</sup> k k , which from Lemma 2 is* 

( 1) ( )

*IL C I LC*

1/ *<sup>u</sup> Pk k ≤* ( )

monotonic non-increasing measurement noise variance sequence.

*Lemma 3 [19]: In respect of Procedure 1 for estimating R, suppose the following:* 

1/0 *<sup>u</sup> P ≤* ( ) 1/0 *<sup>u</sup> P (or* ( ) 1/0 *<sup>u</sup> P ≤* ( 1) 1/0

*N*

*Lemma 1 [19]: In respect of Procedure 1 for estimating R, suppose the following:* 


*Then:* 


*(iii) R ≤* ˆ ( 1) *<sup>u</sup> R ≤* ˆ ( ) *<sup>u</sup> R implies* ( 1) 1/ *<sup>u</sup> Pk k ≤* ( ) 1/ *<sup>u</sup> Pk k (or* <sup>ˆ</sup> ( ) *<sup>u</sup> <sup>R</sup> <sup>≤</sup>* <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>R</sup> ≤ R implies* ( ) 1/ *<sup>u</sup> Pk k ≤* ( 1) 1/ *<sup>u</sup> Pk k )* 

*for all u ≥ 1.* 

#### *Proof:*

*(i) Condition (i) ensures that the problem is well-posed. Condition (ii) stipulates that* (1) *Sk ≥ 0, which is the initialisation step for an induction argument. For the inductive step, subtracting (33) from (31) yields* ( ) 1/ *<sup>u</sup> Pk k –* ( ) 1/ *<sup>u</sup> k k =* (*A –* () () / 1 )( *u u KCP k kk –* ( ) / 1)( *<sup>u</sup> k k A –*  ( ) ) *u T K Ck +* ( ) *<sup>u</sup> Sk and thus* ( ) / 1 *<sup>u</sup> k k ≤* ( ) / 1 *<sup>u</sup> Pk k implies* ( ) 1/ *<sup>u</sup> k k ≤* ( ) 1/ *<sup>u</sup> Pk k .* 


Thus the sequences of observed prediction and correction error covariances are bounded above by the design prediction and correction error covariances. Next, it is shown that the observed error covariances are monotonically non-increasing (or non-decreasing).

*Lemma 2 [19]: Under the conditions of Lemma 1:* 

*i)* ( 1) 1/ *u k k* ≤ ( ) 1/ *<sup>u</sup> k k* (*or* ( ) 1/ *<sup>u</sup> k k* ≤ ( 1) 1/ *u k k* ) *and*

*ii)* ( 1) / *u k k* ≤ ( ) / *<sup>u</sup> k k* (*or* ( ) / *<sup>u</sup> k k* ≤ ( 1) / *u k k* ).

*Proof: To establish that the solution of (33) is monotonic non-increasing, from Theorem 2 of Chapter 7, it is required to show that* 

$$
\begin{bmatrix}
\mathbf{Q} + \mathbf{K}\_{\boldsymbol{k}}^{(\boldsymbol{u}+\boldsymbol{1})} \mathbf{R} (\mathbf{K}\_{\boldsymbol{k}}^{(\boldsymbol{u}+\boldsymbol{1})})^{\mathrm{T}} & (\boldsymbol{A} - \mathbf{K}\_{\boldsymbol{k}}^{(\boldsymbol{u}+\boldsymbol{1})} \mathbf{C})^{\mathrm{T}} \\
\mathbf{A} - \mathbf{K}\_{\boldsymbol{k}}^{(\boldsymbol{u}+\boldsymbol{1})} \mathbf{C} & \mathbf{0}
\end{bmatrix} \mathsf{S} \begin{bmatrix}
\mathbf{Q} + \mathbf{K}\_{\boldsymbol{k}}^{(\boldsymbol{u})} \mathbf{R} (\mathbf{K}\_{\boldsymbol{k}}^{(\boldsymbol{u})})^{\mathrm{T}} & (\mathbf{A} - \mathbf{K}\_{\boldsymbol{k}}^{(\boldsymbol{u})} \mathbf{C})^{\mathrm{T}} \\
\mathbf{A} - \mathbf{K}\_{\boldsymbol{k}}^{(\boldsymbol{u})} \mathbf{C} & \mathbf{0}
\end{bmatrix}.
$$

<sup>&</sup>quot;Technology is so much fun but we can drown in our technology. The fog of information can drive out knowledge." *Daniel Joseph Boostin*

Smoothing, Filtering and Prediction -

1/ *<sup>u</sup> Pk k ≤* ( 1)

/ 1 )( *u u KCP k kk –* ( )

*T T T m T m*

*<sup>i</sup> A < 1, i* 

1/ *<sup>u</sup> Pk k )* 

/ 1)( *<sup>u</sup> k k A –* 

1/ *<sup>u</sup> Pk k ≤*

*,* 

186 Estimating the Past, Present and Future

*(i) the data zk has been generated by (23) – (25) in which A, B, C, Q are known,* ( )

*(ii) there exist* (2) *<sup>P</sup>*1/0 *<sup>≤</sup>* (1) *<sup>P</sup>*1/0 *and R ≤* <sup>ˆ</sup> (2) *<sup>R</sup> <sup>≤</sup>*<sup>ˆ</sup> (1) *R (or* (1) *<sup>P</sup>*1/0 *<sup>≤</sup>* (2) *<sup>P</sup>*1/0 *and* <sup>ˆ</sup> (1) *<sup>R</sup> <sup>≤</sup>*<sup>ˆ</sup> (2) *<sup>R</sup> ≤ R).* 

1/

*(i) Condition (i) ensures that the problem is well-posed. Condition (ii) stipulates that* (1) *Sk ≥ 0,* 

1/ *<sup>u</sup> Pk k –* ( )

/ 1

*(iii) The condition* <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>R</sup> <sup>≤</sup>* <sup>ˆ</sup> ( ) *<sup>u</sup> R ensures that* ˆ( 1) 1 ˆ() 1 ( ) ( )

Thus the sequences of observed prediction and correction error covariances are bounded above by the design prediction and correction error covariances. Next, it is shown that the

*Proof: To establish that the solution of (33) is monotonic non-increasing, from Theorem 2 of Chapter* 

≤

"Technology is so much fun but we can drown in our technology. The fog of information can drive out

1/0

observed error covariances are monotonically non-increasing (or non-decreasing).

1/ *u k k* ) *and*

0

*which is the initialisation step for an induction argument. For the inductive step,* 

*<sup>u</sup> Pk k implies* ( )

*<sup>u</sup> Pk k . �* 

1/

*<sup>u</sup> Pk k (or* <sup>ˆ</sup> ( ) *<sup>u</sup> <sup>R</sup> <sup>≤</sup>* <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>R</sup> ≤ R implies* ( )

*<sup>u</sup> k k =* (*A –* () ()

*<sup>u</sup> P within Theorem 2 of Chapter 7 results in* ( 1)

() () ( )

*Q K RK A K C*

( )( )

*u uT u T k k k*

0

.

( )

*AKC*

*u k*

1/ *<sup>u</sup> Pk k .* 

*Q A QA*

*A CR C A CR C* 

1/ *<sup>u</sup> k k ≤* ( )

*Lemma 1 [19]: In respect of Procedure 1 for estimating R, suppose the following:* 

1/ *<sup>u</sup> Pk k ≤* ( )

/ 1 *<sup>u</sup> k k ≤* ( )

*(ii) The result is immediate by considering A = I within the proof for (i).* 

1/0 *<sup>u</sup> P ≤* ( )

1/ *<sup>u</sup> k k* ≤ ( 1)

/ *<sup>u</sup> k k* ≤ ( 1) / *u k k* ).

( 1) ( 1) ( 1)

*Q K RK A K C*

*u uT u T k k k*

( )( )

*= 1, …, n, and the pair (A, C) is observable;* 

*Then:* 

*(i)* ( ) 1/ *<sup>u</sup> k k ≤* ( )

*(ii)* ( ) / *<sup>u</sup> k k* ≤ ( ) / *<sup>u</sup> Pk k* ;

*for all u ≥ 1.* 

*Proof:* 

1/ *<sup>u</sup> Pk k ;* 

*(iii) R ≤* ˆ ( 1) *<sup>u</sup> R ≤* ˆ ( ) *<sup>u</sup> R implies* ( 1)

*subtracting (33) from (31) yields* ( )

( ) ) *u T K Ck +* ( ) *<sup>u</sup> Sk and thus* ( )

*which together with* ( 1)

*Lemma 2 [19]: Under the conditions of Lemma 1:* 

1/ *<sup>u</sup> k k* (*or* ( )

/ *<sup>u</sup> k k* (*or* ( )

( 1)

*u k*

*AK C*

( ) 1/

1/ *u k k* ≤ ( )

/ *u k k* ≤ ( )

*7, it is required to show that* 

knowledge." *Daniel Joseph Boostin*

*i)* ( 1)

*ii)* ( 1)

Since A, Q and R are time-invariant, it suffices to show that

$$
\begin{bmatrix}
\boldsymbol{L}\_{k}^{(u+1)} (\boldsymbol{L}\_{k}^{(u+1)})^T & (\boldsymbol{I} - \boldsymbol{L}\_{k}^{(u+1)} \mathbf{C})^T \\
\boldsymbol{I} - \boldsymbol{L}\_{k}^{(u+1)} \mathbf{C} & \mathbf{0}
\end{bmatrix} \leq \begin{bmatrix}
\boldsymbol{L}\_{k}^{(u)} (\boldsymbol{L}\_{k}^{(u)})^T & (\boldsymbol{I} - \boldsymbol{L}\_{k}^{(u)} \mathbf{C})^T \\
\boldsymbol{I} - \boldsymbol{L}\_{k}^{(u)} \mathbf{C} & \mathbf{0}
\end{bmatrix}. \tag{36}
$$

*Note for an X and Y satisfying I ≥ Y ≥ X ≥ 0 that YYT - XXT ≥ (I – X)(I – X)T – (I – Y)(I – Y)T. Therefore,* ˆ( 1) *<sup>u</sup> R ≤* ˆ ( ) *<sup>u</sup> R and* ( 1) 1/ *<sup>u</sup> Pk k ≤* ( ) 1/ *<sup>u</sup> Pk k (from Lemma 1) imply* ( 1) *<sup>u</sup> L C ≤* ( ) *<sup>u</sup> L C ≤ I and thus (36) follows. �* 

It is established below that monotonic non-increasing error covariances result in a monotonic non-increasing measurement noise variance sequence.

*Lemma 3 [19]: In respect of Procedure 1 for estimating R, suppose the following:* 


*Then* ˆ ( 1) *<sup>u</sup> R ≤* ˆ ( ) *<sup>u</sup> R (or* ˆ ( ) *<sup>u</sup> R ≤* ˆ ( 1) *<sup>u</sup> R ) for all u > 1.* 

*Proof: Let Ci denote the ith row of C. The approximate MLE within Procedure 1 is written as* 

$$\left(\hat{\sigma}\_{i,v}^{(u+1)}\right)^2 = \frac{1}{N} \sum\_{k=1}^N (z\_{i,k} - \mathbb{C}\_i \hat{\mathbf{x}}\_{k/k}^{(u)})^2 \tag{37}$$

$$=\frac{1}{N}\sum\_{k=1}^{N}(\mathbf{C}\_{i}\tilde{\mathbf{x}}\_{k\,/k}^{(u)}+\mathbf{w}\_{i,k})^{2}\tag{38}$$

$$=\mathbf{C}\_{l}\Sigma\_{k/l}^{(u)}\mathbf{C}\_{l}^{T}+\sigma\_{l,v}^{2} \tag{39}$$

*and thus* ˆ ( 1) *<sup>u</sup> R =* ( ) / *u T C C k k + R. Since* <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>R</sup> is affine to* ( ) / *<sup>u</sup> k k , which from Lemma 2 is monotonically non-increasing, it follows that* ˆ ( 1) *<sup>u</sup> R ≤* ˆ ( ) *<sup>u</sup> R . �* 

If the estimation problem is dominated by measurement noise, the measurement noise MLEs converge to the actual values.

*Lemma 4 [19]: Under the conditions of Lemma 3,* 

$$\lim\_{Q \to 0, \boldsymbol{R}^{-1} \to 0, \boldsymbol{\mu} \to \boldsymbol{\nu}} \hat{\boldsymbol{R}}^{(\boldsymbol{\mu} + 1)} = \boldsymbol{R} \; . \tag{40}$$

<sup>&</sup>quot;Getting information off the internet is like taking a drink from a fire hydrant." *Mitchell David Kapor*

that is

(0, <sup>2</sup>

*George David Lundberg*

**8.3.3 Process Noise Variance Estimation** 

calculated from the sample variance formula

2

, 1 , 1

Substituting *wk* = *Axk* – *xk*+1 into (45) and noting that <sup>2</sup>

1

over *k* [1, *N*] to obtain corrected state estimates ( )

*k k x* and ( )

/ ˆ *u*

 , ( 1) 2 2, ( ) ˆ *<sup>u</sup> w*

*k*

can be found by repeating the following two-step algorithm.

, , , 1 1 *<sup>N</sup> <sup>T</sup> i w ik ik k*

 

1

1

*k*

*N* 

<sup>1</sup> <sup>ˆ</sup> ( )( ) *<sup>N</sup> <sup>T</sup> kk kk*

*Q Ax x Ax x <sup>N</sup>*

*k*

*N*

In respect of the model (23), suppose that it is desired to estimate *Q* given *N* samples of *xk*+1. The vector states within (23) can be written in terms of its *i* components, *ik i k ik* , 1 , *x Ax w* ,

where *wi,k* = *Biwk*, *Ai* and *Bi* refer the *i*th row of *A* and *B*, respectively. Assume that *wi,k* ~

*w w*

1 *<sup>N</sup> T T ik ki*

1 1 *<sup>N</sup> T T i kk i k B ww B N* .

1 1

*Procedure 2 [19].* Assume that an initial estimate ˆ (1) *Q* of *Q* is available. Subsequent estimates

Step 1. Operate the filter recursions (29) designed with ˆ ( ) *<sup>u</sup> Q* on the measurements (25)

1/ 1 ˆ *<sup>u</sup>*

 , …, ( 1) 2 , ( ) ˆ *<sup>u</sup> n w* ).

"Information on the Internet is subject to the same rules and regulations as conversations at a bar."

*Bw w B*

*i w*, is to be estimated. An MLE for the scalar <sup>2</sup>

<sup>1</sup> ( )( ) *<sup>N</sup> <sup>T</sup> ik i k ik i k*

*i w*, = *<sup>T</sup> B QB i i* yields

, (46)

/ ˆ *u*

*k k x* and ( )

*k k x* instead of *xk* and *xk+1* within (46) to obtain

1/ 1 ˆ *<sup>u</sup> k k x* .

*N*

*x Ax x Ax*

*w Ax x ik i k ik* , , 1 , (41)

*i w*, = *<sup>T</sup> B QB i i* can be

(42)

(43)

(44)

(45)

**8.3.3.1 EM Algorithm** 

, ) 

*i w* , where <sup>2</sup>

which can be updated as follows.

Step 2. For *i* = 1, …, *n*, use ( )

ˆ ( 1) *<sup>u</sup> Q* = diag( ( 1) 2

1, ( ) ˆ *<sup>u</sup> w*

*Proof: By inspection of* ( ) *<sup>u</sup> Lk =* ( ) ( ) / 1 / 1 ( *uT uT P C CP C k k k k +* () 1 ) *<sup>u</sup> <sup>R</sup> , it follows that* <sup>1</sup> ( ) 0, 0, lim *<sup>u</sup> <sup>k</sup> QR u <sup>L</sup> = 0. Therefore,* <sup>1</sup> ( ) / 0, 0, lim ˆ *<sup>u</sup> k k QR u <sup>x</sup> = 0 and* <sup>1</sup> 0, 0 lim *<sup>k</sup> Q R <sup>z</sup> = vk , which implies (40), since the MLE (37) is unbiased for large N. �* 

**Example 9.** In respect of the problem (23) – (25), assume *A* = 0.9, *B* = *C* = 1 and <sup>2</sup> *<sup>w</sup>* = 0.1 are known. Suppose that <sup>2</sup> *<sup>v</sup>* = 10 but is unknown. Samples zk and ( ) / ˆ *u k k x* were generated from *N* = 20,000 realisations of zero-mean Gaussian wk and vk. The sequence of MLEs obtained using Procedure 1, initialised with (1) 2 ( ) ˆ *<sup>v</sup>* = 14 and 12 are indicated by traces (i) and (ii) of Fig. 1, respectively. The variance sequences are monotonically decreasing, which is consistent with Lemma 3. The figure shows that the MLEs converge (to a local maximum of the approximate log-likelihood function), and are reasonably close to the actual value of <sup>2</sup> *v* = 10. This illustrates the high measurement noise observation described by Lemma 4. An alternative to the EM algorithm involves calculating MLEs using the Newton-Raphson method [5], [6]. The calculated Newton-Raphson measurement noise variance iterates, initialised with (1) 2 ( ) ˆ *<sup>v</sup>* = 14 and 12 are indicated by traces (iii) and (iv) of Fig. 1, respectively. It can be seen that the Newton-Raphson estimates converge to those of the EM algorithm, albeit at a slower rate.

Figure 1. Variance MLEs (27) versus iteration number for Example 9: (i) EM algorithm with (1) 2 ( ) ˆ *<sup>v</sup> =* 14, (ii) EM algorithm with (1) 2 ( ) ˆ *<sup>v</sup> =* 12, (iii) Newton-Raphson with (1) 2 ( ) ˆ *<sup>v</sup> =* 14 and (iv) Newton-Raphson with (1) 2 ( ) ˆ*v =* 12.

<sup>&</sup>quot;The Internet is the world's largest library. It's just that all the books are on the floor." *John Allen Paulos*

Smoothing, Filtering and Prediction -

0, 0, lim *<sup>u</sup> <sup>k</sup> QR u <sup>L</sup> = 0.* 

*<sup>v</sup> =* 14,

*<sup>v</sup> =* 14 and (iv) Newton-Raphson

*k k x* were generated from *N*

( )

*<sup>w</sup>* = 0.1 are

*v*

<sup>188</sup> Estimating the Past, Present and Future

*unbiased for large N. �* 

*<sup>v</sup>* = 10 but is unknown. Samples zk and ( )

= 20,000 realisations of zero-mean Gaussian wk and vk. The sequence of MLEs obtained

Fig. 1, respectively. The variance sequences are monotonically decreasing, which is consistent with Lemma 3. The figure shows that the MLEs converge (to a local maximum of the approximate log-likelihood function), and are reasonably close to the actual value of <sup>2</sup>

= 10. This illustrates the high measurement noise observation described by Lemma 4. An alternative to the EM algorithm involves calculating MLEs using the Newton-Raphson method [5], [6]. The calculated Newton-Raphson measurement noise variance iterates,

It can be seen that the Newton-Raphson estimates converge to those of the EM algorithm,

(iv)

Figure 1. Variance MLEs (27) versus iteration number for Example 9: (i) EM algorithm with (1) 2 ( ) ˆ

"The Internet is the world's largest library. It's just that all the books are on the floor." *John Allen Paulos*

*<sup>v</sup> =* 12, (iii) Newton-Raphson with (1) 2 ( ) ˆ

lim *<sup>k</sup> Q R*

**Example 9.** In respect of the problem (23) – (25), assume *A* = 0.9, *B* = *C* = 1 and <sup>2</sup>

/ 1 / 1 ( *uT uT P C CP C k k k k +* () 1 ) *<sup>u</sup> <sup>R</sup> , it follows that* <sup>1</sup>

*<sup>v</sup>* = 14 and 12 are indicated by traces (iii) and (iv) of Fig. 1, respectively.

1 2 3 4 5 6

m

*<sup>z</sup> = vk , which implies (40), since the MLE (37) is* 

/ ˆ *u*

*<sup>v</sup>* = 14 and 12 are indicated by traces (i) and (ii) of

*Proof: By inspection of* ( ) *<sup>u</sup> Lk =* ( ) ( )

/ 0, 0, lim ˆ *<sup>u</sup> k k QR u*

( )

using Procedure 1, initialised with (1) 2 ( ) ˆ

*<sup>x</sup> = 0 and* <sup>1</sup> 0, 0

4

6

8

Variance

10

(i),(ii)

(iii)

12

*Therefore,* <sup>1</sup>

known. Suppose that <sup>2</sup>

initialised with (1) 2 ( ) ˆ

albeit at a slower rate.

(ii) EM algorithm with (1) 2 ( ) ˆ

with (1) 2 ( ) ˆ*v =* 12.

#### **8.3.3 Process Noise Variance Estimation**

#### **8.3.3.1 EM Algorithm**

In respect of the model (23), suppose that it is desired to estimate *Q* given *N* samples of *xk*+1. The vector states within (23) can be written in terms of its *i* components, *ik i k ik* , 1 , *x Ax w* , that is

$$
\omega\_{l,k} = A\_i \mathbf{x}\_k - \mathbf{x}\_{l,k+1} \tag{41}
$$

where *wi,k* = *Biwk*, *Ai* and *Bi* refer the *i*th row of *A* and *B*, respectively. Assume that *wi,k* ~ (0, <sup>2</sup> , ) *i w* , where <sup>2</sup> *i w*, is to be estimated. An MLE for the scalar <sup>2</sup> *i w*, = *<sup>T</sup> B QB i i* can be calculated from the sample variance formula

$$\sigma\_{i,w}^2 = \frac{1}{N} \sum\_{k=1}^N \varpi\_{i,k} w\_{i,k}^T \tag{42}$$

$$=\frac{1}{N}\sum\_{k=1}^{N}(\mathbf{x}\_{i,k+1} - A\_i\mathbf{x}\_k)(\mathbf{x}\_{i,k+1} - A\_i\mathbf{x}\_k)^T\tag{43}$$

$$=\frac{1}{N}\sum\_{k=1}^{N}B\_{i}\boldsymbol{w}\_{k}\boldsymbol{w}\_{k}^{T}\boldsymbol{B}\_{i}^{T}\tag{44}$$

$$\mathbf{B} = \mathbf{B}\_i \left(\frac{1}{N} \sum\_{k=1}^N \mathbf{w}\_k \mathbf{w}\_k^T \right) \mathbf{B}\_i^T \ . \tag{45}$$

Substituting *wk* = *Axk* – *xk*+1 into (45) and noting that <sup>2</sup> *i w*, = *<sup>T</sup> B QB i i* yields

$$\hat{Q} = \frac{1}{N} \sum\_{k=1}^{N} (A\mathbf{x}\_k - \mathbf{x}\_{k+1})(A\mathbf{x}\_k - \mathbf{x}\_{k+1})^T \tag{46}$$

which can be updated as follows.

*Procedure 2 [19].* Assume that an initial estimate ˆ (1) *Q* of *Q* is available. Subsequent estimates can be found by repeating the following two-step algorithm.


<sup>&</sup>quot;Information on the Internet is subject to the same rules and regulations as conversations at a bar." *George David Lundberg*

*Lemma 5 [19]: In respect of Procedure 2 for estimating Q, suppose the following:* 

1/0 *<sup>u</sup> P ≤* ( ) 1/0 *<sup>u</sup> P (or* ( )

() 2 () ( ) , , 1 1/ 1 , 1 1 <sup>1</sup> ( ) <sup>ˆ</sup> ( ) *N u u u T i w i k k k k ik k*

> () () ( ) , 1 1/ , 1 ( )( ) *u uT u T L C C RL ik k k i k*

 

*L Cx v L*

*which from Lemma 2 are monotonically non-increasing, it follows that* ˆ ( 1) *<sup>u</sup> Q ≤* ˆ ( ) *<sup>u</sup> Q . �* 

It is observed that the approximate MLEs asymptotically approach the actual values when

( 1)

lim *i k Q R L C*

*which implies (50), since the MLE (46) is unbiased for large N. �* 

 *<sup>w</sup>* = 0.1 but is unknown. Procedure 2 and the Newton-Raphson method [5], [6] were used to jointly estimate the states and the unknown variance. Some example variance

estimates are seen to be monotonically decreasing, which is in agreement with Lemma 5. At the final iteration, the approximate MLEs do not quite reach the actual value of (1) 2 ( ) ˆ

because the presence of measurement noise results in imperfect state reconstruction and introduces a small bias (see Example 5). The figure also shows that MLEs calculated via the

"Four years ago nobody but nuclear physicists had ever heard of the Internet. Today even my cat, Socks, has his own web page. I'm amazed at that. I meet kids all the time, been talking to my cat on the

*Q Q* 

*= 1, ..., n and the pair (A, C) is observable;* 

*Then* ˆ ( 1) *<sup>u</sup> Q ≤* ˆ ( ) *<sup>u</sup> Q (or* ˆ ( ) *<sup>u</sup> Q ≤* ˆ ( 1) *<sup>u</sup> Q ) for all u > 1.* 

*N*

1 1/ ( *u uT LC C k kk +* ( )

1

Newton-Raphson method converge at a slower rate.

0, 0, ˆ lim *<sup>u</sup>*

*Q Ru*

*Example 10.* For the model described in Example 8, suppose that <sup>2</sup>

*(ii) there exist* ˆ (1) *Q ≥ Q ≥ 0 and* ( 1)

*Lemma 6 [19]: Under the conditions of Lemma 5*,

*Proof: It is straight forward to show that* <sup>1</sup> , 0, 0

*Proof: Using (47)within (46) gives* 

*and thus* ˆ ( 1) *<sup>u</sup> Q =* () ()

the SNR is sufficiently high.

iterations, initialised with (1) 2 ( ) ˆ

Internet." *William Jefferson (Bill) Clinton*

(1) 2 ( ) ˆ

*(i) the data zk has been generated by (23) – (25) in which A, B, C, R are known,* ( )

1/0 *<sup>u</sup> P ≤* ( 1) 1/0

2

<sup>1</sup> )( ) *u T R Lk . Since* <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>Q</sup> varies with* () ()

*<sup>u</sup> P ) for all u > 1.* 

. (50)

 *= I and therefore* <sup>1</sup>

*<sup>w</sup>* = 0.14 and 0.12, are shown in Fig. 2. The EM algorithm

1 ,1 ( ) *u uT L L k jk and* ( )

*<sup>i</sup> A < 1, i* 

(49)

1/ *<sup>u</sup> k k ,* 

( )

 *= xk ,* 

/ 0, 0, lim ˆ *<sup>u</sup> k k Q Ru <sup>x</sup>*

*v =* 0.01 is known, and

*<sup>w</sup>* = 0.1,

Figure 2. Variance MLEs (46) versus iteration number for Example 10: (i) EM algorithm with (1) 2 ( ) ˆ *<sup>w</sup> =* 0.14, (ii) EM algorithm with (1) 2 ( ) ˆ *<sup>w</sup> =* 0.12, (iii) Newton-Raphson with (1) 2 ( ) ˆ *<sup>w</sup> =* 0.14 and (iv) Newton-Raphson with (1) 2 ( ) ˆ*<sup>w</sup> =* 0.12.

#### **8.3.3.2Properties**

Similarly to Lemma 1, it can be shown that a monotonically non-increasing (or nondecreasing) sequence of process noise variance estimates results in a monotonically nonincreasing (or non-decreasing) sequence of design and observed error covariances, see [19]. The converse case is stated below, namely, the sequence of variance iterates is monotonically non-increasing, provided the estimates and error covariances are initialized appropriately. The accompanying proof makes use of

$$
\hat{\mathbf{x}}\_{k+1\mid k+1}^{(u)} - A\hat{\mathbf{x}}\_{k\mid k}^{(u)} = \hat{\mathbf{x}}\_{k+1\mid k}^{(u)} + L\_{k+1}^{(u)}(\mathbf{z}\_{k+1} - \mathbf{C}\hat{\mathbf{x}}\_{k+1\mid k}^{(u)}) - A\hat{\mathbf{x}}\_{k\mid k}^{(u)}
$$

$$
= A\hat{\mathbf{x}}\_{k\mid k}^{(u)} + L\_{\mathbf{i},k+1}(\mathbf{z}\_{k+1} - \mathbf{C}\hat{\mathbf{x}}\_{k+1\mid k}^{(u)}) - A\hat{\mathbf{x}}\_{k\mid k}^{(u)}\tag{47}
$$

$$
= L\_{\mathbf{k}}^{(u)}(\mathbf{C}\hat{\mathbf{x}}\_{k+1\mid k}^{(u)} + \upsilon\_{k+1})\ .\tag{48}
$$

The components of (47) are written as

$$
\hat{\mathbf{x}}\_{i,k+1/k+1}^{(u)} - a\_i \hat{\mathbf{x}}\_{k/k}^{(u)} = L\_{i,k+1}^{(u)} (\mathbb{C} \tilde{\mathbf{x}}\_{k+1/k}^{(u)} + \boldsymbol{\upsilon}\_{k+1}) \, \, \, \, \tag{48}
$$

where ( ) , 1 *<sup>u</sup> Li k* is the *ith* row of ( ) 1 *<sup>u</sup> Lk* .

<sup>&</sup>quot;I must confess that I've never trusted the Web. I've always seen it as a coward's tool. Where does it live? How do you hold it personally responsible? Can you put a distributed network of fibre-optic cable on notice? And is it male or female? In other words, can I challenge it to a fight?" *Stephen Tyrone Colbert*

Smoothing, Filtering and Prediction -

*<sup>w</sup> =*

(47)

*<sup>w</sup> =* 0.14 and (iv) Newton-

<sup>190</sup> Estimating the Past, Present and Future

(iv)

Figure 2. Variance MLEs (46) versus iteration number for Example 10: (i) EM algorithm with (1) 2 ( ) ˆ

*<sup>w</sup> =* 0.12, (iii) Newton-Raphson with (1) 2 ( ) ˆ

Similarly to Lemma 1, it can be shown that a monotonically non-increasing (or nondecreasing) sequence of process noise variance estimates results in a monotonically nonincreasing (or non-decreasing) sequence of design and observed error covariances, see [19]. The converse case is stated below, namely, the sequence of variance iterates is monotonically non-increasing, provided the estimates and error covariances are initialized appropriately.

/ , 1 1 1/ / ˆ ( ˆ ˆ ) *<sup>u</sup> <sup>u</sup> <sup>u</sup> A k k ik k k k k k x L z Cx Ax*

"I must confess that I've never trusted the Web. I've always seen it as a coward's tool. Where does it live? How do you hold it personally responsible? Can you put a distributed network of fibre-optic cable on notice? And is it male or female? In other words, can I challenge it to a fight?" *Stephen Tyrone Colbert*

*ik k i k k ik k k k x a x L Cx v* , (48)

( ) () () () ( ) ( ) 1/ 1 / 1/ 1 1 1/ / ˆ ˆˆ ( ˆ ˆ ) *<sup>u</sup> uu u <sup>u</sup> <sup>u</sup> k k kk k k k k k k kk x Ax x L z Cx Ax*

1/ 1 ( ) *u u L Cx v k kk k* .

( ) () () () , 1/ 1 / , 1 1/ 1 ˆ ˆ ( ) *<sup>u</sup> uu u*

( ) ( ) ( )

1 *<sup>u</sup> Lk* .

1 2 3 4 5 6

m

0.06

0.07

0.08

0.09

Variance

0.14, (ii) EM algorithm with (1) 2 ( ) ˆ

*<sup>w</sup> =* 0.12.

The accompanying proof makes use of

() ()

The components of (47) are written as

*<sup>u</sup> Li k* is the *ith* row of ( )

Raphson with (1) 2 ( ) ˆ

**8.3.3.2Properties** 

where ( )

, 1

0.1

(i)

(ii)

(iii)

0.11

*Lemma 5 [19]: In respect of Procedure 2 for estimating Q, suppose the following:* 


*Then* ˆ ( 1) *<sup>u</sup> Q ≤* ˆ ( ) *<sup>u</sup> Q (or* ˆ ( ) *<sup>u</sup> Q ≤* ˆ ( 1) *<sup>u</sup> Q ) for all u > 1.* 

*Proof: Using (47)within (46) gives* 

$$\begin{aligned} \left(\widehat{\sigma}\_{i,w}^{(u)}\right)^2 &= \frac{1}{N} L\_{i,k+1}^{(u)} \left(\sum\_{k=1}^N \mathbb{C} \widetilde{\mathbf{x}}\_{k+1/k} + \boldsymbol{\upsilon}\_{k+1}\right)^2 \left(L\_{i,k+1}^{(u)}\right)^T \\ &= L\_{i,k+1}^{(u)} \left(\boldsymbol{\Sigma}\_{k+1/k}^{(u)} \boldsymbol{\mathsf{C}}^T + \boldsymbol{R}\right) \left(L\_{i,k+1}^{(u)}\right)^T \end{aligned} \tag{49}$$

*and thus* ˆ ( 1) *<sup>u</sup> Q =* () () 1 1/ ( *u uT LC C k kk +* ( ) <sup>1</sup> )( ) *u T R Lk . Since* <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>Q</sup> varies with* () () 1 ,1 ( ) *u uT L L k jk and* ( ) 1/ *<sup>u</sup> k k , which from Lemma 2 are monotonically non-increasing, it follows that* ˆ ( 1) *<sup>u</sup> Q ≤* ˆ ( ) *<sup>u</sup> Q . �* 

It is observed that the approximate MLEs asymptotically approach the actual values when the SNR is sufficiently high.

*Lemma 6 [19]: Under the conditions of Lemma 5*,

$$\lim\_{Q^{-1}\to 0, \mathbb{R}\to 0, \mu\to\nu\epsilon} \hat{\mathbb{Q}}^{(\mu+1)} = \mathbb{Q}\ . \tag{50}$$

*Proof: It is straight forward to show that* <sup>1</sup> , 0, 0 lim *i k Q R L C = I and therefore* <sup>1</sup> ( ) / 0, 0, lim ˆ *<sup>u</sup> k k Q Ru <sup>x</sup> = xk , which implies (50), since the MLE (46) is unbiased for large N. �* 

*Example 10.* For the model described in Example 8, suppose that <sup>2</sup> *v =* 0.01 is known, and (1) 2 ( ) ˆ *<sup>w</sup>* = 0.1 but is unknown. Procedure 2 and the Newton-Raphson method [5], [6] were used to jointly estimate the states and the unknown variance. Some example variance iterations, initialised with (1) 2 ( ) ˆ *<sup>w</sup>* = 0.14 and 0.12, are shown in Fig. 2. The EM algorithm estimates are seen to be monotonically decreasing, which is in agreement with Lemma 5. At the final iteration, the approximate MLEs do not quite reach the actual value of (1) 2 ( ) ˆ *<sup>w</sup>* = 0.1, because the presence of measurement noise results in imperfect state reconstruction and introduces a small bias (see Example 5). The figure also shows that MLEs calculated via the Newton-Raphson method converge at a slower rate.

<sup>&</sup>quot;Four years ago nobody but nuclear physicists had ever heard of the Internet. Today even my cat, Socks, has his own web page. I'm amazed at that. I meet kids all the time, been talking to my cat on the Internet." *William Jefferson (Bill) Clinton* on

Figure 4. Estimated magnitude of Earth rotation rate for Example 11.

The components of the states within (23) are now written as

*N N fa x*

, 1 ,, , 1 *n i k i j ik ik j x ax w* 

estimating *ai,j* from samples of *xi,k*. The assumption *xi,k+1* ~ , ,

where *ai,j* denotes the element in row *i* and column *j* of *A*. Consider the problem of

2 2

, ,1 , , ,1 , ,

 

*ij jk iw jw ik ij ik*

"It's important for us to explain to our nation that life is important. It's not only the life of babies, but

<sup>1</sup> log ( ) | ) log 2 log . 22 2

it's life of children living in, you know, the dark dungeons of the internet." *George Walker Bush*

, (51)

1 ( , *n*

*j a x* 

1 1

*k j*

*N n*

*x ax*

 (52)

*i j i k*

, ) 

2

*i w* , leads to the

<sup>2</sup>

satisfactory (see [19] for further details).

**8.3.4 State Matrix Estimation** 

**8.3.4.1EM Algorithm** 

log-likelihood

The estimated variances after 10 EM iterations are shown in Fig. 3. The figure demonstrates that approximate MLEs (46) approach steady state values from above, which is consistent with Lemma 5. The estimated Earth rotation rate magnitude versus time is shown in Fig. 4. At 100 seconds, the estimated magnitude of the Earth rate is 72.53 micro-radians per second, that is, one revolution every 24.06 hours. This estimated Earth rate is about 0.5% in error compared with the mean sidereal day of 23.93 hours [25]. Since the estimated Earth rate is in reasonable agreement, it is suggested that the MLEs for the unknown variances are

Figure 3. (i) <sup>2</sup> 1, ˆ *<sup>w</sup>* , (ii) <sup>2</sup> 2, ˆ *<sup>w</sup>* , (iii) <sup>2</sup> 3, ˆ *<sup>w</sup>* and (iv) <sup>2</sup> 4, ˆ *<sup>w</sup>* , normalised by their steady state values, versus EM iteration number for Example 11.

*Example 11.* Consider the problem of calculating the initial alignment of an inertial navigation system. Alignment is the process of estimating the Earth rotation rate and rotating the attitude direction cosine matrix, so that it transforms the body-frame sensor signals to a locally-level frame, wherein certain components of accelerations and velocities approach zero when the platform is stationary. This can be achieved by a Kalman filter that uses the model (23), where *xk* <sup>4</sup> comprises the errors in earth rotation rate, tilt, velocity and position vectors respectively, and *wk* <sup>4</sup> is a deterministic signal which is a nonlinear function of the states (see [24]). The state matrix is calculated as *A* = *I* + *Ts* + <sup>1</sup> <sup>2</sup> ( ) 2! *Ts* +

$$\frac{1}{3!} (\Phi T\_{\downarrow})^{\flat}, \text{ where } T\_{\circ} \text{ is the sampling interval, } \Phi = \begin{bmatrix} 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & g & 0 & 0 \\ 0 & 0 & 1 & 0 \end{bmatrix} \text{ is a continuous-time state.}$$

matrix and *g* is the universal gravitational constant. The output mapping within (24) is *C* 0001 *.* Raw three-axis accelerometer and gyro data was recorded from a stationary Litton LN270 Inertial Navigation System at a 500 Hz data rate. In order to generate a compact plot, the initial variance estimates were selected to be 10 times the steady state values.

<sup>&</sup>quot;On the Internet, nobody knows you're a dog." *Peter Steiner*

Smoothing, Filtering and Prediction -

(i) (ii) (iii) (iv)

*<sup>w</sup>* , normalised by their steady state values, versus EM

<sup>1</sup> <sup>2</sup> ( ) 2! *Ts* +

is a continuous-time state

<sup>192</sup> Estimating the Past, Present and Future

0 2 4 6 8 10

m

0

 *<sup>w</sup>* , (iii) <sup>2</sup> 3, ˆ

*Ts* , where *Ts* is the sampling interval,  *=* 

"On the Internet, nobody knows you're a dog." *Peter Steiner*

*<sup>w</sup>* and (iv) <sup>2</sup>

4, ˆ

function of the states (see [24]). The state matrix is calculated as *A* = *I* + *Ts* +

*Example 11.* Consider the problem of calculating the initial alignment of an inertial navigation system. Alignment is the process of estimating the Earth rotation rate and rotating the attitude direction cosine matrix, so that it transforms the body-frame sensor signals to a locally-level frame, wherein certain components of accelerations and velocities approach zero when the platform is stationary. This can be achieved by a Kalman filter that uses the model (23), where *xk* <sup>4</sup> comprises the errors in earth rotation rate, tilt, velocity and position vectors respectively, and *wk* <sup>4</sup> is a deterministic signal which is a nonlinear

matrix and *g* is the universal gravitational constant. The output mapping within (24) is *C* 0001 *.* Raw three-axis accelerometer and gyro data was recorded from a stationary Litton LN270 Inertial Navigation System at a 500 Hz data rate. In order to generate a compact plot, the initial variance estimates were selected to be 10 times the

5

10

Normalized Variances

Figure 3. (i) <sup>2</sup>

<sup>1</sup> <sup>3</sup> ( ) 3!

steady state values.

1, ˆ *<sup>w</sup>* , (ii) <sup>2</sup> 2, ˆ

iteration number for Example 11.

15

20

Figure 4. Estimated magnitude of Earth rotation rate for Example 11.

The estimated variances after 10 EM iterations are shown in Fig. 3. The figure demonstrates that approximate MLEs (46) approach steady state values from above, which is consistent with Lemma 5. The estimated Earth rotation rate magnitude versus time is shown in Fig. 4. At 100 seconds, the estimated magnitude of the Earth rate is 72.53 micro-radians per second, that is, one revolution every 24.06 hours. This estimated Earth rate is about 0.5% in error compared with the mean sidereal day of 23.93 hours [25]. Since the estimated Earth rate is in reasonable agreement, it is suggested that the MLEs for the unknown variances are satisfactory (see [19] for further details).

#### **8.3.4 State Matrix Estimation**

#### **8.3.4.1EM Algorithm**

The components of the states within (23) are now written as

$$\mathbf{x}\_{i,k+1} = \sum\_{j=1}^{n} \mathbf{a}\_{i,j} \mathbf{x}\_{i,k} + \mathbf{w}\_{i,k} \,\prime \tag{51}$$

where *ai,j* denotes the element in row *i* and column *j* of *A*. Consider the problem of estimating *ai,j* from samples of *xi,k*. The assumption *xi,k+1* ~ , , 1 ( , *n i j i k j a x* <sup>2</sup> , ) *i w* , leads to the

log-likelihood

$$\log f(\mathbf{a}\_{i,j}) \mid \mathbf{x}\_{j,k+1} = -\frac{N}{2}\log 2\pi - \frac{N}{2}\log \sigma\_{i,\nu}^2 - \frac{1}{2}\sigma\_{j,\nu}^{-2}\sum\_{k=1}^{N} \left(\mathbf{x}\_{i,k+1} - \sum\_{j=1}^{n} a\_{i,j}\mathbf{x}\_{i,k}\right)^2. \tag{52}$$

<sup>&</sup>quot;It's important for us to explain to our nation that life is important. It's not only the life of babies, but it's life of children living in, you know, the dark dungeons of the internet." *George Walker Bush*

An iterative procedure for re-estimating an unknown state matrix is proposed below.

*Procedure 3 [20].* Assume that there exists an initial estimate ˆ (1) *A* satisfying ˆ (1) | ( )|

/ ˆ *u*

*i j <sup>a</sup>* within <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>A</sup>* if <sup>ˆ</sup> ( 1) | ( )| *<sup>u</sup>*

( ) () () ( ) () () ( )

() () () () () () () () ( )

*(i) the data zk has been generated by (23) – (25) in which B, C, Q, R are known;* 

results in monotonically non-increasing error covariance sequences.

*Lemma 7 [20]. In respect of Procedure 3 for estimating A, suppose the following:* 

*<sup>i</sup> A < 1, i = 1, …, n, the pair (A, C) is observable;* 

1/ *<sup>u</sup> k k );* 

1/ *<sup>u</sup> Pk k ≤* ( )

1/

"It may not always be profitable at first for businesses to be online, but it is certainly going to be

1/0 *<sup>u</sup> P ≤* ( ) 1/0 *<sup>u</sup> P (or* ( ) 1/0 *<sup>u</sup> P ≤* ( 1) 1/0

1/ *<sup>u</sup> Pk k ≤* ( )

( ) / *<sup>u</sup> P k k ≤* ( ) / *<sup>u</sup> k k );* 

/ 1 / 1

accounts for the presence of modelling error. In the following, the notation of Lemma 1 is employed to argue that a monotonically non-increasing state matrix estimate sequence

/ ˆ *u*

obtain corrected state estimates ( )

, ˆ *u*

The design Riccati difference equation (55) can be written as

Step 2. Copy ˆ ( ) *<sup>u</sup> A* to ˆ ( 1) *<sup>u</sup> A* . Use ( )

1, …, *n*. Include ( 1)

The condition ˆ ( 1) | ( )| *<sup>u</sup>*

asymptotically stable.

*(ii)* ˆ (1) | ( )| 

*(i)* ( ) 1/ *<sup>u</sup> k k ≤* ( )

*(ii)* ( ) / *<sup>u</sup> k k ≤* ( ) / *<sup>u</sup> P (or k k*

*for all u ≥ 1.* 

*(iii) there exist* ˆ (1) *A ≥ A and* ( 1)

1/ *<sup>u</sup> Pk k (or* ( )

*(iii)* ˆ ( 1) *<sup>u</sup> A ≤* ˆ ( ) *<sup>u</sup> A which implies* ( 1)

unprofitable not to be online." *Ester Dyson*

**8.3.4.2Properties** 

where

*Then:* 

1, …, *n*. Subsequent estimates are calculated using the following two-step EM algorithm.

Step 1. Operate the Kalman filter (29) using (54) on the measurements *zk* over *k* [1, *N*] to

*k k x* and ( )

1/ 1 ˆ *<sup>u</sup> k k x* .

*<sup>i</sup> A* < 1 within Step 2 ensures that the estimated system is

1/ / 1 ( ) ( ) () *<sup>u</sup> u u u T u uT <sup>u</sup> P A K CP A K C K RK Q S k k k kk <sup>k</sup> k k <sup>k</sup>* , (57)

<sup>ˆ</sup> <sup>ˆ</sup> ( ) ( )( ) ( ) *u u u u u uT u u u T S A K CP A K C A K CP A K C <sup>k</sup> k kk <sup>k</sup> k kk <sup>k</sup>* (58)

*k k x* within (56) to obtain candidate estimates ( 1)

*<sup>i</sup> A* < 1, *i* = 1, …, *n*.

*<sup>u</sup> P ) for all u > 1.* 

*<sup>u</sup> Pk k (* <sup>ˆ</sup> ( ) *<sup>u</sup> <sup>A</sup> <sup>≤</sup>* <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>A</sup> which implies* ( )

1/ *<sup>u</sup> Pk k ≤* ( 1)

1/ *<sup>u</sup> Pk k )* 

*<sup>i</sup> A* < 1, *i* =

, ˆ *u i j a* , *i, j* =

By setting , ,1 , log ( ) | ) *ij jk i j fa x a* = 0, an MLE for *ai,j* is obtained as [20]

$$\hat{\mathbf{a}}\_{i,j} = \frac{\sum\_{k=1}^{N} \left( \mathbf{x}\_{i,k+1} - \sum\_{j=1, j \neq i}^{n} \mathbf{a}\_{i,j} \mathbf{x}\_{i,k} \right) \mathbf{x}\_{j,k}}{\sum\_{k=1}^{N} \mathbf{x}\_{j,k}^{2}}. \tag{53}$$

Incidentally, the above estimate can also be found using the least-squares method [2], [10] and minimising the cost function 2 ,1 , , 1 1 . *N n i k ij ik k j x ax* The expectation of , <sup>ˆ</sup>*i j <sup>a</sup>* is [20]

$$E\{\hat{a}\_{i,j}\} = E\left\{\frac{\sum\_{k=1}^{N} \left(\sum\_{j=1}^{n} a\_{i,j} \mathbf{x}\_{i,k} + \mathbf{w}\_{i,k} - \sum\_{j=1, j\neq i}^{n} a\_{i,j} \mathbf{x}\_{i,k}\right) \mathbf{x}\_{j,k}}{\sum\_{k=1}^{N} \mathbf{x}\_{j,k}^{2}}\right\}$$

$$= a\_{i,j} + E\left\{\frac{\sum\_{k=1}^{N} \mathbf{w}\_{i,k} \mathbf{x}\_{j,k}}{\sum\_{k=1}^{N} \mathbf{x}\_{j,k}^{2}}\right\}$$

$$= a\_{i,j} \,\,\,\,\,$$

Since *wi,k* and *xi,k* are independent. Hence, the MLE (53) is unbiased.

Suppose that an estimate ˆ ( ) *<sup>u</sup> A* = ( ) , { } *<sup>u</sup> <sup>i</sup> <sup>j</sup> a* of *A* is available at an iteration *u*. The predicted state estimates within (29) can be calculated from

$$
\hat{\mathbf{x}}\_{k\*1/k}^{(u)} = (\hat{A}^{(u)} - K\_k^{(u)} \mathbf{C}) \hat{\mathbf{x}}\_{k\*1/k}^{(u)} + K\_k^{(u)} \mathbf{z}\_k \tag{54}
$$

where ( ) *<sup>u</sup> Kk* = () () ( ) / 1 / 1 <sup>ˆ</sup> ( *uu T u T <sup>A</sup> P C CP C k k k k* + <sup>1</sup> *<sup>R</sup>*) , in which ( ) / 1 *<sup>u</sup> Pk k* is obtained from the design Riccati equation

$$P\_{k+1/k}^{(u)} = (\hat{A}^{(u)} - K\_k^{(u)} \mathbf{C}) P\_{k/k-1}^{(u)} (\hat{A}^{(u)} - K\_k^{(u)} \mathbf{C})^T + K\_k^{(u)} R (K\_k^{(u)})^T + Q \ . \tag{55}$$

An approximate MLE for *ai,j* is obtained by replacing *xk* by ( ) / ˆ *u k k x* within (53) which results in

$$
\hat{\mathfrak{X}}\_{i,j}^{(u+1)} = \frac{\sum\_{k=1}^{N} \left( \hat{\mathfrak{X}}\_{i,k+1/k+1}^{(u)} - \sum\_{j=1, j \neq i}^{n} \hat{a}\_{i,j}^{(u)} \hat{\mathfrak{X}}\_{i,k/k}^{(u)} \right) \hat{\mathfrak{X}}\_{j,k/k}^{(u)}}{\sum\_{k=1}^{N} (\hat{\mathfrak{X}}\_{j,k/k}^{(u)})^2} \,. \tag{56}
$$

<sup>&</sup>quot;The internet is like a gold-rush; the only people making money are those who sell the pans." *Will Hobbs*

Smoothing, Filtering and Prediction -

. (53)

<sup>194</sup> Estimating the Past, Present and Future

= 0, an MLE for *ai,j* is obtained as [20]

1 1,

 

*ax*

*k j ji i j N*

*N n*

, 1 ,, ,

*x ax x*

*i k i j i k j k*

2 , 1

*N n*

*k j*

*i j N*

 

> *j k k*

Incidentally, the above estimate can also be found using the least-squares method [2], [10]

,1 , , 1 1

*x ax* 

*i k ij ik*

*N n n*

1 1 1,

, ,

*w x*

*ik jk*

2 , 1

*x*

*j k k*

1

( ) () () () () 1/ 1/ ˆ ˆ ( )ˆ *<sup>u</sup> u uu <sup>u</sup>*

( ) () () () () () () ()

*k*

 

( ) () () () , 1/ 1 , ,/ ,/

ˆ ˆˆ ˆ

*x ax x*

*<sup>N</sup> <sup>n</sup> <sup>u</sup> uu u ik k i j ik k j k k*

> () 2 , / 1

"The internet is like a gold-rush; the only people making money are those who sell the pans." *Will Hobbs*

( ) ˆ

*x*

*jk k*

*k i j N*

 

*N*

*k j j ji*

2

.

,, , ,, ,

*ax w ax x*

*ij ik ik ij ik jk*

2 , 1

*x*

*j k k*

The expectation of , <sup>ˆ</sup>*i j <sup>a</sup>* is [20]

*ij* 

*<sup>i</sup> <sup>j</sup> a* of *A* is available at an iteration *u*. The predicted state

*<sup>u</sup> Pk k* is obtained from the design

*k k x* within (53) which results in

. (56)

*k k k kk kk x A K Cx K z* , (54)

/ 1

/ ˆ *u*

, in which ( )

<sup>ˆ</sup> <sup>ˆ</sup> ( ) ( ) () *<sup>u</sup> u u u u u T u uT P A K CP A K C K RK Q k k k kk <sup>k</sup> k k* . (55)

By setting , ,1

, log ( ) | ) *ij jk i j fa x a* 

and minimising the cost function

*<sup>i</sup>*, *<sup>j</sup> a* ,

Suppose that an estimate ˆ ( ) *<sup>u</sup> A* = ( )

where ( ) *<sup>u</sup> Kk* = () () ( )

Riccati equation

estimates within (29) can be calculated from

/ 1 / 1 <sup>ˆ</sup> ( *uu T u T <sup>A</sup> P C CP C k k k k* + <sup>1</sup> *<sup>R</sup>*)

1/ / 1

,

ˆ

*a*

An approximate MLE for *ai,j* is obtained by replacing *xk* by ( )

1 1, ( 1)

*k j ji u i j <sup>N</sup> <sup>u</sup>*

 

,

,

*Ea E*

,

*a E*

Since *wi,k* and *xi,k* are independent. Hence, the MLE (53) is unbiased.

, { } *<sup>u</sup>*

{ } ˆ

ˆ

An iterative procedure for re-estimating an unknown state matrix is proposed below.

*Procedure 3 [20].* Assume that there exists an initial estimate ˆ (1) *A* satisfying ˆ (1) | ( )| *<sup>i</sup> A* < 1, *i* = 1, …, *n*. Subsequent estimates are calculated using the following two-step EM algorithm.


The condition ˆ ( 1) | ( )| *<sup>u</sup> <sup>i</sup> A* < 1 within Step 2 ensures that the estimated system is asymptotically stable.

#### **8.3.4.2Properties**

The design Riccati difference equation (55) can be written as

$$P\_{k+1/k}^{(u)} = (A - K\_k^{(u)} \mathbf{C}) P\_{k/k-1}^{(u)} (A - K\_k^{(u)} \mathbf{C})^\top + K\_k^{(u)} R (K\_k^{(u)})^\top + Q + S\_k^{(u)},\tag{57}$$

where

$$S\_k^{(u)} = (\hat{A}^{(u)} - K\_k^{(u)} \mathbb{C}) P\_{k/k - 1}^{(u)} (\hat{A}^{(u)} - K\_k^{(u)} \mathbb{C})^T - (A - K\_k^{(u)} \mathbb{C}) P\_{k/k - 1}^{(u)} (A - K\_k^{(u)} \mathbb{C})^T \tag{58}$$

accounts for the presence of modelling error. In the following, the notation of Lemma 1 is employed to argue that a monotonically non-increasing state matrix estimate sequence results in monotonically non-increasing error covariance sequences.

*Lemma 7 [20]. In respect of Procedure 3 for estimating A, suppose the following:* 


*Then:* 

$$\text{(i)}\quad \Sigma\_{k\ast 1/k}^{(u)} \le P\_{k\ast 1/k}^{(u)} \text{ (or } P\_{k\ast 1/k}^{(u)} \le \Sigma\_{k\ast 1/k}^{(u)}\text{);}$$

$$\text{(iii)}\quad \Sigma\_{k/k}^{(u)} \le P\_{k/k}^{(u)} \text{ (or } P\_{k/k}^{(u)} \le \Sigma\_{k/k}^{(u)}\text{);}$$

*(iii)* ˆ ( 1) *<sup>u</sup> A ≤* ˆ ( ) *<sup>u</sup> A which implies* ( 1) 1/ *<sup>u</sup> Pk k ≤* ( ) 1/ *<sup>u</sup> Pk k (* <sup>ˆ</sup> ( ) *<sup>u</sup> <sup>A</sup> <sup>≤</sup>* <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>A</sup> which implies* ( ) 1/ *<sup>u</sup> Pk k ≤* ( 1) 1/ *<sup>u</sup> Pk k )* 

*for all u ≥ 1.* 

<sup>&</sup>quot;It may not always be profitable at first for businesses to be online, but it is certainly going to be unprofitable not to be online." *Ester Dyson*

An illustration is presented below.

0.61

Figure 5. Sequence of ˆ ( ) *<sup>u</sup> A* versus iteration number for Example 12.

of *A* = 0.6 when the measurement noise becomes negligible.

**8.4 Smoothing EM Algorithms** 

**8.4.1.1 EM Algorithm** 

**8.4.1Process Noise Variance Estimation** 

of their operating system." *William Henry (Bill) Gates III*

*Example 12.* In respect of the model (23) – (25), suppose that *B* = *C =* 1, <sup>2</sup>

0.62

0.63

ˆ*A*(*u*)

0.64

0.65

0.66

<sup>1</sup> <sup>2</sup> <sup>3</sup> <sup>4</sup> <sup>5</sup> 0.6

and *A* = 0.6 is unknown. Simulations were conducted with 100 realizations of Gaussian process noise and measurement noise of length *N* = 500,000 for *R* = 0.1, 0.01 and 0.001. The EM algorithms were initialised with ˆ (1) *A* = 0.9999. It was observed that the resulting estimate sequences were all monotonically decreasing, however, this becomes imperceptible at *R* = 0.001, due to the limited resolution of the plot. The mean estimates are shown in Fig. 5. As expected from Lemma 8, ˆ ( ) *<sup>u</sup> A* asymptotically approaches the true value

In the previous EM algorithms, the expectation step involved calculating filtered estimates. Similar EM procedures are outlined in [26] and here where smoothed estimates are used at iteration *u* within the expectation step. The likelihood functions described in Sections 8.2.2 and 8.2.3 are exact, provided that the underlying assumptions are correct and actual random variables are available. Under these conditions, the ensuing parameter estimates maximise the likelihood functions and their limit of precision is specified by the associated CRLBs. However, the use of filtered or smoothed quantities leads to approximate likelihood functions, MLEs and CRLBs. It turns out that the approximate MLEs approach the true parameter values under prescribed SNR conditions. It will be shown that the use of

"The best way to prepare is to write programs, and to study great programs that other people have written. In my case, I went to the garbage cans at the Computer Science Center and I fished out listings

*u*

R=0.1 R=0.01 R=0.001

*<sup>w</sup>* 0.2 are known

The proof follows *mutatis mutandis* from that of Lemma 1. A heuristic argument is outlined below which suggests that non-increasing error variances lead to a non-increasing state matrix estimate sequence. Suppose that there exists a residual error ( ) *<sup>u</sup> <sup>k</sup> s <sup>n</sup>* at iteration *u* such that

$$
\hat{\mathfrak{X}}\_{k+1/k+1}^{(u)} = \hat{A}^{(u)} \hat{\mathfrak{X}}\_{k/k}^{(u)} + \mathfrak{s}\_k^{(u)} \,. \tag{59}
$$

The components of (59) are denoted by

$$
\hat{\mathbf{x}}\_{i,k\times1/k\times1}^{(u)} = \sum\_{j=1}^{n} \mathbf{a}\_{i,j}^{(u)} \hat{\mathbf{x}}\_{i,k\times k}^{(u)} + \mathbf{s}\_{i,k}^{(u)} \; \prime \tag{60}
$$

where ( ) , *u i k s* is the *i*th element of ( ) *<sup>u</sup> <sup>k</sup> s* . It follows from (60) and (48) that

$$\mathbf{s}\_{k}^{(u)} = \mathbf{L}\_{k}^{(u)} (\mathbf{C}\widetilde{\mathbf{x}}\_{k+1/k}^{(u)} + \boldsymbol{\upsilon}\_{k+1}) \tag{61}$$

and

$$\mathbf{s}\_{i,k}^{(u)} = \mathbf{L}\_{i,k}^{(u)} (\mathbf{C}\widetilde{\mathbf{x}}\_{k+1/k}^{(u)} + \boldsymbol{\upsilon}\_{k+1}) \,. \tag{62}$$

Using (61) and (63) within (57) yields

$$\begin{split} \hat{a}^{(u+1)}\_{i,j} &= \hat{a}^{(u)}\_{i,j} + \left( \sum\_{k=1}^{N} \mathbf{s}^{(u)}\_{i,k} \hat{\mathbf{x}}^{(u)}\_{j,k/k} \right) \left( \sum\_{k=1}^{N} \left( \hat{\mathbf{x}}^{(u)}\_{j,k/k} \right)^{2} \right)^{-1} \\ &= \hat{a}^{(u)}\_{i,j} + I^{(u)}\_{i,k} \mathbb{C} \left( \sum\_{k=1}^{N} (\hat{\mathbf{x}}^{(u)}\_{k+1/k} + \mathbf{C}^{\pi} \boldsymbol{\upsilon}\_{k+1}) \hat{\mathbf{x}}^{(u)}\_{j,k/k} \right) \left( \sum\_{k=1}^{N} (\hat{\mathbf{x}}^{(u)}\_{j,k/k})^{2} \right)^{-1} \end{split} \tag{63}$$

where *C*# denotes the Moore-Penrose pseudo-inverse of *C*. It is shown in Lemma 2 under prescribed conditions that ( 1) *<sup>u</sup> L C* ≤ ( ) *<sup>u</sup> L C* ≤ *I*. Since the non-increasing sequence ( ) *<sup>u</sup> L C* is a factor of the second term on the right-hand-side of (63), the sequence ( 1) , ˆ *u i j a* is expected to be non-increasing.24

*Lemma 8 [20]: Under the conditions of Lemma 7, suppose that C is full rank, then* 

$$\lim\_{Q^{-1}\to 0,\mathbb{R}\to 0,\boldsymbol{\mu}\to\boldsymbol{\nu}}\hat{A}^{(\boldsymbol{\mu}\ast\boldsymbol{1})}=\boldsymbol{A}\ .\tag{64}$$

*Proof: It is straight forward to show that* <sup>1</sup> , 0, 0, lim *i k Q Ru L C = I and therefore* <sup>1</sup> ( ) / 0, 0, lim ˆ *<sup>u</sup> k k Q Ru <sup>x</sup> = xk, which implies (64) since the MLE (53) is unbiased. �* 

<sup>24&</sup>quot;New scientific ideas never spring from a communal body, however organized, but rather from the head of an individually inspired researcher who struggles with his problems in lonely thought and unites all his thought on one single point which is his whole world for the moment." *Max Karl Ernst Ludwig Planck*

Smoothing, Filtering and Prediction -

*<sup>k</sup> s <sup>n</sup>* at iteration *u*

<sup>196</sup> Estimating the Past, Present and Future

The proof follows *mutatis mutandis* from that of Lemma 1. A heuristic argument is outlined below which suggests that non-increasing error variances lead to a non-increasing state

*<sup>k</sup> s* . It follows from (60) and (48) that

*k k kk k x Ax s* . (59)

*k k kk k s L Cx v* (61)

*ik ik k k k s L Cx v* . (62)

1

, ˆ *u*

*.* (64)

 *= I and therefore* <sup>1</sup>

*i j a* is expected to be

/ 0, 0, lim ˆ *<sup>u</sup> k k Q Ru <sup>x</sup>*

( )

 *= xk,* 

(63)

1

, (60)

matrix estimate sequence. Suppose that there exists a residual error ( ) *<sup>u</sup>*

( ) () () () 1/ 1 / ˆ ˆ ˆ *<sup>u</sup> uu u*

( ) () () () , 1/ 1 , ,/ , 1 <sup>ˆ</sup> <sup>ˆ</sup> *<sup>n</sup> <sup>u</sup> uu u ik k i j ik k ik j x ax s* 

> () () () 1/ 1 ( ) *uuu*

() () () , , 1/ 1 ( ) *uuu*

( 1) ( ) () () () 2 , , , ,/ , / 1 1

factor of the second term on the right-hand-side of (63), the sequence ( 1)

*Lemma 8 [20]: Under the conditions of Lemma 7, suppose that C is full rank, then* 

*A A* 

0, 0, ˆ lim *<sup>u</sup>*

*Q Ru*

1

*Proof: It is straight forward to show that* <sup>1</sup> , 0, 0,

 

() () () # () () 2 , , 1/ 1 ,/ , / 1 1 <sup>ˆ</sup> ( ) () ˆ ˆ *<sup>N</sup> <sup>N</sup> u u <sup>u</sup> <sup>u</sup> <sup>u</sup> ij ik k k k jk k jk k k k a L C x Cv x x*

,

 

where *C*# denotes the Moore-Penrose pseudo-inverse of *C*. It is shown in Lemma 2 under prescribed conditions that ( 1) *<sup>u</sup> L C* ≤ ( ) *<sup>u</sup> L C* ≤ *I*. Since the non-increasing sequence ( ) *<sup>u</sup> L C* is a

( 1)

lim *i k Q Ru L C*

*which implies (64) since the MLE (53) is unbiased. �* 

24"New scientific ideas never spring from a communal body, however organized, but rather from the head of an individually inspired researcher who struggles with his problems in lonely thought and unites all his thought on one single point which is his whole world for the moment." *Max Karl Ernst* 

ˆˆ ˆ ˆ( ) *N N u u u u u ij ij ik jk k jk k k k*

*a a sx x*

such that

where ( ) , *u*

and

*Ludwig Planck*

non-increasing.24

The components of (59) are denoted by

*i k s* is the *i*th element of ( ) *<sup>u</sup>*

Using (61) and (63) within (57) yields

An illustration is presented below.

Figure 5. Sequence of ˆ ( ) *<sup>u</sup> A* versus iteration number for Example 12.

*Example 12.* In respect of the model (23) – (25), suppose that *B* = *C =* 1, <sup>2</sup> *<sup>w</sup>* 0.2 are known and *A* = 0.6 is unknown. Simulations were conducted with 100 realizations of Gaussian process noise and measurement noise of length *N* = 500,000 for *R* = 0.1, 0.01 and 0.001. The EM algorithms were initialised with ˆ (1) *A* = 0.9999. It was observed that the resulting estimate sequences were all monotonically decreasing, however, this becomes imperceptible at *R* = 0.001, due to the limited resolution of the plot. The mean estimates are shown in Fig. 5. As expected from Lemma 8, ˆ ( ) *<sup>u</sup> A* asymptotically approaches the true value of *A* = 0.6 when the measurement noise becomes negligible.

#### **8.4 Smoothing EM Algorithms**

#### **8.4.1Process Noise Variance Estimation**

#### **8.4.1.1 EM Algorithm**

In the previous EM algorithms, the expectation step involved calculating filtered estimates. Similar EM procedures are outlined in [26] and here where smoothed estimates are used at iteration *u* within the expectation step. The likelihood functions described in Sections 8.2.2 and 8.2.3 are exact, provided that the underlying assumptions are correct and actual random variables are available. Under these conditions, the ensuing parameter estimates maximise the likelihood functions and their limit of precision is specified by the associated CRLBs. However, the use of filtered or smoothed quantities leads to approximate likelihood functions, MLEs and CRLBs. It turns out that the approximate MLEs approach the true parameter values under prescribed SNR conditions. It will be shown that the use of within

<sup>&</sup>quot;The best way to prepare is to write programs, and to study great programs that other people have written. In my case, I went to the garbage cans at the Computer Science Center and I fished out listings of their operating system." *William Henry (Bill) Gates III*

( )

smoother using ( )

straightforward to show that

generated by () () *u u <sup>v</sup> w*

sequence () ()

ˆ (1) | ( )| 

(i) ( ) 1/ *<sup>u</sup> Pk k* ≤ ( )

(ii) ( 1) 1/ *<sup>u</sup> Pk k* ≤ ( )

*Since* () ()

(iii) ( 1) ( 1)

1 1 ( ) *u uH*

*Then:* 

/ 1

Define the stacked vectors *v* = <sup>1</sup> [ *<sup>T</sup> v* , …, ] *T T*

*w* 

non-decreasing, depending on the initial conditions.

(2) *<sup>P</sup>*1/0 *of (57) for* <sup>ˆ</sup> (1) *<sup>Q</sup> <sup>≥</sup>* <sup>ˆ</sup> (2) *Q satisfy* (1) *<sup>P</sup>*1/0 <sup>≤</sup> (2) *<sup>P</sup>*1/0 ).

≤ () ()

*ei ei is common to* () () ( ) *u uH*

/ 1 *<sup>u</sup> Pk k ≤* ( )

/ 1 *<sup>u</sup> Pk k* ≤ ( ) / 1 *<sup>u</sup> Pk k (or* ( )

*ei Q CP P S C k kk k k <sup>k</sup>*

"We have always been shameless about stealing great ideas." *Steven Paul Jobs*

( 1) ( 1) 2 2 <sup>2</sup> ( ) *u uH ei ei* 

/ ( )] ˆ *u TT wN N* and ( ) *<sup>u</sup> w* = *w* – ( ) ˆ *<sup>u</sup> w* = ( )

 **�**

<sup>2</sup> ( ) *u uT w w* <sup>=</sup> () ()

and () () <sup>1</sup> 1 1 <sup>2</sup> ( ) ( ) *u uH H H H ei ei Q Q Q*

/ 1 *<sup>u</sup> Pk k (or* ( )

1/ *<sup>u</sup> Pk k and* ( 1)

<sup>2</sup> ( ) *u uH ei ei* 

*Substituting (67) into (68) yields* 

() () () () () 0 / 1 1/ <sup>0</sup> ˆ ˆ( ) ( ) *u uH H u u u HT CP P S C k kk k k* 

> , where () () ( ) *u uH*

 ( ) () () 1 <sup>1</sup> <sup>2</sup> ˆ ˆ ( ( )) ( ) *<sup>u</sup> H u uH <sup>H</sup> ei Q*

*<sup>i</sup> A < 1, i = 1, …, n, the pair (A, C) is observable and D is of full rank;* 

1/

<sup>2</sup> ( ) *u uH*

≤ () ()

 ( ) () () () <sup>1</sup> <sup>1</sup> <sup>2</sup> 0 / 1 1/ <sup>0</sup> ( ) )( ) *<sup>u</sup> H H u u u HT <sup>H</sup>*

<sup>2</sup> ( ) *u uH*

*Lemma 9: In respect of Procedure 4 for estimating Q, suppose the following:* 

1/ [( ) *u T w <sup>N</sup>* , …, ( )

*ei ei =* () ()

*(i) the system (23) – (24) is non-minimum phase, in which A, B, C, D, R are known,* 

*(ii) the solutions* (1) *<sup>P</sup>*1/ 0 , (2) *<sup>P</sup>*1/0 *of (57) for* <sup>ˆ</sup> (2) *<sup>Q</sup> <sup>≥</sup>* <sup>ˆ</sup> (1) *Q satisfy* (2) *<sup>P</sup>*1/0 <sup>≤</sup> (1) *<sup>P</sup>*1/0 *(or the solutions* (1) *<sup>P</sup>*1/ 0 ,

1/ *<sup>u</sup> Pk k* ≤ ( 1)

1/ *<sup>u</sup> Pk k and* ( )

<sup>2</sup> ( ) *u uH*

*ei ei* 

2 2 <sup>2</sup> ( ) *u uH*

/ 1 *<sup>u</sup> Pk k* ≤ ( 1)

*, it suffices to show that* 

*ei ei* . (69)

. (70)

*ei ei* ≤ ( 1) ( 1)

/ 1 *<sup>u</sup> Pk k* 

<sup>2</sup> ( ) *u uH ei ei* 

 *) for all k, u ≥ 1;* 

*) for u ≥ 1*.

*<sup>u</sup> Pk k ) for all k, u ≥ 1;* 

*ei ei (or* () ()

*Proof: (i) and (ii) This follows from S(u+1) ≤ S(u) within condition (iii) of Theorem 2 of Chapter 8.* 

*ei ei and* ( 1) ( 1) ( ) *u uH*

. It is shown in the lemma below that the

*<sup>u</sup> Pk k* and ( ) *<sup>u</sup> Kk* . Employing the notation and approach of Chapter 7, it is

1 1 ( ) *u uH*

, (68)

*ei ei* is monotonically non-increasing or monotonically

*<sup>K</sup> v* , *w* = <sup>1</sup> [ *<sup>T</sup> w* , …, ] *T T wN* , ( ) ˆ *<sup>u</sup> w* = ( )

*ei ei +* () ()

*<sup>k</sup> .* (67)

/ ( )] *u TT wN N* . The input estimation error is

2 2 ( ) *u uH*

*ei ei* , in which

1/ [( ) ˆ *u T w <sup>N</sup>* , …,

smoothed (as opposed to filtered) quantities results in smaller approximate CRLBs, which suggests improved parameter estimation accuracy.

Suppose that the system having the realisation (23) – (24) is non-minimum phase and *D* is of full rank. Under these conditions <sup>1</sup> exists and the minimum-variance smoother (described in Chapter 7) may be employed to produce input estimates. Assume that an estimate ˆ ( ) *<sup>u</sup> Q* = diag( () 2 1, ( ) ˆ *<sup>u</sup> <sup>w</sup>* , () 2 2, ( ) ˆ *<sup>u</sup> <sup>w</sup>* , …, () 2 , ( ) ˆ *<sup>u</sup> n w* ) of *Q* is are available at iteration *u*. The smoothed input estimates, ( ) / ˆ *<sup>u</sup> wk N* , are calculated from

$$
\begin{bmatrix} \mathbf{x}\_{k \times 1/k}^{(u)} \\ \mathbf{a}\_{k}^{(u)} \end{bmatrix} = \begin{bmatrix} A\_{k} - \mathbf{K}\_{k}^{(u)} \mathbf{C}\_{k} & \mathbf{K}\_{k}^{(u)} \\ - (\boldsymbol{\Omega}\_{k}^{(u)})^{-1/2} \mathbf{C}\_{k} & (\boldsymbol{\Omega}\_{k}^{(u)})^{-1/2} \end{bmatrix} \begin{bmatrix} \mathbf{x}\_{k' \times k}^{(u)} \\ \mathbf{z}\_{k} \end{bmatrix} \tag{65}
$$

$$
\begin{bmatrix}
\boldsymbol{\xi}\_{k-1}^{(u)} \\
\boldsymbol{\nu}\_{k-1}^{(u)} \\
\boldsymbol{\nu}\_{k-1/N}^{(u)}
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{A}\_{k}^{T} - \mathbf{C}\_{k}^{T} (\mathbf{K}\_{k}^{(u)})^{T} & \mathbf{0} & \mathbf{C}\_{k}^{T} (\boldsymbol{\Omega}\_{k}^{(u)})^{-1/2} \\
\mathbf{C}\_{k}^{T} (\mathbf{K}\_{k}^{(u)})^{T} & \mathbf{A}\_{k}^{T} & -\mathbf{C}\_{k}^{T} (\boldsymbol{\Omega}\_{k}^{(u)})^{-1/2} \\
\end{bmatrix} \begin{bmatrix}
\boldsymbol{\xi}\_{k}^{(u)} \\
\boldsymbol{\nu}\_{k}^{(u)} \\
\boldsymbol{\nu}\_{k}^{(u)}
\end{bmatrix},\tag{66}
$$

where ( ) *<sup>u</sup> Kk* = ( ) / 1 ( *u T AP C k kk k* + <sup>ˆ</sup> () () 1 )( ) *uT u BQ D kk k k* , ( ) *<sup>u</sup> k* = ( ) / 1 *u T CP C k kk k* + <sup>ˆ</sup> ( ) *u T DQ D kk k* + *Rk* and ( ) / 1 *<sup>u</sup> Pk k* evolves from the Riccati difference equation ( ) 1/ *<sup>u</sup> Pk k* = ( ) / 1 *u T Ak kk k P A* − ( ) / 1 ( *u T Ak kk k P C* + ( ) ( ) / 1 <sup>ˆ</sup> )( *uT u T BQ D C P C k k k k kk k* + <sup>ˆ</sup> ( ) *u T DQ D kk k* + 1 () / 1 ) ( *u T R CP A k k kk k* + ( ) ) *u T DQ B kk k* + <sup>ˆ</sup> ( ) *u T BQ B kk k* . A smoothing EM algorithm for iteratively re-estimating ˆ ( ) *<sup>u</sup> Q* is described below.

*Procedure 4.* Suppose that an initial estimate ˆ (1) *Q* = diag( (1) 2 1, ( ) ˆ *<sup>w</sup>* , (1) 2 2, ( ) ˆ *<sup>w</sup>* , …, (1) 2 , ( ) ˆ *n w* ) is available. Then subsequent estimates ˆ ( ) *<sup>u</sup> Q* , *u* > 1, are calculated by repeating the following two steps.


#### **8.4.1.2Properties**

In the following it is shown that the variance estimates arising from the above procedure result in monotonic error covariances. The additional term within the design Riccati difference equation (57) that accounts for the presence of parameter error is now given by ( ) <sup>ˆ</sup> ( ) ( ) *<sup>u</sup> <sup>u</sup> <sup>T</sup> S BQ QB <sup>k</sup>* . Let <sup>ˆ</sup> ( ) *<sup>u</sup>* denote an approximate spectral factor arising in the design of a

<sup>&</sup>quot;Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them down people's throats." *Howard Hathaway Aiken*

Smoothing, Filtering and Prediction -

(65)

(66)

/ 1 *<sup>u</sup> Pk k*

/ 1 ( *u T Ak kk k P C* +

, ( ) ˆ*n w* ) is

) using ( )

/ ˆ *<sup>u</sup> wk N*

<sup>198</sup> Estimating the Past, Present and Future

smoothed (as opposed to filtered) quantities results in smaller approximate CRLBs, which

(described in Chapter 7) may be employed to produce input estimates. Assume that an

, ( ) ˆ *<sup>u</sup>* 

*<sup>w</sup>* , …, () 2

/ ˆ *<sup>u</sup> wk N* , are calculated from

( ) ( ) () () 1/ / 1 ( ) ( ) 1/ 2 ( ) 1/ 2 () () *u u u u k k k kk k k k u u u*

*k k k k k x A KC K x*

( ) ( ) ( ) 1/ 2 ( )

*u T T uT T u u k k kk k k k u T uT T T u u k k k k k k k <sup>u</sup> <sup>u</sup> u T uT u T u T u k N <sup>k</sup> kkk kk k k k*

() 0 () ( ) ( )

> / 1 ) ( *u T R CP A k k kk k*

available. Then subsequent estimates ˆ ( ) *<sup>u</sup> Q* , *u* > 1, are calculated by repeating the following

*<sup>w</sup>* , …, () 2

In the following it is shown that the variance estimates arising from the above procedure result in monotonic error covariances. The additional term within the design Riccati difference equation (57) that accounts for the presence of parameter error is now given by ( ) <sup>ˆ</sup> ( ) ( ) *<sup>u</sup> <sup>u</sup> <sup>T</sup> S BQ QB <sup>k</sup>* . Let <sup>ˆ</sup> ( ) *<sup>u</sup>* denote an approximate spectral factor arising in the design of a

"Don't worry about people stealing your ideas. If your ideas are any good, you'll have to ram them

, ( ) ˆ *<sup>u</sup>* 

1, ( ) ˆ *<sup>u</sup> w*

 , ( 1) 2 2, ( ) ˆ *<sup>u</sup> w*

, ( ) *<sup>u</sup> k* = ( )

( ) ( ) ( ) 1/ 2 ( )

( ) ( ) () () ( ) ( ) ( ) 1/ 2 1/

ˆ ˆ ˆ ˆ ( ) ( )

 

*w QDK QB QD*

EM algorithm for iteratively re-estimating ˆ ( ) *<sup>u</sup> Q* is described below.

*Procedure 4.* Suppose that an initial estimate ˆ (1) *Q* = diag( (1) 2

 *<sup>w</sup>* , () 2 2, ( ) ˆ *<sup>u</sup>* 

from Step 1 instead of *wk* within the MLE formula (46).

/ ˆ *<sup>u</sup> wk N* .

1, ( ) ˆ *<sup>u</sup>* 

Step 2. Calculate the elements of ˆ ( 1) *<sup>u</sup> Q* = diag( ( 1) 2

/ 1 ( *u T AP C k kk k* + <sup>ˆ</sup> () () 1 )( ) *uT u BQ D kk k k*

evolves from the Riccati difference equation ( )

<sup>ˆ</sup> )( *uT u T BQ D C P C k k k k kk k* + <sup>ˆ</sup> ( ) *u T DQ D kk k* + 1 ()

smoothed input estimates ( )

down people's throats." *Howard Hathaway Aiken*

*A CK C CK A C*

 *C z* 

having the realisation (23) – (24) is non-minimum phase and *D*

,

/ 1

*<sup>u</sup> Pk k* = ( )

1, ( ) ˆ

1/

,

*u T CP C k kk k* + <sup>ˆ</sup> ( ) *u T DQ D kk k* + *Rk* and ( )

*u T Ak kk k P A* − ( )

*<sup>w</sup>* , …, (1) 2

/ 1

 *<sup>w</sup>* , (1) 2 2, ( ) ˆ

*n w* )) within (65) − (66) to calculate

 , …, ( 1) 2 , ( ) ˆ *<sup>u</sup> n w*

+ ( ) ) *u T DQ B kk k* + <sup>ˆ</sup> ( ) *u T BQ B kk k* . A smoothing

exists and the minimum-variance smoother

*n w* ) of *Q* is are available at iteration *u*. The

suggests improved parameter estimation accuracy.

1, ( ) ˆ *<sup>u</sup>* 

 *<sup>w</sup>* , () 2 2, ( ) ˆ *<sup>u</sup>* 

is of full rank. Under these conditions <sup>1</sup>

Suppose that the system

estimate ˆ ( ) *<sup>u</sup> Q* = diag( () 2

smoothed input estimates, ( )

1

where ( ) *<sup>u</sup> Kk* = ( )

( ) ( )

two steps.

**8.4.1.2Properties** 

1

/ 1

Step 1. Use ˆ ( ) *<sup>u</sup> Q* = diag( () 2

smoother using ( ) / 1 *<sup>u</sup> Pk k* and ( ) *<sup>u</sup> Kk* . Employing the notation and approach of Chapter 7, it is straightforward to show that

$$
\hat{\Delta}^{(u)} (\hat{\Delta}^{(u)})^H = \Delta \Delta^H + \mathbb{C}\_k \mathcal{G}\_0 \left( P\_{k/k - 1}^{(u)} - P\_{k \ast 1/k}^{(u)} + S^{(u)} \right) \mathcal{G}\_0^H \mathbb{C}\_k^T \,. \tag{67}
$$

Define the stacked vectors *v* = <sup>1</sup> [ *<sup>T</sup> v* , …, ] *T T <sup>K</sup> v* , *w* = <sup>1</sup> [ *<sup>T</sup> w* , …, ] *T T wN* , ( ) ˆ *<sup>u</sup> w* = ( ) 1/ [( ) ˆ *u T w <sup>N</sup>* , …, ( ) / ( )] ˆ *u TT wN N* and ( ) *<sup>u</sup> w* = *w* – ( ) ˆ *<sup>u</sup> w* = ( ) 1/ [( ) *u T w <sup>N</sup>* , …, ( ) / ( )] *u TT wN N* . The input estimation error is generated by () () *u u <sup>v</sup> w w*  **�** , where () () ( ) *u uH*  *ei ei =* () () 1 1 ( ) *u uH*  *ei ei +* () () 2 2 ( ) *u uH* *ei ei* , in which

$$\mathcal{R}\_{\text{tr}2}^{(u)} = \mathbb{Q} \mathcal{G}^H \Big( (\hat{\Delta}^{(u)} (\hat{\Delta}^{(u)})^H)^{-1} - (\Delta \Delta^H)^{-1} \Big) \Delta \,\,\,\,\tag{68}$$

and () () <sup>1</sup> 1 1 <sup>2</sup> ( ) ( ) *u uH H H H ei ei Q Q Q*  . It is shown in the lemma below that the sequence () () <sup>2</sup> ( ) *u uT w w* <sup>=</sup> () () <sup>2</sup> ( ) *u uH*  *ei ei* is monotonically non-increasing or monotonically non-decreasing, depending on the initial conditions.

*Lemma 9: In respect of Procedure 4 for estimating Q, suppose the following:* 


*Then:* 


$$\text{(iii)}\quad \left\|\mathsf{Z}\_{\acute{a}}^{(u+1)}(\mathsf{Z}\_{\acute{a}}^{(u+1)})^{\mathbb{H}}\right\|\_{2} \leq \left\|\mathsf{Z}\_{\acute{a}}^{(u)}(\mathsf{Z}\_{\acute{a}}^{(u)})^{\mathbb{H}}\right\|\_{2} \text{ (or } \left\|\mathsf{Z}\_{\acute{a}}^{(u)}(\mathsf{Z}\_{\acute{a}}^{(u)})^{\mathbb{H}}\right\|\_{2} \leq \left\|\mathsf{Z}\_{\acute{a}}^{(u+1)}(\mathsf{Z}\_{\acute{a}}^{(u+1)})^{\mathbb{H}}\right\|\_{2}) \text{ for } u \geq 1. \text{ (5)}$$

*Proof: (i) and (ii) This follows from S(u+1) ≤ S(u) within condition (iii) of Theorem 2 of Chapter 8. Since* () () 1 1 ( ) *u uH*  *ei ei is common to* () () ( ) *u uH*  *ei ei and* ( 1) ( 1) ( ) *u uH ei ei* *, it suffices to show that* 

$$\left\| \mathcal{R}\_{\boldsymbol{\alpha}2}^{(\boldsymbol{u}+\boldsymbol{l})} (\mathcal{R}\_{\boldsymbol{\alpha}2}^{(\boldsymbol{u}+\boldsymbol{l})})^{\mathbb{H}} \right\|\_{2} \leq \left\| \mathcal{R}\_{\boldsymbol{\alpha}2}^{(\boldsymbol{u})} (\mathcal{R}\_{\boldsymbol{\alpha}2}^{(\boldsymbol{u})})^{\mathbb{H}} \right\|\_{2}.\tag{69}$$

*Substituting (67) into (68) yields* 

$$\mathcal{R}\_{\mathbb{R}2}^{(u)} = \mathbb{Q}\mathcal{G}^{\mathbb{H}} \left( \Delta \boldsymbol{\Delta}^{\boldsymbol{H}} + \mathbb{C}\_{\boldsymbol{k}} \mathcal{G}\_{0} \left( P\_{\boldsymbol{k}/k-1}^{(u)} - P\_{\boldsymbol{k}+1/k}^{(u)} + \boldsymbol{S}^{(u)} \right) \mathcal{G}\_{0}^{\boldsymbol{H}} \boldsymbol{C}\_{\boldsymbol{k}}^{\top} \right)^{-1} - \left( \Delta \boldsymbol{\Delta}^{\boldsymbol{H}} \right)^{-1} \right) \boldsymbol{\Delta} . \tag{70}$$

<sup>&</sup>quot;We have always been shameless about stealing great ideas." *Steven Paul Jobs*

**8.4.2State Matrix Estimation** 

/ ˆ *u*

ˆ ( ) *<sup>u</sup> A* to obtain ( )

estimates ( 1) , ˆ *u*

estimation error ( ) *<sup>u</sup> <sup>x</sup>* , that is, () () *u u <sup>v</sup>*

*n.* 

**8.4.2.2Properties**  Denote *x* = <sup>1</sup> [ *<sup>T</sup> x* , …, ] *T T*

> ˆ ( 1) | ( )| *<sup>u</sup>*

(2) *P*1/0 ).

*Einstein*

*Then* ( 1) ( 1)

<sup>2</sup> ( ) *u uH ei ei* 

≤ () ()

( ) / ( )] *u TT N N x* . Let

Step 2. Copy ˆ ( ) *<sup>u</sup> A* to ˆ ( 1) *<sup>u</sup> A* . Use ( )

/ ˆ *u k N x* .

Smoothed state estimates are obtained from the smoothed inputs via

( ) ( ) ( ) 1/ / / ˆ ˆˆ *<sup>u</sup> <sup>u</sup> <sup>u</sup>*

*Procedure 5.* Assume that there exists an initial estimate ˆ (1) *A* of *A* such that ˆ (1) | ( )|

/ ˆ *u*

1/ [( ) ˆ *u T*

 **�**

*Lemma 12: In respect of Procedure 5 for estimating A and x, suppose the following:* 

matrix iterates result in a monotonic sequence of state error covariances.

*<sup>i</sup> A < 1, the pair (A, C) is observable and D is of full rank;* 

<sup>2</sup> ( ) *u uH*

*ei ei (or* () ()

"You do not really understand something unless you can explain it to your grandmother." *Albert* 

*<sup>N</sup> x* , …, ( )

*w* 

*(i) the system (23) – (24) is non-minimum phase, in which B, C, D, Q, R are known,* 

*(ii) there exist solutions* (1) *P*1/0 , (2) *P*1/0 *of (57) for AAT* ≤ (2) (2) ( )*<sup>T</sup> A A* ≤ (1) (1) ( )*<sup>T</sup> A A satisfying* (2) *P*1/0 ≤ (1) *P*1/0 *(or the solutions* (1) *P*1/0 , (2) *P*1/0 *of (31) for* (1) (1) ( )*<sup>T</sup> A A* ≤ (2) (2) ( )*<sup>T</sup> A A* ≤ *AAT satisfying* (1) *P*1/0 ≤

<sup>2</sup> ( ) *u uH*

*ei ei* ≤ ( 1) ( 1)

( ) *<sup>u</sup>* be redefined as the system that maps the inputs *<sup>v</sup>*

*i j a* , *i, j* = 1, …, *n*. Include ( 1)

*Nx* , ( ) ˆ *<sup>u</sup> x* = ( )

*x*

*k N x* are used below to iteratively re-estimate state matrix elements.

1, …, *n*. Subsequent estimates, ˆ ( ) *<sup>u</sup> A* , *u* > 1, are calculated using the following two-step EM

Step 1. Operate the minimum-variance smoother recursions (65), (66), (74) designed with

, ˆ *u*

/ ( )] ˆ *u TT*

*k N k kN k kN x Ax Bw* . (74)

*k N x* instead of *xk* within (53) to obtain candidate

*N N x* and ( ) *<sup>u</sup> x* = *x* – ( ) ˆ *<sup>u</sup> x* = ( )

. It is stated below that the estimated state

*w* 

<sup>2</sup> ( ) *u uH ei ei* 

*) for u ≥ 1*.

*i j <sup>a</sup>* within <sup>ˆ</sup> ( 1) *<sup>u</sup> <sup>A</sup>* if <sup>ˆ</sup> ( 1) | ( )| *<sup>u</sup>*

*<sup>i</sup> A* < 1, i =

*<sup>i</sup> A* < 1, *i* = 1, …,

1/ [( ) *u T <sup>N</sup> x* , …,

to smoother state

**8.4.2.1EM Algorithm** 

The resulting ( )

algorithm.

*Note for linear time-invariant systems X, Y1 ≥ Y2, that* 

$$(XX^H)^{-1} - (XX^H + Y\_1)^{-1} \ge (XX^H)^{-1} - (XX^H + Y\_2)^{-1} \,. \tag{71}$$

*Since* ( 1) ( 1) ( 1) 0 / 1 1/ <sup>0</sup> <sup>2</sup> ( ) *u u uH PPS kk k k ≤* () () () 0 / 1 1/ <sup>0</sup> <sup>2</sup> ( ) *u u uH PPS kk k k , (69) follows from (70) and (71). �* 

As is the case for the filtering EM algorithm, the process noise variance estimates asymptotically approach the exact values when the SNR is sufficiently high.

*Lemma 10: Under the conditions of Lemma 9,* 

$$\lim\_{Q^{-1}\to 0, R\to 0, u\to\ast\ast} \hat{\mathcal{Q}}^{(u)} = \mathcal{Q} \,. \tag{72}$$

*Proof: By inspection of the input estimator, IE =* <sup>1</sup> ( ) *H H Q =* ( *<sup>H</sup> <sup>H</sup> Q Q*  *+* <sup>1</sup> *R*) *, it follows that* <sup>1</sup> 0, 0, lim *IE Q Ru =* <sup>1</sup>  *and therefore* <sup>1</sup> ( ) / 0, 0, lim ˆ *<sup>u</sup> k N Q Ru <sup>w</sup> = wk, which implies (72), since the MLE (46) is unbiased for large N. �* 

It is observed anecdotally that the variance estimates produced by the above smoothing EM algorithm are more accurate than those from the corresponding filtering procedure. This is consistent with the following comparison of approximate CRLBs.

#### *Lemma 11 [26]:*

$$-\left(\frac{\left\|\left<\boldsymbol{\hat{o}}^{2}\log f\left(\boldsymbol{\sigma}\_{l,w}^{2}\mid\hat{\boldsymbol{x}}\_{k/N}\right)\right>}{\left(\boldsymbol{\hat{o}}\boldsymbol{\sigma}\_{l,w}^{2}\right)^{2}}\right)^{-1}\leqslant-\left(\frac{\left\|\left<\boldsymbol{\hat{o}}^{2}\log f\left(\boldsymbol{\sigma}\_{l,w}^{2}\mid\hat{\boldsymbol{x}}\_{k/k}\right)\right>}{\left(\boldsymbol{\hat{o}}\boldsymbol{\sigma}\_{l,w}^{2}\right)^{2}}\right)^{-1}\.\tag{73}$$

*Proof: The vector state elements within (23) can be written in terms of smoothed state estimates, i k*, 1 *x =* / ˆ *Ai kN x + wi k*, *= Ai k x + wi k*, *– AikN*/ *x , where k N*/ *x = xk –* / ˆ *k N x . From the approach of Example 8, the second partial derivative of the corresponding approximate log-likelihood function with respect to the process noise variance is* 

$$\frac{\left(\widehat{\boldsymbol{\sigma}}^{2}\log f(\boldsymbol{\sigma}\_{l,w}^{2}|\,\widehat{\boldsymbol{x}}\_{k/N})\right)}{\left(\widehat{\boldsymbol{\sigma}}\boldsymbol{\sigma}\_{l,w}^{2}\right)^{2}} = -\frac{N}{2}(\boldsymbol{\sigma}\_{l,w}^{2} + A\_{i}E\{\widetilde{\boldsymbol{x}}\_{k/N}\widetilde{\boldsymbol{x}}\_{k/N}^{T}\}\boldsymbol{A}\_{i}^{T})^{-2}\boldsymbol{\Lambda}$$

*Similarly, the use of filtered state estimates leads to* 

$$\frac{\left(\widehat{\boldsymbol{\sigma}}^{2}\log f(\boldsymbol{\sigma}\_{i,w}^{2}|\,\widehat{\boldsymbol{\mathfrak}}\_{k/k})\right)}{\left(\widehat{\boldsymbol{\sigma}}\boldsymbol{\sigma}\_{i,w}^{2}\right)^{2}} = -\frac{N}{2} (\boldsymbol{\sigma}\_{i,w}^{2} + A\_{i}\boldsymbol{E}\{\widetilde{\boldsymbol{\mathfrak}}\_{k/k}\widetilde{\boldsymbol{\mathfrak}}\_{k/k}^{T}\}\boldsymbol{A}\_{i}^{T})^{-2}\boldsymbol{\dots}$$

*The minimum-variance smoother minimizes both the causal part and the non-causal part of the estimation error, whereas the Kalman filter only minimizes the causal part. Therefore,* / / { } *<sup>T</sup> Ex x kN kN <*  / / { } *<sup>T</sup> Ex x kk kk . Thus, the claim (73) follows. �*

<sup>&</sup>quot;The power of an idea can be measured by the degree of resistance it attracts." *David Yoho*

Smoothing, Filtering and Prediction -

*, (69) follows from (70) and* 

 *+* <sup>1</sup> *R*)

 *= wk, which implies (72),* 

. (73)

*, it* 

. (72)

  *=* ( *<sup>H</sup> <sup>H</sup> Q Q* 

( )

/ 0, 0, lim ˆ *<sup>u</sup> k N Q Ru <sup>w</sup>*

*x + wi k*, *– AikN*/ *x , where k N*/ *x = xk –* / ˆ *k N x . From the approach of* 

*iw i k N k N i*

*iw i k k k k i*

<sup>200</sup> Estimating the Past, Present and Future

1 1 1 1

*≤* () () ()

( )

 *and therefore* <sup>1</sup>

<sup>1</sup> <sup>1</sup> <sup>2</sup> <sup>2</sup> <sup>2</sup> <sup>2</sup> , / , / 2 2 2 2 , , log ( | ) log ( | ) ˆ ˆ ( ) ( )

*f x*

*iw k N iw k k i w i w*

*Proof: The vector state elements within (23) can be written in terms of smoothed state estimates,* 

*Example 8, the second partial derivative of the corresponding approximate log-likelihood function* 

log ( | ) <sup>ˆ</sup> ( { }) () 2 *iw k N T T*

log ( | ) <sup>ˆ</sup> ( { }) () 2 *iw k k T T*

*The minimum-variance smoother minimizes both the causal part and the non-causal part of the estimation error, whereas the Kalman filter only minimizes the causal part. Therefore,* / / { } *<sup>T</sup> Ex x kN kN <*  / / { } *<sup>T</sup> Ex x kk kk . Thus, the claim (73) follows. �*

"The power of an idea can be measured by the degree of resistance it attracts." *David Yoho*

 *.*

*f x <sup>N</sup> AE x x A*

 *.*

*f x <sup>N</sup> AE x x A*

*since the MLE (46) is unbiased for large N. �*  It is observed anecdotally that the variance estimates produced by the above smoothing EM algorithm are more accurate than those from the corresponding filtering procedure. This is

*Q Q*

*(71). �*  As is the case for the filtering EM algorithm, the process noise variance estimates

*PPS kk k k*

asymptotically approach the exact values when the SNR is sufficiently high.

0, 0, ˆ lim *<sup>u</sup>*

*Q Ru*

<sup>1</sup> <sup>2</sup> ( )( )( )( ) *<sup>H</sup> <sup>H</sup> <sup>H</sup> <sup>H</sup> XX XX Y XX XX Y* . (71)

0 / 1 1/ <sup>0</sup> <sup>2</sup> ( ) *u u uH*

*IE =* <sup>1</sup> ( ) *H H Q* 

, / 2 2 2 2 , / /

, / 2 2 2 2 , / /

*Note for linear time-invariant systems X, Y1 ≥ Y2, that* 

1

 *=* <sup>1</sup> 

consistent with the following comparison of approximate CRLBs.

*f x* 

2 2

*Similarly, the use of filtered state estimates leads to* 

,

2 2

*i w*

,

*i w*

0 / 1 1/ <sup>0</sup> <sup>2</sup> ( ) *u u uH PPS kk k k* 

*Lemma 10: Under the conditions of Lemma 9,* 

*Proof: By inspection of the input estimator,* 

lim *IE Q Ru*

*x + wi k*, *= Ai k*

*with respect to the process noise variance is* 

*Since* ( 1) ( 1) ( 1)

*follows that* <sup>1</sup> 0, 0,

*Lemma 11 [26]:* 

*i k*, 1 *x =* / ˆ *Ai kN*

#### **8.4.2State Matrix Estimation**

#### **8.4.2.1EM Algorithm**

Smoothed state estimates are obtained from the smoothed inputs via

$$
\hat{\mathbf{x}}\_{k\*1/N}^{(u)} = A\_k \hat{\mathbf{x}}\_{k/N}^{(u)} + B\_k \hat{w}\_{k/N}^{(u)}.\tag{74}
$$

The resulting ( ) / ˆ *u k N x* are used below to iteratively re-estimate state matrix elements.

*Procedure 5.* Assume that there exists an initial estimate ˆ (1) *A* of *A* such that ˆ (1) | ( )| *<sup>i</sup> A* < 1, i = 1, …, *n*. Subsequent estimates, ˆ ( ) *<sup>u</sup> A* , *u* > 1, are calculated using the following two-step EM


#### **8.4.2.2Properties**

Denote *x* = <sup>1</sup> [ *<sup>T</sup> x* , …, ] *T T Nx* , ( ) ˆ *<sup>u</sup> x* = ( ) 1/ [( ) ˆ *u T <sup>N</sup> x* , …, ( ) / ( )] ˆ *u TT N N x* and ( ) *<sup>u</sup> x* = *x* – ( ) ˆ *<sup>u</sup> x* = ( ) 1/ [( ) *u T <sup>N</sup> x* , …, ( ) / ( )] *u TT N N x* . Let ( ) *<sup>u</sup>* be redefined as the system that maps the inputs *<sup>v</sup> w* to smoother state estimation error ( ) *<sup>u</sup> <sup>x</sup>* , that is, () () *u u <sup>v</sup> x w*  **�** . It is stated below that the estimated state matrix iterates result in a monotonic sequence of state error covariances.

*Lemma 12: In respect of Procedure 5 for estimating A and x, suppose the following:* 


$$\text{Then } \left\| \mathbf{Z}\_{d}^{(u+1)} (\mathbf{Z}\_{d}^{(u+1)})^{H} \right\|\_{2} \leq \left\| \mathbf{Z}\_{d}^{(u)} (\mathbf{Z}\_{d}^{(u)})^{H} \right\|\_{2} \text{ (or } \left\| \mathbf{Z}\_{d}^{(u)} (\mathbf{Z}\_{d}^{(u)})^{H} \right\|\_{2} \leq \left\| \mathbf{Z}\_{d}^{(u+1)} (\mathbf{Z}\_{d}^{(u+1)})^{H} \right\|\_{2} \text{) for } u \geq 1. \text{ (5.2)}$$

<sup>&</sup>quot;You do not really understand something unless you can explain it to your grandmother." *Albert Einstein*

*j a x* 

, ,/ 1 *n*

*i j ik N*

*likelihood function with respect to ai,j is* 

2

*a*

*a*

0.915 0.92 0.925 0.93 0.935 0.94 0.945 0.95

machine that could compute things." *Michael Dell*

ˆ*A*(*u*)

*Similarly, the use of filtered state estimates leads to*  2

*Proof: Using smoothed states within (51) yields i k*, 1 *x =* , ,/

1 <sup>ˆ</sup> *<sup>n</sup> i j ik N*

*iw i k N k N i j k*

*iw i k k k k i j k*

 *+ wi k*, *=* , ,

1 *n*

*j a x* 

*i j i k*

 *+ wi k*, *–* 

*j a x* 

*, where k N*/ *<sup>x</sup> = xk –* / <sup>ˆ</sup> *k N <sup>x</sup> . The second partial derivative of the corresponding log-*

log ( | ) <sup>ˆ</sup> ( { }) () 2

log ( | ) <sup>ˆ</sup> ( { }) () 2

, / 2 1 2 2 , / / , , 1

, / 2 1 2 2 , / / , , 1

*.* 

*<sup>N</sup> ij k k T T*

*i j k fa x <sup>N</sup> <sup>A</sup> Ex x A x*

*The result (78) follows since* / / { } *<sup>T</sup> Ex x kN kN <* / / { } *<sup>T</sup> Ex x kk kk . �*

*Example 13.:* Consider a system where *B* = *C* = *D = Q* = 1, *R* = {0.0001, 0.0002, 0.0003} are known and *A* = 0.9 but is unknown. Simulations were conducted using 30 noise realizations with *N* = 500,000. The results of the above smoothing EM algorithm and the filtering EM algorithms, initialized with ˆ (0) *A* = 1.03*A*, are respectively shown by the dotted and dashed lines within Fig. 6. The figure shows that the estimates improve with increasing *u*, which is consistent with Lemma 15. The estimates also improve with increasing SNR which illustrates Lemmas 8 and 14. It is observed anecdotally that the smoother EM algorithm outperforms the filter EM algorithm for estimation of *A* at high signal-to-noise-ratios.

<sup>0</sup> <sup>1</sup> <sup>2</sup> <sup>3</sup> <sup>4</sup> 0.91

Fig. 6. State matrix estimates calculated by the smoother EM algorithm and filter EM algorithm for

"From the time I was seven, when I purchased my first calculator, I was fascinated by the idea of a

Example 13. It can be seen that the ˆ ( ) *<sup>u</sup> A* better approach the nominal *A* at higher SNR.

*u*

filter & smoother, R=.0001

filter & smoother, R=.0003

filter & smoother, R=.0002

*.*

*<sup>N</sup> ij k N T T*

*i j k fa x <sup>N</sup> <sup>A</sup> Ex x A x*

The proof is omitted since it follows *mutatis mutandis* from that of Lemma 9. Suppose that the smoother (65), (66) designed with the estimates ( ) , ˆ *u <sup>i</sup> <sup>j</sup> a* is employed to calculate input estimates ( ) / ˆ *<sup>u</sup> wk N* . An approximate log-likelihood function for the unknown *<sup>i</sup>*, *<sup>j</sup> a* given samples of ( ) / ˆ *<sup>u</sup> wk N* is

$$\log f(a\_{i,j} \mid \hat{w}\_{i,k \mid K}^{(u)}) = -\frac{N}{2} \log 2\pi - \frac{N}{2} \log(\sigma\_{i,u}^{(u)})^2 - \frac{1}{2} (\sigma\_{i,u}^{(u)})^{-2} \sum\_{k=1}^{N} \hat{w}\_{i,k \mid K}^{(u)} (\hat{w}\_{i,k \mid N}^{(u)})^T. \tag{75}$$

Now let ( ) *<sup>u</sup>* denote the map from *<sup>v</sup> w* to the smoother input estimation error ( ) *<sup>u</sup> w* = *w* – ( ) ˆ *<sup>u</sup> w* at iteration *u*. It is argued below that the sequence of state matrix iterates maximises (75).

$$\text{Lemma 13: Under the conditions of Lemma 12, } \left\| \mathcal{R}^{(u+1)} (\mathcal{R}^{(u+1)})^\# \right\|\_2 \le \left\| \mathcal{R}^{(u)} (\mathcal{R}^{(u)})^\# \right\|\_2 \text{ for } u \ge 1.$$

The proof follows *mutatis mutandis* from that of Lemma 9. The above Lemma implies

$$E\{\tilde{w}^{(u+1)} (\tilde{w}^{(u+1)})^T\} \le E\{\tilde{w}^{(u)} (\tilde{w}^{(u)})^T\}.\tag{76}$$

It follows from ( ) ˆ *<sup>u</sup> w* = *w* − ( ) *<sup>u</sup> w* that () () { ( )} *u uT Ew w* = *E w*{ + ( ) )( *<sup>u</sup> w w* + ( ) ( )} *u T w* = () () { ( )} *u uT Ew w* + *Q*, which together with (76) implies ( 1) ( 1) { ( )} ˆ ˆ *u uT Ew w* ≤ () () { ( )} ˆ ˆ *u uT Ew w* and ( 1) , ,/ log ( | ) ˆ *<sup>u</sup> i jw i k N fa w* ≥ ( ) , ,/ log ( | ) ˆ *<sup>u</sup> <sup>i</sup> jw ik K fa w* for all *u* ≥ 1. Therefore, it is expected that the sequence of state matrix estimates will similarly vary monotonically. Next, it is stated that the state matrix estimates asymptotically approach the exact values when the SNR is sufficiently high.

*Lemma 14: Under the conditions of Lemma 9,* 

$$\lim\_{Q^{-l}\to 0,\mathbb{R}\to 0,\boldsymbol{\mu}\to\boldsymbol{\nu}}\hat{A}^{(\boldsymbol{\mu})}=A\ .\tag{77}$$

*Proof: From the proof of Lemma 10,* <sup>1</sup> ( ) / 0, 0, lim ˆ *<sup>u</sup> k N Q Ru <sup>w</sup> = wk, therefore, the states within (74) are reconstructed exactly. Thus, the claim (77) follows since the MLE (53) is unbiased. �* 

It is expected that the above EM smoothing algorithm offers improved state matrix estimation accuracy.

*Lemma 15:* 

$$-\left(\frac{\left\|\hat{\sigma}^2 \log f(a\_{i,j} \mid \hat{\mathbf{x}}\_{k/N})\right\|}{\left(\hat{\sigma}a\_{i,j}\right)^2}\right)^{-1} \le -\left(\frac{\left\|\hat{\sigma}^2 \log f(a\_{i,j} \mid \hat{\mathbf{x}}\_{k/k})\right\|}{\left(\hat{\sigma}a\_{i,j}\right)^2}\right)^{-1}.\tag{78}$$

<sup>&</sup>quot;The test of a first-rate intelligence is the ability to hold two opposed ideas in mind at the same time and still retain the ability to function." *Francis Scott Key Fitzgerald*

Smoothing, Filtering and Prediction -

*<sup>i</sup> <sup>j</sup> a* is employed to calculate input

<sup>2</sup> ( )

. (77)

 *= wk, therefore, the states within (74) are* 

. (78)

*u uH for u ≥ 1.* 

<sup>202</sup> Estimating the Past, Present and Future

The proof is omitted since it follows *mutatis mutandis* from that of Lemma 9. Suppose that

( ) () 2 () 2 () () , ,/ , , ,/ ,/

*<sup>N</sup> <sup>u</sup> <sup>u</sup> <sup>u</sup> u uT ij ik K i w i w ik K ik N*

( ) ˆ *<sup>u</sup> w* at iteration *u*. It is argued below that the sequence of state matrix iterates maximises

It follows from ( ) ˆ *<sup>u</sup> w* = *w* − ( ) *<sup>u</sup> w* that () () { ( )} *u uT Ew w* = *E w*{ + ( ) )( *<sup>u</sup> w w* + ( ) ( )} *u T w* = () () { ( )} *u uT Ew w* + *Q*, which together with (76) implies ( 1) ( 1) { ( )} ˆ ˆ *u uT Ew w* ≤ () () { ( )} ˆ ˆ *u uT Ew w* and

of state matrix estimates will similarly vary monotonically. Next, it is stated that the state matrix estimates asymptotically approach the exact values when the SNR is sufficiently

( )

/ 0, 0, lim ˆ *<sup>u</sup> k N Q Ru <sup>w</sup>*

<sup>1</sup> <sup>1</sup> <sup>2</sup> <sup>2</sup> , / , / 2 2 , , log ( | ) log ( | ) ˆ ˆ ( ) ( )

*i j i j fa x fa x a a*

*ij k N ij k k*

"The test of a first-rate intelligence is the ability to hold two opposed ideas in mind at the same time and

*reconstructed exactly. Thus, the claim (77) follows since the MLE (53) is unbiased. �*  It is expected that the above EM smoothing algorithm offers improved state matrix

( )

*<sup>A</sup> <sup>A</sup>*

The proof follows *mutatis mutandis* from that of Lemma 9. The above Lemma implies

<sup>1</sup> log ( | ) log 2 log( ) ( ) ( ) . <sup>ˆ</sup> ˆ ˆ 2 2 <sup>2</sup>

*w*  , ˆ *u*

1

*u uH ≤* () ()

( 1) ( 1) () () { ( )} { ( )} *u uT u uT Ew w Ew w* . (76)

*<sup>i</sup> jw ik K fa w* for all *u* ≥ 1. Therefore, it is expected that the sequence

to the smoother input estimation error ( ) *<sup>u</sup> w* = *w* –

 *w w* (75)

*k*

<sup>2</sup> ( )

/ ˆ *<sup>u</sup> wk N* . An approximate log-likelihood function for the unknown *<sup>i</sup>*, *<sup>j</sup> a* given

the smoother (65), (66) designed with the estimates ( )

*N N fa w*

( ) *<sup>u</sup>* denote the map from *<sup>v</sup>*

*Lemma 13: Under the conditions of Lemma 12,* ( 1) ( 1)

, ,/ log ( | ) ˆ *<sup>u</sup>*

1

still retain the ability to function." *Francis Scott Key Fitzgerald*

0, 0, ˆ lim *<sup>u</sup>*

*Q Ru*

estimates ( )

samples of ( )

( 1)

*i jw i k N fa w* ≥ ( )

*Lemma 14: Under the conditions of Lemma 9,* 

*Proof: From the proof of Lemma 10,* <sup>1</sup>

, ,/ log ( | ) ˆ *<sup>u</sup>*

estimation accuracy.

*Lemma 15:* 

high.

Now let

(75).

/ ˆ *<sup>u</sup> wk N* is

*Proof: Using smoothed states within (51) yields i k*, 1 *x =* , ,/ 1 <sup>ˆ</sup> *<sup>n</sup> i j ik N j a x + wi k*, *=* , , 1 *n i j i k j a x + wi k*, *–* 

, ,/ 1 *n i j ik N j a x , where k N*/ *<sup>x</sup> = xk –* / <sup>ˆ</sup> *k N <sup>x</sup> . The second partial derivative of the corresponding log-*

*likelihood function with respect to ai,j is* 

$$\frac{\left(\boldsymbol{\partial}^{\boldsymbol{z}}\log f(\boldsymbol{a}\_{i,j}|\boldsymbol{\hat{X}}\_{k\wedge N})\right)}{\left(\boldsymbol{\partial}\boldsymbol{a}\_{i,j}\right)^{2}} = -\frac{N}{2}(\boldsymbol{\sigma}\_{i,\boldsymbol{w}}^{2} + A\_{i}E\{\tilde{\boldsymbol{x}}\_{k\wedge N}\tilde{\boldsymbol{x}}\_{k\wedge N}^{T}\}\boldsymbol{A}\_{i}^{T})^{-1}\sum\_{k=1}^{N}\boldsymbol{x}\_{j,k}^{2}\dots$$

*Similarly, the use of filtered state estimates leads to* 

$$\frac{\left(\hat{\boldsymbol{\sigma}}^{2}\log f(\boldsymbol{a}\_{i,j} \mid \hat{\boldsymbol{\mathbf{x}}}\_{k\mid k})\right)}{\left(\hat{\boldsymbol{\sigma}}\boldsymbol{a}\_{i,j}\right)^{2}} = -\frac{N}{2} (\boldsymbol{\sigma}\_{i,w}^{2} + A\_{i}E\{\tilde{\boldsymbol{\mathbf{x}}}\_{k\mid k}\tilde{\boldsymbol{\mathbf{x}}}\_{k\mid k}^{T}\}\boldsymbol{A}\_{i}^{\top})^{-1}\sum\_{k=1}^{N} \boldsymbol{\mathfrak{x}}\_{j,k}^{2}\dots$$

*The result (78) follows since* / / { } *<sup>T</sup> Ex x kN kN <* / / { } *<sup>T</sup> Ex x kk kk . �*

*Example 13.:* Consider a system where *B* = *C* = *D = Q* = 1, *R* = {0.0001, 0.0002, 0.0003} are known and *A* = 0.9 but is unknown. Simulations were conducted using 30 noise realizations with *N* = 500,000. The results of the above smoothing EM algorithm and the filtering EM algorithms, initialized with ˆ (0) *A* = 1.03*A*, are respectively shown by the dotted and dashed lines within Fig. 6. The figure shows that the estimates improve with increasing *u*, which is consistent with Lemma 15. The estimates also improve with increasing SNR which illustrates Lemmas 8 and 14. It is observed anecdotally that the smoother EM algorithm outperforms the filter EM algorithm for estimation of *A* at high signal-to-noise-ratios.

Fig. 6. State matrix estimates calculated by the smoother EM algorithm and filter EM algorithm for Example 13. It can be seen that the ˆ ( ) *<sup>u</sup> A* better approach the nominal *A* at higher SNR.

<sup>&</sup>quot;From the time I was seven, when I purchased my first calculator, I was fascinated by the idea of a machine that could compute things." *Michael Dell*

*where* ( ) / *u*

*where* ( ) / *u k k y = y –* ( )

**8.5 Conclusion** 

likelihood functions.

number of samples is large.

diminish to zero, in which case <sup>1</sup>

2 , 0, 0 lim <sup>ˆ</sup>*i v Q R* = <sup>2</sup>

then <sup>1</sup>

*Stephens*

*k N y = y –* ( )

/ ˆ *u*

/ ˆ *u*

2 2

,

*i v*

*k N y . Similarly, the use of filtered state estimates leads to* 

, ,/ 2 2 2 2 , ,/ ,/

*iv ik k ik k*

1 *n*

*j*

*i w*, and <sup>1</sup> , 0, 0

*i v*, . Thus, measurement noise estimation should only be attempted

lim <sup>ˆ</sup>*<sup>i</sup> <sup>j</sup> Q R <sup>a</sup>*

*i j ik ik*

, *ai,j*, *xi,k*, *wi,k* .

= *ai,j*. Therefore, the

*ax w*

*k k y . The claim (80) follows since* / / { } *<sup>T</sup> Ey y kN kN <* / / { } *<sup>T</sup> Ey y kk kk . �*

log ( | ) <sup>ˆ</sup> ( { }) () 2 *iv ik k T*

From the Central Limit Theorem, the mean of a large sample of independent identically distributed random variables asymptotically approaches a normal distribution. Consequently, parameter estimates are often obtained by maximising Gaussian log-

Unknown process noise variances and state matrix elements can be estimated by

Similarly, unknown measurement noise variances can be estimated by considering *i* singleoutput observations of the form *zi,k* = *yi,k* + *vi,k* , where *yi,k* + *vi,k* . The resulting MLEs are listed in Table 1 and are unbiased provided that the assumed models are correct and the

The above parameter estimates rely on the availability of complete *xi,k* and *yi,k* information. Usually, both states and parameters need to be estimated from measurements. The EM algorithm is a common technique for solving joint state and parameter estimation problems. It has been shown that the estimation sequences vary monotonically and depend on the initial conditions. However, the use of imperfect states from filters or smoothers within the MLE calculations leads to biased parameter estimates. An examination of the approximate Cramér-Rao lower bounds shows that the use of smoothed states as opposed to filtered

When the SNR is sufficiently high, the states are recovered exactly and the bias terms

process noise variance and state matrix estimation procedures described herein are only advocated when the measurement noise is negligible. Conversely, when the SNR is sufficiently low, that is, when the estimation problem is dominated by measurement noise,

when the signal is absent. If parameter estimates are desired at intermediate SNRs then the

"If automobiles had followed the same development cycle as the computer, a Rolls-Royce would today cost \$100, get a million miles per gallon, and explode once a year, killing everyone inside." *Mark* 

subspace identification techniques such as [13], [14] are worthy of consideration.

 = <sup>2</sup> 

2 , 0, 0 lim <sup>ˆ</sup>*i w Q R* 

considering *i* single-input state evolutions of the form *xi,k+1* = ,, ,

states is expected to provide improved parameter estimation accuracy.

*f y <sup>N</sup> Ey y*

 ,

#### **8.4.3 Measurement Noise Variance Estimation**

The discussion of an EM procedure for measurement noise variance estimation is presented in a summary form because it follows analogously to the algorithms described previously.

*Procedure 6.* Assume that an initial estimate ˆ (1) *R* of *R* is available. Subsequent estimates ˆ ( ) *<sup>u</sup> R* , *u* > 1, are calculated by repeating the following two-step procedure.


It can be shown using the approach of Lemma 9 that the sequence of measurement noise variance estimates are either monotonically non-increasing or non-decreasing depending on the initial conditions. When the SNR is sufficiently low, the measurement noise variance estimates converge to the actual value.

*Lemma 16: In respect of Procedure 6,* 

$$\lim\_{R^{-l}\to 0, Q\to 0, u\to \ast \circ} R^{(u)} = R \text{ .}\tag{79}$$

*Proof: By inspection of the output, OE =* ( *<sup>H</sup> <sup>H</sup>*  *Q Q +* <sup>1</sup> *R*) *, it follows that*  <sup>1</sup> 0, 0, lim *IE R Qu = 0, which together with the observation* <sup>1</sup> 0, 0, lim { }*<sup>T</sup> R Qu E zz = R implies (79), since the MLE (27) is unbiased for large N.* �

Once again, the variance estimates produced by the above procedure are expected to be more accurate than those relying on filtered estimates.

*Lemma 17:* 

$$-\left(\frac{\partial^2 \log f(\sigma\_{l,v}^2 \mid \hat{y}\_{k/N})}{\left(\hat{\sigma}\sigma\_{l,v}^2\right)^2}\right)^{-1} < -\left(\frac{\partial^2 \log f(\sigma\_{l,v}^2 \mid \hat{y}\_{k/k})}{\left(\hat{\sigma}\sigma\_{l,v}^2\right)^2}\right)^{-1}.\tag{80}$$

*Proof: The second partial derivative of the corresponding log-likelihood function with respect to the process noise variance is* 

$$\frac{\partial^2 \log f(\sigma\_{i,v}^2 \mid \hat{y}\_{i,k \wedge N})}{(\partial \sigma\_{i,v}^2)^2} = -\frac{N}{2} (\sigma\_{i,v}^2 + E\{\tilde{y}\_{i,k \wedge K} \tilde{y}\_{i,k \wedge K}^T\})^{-2} \text{ .}$$

<sup>&</sup>quot;It is unworthy of excellent men to lose hours like slaves in the labor of calculation which could be relegated to anyone else if machines were used." *Gottfried Wilhelm von Leibnitz*

Smoothing, Filtering and Prediction -

1, ( ) ˆ *<sup>u</sup> v* ,

*, it follows that* 

. (80)

 *= R implies (79),* 

<sup>204</sup> Estimating the Past, Present and Future

The discussion of an EM procedure for measurement noise variance estimation is presented in a summary form because it follows analogously to the algorithms described previously.

*Procedure 6.* Assume that an initial estimate ˆ (1) *R* of *R* is available. Subsequent estimates

Step 1. Operate the minimum-variance smoother (7.66), (7.68), (7.69) designed with ˆ ( ) *<sup>u</sup> R* to

It can be shown using the approach of Lemma 9 that the sequence of measurement noise variance estimates are either monotonically non-increasing or non-decreasing depending on the initial conditions. When the SNR is sufficiently low, the measurement noise variance

( )

*since the MLE (27) is unbiased for large N.* � Once again, the variance estimates produced by the above procedure are expected to be

*OE =* ( *<sup>H</sup> <sup>H</sup>* 

, ,/ 2 2 2 2 , ,/ ,/

*iv ik K ik K*

*Q Q +* <sup>1</sup> *R*)

> *R Qu E zz*

*R R*

<sup>1</sup> <sup>1</sup> <sup>2</sup> <sup>2</sup> <sup>2</sup> <sup>2</sup> , / , / 2 2 2 2 , , log ( | ) log ( | ) ˆ ˆ ( ) ( )

*f y*

*i v i v*

*iv k N iv k k*

*Proof: The second partial derivative of the corresponding log-likelihood function with respect to the* 

log ( | ) <sup>ˆ</sup> ( { }) ( ) <sup>2</sup> *iv ik N T*

"It is unworthy of excellent men to lose hours like slaves in the labor of calculation which could be

*f y <sup>N</sup> Ey y*

 ,

 *= 0, which together with the observation* <sup>1</sup> 0, 0,

*k N <sup>y</sup>* instead of *yk* within (27) to obtain ˆ( 1) *<sup>u</sup> <sup>R</sup>* = diag( ( 1) 2

. (79)

lim { }*<sup>T</sup>*

/ ˆ *u k N y* .

ˆ ( ) *<sup>u</sup> R* , *u* > 1, are calculated by repeating the following two-step procedure.

**8.4.3 Measurement Noise Variance Estimation** 

obtain corrected output estimates ( )

/ ˆ *u*

1

more accurate than those relying on filtered estimates.

*f y* 

2 2

,

*i v*

relegated to anyone else if machines were used." *Gottfried Wilhelm von Leibnitz*

0, 0, lim *<sup>u</sup>*

*R Qu*

Step 2. For *i* = 1, …, *p*, use ( )

 , …, ( 1) 2 , ( ) ˆ *<sup>u</sup> n v* ).

estimates converge to the actual value.

*Proof: By inspection of the output,* 

*process noise variance is* 

<sup>1</sup> 0, 0, lim *IE R Qu*

*Lemma 17:* 

*Lemma 16: In respect of Procedure 6,* 

( 1) 2 2, ( ) ˆ *<sup>u</sup> v*

*where* ( ) / *u k N y = y –* ( ) / ˆ *u k N y . Similarly, the use of filtered state estimates leads to estimates* 

$$\frac{\partial^2 \log f(\sigma\_{i,\upsilon}^2 \mid \hat{y}\_{i,k/k})}{\left(\partial \sigma\_{i,\upsilon}^2\right)^2} = -\frac{N}{2} (\sigma\_{i,\upsilon}^2 + E\{\tilde{y}\_{i,k/k} \tilde{y}\_{i,k/k}^\top\})^{-2} \mu$$

*where* ( ) / *u k k y = y –* ( ) / ˆ *u k k y . The claim (80) follows since* / / { } *<sup>T</sup> Ey y kN kN <* / / { } *<sup>T</sup> Ey y kk kk . �*

#### **8.5 Conclusion**

From the Central Limit Theorem, the mean of a large sample of independent identically distributed random variables asymptotically approaches a normal distribution. Consequently, parameter estimates are often obtained by maximising Gaussian loglikelihood functions.

Unknown process noise variances and state matrix elements can be estimated by *n*

considering *i* single-input state evolutions of the form *xi,k+1* = ,, , 1 *i j ik ik j ax w* , *ai,j*, *xi,k*, *wi,k* .

Similarly, unknown measurement noise variances can be estimated by considering *i* singleoutput observations of the form *zi,k* = *yi,k* + *vi,k* , where *yi,k* + *vi,k* . The resulting MLEs are listed in Table 1 and are unbiased provided that the assumed models are correct and the number of samples is large.

The above parameter estimates rely on the availability of complete *xi,k* and *yi,k* information. Usually, both states and parameters need to be estimated from measurements. The EM algorithm is a common technique for solving joint state and parameter estimation problems. It has been shown that the estimation sequences vary monotonically and depend on the initial conditions. However, the use of imperfect states from filters or smoothers within the MLE calculations leads to biased parameter estimates. An examination of the approximate Cramér-Rao lower bounds shows that the use of smoothed states as opposed to filtered states is expected to provide improved parameter estimation accuracy.

When the SNR is sufficiently high, the states are recovered exactly and the bias terms diminish to zero, in which case <sup>1</sup> 2 , 0, 0 lim <sup>ˆ</sup>*i w Q R* = <sup>2</sup> *i w*, and <sup>1</sup> , 0, 0 lim <sup>ˆ</sup>*<sup>i</sup> <sup>j</sup> Q R <sup>a</sup>* = *ai,j*. Therefore, the process noise variance and state matrix estimation procedures described herein are only advocated when the measurement noise is negligible. Conversely, when the SNR is sufficiently low, that is, when the estimation problem is dominated by measurement noise, then <sup>1</sup> 2 , 0, 0 lim <sup>ˆ</sup>*i v Q R* = <sup>2</sup> *i v*, . Thus, measurement noise estimation should only be attempted when the signal is absent. If parameter estimates are desired at intermediate SNRs then the subspace identification techniques such as [13], [14] are worthy of consideration.

<sup>&</sup>quot;If automobiles had followed the same development cycle as the computer, a Rolls-Royce would today cost \$100, get a million miles per gallon, and explode once a year, killing everyone inside." *Mark Stephens*

*<sup>n</sup>* .

noise *vk*.

**8.7 Glossary** 

*xk* ~ (0, <sup>2</sup> 

( ) , ˆ *u i w* , ( ) , ˆ *u* 

ˆ ( ) ( ) *<sup>u</sup>* 

*A*, given *xk* and *xk*+1.

(ii) Derive a formula for the MLE , ˆ*<sup>i</sup> <sup>j</sup> a* of *ai,j*.

Cramér Rao lower bound for an unknown *A*.

Cramér Rao lower bound for an unknown *f*0.

approximations for sine and cosine, see [2].)

MLE Maximum likelihood estimate. CRLB Cramér Rao Lower Bound

2 . *wi,k* , *vi,k* , *zi,k i*th elements of vectors *wk* , *vk* , *zk*.

*<sup>i</sup> <sup>A</sup>* The *i* eigenvalues of <sup>ˆ</sup> ( ) *<sup>u</sup> <sup>A</sup>* .

information, words, instructions." *Clinton Richard Dawkins*

*Ai* , *Ci i*th row of state-space matrices *A* and *C*.

*F*(*θ*) The Fisher information of a parameter *θ*.

*i v* Estimates of variances of *wi,k* and *vi,k* at iteration *u*.

ˆ ( ) *<sup>u</sup> A* , ˆ ( ) *<sup>u</sup> R* , ˆ ( ) *<sup>u</sup> Q* Estimates of state matrix *A*, covariances *R* and *Q* at iteration *u*.

"What lies at the heart of every living thing is not a fire, not warm breath, not a 'spark of life'. It is

SNR Signal to noise ratio.

Cramér Rao lower bound .

**Problem 3.** Consider the state evolution *<sup>k</sup>* <sup>1</sup> *k k x Ax w* , where *A n n* is unknown and *wk*

(i) Write down a Gaussian log-likelihood function for the unknown components *ai,j* of

(iii) Show that , { } ˆ *E ai <sup>j</sup>* = *ai,j*. Replace the actual states *xk* with the filtered state / ˆ *k k x*

**Problem 4.** Consider measurements of a sinusoidal signal modelled by *yk* = *A*cos(2π*fk* + φ) + *vk*, with amplitude *A* > 0, frequency 0 < *f* < 0.5, phase φ and Gaussian white measurement

(i) Assuming that φ and *f* are known, determine the Fisher information and the

(ii) Assuming that *A* and φ are known, determine the fisher information and the

(iii) Assuming that *A* and *f* are known, determine the Fisher information and the

(iv) Assuming that the vector parameter [*A*T*, <sup>T</sup> f* , φT]T is known, determine the Fisher information matrix and the Cramér Rao lower bound. (Hint: use small angle

) The random variable *xk* is normally distributed with mean μ and variance

within the obtained formula to yield an approximate MLE for *ai,j*. (iv) Obtain a high SNR asymptote for the approximate MLE.


Table 1. MLEs for process noise variance, state matrix element and measurement noise variance.

#### **8.6 Problems**

#### **Problem 1.**


**Problem 2.** Suppose that *N* samples of *xk*+1 = *Axk* + *wk* are available, where *wk* ~ (0, <sup>2</sup> *<sup>w</sup>* ), in which <sup>2</sup> *<sup>w</sup>* is an unknown parameter.


<sup>&</sup>quot;The question of whether computers can think is like the question of whether submarines can swim." *Edsger Wybe Dijkstra*

Smoothing, Filtering and Prediction -

<sup>206</sup> Estimating the Past, Present and Future

*i w*, 2 2

,

2 2

ˆ

*a*

*N*

1 1,

 

*k j ji i j N*

*N n*

, ,1 , , 1 1 <sup>1</sup> <sup>ˆ</sup> ( ) *N n i w i k ij ik k j*

 

, 1 ,, ,

*x ax x*

*i k i j i k j k*

2 , 1

*x*

, , , 1 <sup>1</sup> <sup>ˆ</sup> ( ) *N i v ik ik k z y <sup>N</sup>*

 

 

> *j k k*

> > (0, <sup>2</sup>

*<sup>w</sup>* ), obtain an

 (0, <sup>2</sup> *<sup>w</sup>* ),

*x ax*

ASSUMPTIONS MAIN RESULTS

, ) *i w*

Table 1. MLEs for process noise variance, state matrix element and measurement noise variance.

*wk*, where *an*-1, *an*-2, …, *a*0 are unknown. From the assumption *wk* ~

**Problem 2.** Suppose that *N* samples of *xk*+1 = *Axk* + *wk* are available, where *wk* ~

 *<sup>w</sup>* of <sup>2</sup> *<sup>w</sup>* .

*<sup>w</sup>* provided that *N* is large.

equation for MLEs of the unknown coefficients.

*<sup>w</sup>* is an unknown parameter.

(ii) Derive a formula for the MLE <sup>2</sup> ˆ

 *<sup>w</sup>* = <sup>2</sup> 

(iv) Find the Cramér Rao lower bound for <sup>2</sup> ˆ

a high SNR asymptote for this approximate MLE.

(i) Consider the second order difference equation *xk*+2 + *a*1*xk*+1 + *a*0*xk* = *wk*. Assuming

(ii) Consider the *n*th order autoregressive system *xk*+n + *a*n-1*xk*+n-1 + *a*n-2*xk*+n-2 + … + *a*0*xk* =

(i) Write down a Gaussian log-likelihood function for the unknown parameter, given *xk*.

 *<sup>w</sup>* . (v) Replace the actual states *<sup>k</sup> x* with filtered state / ˆ *k k x* within the MLE formula. Obtain

"The question of whether computers can think is like the question of whether submarines can swim."

*<sup>w</sup>* ), obtain an equation for the MLEs of the unknown *a*1 and *a*0.

<sup>2</sup> *wi k*, ~ (0, ) 

*xi,k+1*~ , , 1 ( , *n*

*j a x* 

*i j i k*

2

<sup>2</sup>

*i k*, ~ (0, ) *i v*, *v* 

Process noise

State matrix

Measurement noise

variance

**8.6 Problems** 

that *wk* ~

 (0, <sup>2</sup> 

(iii) Show that <sup>2</sup> { } ˆ *E*

**Problem 1.**

in which <sup>2</sup> 

*Edsger Wybe Dijkstra*

elements

variance

**Problem 3.** Consider the state evolution *<sup>k</sup>* <sup>1</sup> *k k x Ax w* , where *A n n* is unknown and *wk <sup>n</sup>* .


**Problem 4.** Consider measurements of a sinusoidal signal modelled by *yk* = *A*cos(2π*fk* + φ) + *vk*, with amplitude *A* > 0, frequency 0 < *f* < 0.5, phase φ and Gaussian white measurement noise *vk*.

(i) Assuming that φ and *f* are known, determine the Fisher information and the Cramér Rao lower bound for an unknown *A*.

(ii) Assuming that *A* and φ are known, determine the fisher information and the Cramér Rao lower bound for an unknown *f*0.

(iii) Assuming that *A* and *f* are known, determine the Fisher information and the Cramér Rao lower bound .

(iv) Assuming that the vector parameter [*A*T*, <sup>T</sup> f* , φT]T is known, determine the Fisher information matrix and the Cramér Rao lower bound. (Hint: use small angle approximations for sine and cosine, see [2].)

#### **8.7 Glossary**


<sup>&</sup>quot;What lies at the heart of every living thing is not a fire, not warm breath, not a 'spark of life'. It is information, words, instructions." *Clinton Richard Dawkins*

Limited, 2005.

– 264, 1982.

198

2000.

vol. 11,no. 1, pp. 95 – 103, Mar. 1983.

*Processing*, vol 57, no. 1, Jan. 2009.

25, Iss. 6, pp. 131 – 146, Nov. 200

*Magazine*, vol. 13, pp. 47 – 60, Nov. 1996.

*Distributions*, Wiley, Chichester and New York, 1985.

Cal., University Science Books, pp. 52 and 698, 1992.

errors during DNA replication and repair." *Tomoyuki Shibata*

Estimation", IEEE Signal Processing letters, 2012 (to appear).

[13] T. Katayama and G. Picci, "Realization of stochastic systems with exogenous inputs and

[14] T. Katayama, *Subspace Methods for System Identification*, Springer-Verlag London

[15] R. H. Shumway and D. S. Stoffer, "An approach to time series smoothing and forecasting using the EM algorithm," *Journal of Time Series Analysis*, vol. 3, no. 4, pp. 253

[16] C. F. J. Wu, "On the convergence properties of the EM algorithm," *Annals of Statistics*,

[17] M. Feder and E. Weinstein, "Parameter estimation of superimposed signals using the EM algorithm," *IEEE Transactions on Signal Processing*, vol. 36, no. 4, pp. 477 – 489, Apr.

[18] G. A. Einicke, "Optimal and Robust Noncausal Filter Formulations", *IEEE Transactions* 

[19] G. A. Einicke, J. T. Malos, D. C. Reid and D. W. Hainsworth, "Riccati Equation and EM Algorithm Convergence for Inertial Navigation Alignment", *IEEE Transactions on Signal* 

[20] G. A. Einicke, G. Falco and J. T. Malos, "EM Algorithm State Matrix Estimation for Navigation", *IEEE Signal Processing Letters*, vol. 17, no. 5, pp. 437 – 440, May 2010. [21] T. K. Moon, "The Expectation-Maximization Algorithm", *IEEE Signal Processing* 

[22] D. G. Tzikas, A. C. Likas and N. P. Galatsanos, "The Variational Approximation for Bayesian Inference: Life After the EM Algorithm", *IEEE Signal Processing Magazine*, vol.

[23] D. M. Titterington, A. F. M. Smith and U. E. Makov, *Statistical Analysis of Finite Mixture* 

[24] R. P. Savage, *Strapdown Analytics*, Strapdown Associates, vol. 2, ch. 15, pp. 15.1 – 15.142,

[25] P. K. Seidelmann, ed., *Explanatory supplement to the Astronomical Almanac*, Mill Valley,

[26] G. A. Einicke, G. Falco, M. T. Dunn and D. C. Reid, "Iterative Smoother-Based Variance

"The faithful duplication and repair exhibited by the double-stranded DNA structure would seem to be incompatible with the process of evolution. Thus, evolution has been explained by the occurrence of

*on Signal Processing,* vol. 54, no. 3, pp. 1069 - 1077, Mar. 2006.

subspace identification methods", *Automatica*, vol. 35, pp. 1635 – 1652, 1999.


#### **8.8 References**


<sup>&</sup>quot;In my lifetime, we've gone from Eisenhower to George W. Bush. We've gone from John F. Kennedy to Al Gore. If this is evolution, I believe that in twelve years, we'll be voting for plants." *Lewis Niles Black*

Smoothing, Filtering and Prediction -

*ei ei =* () ()

*ei ei +* 

1 1 ( ) *u uH*

<sup>208</sup> Estimating the Past, Present and Future

( ) *<sup>u</sup> Sk* Additive term within the design Riccati difference equation to account for the presence of modelling error at time *k* and iteration *u*.

( ) *<sup>u</sup>* A system (or map) that operates on the filtering/smoothing problem inputs

[1] L. L. Scharf, *Statistical Signal Processing: Detection, Estimation, and Time Series Analysis*,

[2] S. M. Kay, *Fundamentals of Statistical Signal Processing: Estimation Theory*, Prentice Hall,

[3] G. J. McLachlan and T. Krishnan, *The EM Algorithm and Extensions*, John Wiley & Sons,

[4] H. L. Van Trees and K. L. Bell (Editors), *Baysesian Bounds for Parameter Estimation and* 

[5] A. Van Den Bos, *Parameter Estimation for Scientists and Engineers*, John Wiley & Sons,

[6] R. K. Mehra, "On the identification of variances and adaptive Kalman filtering", *IEEE* 

[7] D. C. Rife and R. R. Boorstyn, "Single-Tone Parameter Estimation from Discrete-time Observations", *IEEE Transactions on Information Theory*, vol. 20, no. 5, pp. 591 – 598, Sep.

[8] R. P. Nayak and E. C. Foundriat, "Sequential Parameter Estimation Using Pseudoinverse", *IEEE Transactions on Automatic Control*, vol. 19, no. 1, pp. 81 – 83, Feb.

[9] P. R. Bélanger, "Estimation of Noise Covariance Matrices for a Linear Time-Varying

[10] V. Strejc, "Least Squares Parameter Estimation", *Automatica*, vol. 16, pp. 535 – 550, Sep.

[11] A. P. Dempster, N. M. Laid and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," *Journal of the Royal Statistical Society*, vol 39, no. 1, pp. 1 – 38,

[12] P. Van Overschee and B. De Moor, "A Unifying Theorem for Three Subspace System

"In my lifetime, we've gone from Eisenhower to George W. Bush. We've gone from John F. Kennedy to Al Gore. If this is evolution, I believe that in twelve years, we'll be voting for plants." *Lewis Niles Black*

*ei ei* , where () ()

Addison-Wesley Publishing Company Inc., Massachusetts, USA, 1990.

*Nonlinear Filtering/Tracking*, John Wiley & Sons, Inc., New Jersey, 2007.

*Transactions on Automatic Control*, vol. 15, pp. 175 – 184, Apr. 1970.

Stochastic Process", *Automatica*, vol. 10, pp. 267 – 275, 1974.

Identification Algorithms", *Automatica*, 1995.

convenient to make use of the factorisation () () ( ) *u uH*

*ei ei* is a lower performance bound.

2 2 ( ) *u uH*

to produce the input, state or output estimation error at iteration *u*. It is

*ei ei* includes the filter or smoother solution

*Ki,k*, *Li,k i*th row of predictor and filter gain matrices *Kk* and *Lk*.

*ai,j* Element in row *i* and column *j* of *A*. # *Ck* Moore-Penrose pseudo-inverse of *Ck*.

> () () 2 2 ( ) *u uH*

and () () 1 1 ( ) *u uH*

Englewood Cliffs, New Jersey, ch. 7, pp. 157 – 204, 1993.

**8.8 References** 

Inc., New York, 1997.

New Jersey, 2007.

1974.

1974.

1980.

1977.


<sup>&</sup>quot;The faithful duplication and repair exhibited by the double-stranded DNA structure would seem to be incompatible with the process of evolution. Thus, evolution has been explained by the occurrence of errors during DNA replication and repair." *Tomoyuki Shibata*

Chapter title

Author Name

### **Robust Prediction, Filtering and Smoothing**

#### 1 **9.1 Introduction**

Robust Prediction, Filtering and Smoothing <sup>211</sup>

The previously-discussed optimum predictor, filter and smoother solutions assume that the model parameters are correct, the noise processes are Gaussian and their associated covariances are known precisely. These solutions are optimal in a mean-square-error sense, that is they provide the best average performance. If the above assumptions are correct, then the filter's mean-square-error equals the trace of design error covariance. The underlying modelling and noise assumptions are a often convenient fiction. They do, however, serve to allow estimated performance to be weighed against implementation complexity.

In general, robustness means "the persistence of a system's characteristic behaviour under perturbations or conditions of uncertainty" [1]. In an estimation context, robust solutions refer to those that accommodate uncertainties in problem specifications. They are also known as worst-case or peak error designs. The standard predictor, filter and smoother structures are retained but a larger design error covariance is used to account for the presence of modelling error.

Designs that cater for worst cases are likely to exhibit poor average performance. Suppose that a bridge designed for average loading conditions returns an acceptable cost benefit. Then a design that is focussed on accommodating infrequent peak loads is likely to provide degraded average cost performance. Similarly, a worst-case shoe design that accommodates rarely occurring large feet would provide poor fitting performance on average. That is, robust designs tend to be conservative. In practice, a trade-off may be desired between optimum and robust designs.

The material canvassed herein is based on the H∞ filtering results from robust control. The robust control literature is vast, see [2] – [33] and the references therein. As suggested above, the H∞ solutions of interest here involve observers having gains that are obtained by solving Riccati equations. This Riccati equation solution approach relies on the Bounded Real Lemma – see the pioneering work by Vaidyanathan [2] and Petersen [3]. The Bounded Real Lemma is implicit with game theory [9] – [19]. Indeed, the continuous-time solutions presented in this section originate from the game theoretic approach of Doyle, Glover, Khargonekar, Francis Limebeer, Anderson, Khargonekar, Green, Theodore and Shaked, see [4], [13], [15], [21]. The discussed discrete-time versions stem from the results of Limebeer, Green, Walker, Yaesh, Shaked, Xie, de Souza and Wang, see [5], [11], [18], [19], [21]. In the parlance of game theory: "a statistician is trying to best estimate a linear combination of the states of a system that is driven by nature; nature is trying to cause the statistician's estimate

<sup>&</sup>quot;On a huge hill, Cragged, and steep, Truth stands, and he that will Reach her, about must, and about must go." *John Donne*

*I t* ( ) 

and noting that <sup>0</sup> ( ( )) *<sup>T</sup>*

The ∞-norm of

conditions under which

*has a solution on [0, T]. Then* 

is, 

is defined as

∞, if there exists a *γ* such that

"Information is the resolution of uncertainty." *Claude Elwood Shannon*

*, suppose that the Riccati differential equation* 

*y*() () () *t Ctxt* , (4)

(5)

() () 0 *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> V x t dt y t y t dt w t w t dt* (6)

0 2

. (8)

. (9)

, (10)

∞ are specified below. The accompanying sufficiency proof

=

. (7)

where *w*(*t*) *<sup>m</sup>* , *B*(*t*) *n m* and *C*(*t*) *<sup>p</sup> <sup>n</sup>* . Assume temporarily that { ( ) ( )} *<sup>T</sup> Ewtw*

<sup>2</sup> ( ( )) ( ) ( ) ( ) ( ) 0 *<sup>T</sup> <sup>T</sup> Vxt y tyt w twt* 

for a *γ* . Integrating (5) from *t* = 0 to *t* = *T* gives

<sup>0</sup> <sup>0</sup> <sup>0</sup> ( ( )) ( ) ( )

0

*<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup>*

2

2 <sup>2</sup> <sup>0</sup>

2 2

sup sup *w w*

<sup>0</sup> <sup>0</sup> <sup>2</sup>

<sup>2</sup> () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt Pt At A tPt C tCt PtBtB tPt*

*≤ γ for any w* 

( ) ( ) ( ) (0) (0) (0) ( ) ( )

*x T P T x T x P x y t y t dt w t w t dt*

() ()

Under the assumptions *x*(0) = 0 and *P*(*T*) = 0, the above inequality simplifies to

( ) () () ( ) () ()

*<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> y t y t y t dt w t w t w t dt*

2 0 2

2 2 2 2

The Lebesgue ∞-space is the set of systems having finite ∞-norm and is denoted by ∞. That

2

, (11)

*y w*

namely, the supremum (or maximum) ratio of the output and input 2-norms is finite. The

combines the approaches of [15], [31]. A further five proofs for this important result appear in [21].

*Lemma 1: The continuous-time Bounded Real Lemma [15], [13], [21]: In respect of the above system* 

 *2.* 

*y w w w*

. The Bounded Real Lemma [13], [15], [21], states that *w* 2 implies *y* 2 if

2

*V x t dt* = *xT*(*T*)*P*(*T*)*x*(*T*) – *xT*(0)*P*(0)*x*(0), another objective is

to be as erroneous as possible, while trying to minimize the energy it invests in driving the system" [19].

Pertinent state-space H∞ predictors, filters and smoothers are described in [4] – [19]. Some prediction, filtering and smoothing results are summarised in [13] and methods for accommodating model uncertainty are described in [14], [18], [19]. The aforementioned methods for handling model uncertainty can result in conservative designs (that depart far from optimality). This has prompted the use of linear matrix inequality solvers in [20], [23] to search for optimal solutions to model uncertainty problems.

It is explained in [15], [19], [21] that a saddle-point strategy for the games leads to robust estimators, and the resulting robust smoothing, filtering and prediction solutions are summarised below. While the solution structures remain unchanged, designers need to tweak the scalar within the underlying Riccati equations.

This chapter has two main parts. Section 9.2 describes robust continuous-time solutions and the discrete-time counterparts are presented in Section 9.3. The previously discussed techniques each rely on a trick. The optimum filters and smoothers arise by completing the square. In maximum-likelihood estimation, a function is differentiated with respect to an unknown parameter and then set to zero. The trick behind the described robust estimation techniques is the Bounded Real Lemma, which opens the discussions.

#### **9.2 Robust Continuous-time Estimation**

#### **9.2.1 Continuous-Time Bounded Real Lemma**

First, consider the unforced system

$$\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) \tag{1}$$

over a time interval *t* [0, *T*], where *A*(*t*) *n n* . For notational convenience, define the stacked vector *x* = {*x*(*t*), *t* [0, *T*]}. From Lyapunov stability theory [36], the system (1) is asymptotically stable if there exists a function *V*(*x*(*t*)) > 0 such that ( ( )) 0 *Vxt* . A possible Lyapunov function is *V*(*x*(*t*)) = () () () *<sup>T</sup> x tPtxt* , where *P*(*t*) = ( ) *<sup>T</sup> P t n n* is positive definite. To ensure *x* 2 it is required to establish that

$$\dot{V}(\mathbf{x}(t)) = \dot{\mathbf{x}}^T(t)P(t)\mathbf{x}(t) + \mathbf{x}^T(t)\dot{P}(t)\mathbf{x}(t) + \mathbf{x}^T(t)P(t)\dot{\mathbf{x}}(t)<0. \tag{2}$$

Now consider the output of a linear time varying system, *y* = *w,* having the state-space representation

$$\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)w(t) \,. \tag{3}$$

<sup>&</sup>quot;Uncertainty is one of the defining features of Science. Absolute proof only exists in mathematics. In the real world, it is impossible to prove that theories are right in every circumstance; we can only prove that they are wrong. This provisionality can cause people to lose faith in the conclusions of science, but it shouldn't. The recent history of science is not one of well-established theories being proven wrong. Rather, it is of theories being gradually refined." *New Scientist vol. 212 no. 2835*

Smoothing, Filtering and Prediction:

<sup>212</sup> Estimating the Past, Present and Future

to be as erroneous as possible, while trying to minimize the energy it invests in driving the

Pertinent state-space H∞ predictors, filters and smoothers are described in [4] – [19]. Some prediction, filtering and smoothing results are summarised in [13] and methods for accommodating model uncertainty are described in [14], [18], [19]. The aforementioned methods for handling model uncertainty can result in conservative designs (that depart far from optimality). This has prompted the use of linear matrix inequality solvers in [20], [23]

It is explained in [15], [19], [21] that a saddle-point strategy for the games leads to robust estimators, and the resulting robust smoothing, filtering and prediction solutions are summarised below. While the solution structures remain unchanged, designers need to

This chapter has two main parts. Section 9.2 describes robust continuous-time solutions and the discrete-time counterparts are presented in Section 9.3. The previously discussed techniques each rely on a trick. The optimum filters and smoothers arise by completing the square. In maximum-likelihood estimation, a function is differentiated with respect to an unknown parameter and then set to zero. The trick behind the described robust estimation

over a time interval *t* [0, *T*], where *A*(*t*) *n n* . For notational convenience, define the stacked vector *x* = {*x*(*t*), *t* [0, *T*]}. From Lyapunov stability theory [36], the system (1) is

Lyapunov function is *V*(*x*(*t*)) = () () () *<sup>T</sup> x tPtxt* , where *P*(*t*) = ( ) *<sup>T</sup> P t n n* is positive definite.

"Uncertainty is one of the defining features of Science. Absolute proof only exists in mathematics. In the real world, it is impossible to prove that theories are right in every circumstance; we can only prove that they are wrong. This provisionality can cause people to lose faith in the conclusions of science, but it shouldn't. The recent history of science is not one of well-established theories being proven wrong.

( ( )) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) 0 *<sup>T</sup> <sup>T</sup> <sup>T</sup> Vxt x tPtxt x tPtxt x tPtxt* . (2)

asymptotically stable if there exists a function *V*(*x*(*t*)) > 0 such that ( ( )) 0 *Vxt*

*xt Atxt* () () () (1)

*xt Atxt Btwt* () () () () () , (3)

. A possible

*w,* having the state-space

to search for optimal solutions to model uncertainty problems.

techniques is the Bounded Real Lemma, which opens the discussions.

tweak the scalar within the underlying Riccati equations.

**9.2 Robust Continuous-time Estimation** 

First, consider the unforced system

representation

**9.2.1 Continuous-Time Bounded Real Lemma** 

To ensure *x* 2 it is required to establish that

Now consider the output of a linear time varying system, *y* =

Rather, it is of theories being gradually refined." *New Scientist vol. 212 no. 2835*

system" [19].

$$
\hat{y}(t) = \mathbf{C}(t)\mathbf{x}(t),\tag{4}
$$

where *w*(*t*) *<sup>m</sup>* , *B*(*t*) *n m* and *C*(*t*) *<sup>p</sup> <sup>n</sup>* . Assume temporarily that { ( ) ( )} *<sup>T</sup> Ewtw* = *I t* ( ) . The Bounded Real Lemma [13], [15], [21], states that *w* 2 implies *y* 2 if

$$
\dot{V}(\mathbf{x}(t)) + \boldsymbol{y}^T(t)\boldsymbol{y}(t) - \boldsymbol{\gamma}^2 \boldsymbol{w}^T(t)\boldsymbol{w}(t) < 0 \tag{5}
$$

for a *γ* . Integrating (5) from *t* = 0 to *t* = *T* gives

$$\int\_{0}^{T} \dot{V}(\mathbf{x}(t)) \, dt + \int\_{0}^{T} y^{T}(t)y(t) \, dt - \gamma^{2} \int\_{0}^{T} w^{T}(t)w(t) \, dt < 0 \tag{6}$$

and noting that <sup>0</sup> ( ( )) *<sup>T</sup> V x t dt* = *xT*(*T*)*P*(*T*)*x*(*T*) – *xT*(0)*P*(0)*x*(0), another objective is

$$\frac{\mathbf{x}^{\top}(T)P(T)\mathbf{x}(T) - \mathbf{x}^{\top}(0)P(0)\mathbf{x}(0) + \int\_{0}^{t} y^{\top}(t)y(t) \ \mathrm{d}t}{\int\_{0}^{\tau} w^{\top}(t)w(t) \ \mathrm{d}t} \le \boldsymbol{\gamma}^{2}. \tag{7}$$

Under the assumptions *x*(0) = 0 and *P*(*T*) = 0, the above inequality simplifies to

$$\frac{\left\|\boldsymbol{y}(t)\right\|\_{2}^{2}}{\left\|\boldsymbol{w}(t)\right\|\_{2}^{2}} = \frac{\int\_{0}^{\boldsymbol{\tau}} \boldsymbol{y}^{\top}(t)\boldsymbol{y}^{\top}(t)dt}{\int\_{0}^{\boldsymbol{\tau}} \boldsymbol{w}^{\top}(t)\boldsymbol{w}(t)\,dt} \leq \boldsymbol{\gamma}^{\boldsymbol{z}}.\tag{8}$$

The ∞-norm of is defined as

$$\left\|\mathcal{L}\right\|\_{\circ} = \frac{\left\|\mathcal{Y}\right\|\_{z}}{\left\|w\right\|\_{z}} = \frac{\left\|\mathcal{L}w\right\|\_{z}}{\left\|w\right\|\_{z}}\,. \tag{9}$$

The Lebesgue ∞-space is the set of systems having finite ∞-norm and is denoted by ∞. That is, ∞, if there exists a *γ* such that for

$$\sup\_{\|\boldsymbol{\mu}\|\_{\boldsymbol{\nu}} \neq 0} \left\| \mathcal{G} \right\|\_{\boldsymbol{\nu}} = \sup\_{\|\boldsymbol{\nu}\|\_{\boldsymbol{\nu}} \neq 0} \frac{\left\| \boldsymbol{\nu} \right\|\_{2}}{\left\| \boldsymbol{\nu} \right\|\_{2}} \leq \boldsymbol{\gamma} \,\tag{10}$$

namely, the supremum (or maximum) ratio of the output and input 2-norms is finite. The conditions under which ∞ are specified below. The accompanying sufficiency proof combines the approaches of [15], [31]. A further five proofs for this important result appear in [21].

*Lemma 1: The continuous-time Bounded Real Lemma [15], [13], [21]: In respect of the above system , suppose that the Riccati differential equation* 

$$-\dot{P}(t) = P(t)A(t) + A^\top(t)P(t) + \mathbb{C}^\top(t)\mathbb{C}(t) + \gamma^{-2}P(t)B(t)\mathbb{B}^\top(t)P(t) \tag{11}$$

*has a solution on [0, T]. Then ≤ γ for any w 2.* 

<sup>&</sup>quot;Information is the resolution of uncertainty." *Claude Elwood Shannon*

and 

2 0 

where

 ()() *<sup>T</sup> <sup>T</sup>*

and { ( ) ( )} *<sup>T</sup> Ewtv*

**9.2.2.2H∞ Solution** 

noisy measurements of

measurements,

.

2

1

It is desired to find a causal solution

*ei* = 2 1 [

*i t i t dt* < 0 for some

*E*{*w*(*t*)} = 0, { ( ) ( )} *<sup>T</sup> Ewtw*

at time *t* so that the output estimation error,

= 0.

 = *Qt t* () ( ) 

and possessing of a wish discovers the folly of the chase." *William Congreve*

Figure 1. The general filtering problem. The objective is to estimate the output of

∑

*v* 

*w y*<sup>2</sup> *z* 

is in 2. The error signal (18) is generated by a system denoted by *e* =

from

that produces estimates <sup>1</sup> *y*ˆ (|) *t t* of *y*1(*t*) from the

∑

*e* 

*<sup>T</sup> <sup>T</sup>*

*ei* , where *i* =

*e t t e t t dt* –

 = *Rt t* () ( ) 

*v w* 

> (19) (20)

<sup>2</sup> *zt y t vt* () () () , (17)

1 1 *et t* ( | ) () ( | ) *y t y*ˆ *t t* , (18)

. For convenience, it is assumed here that *w*(*t*) *<sup>m</sup>* ,

<sup>2</sup> () () () () *<sup>T</sup> Kt PtC tR t* (21)

] . Hence, the objective is to achieve <sup>0</sup> (| )(| )

*y*<sup>1</sup> *–* 

, *v*(*t*) *<sup>p</sup>* , *E*{*v*(*t*)} = 0, { ( ) ( )} *<sup>T</sup> Evtv*

A parameterisation of all solutions for the H∞ filter is developed in [21]. A minimumentropy filter arises when the contractive operator within [21] is zero and is given by

1

"Uncertainty and expectation are the joys of life. Security is an insipid thing, through the overtaking

*xt t At KtC t xt t Ktzt* ˆ( | ) ( ) ( ) ( ) ( | ) ( ) ( ), <sup>2</sup> <sup>ˆ</sup> *<sup>x</sup>*ˆ(0) 0,

<sup>1</sup> <sup>1</sup> *y*ˆ ( | ) ( | ) ( | ), *t t C t txt t* ˆ

*Proof: From (2) – (5),* 

$$\begin{split} \dot{V}(t) + \boldsymbol{y}^{\top}(t)\boldsymbol{y}(t) - \boldsymbol{\gamma}^{2} \boldsymbol{w}^{\top}(t)\boldsymbol{x}(t) \\ &= \boldsymbol{\text{x}}^{\top}(t)\boldsymbol{\text{C}}^{\top}(t)\boldsymbol{\text{C}}(t)\boldsymbol{x}(t) - \boldsymbol{\gamma}^{2} \boldsymbol{w}^{\top}(t)\boldsymbol{w}(t) + \boldsymbol{\text{x}}^{\top}(t)\dot{P}(t)\boldsymbol{x}(t) \\ &+ (A(t)\boldsymbol{x}(t) + B(t)\boldsymbol{x}(t))^{\top}P(t)\boldsymbol{x}(t) + \boldsymbol{x}^{\top}(t)P(t)(A(t)\boldsymbol{x}^{\top}(t) + B(t)\boldsymbol{x}(t)) \\ &= \boldsymbol{\gamma}^{-2} \boldsymbol{x}^{\top}(t)P(t)B(t)\boldsymbol{B}^{\top}(t)P(t)\boldsymbol{x}(t) - \boldsymbol{\gamma}^{2} \boldsymbol{w}^{\top}(t)\boldsymbol{w}(t)) + \boldsymbol{w}^{\top}(t)\boldsymbol{B}^{\top}(t)P(t)\boldsymbol{x}(t) + \boldsymbol{x}^{\top}(t)P(t)B(t)\boldsymbol{x}(t) \\ &= -\boldsymbol{\gamma}^{2} \left(\boldsymbol{w}(t) - \boldsymbol{\gamma}^{-2}\boldsymbol{B}(t)P(t)\boldsymbol{x}(t)\right)^{\top}(\boldsymbol{w}(t) - \boldsymbol{\gamma}^{-2}\boldsymbol{B}(t)P(t)\boldsymbol{x}(t)), \end{split}$$

*which implies (6) and (7). Inequality (8) is established under the assumptions x(0) = 0 and P(T) = 0. �*

In general, where { ( ) ( )} *<sup>T</sup> Ewtw* = *Qt t* () ( ) , the scaled matrix *B t*( ) = 1/ 2 *BtQ t* () () may be used in place of *B*(*t*) above. When the plant has a direct feedthrough matrix, that is,

$$\mathbf{y}(t) = \mathbf{C}(t)\mathbf{x}(t) + D(t)w(t) \,. \tag{12}$$

*D*(*t*) *<sup>p</sup> <sup>m</sup>* , the above Riccati differential equation is generalised to

$$\begin{aligned} -\dot{P}\_l(t) &= P\_l(t)(A(t) + B(t)M^{-1}(t)D^\top(t)\mathbb{C}(t)) \ &+ (A(t) + B(t)M^{-1}(t)D^\top(t)\mathbb{C}(t))^\top P\_l(t) \\ &+ \gamma^{-2}B(t)M^{-1}(t)B^\top(t) + \mathbb{C}^\top(t)(I + D(t)M^{-1}(t)D^\top(t))\mathbb{C}(t)) \end{aligned} \tag{13}$$

where *M*(*t*) = *γ*2*I – DT*(*t*)*D*(*t*) *>* 0. A proof is requested in the problems.

Criterion (8) indicates that the ratio of the system's output and input energies is bounded above by *γ*2 for any *w* 2, including worst-case *w*. Consequently, solutions satisfying (8) are often called worst-case designs.

#### **9.2.2Continuous-Time H∞ Filtering**

#### **9.2.2.1Problem Definition**

Now that the Bounded Real Lemma has been defined, the H∞ filter can be set out. The general filtering problem is depicted in Fig. 1. It is assumed that the system has the state-space realisation

$$
\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)w(t), \ \mathbf{x}(0) = 0,\tag{14}
$$

$$
\partial\_2 \mathbf{y}\_2(t) = \mathbb{C}\_2(t)\mathbf{x}(t) \,. \tag{15}
$$

Suppose that the system has the realisation (14) and

$$\mathbf{y}\_1(t) = \mathbb{C}\_1(t)\mathbf{x}(t) \,. \tag{16}$$

<sup>4 &</sup>quot;All exact science is dominated by the idea of approximation." *Earl Bertrand Arthur William*

Smoothing, Filtering and Prediction:

, the scaled matrix *B t*( ) = 1/ 2 *BtQ t* () () may be

has a direct feedthrough matrix, that is,

*y*() () () () () *t Ctxt Dtwt* , (12)

<sup>1</sup> ( ( ) ( ) ( ) ( ) ( )) ( ) *<sup>T</sup> <sup>T</sup> At BtM tD tCt P t*

1 1 *y* () () () *t C txt* . (16)

*�*

(13)

 has the

(14) (15)

<sup>214</sup> Estimating the Past, Present and Future

Criterion (8) indicates that the ratio of the system's output and input energies is bounded above by *γ*2 for any *w* 2, including worst-case *w*. Consequently, solutions satisfying (8)

Now that the Bounded Real Lemma has been defined, the H∞ filter can be set out. The

general filtering problem is depicted in Fig. 1. It is assumed that the system

*xt Atxt Btwt* ( ) ( ) ( ) ( ) ( ), *x*(0) 0,

2 2 *y* () () () *t C txt* .

4 "All exact science is dominated by the idea of approximation." *Earl Bertrand Arthur William*

has the realisation (14) and

*which implies (6) and (7). Inequality (8) is established under the assumptions x(0) = 0 and P(T) = 0.* 

() () () () () () () () *T T <sup>T</sup> w tB tPtxt x tPtBtwt*

( ( ) ( ) ( ) ( )) ( ) ( ) *<sup>T</sup> A txt Btwt Ptxt* ( ) ( )( ( ) ( ) ( ) ( )) *<sup>T</sup> <sup>T</sup> x tPt Atx t Btwt*

*wt BtPtxt wt BtPtxt*

 = *Qt t* () ( ) 

*BtM tB t C t I DtM tD t Ct* ,

,4

*D*(*t*) *<sup>p</sup> <sup>m</sup>* , the above Riccati differential equation is generalised to

1 1 1 ( ) ( )( ( ) ( ) ( ) ( ) ( )) *<sup>T</sup> P t P t At BtM tD tCt* <sup>1</sup>

<sup>2</sup> <sup>1</sup> <sup>1</sup> ( ) ( ) ( ) ( )( ( ) ( ) ( )) ( ) *T T <sup>T</sup>*

where *M*(*t*) = *γ*2*I – DT*(*t*)*D*(*t*) *>* 0. A proof is requested in the problems.

*x tPtBtB tPtxt w twt*

<sup>2</sup> <sup>2</sup> <sup>2</sup> ( ( ) ( ) ( ) ( )) ( ( ) ( ) ( ) ( )) *<sup>T</sup>*

*Proof: From (2) – (5),* 

<sup>2</sup> () () () () () *<sup>T</sup> <sup>T</sup> V t y t y t w twt* 

In general, where { ( ) ( )} *<sup>T</sup> Ewtw*

are often called worst-case designs.

**9.2.2Continuous-Time H∞ Filtering** 

**9.2.2.1Problem Definition** 

state-space realisation

Suppose that the system

 <sup>2</sup> () () () () () () () () () *T T <sup>T</sup> <sup>T</sup> x tC tCtxt w twt x tPtxt* 

<sup>2</sup> <sup>2</sup> ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )) *<sup>T</sup> <sup>T</sup> <sup>T</sup>*

 

used in place of *B*(*t*) above. When the plant

Figure 1. The general filtering problem. The objective is to estimate the output of from noisy measurements of .

It is desired to find a causal solution that produces estimates <sup>1</sup> *y*ˆ (|) *t t* of *y*1(*t*) from the measurements,

$$z(t) = y\_z(t) + v(t) \, \tag{17}$$

at time *t* so that the output estimation error,

$$e(t \mid t) = y\_1(t) - \hat{y}\_1(t \mid t) \,, \tag{18}$$

is in 2. The error signal (18) is generated by a system denoted by *e* = *ei* , where *i* = *v w* and *ei* = 2 1 [ ] . Hence, the objective is to achieve <sup>0</sup> (| )(| ) *<sup>T</sup> <sup>T</sup> e t t e t t dt* – 2 0 ()() *<sup>T</sup> <sup>T</sup> i t i t dt* < 0 for some . For convenience, it is assumed here that *w*(*t*) *<sup>m</sup>* , *E*{*w*(*t*)} = 0, { ( ) ( )} *<sup>T</sup> Ewtw* = *Qt t* () ( ) , *v*(*t*) *<sup>p</sup>* , *E*{*v*(*t*)} = 0, { ( ) ( )} *<sup>T</sup> Evtv* = *Rt t* () ( ) and { ( ) ( )} *<sup>T</sup> Ewtv* = 0.

#### **9.2.2.2H∞ Solution**

A parameterisation of all solutions for the H∞ filter is developed in [21]. A minimumentropy filter arises when the contractive operator within [21] is zero and is given by

$$
\dot{\hat{\mathbf{x}}}(t|t) = \left(A(t) - K(t)\mathbb{C}\_2(t)\right)\hat{\mathbf{x}}(t|t) + K(t)\mathbf{z}(t), \quad \hat{\mathbf{x}}(0) = 0,\tag{19}
$$

$$
\hat{y}\_1(t \mid t) = \mathbb{C}\_1(t \mid t)\hat{\mathbf{x}}(t \mid t),
\tag{20}
$$

where

$$K(t) = P(t)\mathbb{C}\_2^{\top}(t)R^{-1}(t)\tag{21}$$

<sup>&</sup>quot;Uncertainty and expectation are the joys of life. Security is an insipid thing, through the overtaking and possessing of a wish discovers the folly of the chase." *William Congreve*

*0. Therefore,* 

<sup>2</sup> 0 lim *<sup>v</sup>* 

Suppose that

energy of *et t* (|) is

*on [0, T] for some γ*

<sup>2</sup> () () () () *<sup>T</sup> K PC R*

*ei* 

**9.2.2.4Trading-Off H∞ Performance** 

= 1 and <sup>2</sup> <sup>0</sup>

at high signal-to-noise-ratios.

2 = 

denote the transfer function of

"Life is uncertain. Eat dessert first." *Ernestine Ulmer*

achieves the performance *<sup>H</sup>*

lim *v*

 

 *into the above Riccati differential equation yields* 

<sup>1</sup> <sup>2</sup>

*<sup>T</sup> <sup>T</sup>*

 

*solution implies* ( ) ( )( ) *<sup>T</sup> x T PT xT –* (0) (0) (0) *<sup>T</sup> x Px +* <sup>0</sup> (| )(| )

 *2 => e*  

<sup>2</sup> <sup>2</sup> 1 1 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )( ( ) ( )( ( ) ( ) ( )) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP P A BQB P C R C C C P*

() 0 *<sup>T</sup> P* .

*Taking adjoints to address the problem (23) leads to (22), for which the existence of a positive define* 

*e t t e t t dt –* <sup>2</sup>

In a robust filter design it is desired to meet an H∞ performance objective for a minimum possible *γ*. A minimum *γ* can be found by conducting a search and checking for the existence of positive definite solutions to the Riccati differential equation (22). This search is

In some applications it may be possible to estimate *a priori* values for *γ*. Recall for output

From the arguments of Chapters 1 – 2 and [28], for single-input-single-output plants

When the problem is stationary (or time-invariant), the filter gain is precalculated as

1 2 2 2 11 0 ( ) *T T <sup>T</sup> <sup>T</sup> AP PA P C R C C C P BQB*

2 2 2 <sup>2</sup> (|) (|) ( ) ( ) *<sup>j</sup> <sup>j</sup> <sup>H</sup>*

*ei ei ei <sup>j</sup> <sup>j</sup> e t t e t t dt R R s ds R s ds* , (25)

*ei* = 1, which implies <sup>2</sup> <sup>0</sup>

,

 

1

 

> 

*Thus, under the assumption x(0) = 0,* <sup>0</sup> (| )(| )

*∞, that is, w, v* 

tractable because *P t*( ) is a convex function of *γ*2, since

estimation problems that the error is generated by *ei e i*

*ei ei* 

*<sup>T</sup>* <sup>1</sup> *K PC R* , where *P* is the solution of the algebraic Riccati equation

*, in which τ = T – t is a time-to-go variable. Substituting* 

 

*<sup>T</sup> <sup>T</sup>*

2 2 2 *P t*( ) 

 

. (24)

<sup>1</sup> is a time-invariant single-input-single-output system and let *Rei*(*s*)

lim 

*v*

   

*e t t e t t dt –* <sup>2</sup>

<sup>0</sup> ()() *<sup>T</sup> <sup>T</sup>*

= <sup>6</sup>

, where

*H ei ei* = <sup>2</sup>

< *γ*2, it follows that an *a priori* design estimate is *γ* =

*ei* . Then Parseval's Theorem states that the average total

 

*2. �*

 

<sup>0</sup> ()() *<sup>T</sup> <sup>T</sup>*

*i t i t dt < 0.* 

*i t i t dt <sup>&</sup>lt;*( ) ( )( ) *<sup>T</sup> x T PT xT <* 

1 1 ( ) ( ) ( )) ( ) *<sup>T</sup>*

*PtC tC t Pt* > 0.

*ei* = <sup>1</sup> [ ( )] 

*<sup>v</sup>* . Since the H∞ filter

*I* .

> *v*

is the filter gain and *P*(*t*) = *PT*(*t*) > 0 is the solution of the Riccati differential equation

$$\dot{P}(t) = A(t)P(t) + P(t)A^\top(t) + B(t)Q(t)B^\top(t)\tag{22}$$

$$-P(t)(\mathbf{C}\_2^\top(t)R^{-1}(t)\mathbf{C}\_2(t) - \gamma^{-2}\mathbf{C}\_1^\top(t)\mathbf{C}\_1(t))P(t),\; P(0) = 0. \tag{23}$$

It can be seen that the H∞ filter has a structure akin to the Kalman filter. A point of difference is that the solution to the above Riccati differential equation solution depends on *C*1(*t*), the linear combination of states being estimated.

#### **9.2.2.3Properties**

Define *A t*( ) = *A(t*) – *K*(*t*)*C2*(*t*). Subtracting (19) – (20) from (14) – (15) yields the error system

$$
\begin{bmatrix}
\dot{\tilde{\boldsymbol{x}}}(t\mid t) \\
\boldsymbol{e}(t\mid t) \\
\boldsymbol{e}(t\mid t)
\end{bmatrix} = \begin{bmatrix}
\overline{\boldsymbol{A}}(t) & [-\boldsymbol{K}(t) & \boldsymbol{B}(t)] \\
\boldsymbol{C}\_{1}(t) & [0 & 0] \\
\end{bmatrix} \begin{bmatrix}
\tilde{\boldsymbol{x}}(t\mid t) \\
\boldsymbol{v}(t) \\
\boldsymbol{w}(t)
\end{bmatrix}, \quad \tilde{\boldsymbol{\chi}}(0) = \boldsymbol{0},
\tag{23}
$$

where *xt t* (|) = *x*(*t*) – *xt t* ˆ(|) and *ei* = 1 ( ) [ ( ) ( )] ( ) [0 0] *A t Kt Bt C t* . The adjoint of *ei* is given by

*H ei* = <sup>1</sup> () () ( ) 0 ( ) 0 *T T T T At Ct K t B t* . It is shown below that the estimation error satisfies the desired

performance objective.

*ei*

*i* ,

*Lemma 2: In respect of the H∞ problem (14) – (18), the solution (19) – (20) achieves the performance*  ( ) ( )( ) *<sup>T</sup> x T PT xT –* (0) (0) (0) *<sup>T</sup> x Px +* <sup>0</sup> (| )(| ) *<sup>T</sup> <sup>T</sup> e t t e t t dt –* <sup>2</sup> <sup>0</sup> ()() *<sup>T</sup> <sup>T</sup> i t i t dt < 0.* 

*Proof: Following the approach in [15], [21], by applying Lemma 1 to the adjoint of (23), it is required that there exists a positive definite symmetric solution to* 

$$-\dot{P}(\tau) = \overline{A}(\tau)P(\tau) + P(\tau)\overline{A}^T(\tau) + B(\tau)Q(\tau)B^\top(\tau) + K(\tau)R(\tau)K^\top(\tau)$$

$$+ \gamma^{-2}P(\tau)\mathbf{C}\_1^\top(\tau)\mathbf{C}\_1(\tau))P(\tau) \; \; \; \; P(\tau)\Big|\_{\tau=T} = 0,$$

<sup>&</sup>quot;Although economists have studied the sensitivity of import and export volumes to changes in the exchange rate, there is still much uncertainty about just how much the dollar must change to bring about any given reduction in our trade deficit." *Martin Stuart Feldstein*

Smoothing, Filtering and Prediction:

(22)

(23)

<sup>216</sup> Estimating the Past, Present and Future

is the filter gain and *P*(*t*) = *PT*(*t*) > 0 is the solution of the Riccati differential equation

<sup>2</sup> <sup>2</sup> 1 1 ( )( ( ) ( ) ( ) ( ) ( )) ( ) *<sup>T</sup> <sup>T</sup> Pt C tR tC t C tC t Pt* , *P*(0) 0.

It can be seen that the H∞ filter has a structure akin to the Kalman filter. A point of difference is that the solution to the above Riccati differential equation solution depends on *C*1(*t*), the

Define *A t*( ) = *A(t*) – *K*(*t*)*C2*(*t*). Subtracting (19) – (20) from (14) – (15) yields the error system

(|)

*xt t*

*v t*

*w t*

( ) [ ( ) ( )] ( ) [0 0] *A t Kt Bt*

. It is shown below that the estimation error satisfies the desired

 

 = 0,

<sup>0</sup> ()() *<sup>T</sup> <sup>T</sup>*

*i t i t dt < 0.* 

   

. The adjoint of

*ei* is given by

*Lemma 2: In respect of the H∞ problem (14) – (18), the solution (19) – (20) achieves the performance* 

*Proof: Following the approach in [15], [21], by applying Lemma 1 to the adjoint of (23), it is required* 

() () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> P AP P A BQB K RK*

"Although economists have studied the sensitivity of import and export volumes to changes in the exchange rate, there is still much uncertainty about just how much the dollar must change to bring

 

 

, ( ) *<sup>T</sup> <sup>P</sup>*

*e t t e t t dt –* <sup>2</sup>

() () () () () () () () *<sup>T</sup> <sup>T</sup> Pt AtPt Pt A t BtQtB t*

1

*i* ,

*et t C t*

( ) ( )( ) *<sup>T</sup> x T PT xT –* (0) (0) (0) *<sup>T</sup> x Px +* <sup>0</sup> (| )(| )

*that there exists a positive definite symmetric solution to* 

about any given reduction in our trade deficit." *Martin Stuart Feldstein*

<sup>2</sup>

*xt t A t Kt Bt*

(|) ( ) [ ( ) ( )] ( ) (|) ( ) [0 0] ( )

*ei* =

*<sup>T</sup> <sup>T</sup>*

 

 *PC C P* 

1 1 ( ) ( ) ( )) ( ) *<sup>T</sup>*

, *x*(0) 0,

1

*C t*

<sup>1</sup> <sup>2</sup>

linear combination of states being estimated.

**9.2.2.3Properties** 

*H ei* =

*ei*

where *xt t* (|) = *x*(*t*) – *xt t* ˆ(|) and

*T T*

*At Ct*

*T T*

*K t B t*

performance objective.

<sup>1</sup> () () ( ) 0 ( ) 0

*on [0, T] for some γ , in which τ = T – t is a time-to-go variable. Substituting* 1 <sup>2</sup> () () () () *<sup>T</sup> K PC R into the above Riccati differential equation yields* 

$$-\dot{P}(\tau) = A(\tau)P(\tau) + P(\tau)A^\top(\tau) + B(\tau)Q(\tau)B^\top(\tau) - P(\tau)(\mathbf{C}\_2^\top(\tau)\mathbf{R}^{-1}(\tau)(\mathbf{C}\_2(\tau) - \gamma^{-2}\mathbf{C}\_1^\top(\tau)\mathbf{C}\_1(\tau))P(\tau),$$

$$P(\tau)\Big|\_{t=T} = 0.$$

*Taking adjoints to address the problem (23) leads to (22), for which the existence of a positive define solution implies* ( ) ( )( ) *<sup>T</sup> x T PT xT –* (0) (0) (0) *<sup>T</sup> x Px +* <sup>0</sup> (| )(| ) *<sup>T</sup> <sup>T</sup> e t t e t t dt –* <sup>2</sup> <sup>0</sup> ()() *<sup>T</sup> <sup>T</sup> i t i t dt < 0. Thus, under the assumption x(0) = 0,* <sup>0</sup> (| )(| ) *<sup>T</sup> <sup>T</sup> e t t e t t dt –* <sup>2</sup> <sup>0</sup> ()() *<sup>T</sup> <sup>T</sup> i t i t dt <sup>&</sup>lt;*( ) ( )( ) *<sup>T</sup> x T PT xT < 0. Therefore, ei ∞, that is, w, v 2 => e 2. �*

#### **9.2.2.4Trading-Off H∞ Performance**

In a robust filter design it is desired to meet an H∞ performance objective for a minimum possible *γ*. A minimum *γ* can be found by conducting a search and checking for the existence of positive definite solutions to the Riccati differential equation (22). This search is tractable because *P t*( ) is a convex function of *γ*2, since 2 2 2 *P t*( ) = <sup>6</sup> 1 1 ( ) ( ) ( )) ( ) *<sup>T</sup> PtC tC t Pt* > 0.

In some applications it may be possible to estimate *a priori* values for *γ*. Recall for output estimation problems that the error is generated by *ei e i* , where *ei* = <sup>1</sup> [ ( )]  *I* . From the arguments of Chapters 1 – 2 and [28], for single-input-single-output plants <sup>2</sup> 0 lim *<sup>v</sup>* = 1 and <sup>2</sup> <sup>0</sup> lim *v ei* = 1, which implies <sup>2</sup> <sup>0</sup> lim  *v H ei ei* = <sup>2</sup> *<sup>v</sup>* . Since the H∞ filter achieves the performance *<sup>H</sup> ei ei*  < *γ*2, it follows that an *a priori* design estimate is *γ* = *v* at high signal-to-noise-ratios.

When the problem is stationary (or time-invariant), the filter gain is precalculated as *<sup>T</sup>* <sup>1</sup> *K PC R* , where *P* is the solution of the algebraic Riccati equation

$$0 = AP + PA^{\top} - P(\mathbf{C}\_2^{\top} \mathbf{R}^{-1} \mathbf{C}\_2 - \gamma^{-2} \mathbf{C}\_1^{\top} \mathbf{C}\_1)P + BQB^{\top}. \tag{24}$$

Suppose that 2 = <sup>1</sup> is a time-invariant single-input-single-output system and let *Rei*(*s*) denote the transfer function of *ei* . Then Parseval's Theorem states that the average total energy of *et t* (|) is

$$\left\|c(t\mid t)\right\|\_{2}^{2} = \int\_{-\alpha}^{-\alpha} \left|c(t\mid t)\right|^{2} dt = \int\_{-j\alpha}^{j\alpha} R\_{\alpha} R\_{\alpha}^{H}(s) \, ds = \int\_{-j\alpha}^{j\alpha} \left|R\_{\alpha}(s)\right|^{2} ds \, \, \tag{25}$$

<sup>&</sup>quot;Life is uncertain. Eat dessert first." *Ernestine Ulmer*

**9.2.3Accommodating Uncertainty** 

Figure 3. Representation of additive model

<sup>∑</sup>

*p* 

Δ

*w* 

uncertainty Δ, an auxiliary problem is defined as

where *p*(*t*) is an additional exogenous input satisfying

Consider the scaled H∞ filtering problem where

auxiliary H∞ filtering problem.

**9.2.3.1Additive Uncertainty** 

where

in which ε2 = (1 + *δ*2)-1.

right but irrelevant." *Manfred Eigen*

The above filters are designed for situations in which the inputs *v*(*t*) and *w*(*t*) are uncertain. Next, problems in which model uncertainty is present are discussed. The described approaches involve converting the uncertainty into a fictitious noise source and solving an

uncertainty. Figure 4. Input scaling in lieu of a problem

signal to account for discrepancies due to the uncertainty. It is argued below that a solution to the H∞ filtering problem can be found by solving an auxiliary problem in which the input is scaled by ε as shown in Fig. 4. In lieu of the filtering problem possessing the

<sup>2</sup> is known and Δ is unknown, as depicted in Fig. 3. The *p*(*t*) represents a fictitious

Consider a time-invariant output estimation problem in which the nominal model is

*x t Ax t Bw t Bp t* () () () () , *x*(0) 0 ,

<sup>2</sup> *zt C txt vt* () () () () ,

<sup>1</sup> <sup>1</sup> *et t C txt C txt* ( | ) () () () () ˆ ,

<sup>2</sup> *zt C txt vt* () () () () ,

<sup>1</sup> <sup>1</sup> *et t C txt C txt* ( | ) () () () () ˆ ,

, *x*(0) 0 ,

"A theory has only the alternative of being right or wrong. A model has a third possibility - it may be

2 2 2 <sup>2</sup> <sup>2</sup> *p w* ,

*x t Ax t B w t* () () ()

that possesses an uncertainty.

X

ε

*w* 

. (30)

<sup>2</sup> + Δ,

> (27) (28) (29)

> (31) (32) (33)

which equals the area under the error power spectral density, ( ) *<sup>H</sup> RR s ei ei* . Recall that the optimal filter (in which *γ* = ∞) minimises (25), whereas the H∞ filter minimises

$$\sup\_{\|\|\_{\mathbb{R}} \neq 0} \left\| R\_{cl} R\_{cl}^H \right\|\_{\mathbb{R}} = \sup\_{\|\|\_{\mathbb{R}} \neq 0} \frac{\left\| c \right\|\_2^2}{\left\| \bar{t} \right\|\_2^2} < \gamma^2 \,. \tag{26}$$

In view of (25) and (26), it follows that the H∞ filter minimises the maximum magnitude of ( ) *<sup>H</sup> RR s ei ei* . Consequently, it is also called a 'minimax filter'. However, robust designs, which accommodate uncertain inputs tend to be conservative. Therefore, it is prudent to investigate using a larger *γ* to achieve a trade-off between H∞ and minimum-mean-squareerror performance criteria.

Figure 2. R () *<sup>H</sup> ei ei R s* versus frequency for Example 1: optimal filter (solid line) and H∞ filter (dotted line).

*Example 1.* Consider a time-invariant output estimation problem where *A* = -1, *B* = *C*2 *= C*1 = 1, <sup>2</sup> *<sup>w</sup>* = 10 and <sup>2</sup> *<sup>v</sup>* = 0.1. The magnitude of the error spectrum exhibited by the optimal filter (designed with *γ*2 = 108) is indicated by the solid line of Fig. 2. From a search, a minimum of *γ*2 = 0.099 was found such that the algebraic Riccati equation (24) has a positive definite solution, which concurs with the *a priori* estimate of *γ*<sup>2</sup> ≈ <sup>2</sup> *<sup>v</sup>* . The magnitude of the error spectrum exhibited by the H∞ filter is indicated by the dotted line of Fig. 2. The figure demonstrates that the filter achieves ( ) *<sup>H</sup> RR s ei ei* < *γ*2. Although the H∞ filter reduces the peak of the error spectrum by 10 dB, it can be seen that the area under the curve is larger, that is, the mean square error increases. Consequently, some intermediate value of *γ* may need to be considered to trade off peak error (spectrum) and average error performance.

<sup>&</sup>quot;If the uncertainty is larger than the effect, the effect itself becomes moot." *Patrick Frank*

Smoothing, Filtering and Prediction:

. (26)

<sup>218</sup> Estimating the Past, Present and Future

which equals the area under the error power spectral density, ( ) *<sup>H</sup> RR s ei ei* . Recall that the

2 2 2

*e*

*i*

In view of (25) and (26), it follows that the H∞ filter minimises the maximum magnitude of ( ) *<sup>H</sup> RR s ei ei* . Consequently, it is also called a 'minimax filter'. However, robust designs, which accommodate uncertain inputs tend to be conservative. Therefore, it is prudent to investigate using a larger *γ* to achieve a trade-off between H∞ and minimum-mean-square-

<sup>100</sup> <sup>105</sup> 10−4

Frequency, Hz

*ei ei R s* versus frequency for Example 1: optimal filter (solid line) and H∞ filter (dotted line).

*<sup>v</sup>* = 0.1. The magnitude of the error spectrum exhibited by the optimal

*<sup>v</sup>* . The magnitude of the

*Example 1.* Consider a time-invariant output estimation problem where *A* = -1, *B* = *C*2 *= C*1 =

filter (designed with *γ*2 = 108) is indicated by the solid line of Fig. 2. From a search, a minimum of *γ*2 = 0.099 was found such that the algebraic Riccati equation (24) has a positive

error spectrum exhibited by the H∞ filter is indicated by the dotted line of Fig. 2. The figure demonstrates that the filter achieves ( ) *<sup>H</sup> RR s ei ei* < *γ*2. Although the H∞ filter reduces the peak of the error spectrum by 10 dB, it can be seen that the area under the curve is larger, that is, the mean square error increases. Consequently, some intermediate value of *γ* may need to be

definite solution, which concurs with the *a priori* estimate of *γ*<sup>2</sup> ≈ <sup>2</sup>

considered to trade off peak error (spectrum) and average error performance.

"If the uncertainty is larger than the effect, the effect itself becomes moot." *Patrick Frank*

optimal filter (in which *γ* = ∞) minimises (25), whereas the H∞ filter minimises

<sup>2</sup> <sup>0</sup> <sup>0</sup> <sup>2</sup> sup sup *<sup>H</sup> ei ei <sup>i</sup> <sup>i</sup>*

2 2

*R R*

10−2


ei|

R

H

100

larger

error performance criteria.

Figure 2. R () *<sup>H</sup>*

*<sup>w</sup>* = 10 and <sup>2</sup>

1, <sup>2</sup> 

#### **9.2.3Accommodating Uncertainty**

The above filters are designed for situations in which the inputs *v*(*t*) and *w*(*t*) are uncertain. Next, problems in which model uncertainty is present are discussed. The described approaches involve converting the uncertainty into a fictitious noise source and solving an auxiliary H∞ filtering problem.

Figure 3. Representation of additive model

uncertainty. Figure 4. Input scaling in lieu of a problem that possesses an uncertainty.

#### **9.2.3.1Additive Uncertainty**

Consider a time-invariant output estimation problem in which the nominal model is <sup>2</sup> + Δ, where <sup>2</sup> is known and Δ is unknown, as depicted in Fig. 3. The *p*(*t*) represents a fictitious signal to account for discrepancies due to the uncertainty. It is argued below that a solution to the H∞ filtering problem can be found by solving an auxiliary problem in which the input is scaled by ε as shown in Fig. 4. In lieu of the filtering problem possessing the uncertainty Δ, an auxiliary problem is defined as

$$\dot{\mathbf{x}}(t) = A\mathbf{x}(t) + Bw(t) + Bp(t) \; \; \mathbf{x}(0) = \mathbf{0} \; \; \tag{27}$$

$$z(t) = \mathbb{C}\_2(t)\mathbf{x}(t) + \upsilon(t) \tag{28}$$

$$e(t \mid t) = \mathbb{C}\_1(t)\mathbf{x}(t) - \mathbb{C}\_1(t)\hat{\mathbf{x}}(t) \tag{29}$$

where *p*(*t*) is an additional exogenous input satisfying

$$\left\| p \right\|\_{2}^{2} < \delta^{2} \left\| w \right\|\_{2}^{2}, \quad \delta \in \mathbb{R} \tag{30}$$

Consider the scaled H∞ filtering problem where

$$
\dot{\mathbf{x}}(t) = A\mathbf{x}(t) + B\boldsymbol{\varepsilon}w(t) \; , \; \mathbf{x}(0) = \mathbf{0} \; , \tag{31}
$$

$$\mathbf{z}(t) = \mathbb{C}\_{\mathbf{z}}(t)\mathbf{x}(t) + \mathbf{v}(t) \; \tag{32}$$

$$e(t \mid t) = \mathbb{C}\_1(t)\mathbf{x}(t) - \mathbb{C}\_1(t)\hat{\mathbf{x}}(t) \tag{33}$$

in which ε2 = (1 + *δ*2)-1.

<sup>&</sup>quot;A theory has only the alternative of being right or wrong. A model has a third possibility - it may be right but irrelevant." *Manfred Eigen*

[18], [19].

**9.2.6.1Background** 

which is described below.

Δ

model uncertainty.

*w* 

Figure 5. Representation of multiplicative

∑

*p* 

(36), (37), where *B* = <sup>1</sup> *B*

**9.2.6Continuous-Time H∞ Smoothing** 

filtering problem satisfies

, *w t*( )<sup>=</sup>

( ) ( ) *w t p t* 

which implies (39). Thus, state matrix parameter uncertainty can be accommodated by including a scaled input in the solution of an auxiliary H∞ filtering problem. Similar solutions to problems in which other state-space parameters are uncertain appear in [14],

There are three kinds of H∞ smoothers: fixed point, fixed lag and fixed interval (see the tutorial [13]). The next development is concerned with continuous-time H∞ fixed-interval smoothing. The smoother in [10] arises as a combination of forward states from an H∞ filter and adjoint states that evolve according to a Hamiltonian matrix. A different fixed-interval smoothing problem to [10] is found in [16] by solving for saddle conditions within differential games. A summary of some filtering and smoothing results appears in [13]. Robust prediction, filtering and smoothing problems are addressed in [22]; the H∞ predictor, filter and smoother require the solution of a Riccati differential equation that evolves forward in time, whereas the smoother additionally requires another to be solved in reversetime. Another approach for combining forward and adjoint estimates is described [32]

Continuous-time, fixed-interval smoothers that differ from the formulations within [10], [13], [16], [22], [32] are reported in [34] – [35]. A robust version of [34] – [35] appears in [33],

where the Fraser-Potter formula is used to construct a smoothed estimate.

2

"The purpose of models is not to fit the data but to sharpen the questions." *Samuel Karlin*

2 2 2 2 2 2 <sup>2</sup> <sup>2</sup> <sup>2</sup> <sup>2</sup> *e w*

  and 0 < ε < 1. Then the solution of this H∞

Figure 6. Robust smoother error structure.

*fi*

*w f* 

*v* 

*uj*

 <sup>∑</sup> *<sup>H</sup>*

*e* 

*u* 

( *p v* ) , (41)

*Lemma 3 [26]: Suppose for a γ ≠ 0 that the scaled H∞ problem (31) – (33) is solvable, that is,* <sup>2</sup> <sup>2</sup> *e <*  2 2 2 2 ( *w +* <sup>2</sup> <sup>2</sup> *v . Then, this guarantees the performance* )

$$\left\|\boldsymbol{v}\right\|\_{2}^{2} < \boldsymbol{\gamma}^{2} \left(\left\|\boldsymbol{w}\right\|\_{2}^{2} + \left\|\boldsymbol{p}\right\|\_{2}^{2} + \left\|\boldsymbol{v}\right\|\_{2}^{2}\right) \tag{34}$$

*for the solution of the auxiliary problem (27) – (29).* 

*Proof: From the assumption that problem (31) – (33) is solvable, it follows that* <sup>2</sup> <sup>2</sup> *e <* <sup>2</sup> 2 2 2 ( *w +*  2 <sup>2</sup> *v . Substituting for* ) *ε, using (30) and rearranging yields (34). �*

#### **9.2.4 Multiplicative Uncertainty**

Next, consider a filtering problem in which the model is *G(I* + Δ), as depicted in Fig. 5. It is again assumed that *G* and Δ are known and unknown transfer function matrices, respectively. This problem may similarly be solved using Lemma 3. Thus a filter that accommodates additive or multiplicative uncertainty simply requires scaling of an input. The above scaling is only sufficient for a H∞ performance criterion to be met. The design may well be too conservative and it is worthwhile to explore the merits of using values for δ less than the uncertainty's assumed norm bound.

#### **9.2.5Parametric Uncertainty**

Finally, consider a time-invariant output estimation problem in which the state matrix is uncertain, namely,

$$
\dot{\mathbf{x}}(t) = (A + \Delta\_A)\mathbf{x}(t) + Bw(t), \ \mathbf{x}(0) = \mathbf{0},\tag{35}
$$

$$\mathbf{x}(t) = \mathbf{C}\_2(t)\mathbf{x}(t) + \mathbf{v}(t) \tag{36}$$

$$\mathbf{e}(t \mid t) = \mathbb{C}\_1(t)\mathbf{x}(t) - \mathbb{C}\_1(t)\hat{\mathbf{x}}(t) \tag{37}$$

where Δ*<sup>A</sup> n n* is unknown. Define an auxiliary H∞ filtering problem by

$$
\dot{\mathbf{x}}(t) = A\mathbf{x}(t) + Bw(t) + p(t), \quad \mathbf{x}(0) = 0,\tag{38}
$$

(36) and (37), where *p*(*t*) = Δ*Ax*(*t*) is a fictitious exogenous input. A solution to this problem would achieve

$$\left\|\boldsymbol{c}\right\|\_{2}^{2} < \boldsymbol{\gamma}^{2} \left(\left\|\boldsymbol{w}\right\|\_{2}^{2} + \left\|\boldsymbol{p}\right\|\_{2}^{2} + \left\|\boldsymbol{v}\right\|\_{2}^{2}\right) \tag{39}$$

for a *γ* ≠ 0. From the approach of [14], [18], [19], consider the scaled filtering problem

$$
\dot{\mathbf{x}}(t) = A\mathbf{x}(t) + B\overline{w}(t), \quad \mathbf{x}(0) = 0,\tag{40}
$$

<sup>&</sup>quot;Remember that all models are wrong; the practical question is how wrong do they have to be to not be useful." *George Edward Pelham Box*

filtering problem satisfies

Smoothing, Filtering and Prediction:

<sup>2</sup> *e <* <sup>2</sup> 2 2

 ( *w +* 

( *p v* ) (34)

<sup>2</sup> *e <* 

2

(35) (36) (37)

the is

<sup>220</sup> Estimating the Past, Present and Future

*Lemma 3 [26]: Suppose for a γ ≠ 0 that the scaled H∞ problem (31) – (33) is solvable, that is,* <sup>2</sup>

<sup>2</sup> *v . Substituting for* ) *ε, using (30) and rearranging yields (34). �*

Next, consider a filtering problem in which the model is *G(I* + Δ), as depicted in Fig. 5. It is again assumed that *G* and Δ are known and unknown transfer function matrices, respectively. This problem may similarly be solved using Lemma 3. Thus a filter that accommodates additive or multiplicative uncertainty simply requires scaling of an input. The above scaling is only sufficient for a H∞ performance criterion to be met. The design may well be too conservative and it is worthwhile to explore the merits of using values for δ

Finally, consider a time-invariant output estimation problem in which the state matrix is

(36) and (37), where *p*(*t*) = Δ*Ax*(*t*) is a fictitious exogenous input. A solution to this problem

"Remember that all models are wrong; the practical question is how wrong do they have to be to not be

*x t Ax t Bw t p t* ( ) ( ) ( ) ( ), *x*(0) 0, (38)

( *p v* ) (39)

*x t Ax t Bw t* ( ) ( ) ( ), *x*(0) 0, (40)

( ) ( ) ( ) ( ), *<sup>A</sup> x t A x t Bw t x*(0) 0,

<sup>2</sup> *zt C txt vt* () () () () ,

<sup>1</sup> <sup>1</sup> *et t C txt C txt* ( | ) () () () () ˆ ,

2 2 2 2 2 <sup>2</sup> <sup>2</sup> <sup>2</sup> <sup>2</sup> *e w*

for a *γ* ≠ 0. From the approach of [14], [18], [19], consider the scaled filtering problem

where Δ*<sup>A</sup> n n* is unknown. Define an auxiliary H∞ filtering problem by

2 2 2 2 2 <sup>2</sup> <sup>2</sup> <sup>2</sup> <sup>2</sup> *e w*

*Proof: From the assumption that problem (31) – (33) is solvable, it follows that* <sup>2</sup>

<sup>2</sup> *v . Then, this guarantees the performance* )

*for the solution of the auxiliary problem (27) – (29).* 

less than the uncertainty's assumed norm bound.

**9.2.4 Multiplicative Uncertainty** 

**9.2.5Parametric Uncertainty** 

uncertain, namely,

would achieve

useful." *George Edward Pelham Box*

2 2 2 2

( *w +* <sup>2</sup>

 

2

$$\left\|\boldsymbol{e}\right\|\_{2}^{2} < \gamma^{2} \left(\left\|\boldsymbol{w}\right\|\_{2}^{2} + \varepsilon^{2} \left\|\boldsymbol{p}\right\|\_{2}^{2} + \left\|\boldsymbol{v}\right\|\_{2}^{2}\right) \,,\tag{41}$$

which implies (39). Thus, state matrix parameter uncertainty can be accommodated by including a scaled input in the solution of an auxiliary H∞ filtering problem. Similar solutions to problems in which other state-space parameters are uncertain appear in [14], [18], [19].

#### **9.2.6Continuous-Time H∞ Smoothing**

#### **9.2.6.1Background**

There are three kinds of H∞ smoothers: fixed point, fixed lag and fixed interval (see the tutorial [13]). The next development is concerned with continuous-time H∞ fixed-interval smoothing. The smoother in [10] arises as a combination of forward states from an H∞ filter and adjoint states that evolve according to a Hamiltonian matrix. A different fixed-interval smoothing problem to [10] is found in [16] by solving for saddle conditions within differential games. A summary of some filtering and smoothing results appears in [13]. Robust prediction, filtering and smoothing problems are addressed in [22]; the H∞ predictor, filter and smoother require the solution of a Riccati differential equation that evolves forward in time, whereas the smoother additionally requires another to be solved in reversetime. Another approach for combining forward and adjoint estimates is described [32] where the Fraser-Potter formula is used to construct a smoothed estimate.

Continuous-time, fixed-interval smoothers that differ from the formulations within [10], [13], [16], [22], [32] are reported in [34] – [35]. A robust version of [34] – [35] appears in [33], which is described below.

Figure 5. Representation of multiplicative model uncertainty.

Figure 6. Robust smoother error structure.

<sup>&</sup>quot;The purpose of models is not to fit the data but to sharpen the questions." *Samuel Karlin*

*(i) => f*  

*the property* 

*property <sup>H</sup> uj <sup>2</sup> 2 => <sup>H</sup> uj ∞.* 

*(ii) Finally, i* 

**9.2.7Performance** 

*Fig. 6, in which <sup>w</sup>*

*2. If and only if: (i)* 

<sup>2</sup> *e ≤* <sup>2</sup> *f* <sup>2</sup> *u => e* 

*fi <sup>2</sup>*

 *2, e = ei i* 

*solutions to (22) and (46) for* 

 *2.* 

 

*2 implies e* 

*and <sup>H</sup>* 

smoother (43) – (45), is given by

*i*

*v* 

> *fi*

It is easily shown that the error system,

It will be shown subsequently that the robust fixed-interval smoother (43) – (45) has the

*∞, then (i) f, u, e* 

*2. The necessity of (i) follows from the assumption i* 

*∞ (see [p. 83, 21]). Similarly, j* 

 *> 0, then the smoother (43) – (45) achieves* 

*fi* 

> 

 

*ei <sup>2</sup> 2 => ei* 

 

*2 together with the property* 

1 1 ( ) ( ) ( ) 0 () () ( ) ( ) ) () () () 0 () () ( ) (| ) () () () 0 0 ( ) ( *<sup>T</sup> <sup>T</sup> <sup>T</sup> T*

, *x*(0) 0 ,

where *xt t* (|) = *x*(*t*) – *xt t* ˆ(|) . The conditions for the smoother attaining the desired

*Lemma 5 [33]: In respect of the smoother error system (47), if there exist symmetric positive define* 

*ei denote the map from i to e. Assume that w and v* 

*2 and (ii)* 

 

*ei* , for the model (14) – (15), the data (17) and the

*x t*

*w t*

*v t*

*ei is equivalent to the arrangement of two systems* 

() 0 *T* ,

*fi is defined by (23) in which C2(t) = C(t). From Lemma 2, the* 

*∞. The <sup>H</sup>* 

*t*

*2, which with Condition (ii) => u* 

  *i and u = <sup>H</sup>*

*ei ∞.* 

> 

*ei* 

*uj is given by the system* 

 

*uj* 

*2, which with Condition* 

 *2. Also,* 

*2 together with the* 

*2 together with* 

*∞. �*

*∞, that is, i*

(47)

*fi*

(48) (49)

*j shown in* 

error structure shown in Fig. 6, which is examined below.

 *and <sup>f</sup> <sup>j</sup>*

*2. Similarly,* <sup>2</sup> *j ≤* <sup>2</sup> *f* <sup>2</sup> *v => j* 

 

*2 => fi* 

*et T Ct RtK t*

performance objective are described below. <sup>15</sup>

*Proof: Since xt t* (|) *is decoupled from ξ(t),* 

 

*existence of a positive definite solution to (22) implies* 

*uj shown in Fig. 6. The* 

*Lemma 4 [33]: Consider the arrangement of two linear systems f = fi*

*v* 

*∞ and (ii) <sup>H</sup> uj* 

*Proof: (i) To establish sufficiency, note that* <sup>2</sup> *i ≤* <sup>2</sup> *w*  <sup>2</sup> *v => d* 

*x t A t Bt Kt*

 *,* <sup>2</sup> 

 

<sup>1</sup> <sup>1</sup> () ()() () ()( | ) () ()() *<sup>T</sup> <sup>T</sup> <sup>T</sup>*

*A CR y CR v* ,

"Doubt is uncomfortable, certainty is ridiculous." *François-Marie Arouet de Voltaire*

 

() () () () *<sup>T</sup> u RK*

 .

*t C tR tCt A t C tR t*

() 0 *T* ,

*. Let* 

#### **9.2.6.2Problem Definition**

Once again, it is assumed that the data is generated by (14) – (17). For convenience, attention is confined to output estimation, namely 2 = <sup>1</sup> within Fig. 1. Input and state estimation problems can be handled similarly using the solution structures described in Chapter 6. It is desired to find a fixed-interval smoother solution that produces estimates <sup>1</sup> *y*ˆ (| ) *t T* of <sup>1</sup> *y t*( ) so that the output estimation error

$$e(t \mid T) = y\_1(t) - \hat{y}\_1(t \mid T) \tag{42}$$

is in 2. As before, the map from the inputs *i* = *v w* to the error is denoted by *ei* = 1 1 [ ] and the objective is to achieve <sup>0</sup> (| )(| ) *<sup>T</sup> <sup>T</sup> e t T e t T dt* – <sup>2</sup> 0 ()() *<sup>T</sup> <sup>T</sup> i t i t dt* < 0 for some .

#### **9.2.6.3 H∞ Solution**

The following H∞ fixed-interval smoother exploits the structure of the minimum-variance smoother but uses the gain (21) calculated from the solution of the Riccati differential equation

(22) akin to the H∞ filter. An approximate Wiener-Hopf factor inverse, ˆ <sup>1</sup> , is given by

$$
\begin{bmatrix}
\dot{\hat{\mathbf{x}}}(t|t) \\
\boldsymbol{a}(t)
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{A}(t) - \boldsymbol{K}(t)\mathbf{C}(t) & \boldsymbol{K}(t) \\
\end{bmatrix} \begin{bmatrix}
\hat{\mathbf{x}}(t|t) \\
\boldsymbol{z}(t)
\end{bmatrix}.\tag{43}
$$

An inspection reveals that the states within (43) are the same as those calculated by the H∞ filter (19). The adjoint of ˆ <sup>1</sup> , which is denoted by ˆ *<sup>H</sup>* , has the realisation

$$
\begin{bmatrix}
\mathcal{J}(t)
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{A}^{\top}(t) - \boldsymbol{\mathcal{C}}^{\top}(t)\boldsymbol{K}^{\top}(t) & -\boldsymbol{\mathcal{C}}^{\top}(t)\boldsymbol{R}^{-1/2}(t) \\
\boldsymbol{K}^{\top}(t) & \boldsymbol{R}^{-1/2}(t)
\end{bmatrix} \begin{bmatrix}
\dot{\xi}(t) \\
\boldsymbol{a}(t)
\end{bmatrix}.\tag{44}$$

Output estimates are obtained as

$$
\hat{y}(t|T) = z(t) - R(t)\beta(t) \,. \tag{45}
$$

However, an additional condition requires checking in order to guarantee that the smoother actually achieves the above performance objective; the existence of a solution <sup>2</sup> *P t*( ) = <sup>2</sup> ( ) *<sup>T</sup> P t* > 0 is required for the auxiliary Riccati differential equation

$$\begin{aligned} -\dot{P}\_2(t) &= \overline{A}(t)P\_2(t) + P\_2(t)\overline{A}^T(t) + K(t)R^2(t)K^T(t) \\ &+ \gamma\_2^{-2} P\_2(t)C^T(t)R^{-1}(t)\{\mathcal{C}(t)P(t)C^T(t) + R(t)\}R^{-1}(t)C(t)P\_2(t), \; P\_2(T) = 0 \end{aligned} \tag{46}$$

where *At At KtCt* () () () () .

<sup>&</sup>quot;Certainty is the mother of quiet and repose, and uncertainty the cause of variance and contentions." *Edward Coke*

#### **9.2.7Performance**

Smoothing, Filtering and Prediction:

<sup>1</sup> within Fig. 1. Input and state estimation

to the error is denoted by

0 

 ()() *<sup>T</sup> <sup>T</sup>*

. (43)

*ei* =

(46)

*i t i t dt* < 0

that produces estimates <sup>1</sup> *y*ˆ (| ) *t T* of

<sup>222</sup> Estimating the Past, Present and Future

Once again, it is assumed that the data is generated by (14) – (17). For convenience, attention

problems can be handled similarly using the solution structures described in Chapter 6. It is

The following H∞ fixed-interval smoother exploits the structure of the minimum-variance smoother but uses the gain (21) calculated from the solution of the Riccati differential equation

An inspection reveals that the states within (43) are the same as those calculated by the H∞

However, an additional condition requires checking in order to guarantee that the smoother actually achieves the above performance objective; the existence of a solution <sup>2</sup> *P t*( ) = <sup>2</sup> ( ) *<sup>T</sup> P t*

*P tC tR t C tPtC t Rt R tCtP t* , <sup>2</sup> *P T*() 0 ,

"Certainty is the mother of quiet and repose, and uncertainty the cause of variance and contentions."

1/ 2 1/ 2

. (44)

 

. (45)

(22) akin to the H∞ filter. An approximate Wiener-Hopf factor inverse, ˆ <sup>1</sup> , is given by

1/ 2 1/ 2 ˆ(|) () () () () ˆ(|) ( ) () () () ( ) *xt t At KtCt Kt xt t*

( ) () () () () () ( ) ( ) ( ) ( ) ( )

*t A t C tK t C tR t t t K t R t t*

*y*ˆ( | ) () () () *t T zt Rt t*

2 2 <sup>2</sup> () () () () () () () () () () *<sup>T</sup> <sup>T</sup>*

 *t R tCt R t z t* 

filter (19). The adjoint of ˆ <sup>1</sup> , which is denoted by ˆ *<sup>H</sup>* , has the realisation

*T TT T*

*T*

> 0 is required for the auxiliary Riccati differential equation

<sup>2</sup> 2 2 () () () () () () () () *<sup>T</sup> <sup>T</sup> P t AtP t P t A t KtR tK t*

<sup>2</sup> <sup>1</sup> <sup>1</sup>

*v w* 

*<sup>T</sup> <sup>T</sup>*

1 1 *et T* ( | ) () ( | ) *y t y*ˆ *t T* (42)

*e t T e t T dt* – <sup>2</sup>

2 = 

– output

] and the objective is to achieve <sup>0</sup> (| )(| )

**9.2.6.2Problem Definition** 

1 1 [

.

**9.2.6.3 H∞ Solution** 

for some

is confined to output estimation, namely

<sup>1</sup> *y t*( ) so that the output estimation error

<sup>2</sup>

Output estimates are obtained as

where *At At KtCt* () () () () .

*Edward Coke*

desired to find a fixed-interval smoother solution

is in 2. As before, the map from the inputs *i* =

It will be shown subsequently that the robust fixed-interval smoother (43) – (45) has the error structure shown in Fig. 6, which is examined below.

*Lemma 4 [33]: Consider the arrangement of two linear systems f = fi i and u = <sup>H</sup> uj j shown in Fig. 6, in which <sup>w</sup> i v and <sup>f</sup> <sup>j</sup> v . Let ei denote the map from i to e. Assume that w and v 2. If and only if: (i) fi ∞ and (ii) <sup>H</sup> uj ∞, then (i) f, u, e 2 and (ii) ei ∞. Proof: (i) To establish sufficiency, note that* <sup>2</sup> *i ≤* <sup>2</sup> *w*  <sup>2</sup> *v => d 2, which with Condition (i) => f 2. Similarly,* <sup>2</sup> *j ≤* <sup>2</sup> *f* <sup>2</sup> *v => j 2, which with Condition (ii) => u 2. Also, 2. The necessity of (i) follows from the assumption i 2 together with* 

<sup>2</sup> *e ≤* <sup>2</sup> *f* <sup>2</sup> *u => e the property fi <sup>2</sup> 2 => fi ∞ (see [p. 83, 21]). Similarly, j 2 together with the property <sup>H</sup> uj <sup>2</sup> 2 => <sup>H</sup> uj ∞.* 

*(ii) Finally, i 2, e = ei i 2 together with the property ei <sup>2</sup> 2 => ei ∞. �*

It is easily shown that the error system, *ei* , for the model (14) – (15), the data (17) and the smoother (43) – (45), is given by

$$
\begin{bmatrix}
\dot{\tilde{x}}(t) \\
\epsilon(t \mid T)
\end{bmatrix} = \begin{bmatrix}
\overline{A}(t) & 0 & B(t) & -K(t) \\
& \mathbb{C}(t) & R(t)\mathbb{K}^{\top}(t) & 0 & 0
\end{bmatrix} \begin{bmatrix}
\dot{\tilde{x}}(t) \\
\dot{\tilde{\varphi}}(t) \\
w(t) \\
v(t)
\end{bmatrix}, \tilde{\mathbf{x}}(0) = 0,\tag{47}
$$

$$
\underline{\mathcal{E}}(T) = 0 \, \, \, \, \tag{48}
$$

where *xt t* (|) = *x*(*t*) – *xt t* ˆ(|) . The conditions for the smoother attaining the desired performance objective are described below. <sup>15</sup>

*Lemma 5 [33]: In respect of the smoother error system (47), if there exist symmetric positive define solutions to (22) and (46) for ,* <sup>2</sup>  *> 0, then the smoother (43) – (45) achieves ei ∞, that is, i 2 implies e 2.* 

*Proof: Since xt t* (|) *is decoupled from ξ(t), ei is equivalent to the arrangement of two systems fi and <sup>H</sup> uj shown in Fig. 6. The fi is defined by (23) in which C2(t) = C(t). From Lemma 2, the existence of a positive definite solution to (22) implies fi ∞. The <sup>H</sup> uj is given by the system* 

$$-\dot{\tilde{\varphi}}(\tau) = \overline{\boldsymbol{A}}^{\top}(\tau)\tilde{\boldsymbol{\varphi}}(\tau) - \boldsymbol{C}^{\top}(\tau)\boldsymbol{R}^{-1}(\tau)\tilde{\boldsymbol{y}}(\tau|\tau) - \boldsymbol{C}^{\top}(\tau)\boldsymbol{R}^{-1}(\tau)\boldsymbol{v}(\tau), \ \tilde{\boldsymbol{\varphi}}(T) = \boldsymbol{0} \tag{48}$$

$$
\mu(\boldsymbol{\tau}) = R(\boldsymbol{\tau}) K^{\boldsymbol{T}}(\boldsymbol{\tau}) \boldsymbol{\xi}(\boldsymbol{\tau}) \,. \tag{49}
$$

<sup>&</sup>quot;Doubt is uncomfortable, certainty is ridiculous." *François-Marie Arouet de Voltaire*

**9.2.8Performance Comparison** 

*t*

yields

2 

−50

smoother (43) – (45).

−45

−40

MSE, dB

−35

−30

described in [10], [13], [16], namely,

which reverts to the Kalman filter at <sup>2</sup>

−10 −5 <sup>0</sup> <sup>5</sup> <sup>10</sup> −55

Figure 7. Fixed-interval smoother performance comparison for Gaussian process noise: (i) Kalman filter; (ii) Maximum likelihood smoother; (iii) Minimum-variance smoother; (iv) H∞ filter; (v) H∞ smoother [10], [13], [16]; and (vi) H∞

"Inquiry is fatal to certainty." *William James Durant*

SNR, dB

into the second row of (53) yields

where *G*(*t*) <sup>1</sup>

It is of interest to compare to compare the performance of (43) – (45) with the H∞ smoother

1 1

and (22). Substituting (54) and its differential into the first row of (53) together with (21)

= 0. Substituting

= 0. Thus, the Hamiltonian form (53) – (54) can be realised by calculating the filtered

−25

smoother (43) – (45).

−20

−15

MSE, dB

−10

−5

0

*xt Atxt Kt zt Ctxt* ˆ ˆ ( ) ( ) ( ) ( )( ( ) ( ) ( )) ˆ , (55)

*xt T Atxt Gt xt T xt* ˆ( | ) ( ) ( ) ( )( ( | ) ( )) ˆ ˆˆ , (56)

<sup>2</sup> () () () () *<sup>T</sup> BtQtB tP t* , which reverts to the maximum-likelihood smoother at

−10 −5 <sup>0</sup> <sup>5</sup> <sup>10</sup> −30

SNR, dB

Figure 8. Fixed-interval smoother performance comparison for sinusoidal process noise: (i) Kalman filter; (ii) Maximum likelihood smoother; (iii) Minimum-variance smoother; (iv) H∞ filter; (v) H∞ smoother [10], [13], [16]; and (vi) H∞

*z t*

( )*t* <sup>1</sup> *P t xt T* ( )( ( | ) ˆ *x t* ˆ( ))

(53)

(54)

(v) (ii) (i)

(iv) (iii)

(vi)

( ) ( ) ( ) ( ) '( ) ( ) () () ( ) *T T T*

*xt T xt Pt t* ˆ ˆ ( | ) () () ()

(i) (ii) (iii)

(iv)

(vi)

(v)

estimate (55) and then obtaining the smoothed estimate from (56).

,

*C tR tCt A t t C tR t*

ˆ(| ) () () () () ˆ(| ) 0

*xt T At BtQtB t xt T*

*For the above system to be in ∞, from Lemma 4, it is required that there exists a solution to (46) for which the existence of a positive definite solution implies <sup>H</sup> uj ∞. The claim ei ∞ follows from Lemma 4. �*

The H∞ solution can be derived as a solution to a two-point boundary value problem, which involves a trade-off between causal and noncausal processes (see [10], [15], [21]). This suggests that the H∞ performance of the above smoother would not improve on that of the filter. Indeed, from Fig. 6, *e* = *f* + *u* and the triangle rule yields <sup>2</sup> *e* ≤ <sup>2</sup> *f* + <sup>2</sup> *u* , where *f* is the H∞ filter error. That is, the error upper bound for the H∞ fixed-interval smoother (43) – (45) is greater than that for the H∞ filter (19) – (20). It is observed below that compared to the minimum-variance case, the H∞ solution exhibits an increased mean-square error.

*Lemma 6 [33]: For the output estimation problem (14) – (18), in which C2(t) = C1(t) = C(t), the smoother solution (43) – (45) results in* 

$$\left\| \mathcal{R}\_{\acute{a}} \mathcal{R}\_{\acute{a}}^{H} \right\|\_{2,\boldsymbol{\gamma}^{-2},\boldsymbol{\gamma}\_{\boldsymbol{0}}} > \left\| \mathcal{R}\_{\acute{a}} \mathcal{R}\_{\acute{a}}^{H} \right\|\_{2,\boldsymbol{\gamma}^{-2},\boldsymbol{ \gamma}\_{\boldsymbol{0}}}.\tag{50}$$

*Proof: By expanding <sup>H</sup>* *ei ei and completing the squares, it can be shown that <sup>H</sup>* *ei ei =* 1 1 *H* *ei ei +*  2 2 *H* *ei ei , where* 2 2 *H* *ei ei =* 1 1 ( ) *<sup>H</sup>*  *Q t* <sup>1</sup> 1 1 1 1 ( ) ( ) *H H <sup>H</sup> Q t Q t*   *and ei*1 *=*  1 1 ( ) *H H Q t*   *=* [  *I +* <sup>1</sup> ( )( ) ] *<sup>H</sup> R t . Substituting = I*  ˆ ˆ <sup>1</sup> ( )( ) *<sup>H</sup> R t into ei*1 *yields*  1 1 <sup>1</sup> ˆ ˆ ( )[( ) ( ) ] *<sup>H</sup> <sup>H</sup> ei R t* **�**, (51)

*which suggests* ˆ  *=* 1/ 2 <sup>0</sup> *Ct KtR t* () () ()  *+* 1/ 2 *R t*( ) *, where* 0 *denotes an operator having the statespace realization* ( ) 0 *A t I I . Constructing* ˆ ˆ *<sup>H</sup> =* <sup>0</sup> () [ () () () *<sup>T</sup> Ct KtRtK t* () () *<sup>T</sup> PtA t* <sup>0</sup> ( ) ( )] ( ) *H T AtPt C t + R(t) and using (22) yields* ˆ ˆ *<sup>H</sup> =* <sup>0</sup> () [ () () () *<sup>T</sup> Ct BtQtB t P t*( )  *+*  2 <sup>0</sup> ( ) ( ) ( ) ( )] ( ) *<sup>T</sup> H T PtC tCtPt C t + R(t). Comparison with <sup>H</sup> =* <sup>0</sup> <sup>0</sup> () () () () () *T HT Ct BtQtB t C t + R(t) leads to* ˆ ˆ *<sup>H</sup> = <sup>H</sup>* <sup>0</sup> *Ct Pt* () ( ()  *+* <sup>2</sup> <sup>0</sup> ( ) ( ) ( ) ( )) ( ) *<sup>T</sup> H T PtC tCtPt C t . Substituting for*  ˆ ˆ *<sup>H</sup> into (51) yields* 

$$\mathcal{R}\_{\text{i1}} = \mathcal{R}(t)[(\Delta \boldsymbol{\Lambda}^{\boldsymbol{H}})^{-1} - (\Delta \boldsymbol{\Lambda}^{\boldsymbol{H}} - \boldsymbol{\mathsf{C}}(t))\mathcal{G}\_{0}(\dot{\mathsf{P}}(t) - \boldsymbol{\mathsf{y}}^{-2}\boldsymbol{\mathsf{P}}(t)\boldsymbol{\mathsf{C}}^{\top}(t)\boldsymbol{\mathsf{C}}(t)\boldsymbol{\mathsf{P}}(t))\mathcal{G}\_{0}^{\boldsymbol{H}}\boldsymbol{\mathsf{C}}^{\top}(t))^{-1}]\boldsymbol{\mathsf{A}}\,. \tag{52}$$
  $\text{The observation (50) follows by inspection of (52).}$ 

Thus, the cost of designing for worst case input conditions is a deterioration in the mean performance. Note that the best possible average performance 2 2 <sup>2</sup> <sup>2</sup> *<sup>H</sup> <sup>H</sup> RR R R ei ei ei ei* can be attained in problems where there are no uncertainties present, <sup>2</sup> 0 and the Riccati equation solution has converged, that is, *P t*() 0 , in which case ˆ ˆ *<sup>H</sup>* = *<sup>H</sup>* and *ei*1 is a zero matrix.

<sup>&</sup>quot;We know accurately only when we know little, with knowledge doubt increases." *Johann Wolfgang von Goethe*

#### **9.2.8Performance Comparison**

Smoothing, Filtering and Prediction:

*ei* 

*∞ follows* 

<sup>224</sup> Estimating the Past, Present and Future

*from Lemma 4. �* The H∞ solution can be derived as a solution to a two-point boundary value problem, which involves a trade-off between causal and noncausal processes (see [10], [15], [21]). This suggests that the H∞ performance of the above smoother would not improve on that of the filter. Indeed, from Fig. 6, *e* = *f* + *u* and the triangle rule yields <sup>2</sup> *e* ≤ <sup>2</sup> *f* + <sup>2</sup> *u* , where *f* is the H∞ filter error. That is, the error upper bound for the H∞ fixed-interval smoother (43) – (45) is greater than that for the H∞ filter (19) – (20). It is observed below that compared to the

*Lemma 6 [33]: For the output estimation problem (14) – (18), in which C2(t) = C1(t) = C(t), the* 

*ei ei and completing the squares, it can be shown that <sup>H</sup>*

1 1 1 1 ( ) ( ) *H H <sup>H</sup> Q t Q t*

 *+ R(t). Comparison with <sup>H</sup> =* <sup>0</sup> <sup>0</sup> () () () () () *T HT Ct BtQtB t C t*

*PtC tCtPt C t*

 *and* 

, (51)

<sup>0</sup> ( ) ( ) ( ) ( )) ( ) *<sup>T</sup> H T*

<sup>2</sup> <sup>2</sup> 2, 0 2, 0

<sup>1</sup> ˆ ˆ ( )[( ) ( ) ] *<sup>H</sup> <sup>H</sup> ei R t*

*Q t* <sup>1</sup>

1 1

*. Constructing* ˆ ˆ *<sup>H</sup> =* <sup>0</sup> () [ () () () *<sup>T</sup> Ct KtRtK t*

 *+ R(t) and using (22) yields* ˆ ˆ *<sup>H</sup> =* <sup>0</sup> () [ () () () *<sup>T</sup> Ct BtQtB t*

1 2 1

"We know accurately only when we know little, with knowledge doubt increases." *Johann Wolfgang von Goethe*

*The observation (50) follows by inspection of (52). �* Thus, the cost of designing for worst case input conditions is a deterioration in the mean

 *+* 1/ 2 *R t*( ) *, where* 

minimum-variance case, the H∞ solution exhibits an increased mean-square error.

*H H ei ei ei ei* 

*∞, from Lemma 4, it is required that there exists a solution to (46) for* 

*∞. The claim* 

. (50)

0 *denotes an operator having the state-*

() () *<sup>T</sup> PtA t*

*P t*( )  *+* 

*. Substituting for* 

. (52)

*<sup>H</sup> <sup>H</sup> RR R R ei ei ei ei* can be

*ei*1 is a zero matrix.

0 and the Riccati equation

 *+* 

 *= I*  ˆ ˆ <sup>1</sup> ( )( ) *<sup>H</sup> R t into* 

*ei*1 *=* 

*ei ei =* 1 1

*ei ei +* 

*ei*1 *yields* 

*H*

*uj* 

*For the above system to be in* 

*smoother solution (43) – (45) results in* 

<sup>0</sup> ( ) ( ) ( ) ( )] ( ) *<sup>T</sup> H T*

*R(t) leads to* ˆ ˆ *<sup>H</sup> = <sup>H</sup>* <sup>0</sup> *Ct Pt* () ( ()

*PtC tCtPt C t*

*H*

*Proof: By expanding <sup>H</sup>*

*ei ei , where* 2 2

2 2 *H*

1 1 ( ) *H H Q t*

*which suggests* ˆ

<sup>0</sup> ( ) ( )] ( ) *H T AtPt C t* 

ˆ ˆ *<sup>H</sup> into (51) yields* 

*space realization* ( )

 *=* [

2

*which the existence of a positive definite solution implies <sup>H</sup>*

*ei ei =* 1 1 ( ) *<sup>H</sup>* 

**�**

 *=* 1/ 2 <sup>0</sup> *Ct KtR t* () () () 

> 0 *A t I I*

 *I +* <sup>1</sup> ( )( ) ] *<sup>H</sup> R t . Substituting* 

attained in problems where there are no uncertainties present, <sup>2</sup>

solution has converged, that is, *P t*() 0 , in which case ˆ ˆ *<sup>H</sup>* = *<sup>H</sup>* and

 *+* <sup>2</sup>

<sup>1</sup> <sup>0</sup> <sup>0</sup> ( )[( ) ( ( ) ( ( ) ( ) ( ) ( ) ( )) ( )) ] *<sup>H</sup> <sup>H</sup> <sup>T</sup> H T ei R t Ct Pt PtC tCtPt C t*

performance. Note that the best possible average performance 2 2 <sup>2</sup> <sup>2</sup>

It is of interest to compare to compare the performance of (43) – (45) with the H∞ smoother described in [10], [13], [16], namely,

$$
\begin{bmatrix}
\dot{\hat{\mathbf{x}}}(t|T) \\
\dot{\tilde{\mathbf{y}}}(t)
\end{bmatrix} = \begin{bmatrix}
A(t) & B(t)Q(t)\mathbf{B}^{\top}(t) \\
\mathbf{C}^{\top}(t)\mathbf{R}^{-1}(t)\mathbf{C}(t) & -A^{\dagger}(t)
\end{bmatrix} \begin{bmatrix}
\hat{\mathbf{x}}(t|T) \\
\boldsymbol{\xi}(t)
\end{bmatrix} + \begin{bmatrix}
0 \\
\end{bmatrix} \mathbf{z}(t) \tag{53}
$$

$$
\hat{\mathbf{x}}(t \mid T) = \hat{\mathbf{x}}(t) + P(t)\xi(t) \tag{54}
$$

and (22). Substituting (54) and its differential into the first row of (53) together with (21) yields

$$
\dot{\hat{\mathbf{x}}}(t) = A(t)\hat{\mathbf{x}}(t) + K(t)(z(t) - \mathbb{C}(t)\hat{\mathbf{x}}(t)) \, , \tag{55}
$$

which reverts to the Kalman filter at <sup>2</sup> = 0. Substituting ( )*t* <sup>1</sup> *P t xt T* ( )( ( | ) ˆ *x t* ˆ( )) into the second row of (53) yields 

$$
\dot{\hat{\mathbf{x}}}(t|T) = A(t)\hat{\mathbf{x}}(t) + G(t)(\hat{\mathbf{x}}(t|T) - \hat{\mathbf{x}}(t)) \, , \tag{56}
$$

where *G*(*t*) <sup>1</sup> <sup>2</sup> () () () () *<sup>T</sup> BtQtB tP t* , which reverts to the maximum-likelihood smoother at 2 = 0. Thus, the Hamiltonian form (53) – (54) can be realised by calculating the filtered estimate (55) and then obtaining the smoothed estimate from (56).

Figure 7. Fixed-interval smoother performance comparison for Gaussian process noise: (i) Kalman filter; (ii) Maximum likelihood smoother; (iii) Minimum-variance smoother; (iv) H∞ filter; (v) H∞ smoother [10], [13], [16]; and (vi) H∞ smoother (43) – (45).

Figure 8. Fixed-interval smoother performance comparison for sinusoidal process noise: (i) Kalman filter; (ii) Maximum likelihood smoother; (iii) Minimum-variance smoother; (iv) H∞ filter; (v) H∞ smoother [10], [13], [16]; and (vi) H∞ smoother (43) – (45).

<sup>&</sup>quot;Inquiry is fatal to certainty." *William James Durant*

that is,

Assuming that *x*0 = 0,

*It is easily verified that* 

 <sup>1</sup> *<sup>T</sup> <sup>T</sup> P AP A CC k kk k k k*

*Qk jk* 

*that the Riccati difference equation* 

The Bounded Real Lemma [18] states that *w* <sup>2</sup> implies *y* <sup>2</sup> if

for a *γ* . Summing (61) from *k* = 0 to *k* = *N* – 1 yields the objective

0 00

*N T k k*

*k*

1 0

2 2

*w w*

Conditions for achieving the above objectives are established below.

*with PT = 0, has a positive definite symmetric solution on [0, N]. Then* 

 

The above lemma relies on the simplifying assumption { }*<sup>T</sup> Ewwj <sup>k</sup>* = *jk I*

<sup>2</sup> <sup>2</sup> <sup>2</sup> <sup>1</sup>

 

0 00

*Proof: From the approach of Xie et al [18], define* 

equation within the above lemma becomes

learning doth make thee mad." *Acts 26: 24*

<sup>111</sup> 0 *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> k k k k kk k k k k x P x x Px y y w w*

> 1 1 2

 *<sup>N</sup> <sup>N</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup>*

0 0 

1

 

2 2 0

*Lemma 7: The discrete-time Bounded Real Lemma [18]: In respect of the above system* 

2 2 1 <sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP B I BP B BP A CC k kk k kk k kk k kk k k k*

> 2 2 1 <sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> k k k k k k k kk p w I B P B B P Ax*

<sup>111</sup> <sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> T T k k k k kk k k k k <sup>k</sup> k k k k k k k kk x P x x Px y y*

<sup>1</sup> <sup>1</sup> <sup>1</sup> ( )( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> AP B CD I BP B DD BP A DC kk k k k kk k k k kk k k k*

*which implies (61) – (62) and (63) under the assumption x0 = 0. �*

*,* 

, the scaled matrix *Bk* = 1/2 *B Qk k* may be used in place of *Bk* above. In the case where

possesses a direct feedthrough matrix, namely, *yk* = *Ckxk* + *Dkwk*, the Riccati difference

. (67)

"And as he thus spake for himself, Festus said with a loud voice, Paul, thou art beside thyself; much

*y y <sup>y</sup> <sup>w</sup>*

*k k*

*N T T*

*x Px y y*

*k*

*w w*

*k k k k*

0 2

*k k*

2

(61)

*x Px y y w w* , (62)

. (63)

*, suppose* 

<sup>2</sup> *.* 

. When { }*<sup>T</sup> Ewwj <sup>k</sup>* =

0

1

 

*N T k k*

*k <sup>N</sup> <sup>T</sup> k k*

1

*w w*

, (65)

2 2 2 1

. (66)

*w w p I BP B p x A P Ax*

*≤ γ for any w* 

. (64)

0

*k*

$$\text{Example: 2 [35]. Let } A = \begin{bmatrix} -1 & 0 \\ 0 & -1 \end{bmatrix}, B = C = Q = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, D = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \text{ and } R = \begin{bmatrix} \sigma\_v^2 & 0 \\ 0 & \sigma\_v^2 \end{bmatrix}$$

denote time-invariant parameters for an output estimation problem. Simulations were conducted for the case of *T = 100* seconds, d*t = 1* millisecond, using 500 realizations of zeromean, Gaussian process noise and measurement noise. The resulting mean-square-error (MSE) versus signal-to-noise ratio (SNR) are shown in Fig. 7. The H∞ solutions were calculated using *a priori* designs of <sup>2</sup> <sup>2</sup> *v* within (22). It can be seen from trace (vi) of Fig. 7 that the H∞ smoothers exhibit poor performance when the exogenous inputs are in fact Gaussian, which illustrates Lemma 6. The figure demonstrates that the minimumvariance smoother out-performs the maximum-likelihood smoother. However, at high SNR, the difference in smoother performance is inconsequential. Intermediate values for 2 may be selected to realise a smoother design that achieves a trade-off between minimum-variance performance (trace (iii)) and H∞ performance (trace (v)).

*Example 3 [35].* Consider the non-Gaussian process noise signal *w*(*t*) = <sup>1</sup> sin( ) sin( ) *<sup>t</sup> t* , where <sup>2</sup> sin( )*t*

denotes the sample variance of sin(*t*). The results of a simulation study appear in Fig. 8. It can be seen that the H∞ solutions, which accommodate input uncertainty, perform better than those relying on Gaussian noise assumptions. In this example, the developed H∞ smoother (43) – (45) exhibits the best mean-square-error performance.

#### **9.3 Robust Discrete-time Estimation**

#### **9.3.1 Discrete-Time Bounded Real Lemma**

The development of discrete-time H∞ filters and smoothers proceeds analogously to the continuous-time case. From Lyapunov stability theory [36], for the unforced system

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k \tag{57}$$

*Ak n n* , to be asymptotically stable over the interval *k* [1, *N*], a Lyapunov function, *Vk*(*xk*), is required to satisfy ()0 *V x k k* , where Δ*Vk*(*xk*) = *Vk+1*(*xk*) – *Vk*(*xk*) denotes the first backward difference of *Vk*(*xk*). Consider the candidate Lyapunov function ( ) *<sup>T</sup> V x x Px k k k kk* , where *Pk* = *<sup>T</sup> Pk n n* is positive definite. To guarantee *xk* <sup>2</sup> , it is required that

$$
\Delta V\_k(\mathbf{x}\_k) = \mathbf{x}\_{k+1}^T P\_{k+1} \mathbf{x}\_{k+1} - \mathbf{x}\_k^T P\_k \mathbf{x}\_k < \mathbf{0} \,. \tag{58}
$$

Now let *y* = *w* denote the output of the system

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{w}\_k \tag{59}$$

$$\mathbf{y}\_k = \mathbf{C}\_k \mathbf{x}\_k \tag{60}$$

where *wk <sup>m</sup>* , *Bk n m* and *Ck <sup>p</sup> <sup>n</sup>* .

<sup>&</sup>quot;Education is the path from cocky ignorance to miserable uncertainty." *Samuel Langhorne Clemens aka. Mark Twain*

The Bounded Real Lemma [18] states that *w* <sup>2</sup> implies *y* <sup>2</sup> if

$$\mathbf{x}\_{k+1}^T P\_{k+1} \mathbf{x}\_{k+1} - \mathbf{x}\_k^T P\_k \mathbf{x}\_k + y\_k^T y\_k - \gamma^2 w\_k^T w\_k < 0 \tag{61}$$

for a *γ* . Summing (61) from *k* = 0 to *k* = *N* – 1 yields the objective

$$-\varepsilon\_n^T P\_n \mathbf{v}\_n + \sum\_{k=0}^{N-1} \mathbf{v}\_k^T \mathbf{v}\_k - \gamma^2 \sum\_{k=0}^{N-1} \mathbf{u}\_k^T \mathbf{w}\_k < 0,\tag{62}$$

that is,

Smoothing, Filtering and Prediction:

and *R* =

2

 

0 *v*

2 0

*v*

<sup>226</sup> Estimating the Past, Present and Future

denote time-invariant parameters for an output estimation problem. Simulations were conducted for the case of *T = 100* seconds, d*t = 1* millisecond, using 500 realizations of zeromean, Gaussian process noise and measurement noise. The resulting mean-square-error (MSE) versus signal-to-noise ratio (SNR) are shown in Fig. 7. The H∞ solutions were

Fig. 7 that the H∞ smoothers exhibit poor performance when the exogenous inputs are in fact Gaussian, which illustrates Lemma 6. The figure demonstrates that the minimumvariance smoother out-performs the maximum-likelihood smoother. However, at high SNR, the difference in smoother performance is inconsequential. Intermediate values for

may be selected to realise a smoother design that achieves a trade-off between

denotes the sample variance of sin(*t*). The results of a simulation study appear in Fig. 8. It can be seen that the H∞ solutions, which accommodate input uncertainty, perform better than those relying on Gaussian noise assumptions. In this example, the developed H∞ smoother (43)

The development of discrete-time H∞ filters and smoothers proceeds analogously to the

*Ak n n* , to be asymptotically stable over the interval *k* [1, *N*], a Lyapunov function, *Vk*(*xk*), is required to satisfy ()0 *V x k k* , where Δ*Vk*(*xk*) = *Vk+1*(*xk*) – *Vk*(*xk*) denotes the first backward difference of *Vk*(*xk*). Consider the candidate Lyapunov function ( ) *<sup>T</sup> V x x Px k k k kk* ,

continuous-time case. From Lyapunov stability theory [36], for the unforced system

where *Pk* = *<sup>T</sup> Pk n n* is positive definite. To guarantee *xk* <sup>2</sup> , it is required that

*k kk k k* <sup>1</sup> *x Ax Bw* ,

*k kk y C x* ,

"Education is the path from cocky ignorance to miserable uncertainty." *Samuel Langhorne Clemens aka.* 

*w* denote the output of the system

1 0 0 1 

, *D* =

0 0 0 0 

within (22). It can be seen from trace (vi) of

sin( ) sin( ) *<sup>t</sup> t* 

*k kk* <sup>1</sup> *x Ax* , (57)

<sup>111</sup> ( ) 0 *<sup>T</sup> <sup>T</sup> V x x P x x Px k k k k k k kk* . (58)

, where <sup>2</sup>

sin( )*t*

> (59) (60)

, *B* = *C* = *Q* =

 <sup>2</sup> *v*

1 0 0 1 

minimum-variance performance (trace (iii)) and H∞ performance (trace (v)). *Example 3 [35].* Consider the non-Gaussian process noise signal *w*(*t*) = <sup>1</sup>

– (45) exhibits the best mean-square-error performance.

*Example: 2 [35].* Let *A* =

2 

Now let *y* =

*Mark Twain*

where *wk <sup>m</sup>* , *Bk n m* and *Ck <sup>p</sup> <sup>n</sup>* .

calculated using *a priori* designs of <sup>2</sup>

**9.3 Robust Discrete-time Estimation** 

**9.3.1 Discrete-Time Bounded Real Lemma** 

$$\frac{-\mathbf{x}\_0^T P\_0 \mathbf{x}\_0 + \sum\_{k=0}^{N-1} \mathbf{y}\_k^T \mathbf{y}\_k}{\sum\_{k=0}^{N-1} \mathbf{w}\_k^T \mathbf{w}\_k} < \boldsymbol{\gamma}^2. \tag{63}$$

Assuming that *x*0 = 0,

$$\left\|\left\|\mathcal{G}\right\|\right\|\_{\circ} = \frac{\left\|\mathcal{Y}\right\|\_{2}}{\left\|w\right\|\_{2}} = \frac{\left\|\mathcal{G}w\right\|\_{2}}{\left\|w\right\|\_{2}} = \frac{\sqrt{\sum\_{k=0}^{N-1} \mathcal{Y}\_{k}^{T} \mathcal{Y}\_{k}}}{\sqrt{\sum\_{k=0}^{N-1} w\_{k}^{T} w\_{k}}} < \mathcal{Y} \,. \tag{64}$$

Conditions for achieving the above objectives are established below.

*Lemma 7: The discrete-time Bounded Real Lemma [18]: In respect of the above system , suppose that the Riccati difference equation* 

$$P\_k = A\_k^T P\_{k+1} A\_k + \gamma^{-2} A\_k^T P\_{k+1} B\_k (I - \gamma^{-2} B\_k^T P\_{k+1} B\_k)^{-1} B\_k^T P\_{k+1} A\_k + \mathcal{C}\_k^T \mathcal{C}\_k \, \, \, \, \tag{65}$$

*with PT = 0, has a positive definite symmetric solution on [0, N]. Then ≤ γ for any w* <sup>2</sup> *.* 

*Proof: From the approach of Xie et al [18], define* 

$$p\_k = w\_k - \gamma^{-2} (I - \gamma^{-2} B\_k^T P\_{k+1} B\_k)^{-1} B\_k^T P\_{k+1} A\_k \mathbf{x}\_k \,. \tag{66}$$

*It is easily verified that* 

$$\mathbf{x}\_{k+1}^{\top}\mathbf{P}\_{k+1}\mathbf{x}\_{k+1} - \mathbf{x}\_{k}^{\top}\mathbf{P}\_{k}\mathbf{x}\_{k} + y\_{k}^{\top}y\_{k} - \boldsymbol{\gamma}^{2}\boldsymbol{w}\_{k}^{\top}\mathbf{w}\_{k} = -\boldsymbol{\gamma}^{-2}p\_{k}^{\top}(\mathbf{I} - \boldsymbol{\gamma}^{-2}\mathbf{B}\_{k}^{\top}\mathbf{P}\_{k+1}\mathbf{B}\_{k})^{-1}\boldsymbol{p}\_{k} - \mathbf{x}\_{k}^{\top}A\_{k}^{\top}\mathbf{P}\_{k+1}A\_{k}\mathbf{x}\_{k+1}$$
  $\text{which implies (61) -- (62) and (63) under the assumption }\mathbf{x}\_{0} = \boldsymbol{0}.$ 

The above lemma relies on the simplifying assumption { }*<sup>T</sup> Ewwj <sup>k</sup>* = *jk I* . When { }*<sup>T</sup> Ewwj <sup>k</sup>* = *Qk jk* , the scaled matrix *Bk* = 1/2 *B Qk k* may be used in place of *Bk* above. In the case where possesses a direct feedthrough matrix, namely, *yk* = *Ckxk* + *Dkwk*, the Riccati difference equation within the above lemma becomes

$$\begin{split} P\_k &= A\_k^\top P\_{k+1} A\_k + \mathbf{C}\_k^\top \mathbf{C}\_k \\ &+ \boldsymbol{\gamma}^{-2} (A\_k^\top P\_{k+1} B\_k + \mathbf{C}\_k^\top D\_k) (I - \boldsymbol{\gamma}^{-2} B\_k^\top P\_{k+1} B\_k - \boldsymbol{\gamma}^{-2} D\_k^\top D\_k)^{-1} (B\_k^\top P\_{k+1} A\_k + D\_k^\top \mathbf{C}\_k) \ . \end{split} \tag{67}$$

<sup>&</sup>quot;And as he thus spake for himself, Festus said with a loud voice, Paul, thou art beside thyself; much learning doth make thee mad." *Acts 26: 24*

where

such that

is the one-step-ahead predictor gain,

*priori* filter within [11], [13], [30].

(70), the predictor error system is

*ei*

where *k k*/ 1 *x* = *<sup>k</sup> x* – / 1 ˆ *k k x* ,

*performance objective* 

 

**9.3.2.3Performance** 

1 1, 2,

2 1, 1, 1, 2, 2, 1, 2, 2,

*C MC I C MC C MC R C MC*

 

*T T kkk kk k*

> 1/ 2, / 1 1,

*k k k*

*i* ,

/1/1

 *–* 

*required that there exists a positive define symmetric solution to* 

*e e* 

*Proof: By applying the Bounded Real Lemma to <sup>H</sup>*

sterile truth for yourself." *Vilfredo Federico Damaso Pareto*

*e C*

1

*N T kk kk*

*k*

0

*k k k kk k k*

*ei* = 2, 1,

that the prediction error satisfies the desired performance objective.

*x A KC K B*

*T T kkk k kk k* 1

(75)

, (76)

*T T*

*C MC I C MC C*

<sup>1</sup> <sup>2</sup> 1, 1, 1, 2, 1,

. The above predictor is also known as an *a* 

(78)

. It is shown below

*ei , it is* 

2, 1, 2, 2, 2,

*C MC R C MC C*

*kkk k kk k k*

/ 1 2, 2, / 1 2, ( ) *<sup>T</sup> <sup>T</sup> K AP C C P C R k k kk k k kk k k*

/ 1 1, 1 1, 1 ( ) *<sup>T</sup> P M CC k k <sup>k</sup> k k* 

and *Mk* = *<sup>T</sup> Mk* > 0 satisfies the Riccati differential equation

1 2 1

*k k k k k kk k k k k T T k k*

(77)

0

Following the approach in the continuous-time case, by subtracting (73) – (74) from (68),

[ ] [0 0]

, <sup>0</sup> *<sup>x</sup>* 0,

*k*

*C*

1 2 0

 *< 0.* 

*N T k k k i i* 

/ 1

*k*

*k*

*k k*

*v*

*w*

[ ] [0 0]

and *i* =

*ei and taking the adjoint to address* 

*v w* 

*k kk k k*

*Lemma 8 [11], [13], [30]: In respect of the H∞ prediction problem (68) – (72), the existence of Mk = <sup>T</sup> Mk > 0 for the Riccati differential equation (77) ensures that the solution (73) – (74) achieves the* 

"Give me a fruitful error any time, full of seeds bursting with its own corrections. You can keep your

*A KC K B*

*x*

*M AM A BQB AM C C M A*

*T T T T kkk kk k k T*

A verification is requested in the problems. It will be shown that predictors, filters and smoothers satisfy a H∞ performance objective if there exist solutions to Riccati difference equations arising from the application of Lemma 7 to the corresponding error systems. A summary of the discrete-time results from [5], [11], [13] and the further details described in [21], [30], is presented below.

#### **9.3.2Discrete-Time H∞ Prediction**

#### **9.3.2.1Problem Definition**

Consider a nominal system 

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{z}v\_k,\tag{68}$$

$$\mathbf{y}\_{2,k} = \mathbf{C}\_{2,k} \mathbf{x}\_{k,\prime} \tag{69}$$

together with a fictitious reference system realised by (68) and

$$\mathbf{y}\_{1,k} = \mathbf{C}\_{1,k} \mathbf{x}\_k \tag{70}$$

where *Ak*, *Bk*, *C2,k* and *C1,k* are of appropriate dimensions. The problem of interest is to find a solution that produces one-step-ahead predictions, 1, / 1 ˆ *k k y* , given measurements

$$z\_k = y\_{2,k} + \upsilon\_k \tag{71}$$

at time *k* – 1. The prediction error is defined as

$$e\_{k/k=1} = y\_{1,k} - \hat{y}\_{1,k/k=1} \,. \tag{72}$$

The error sequence (72) is generated by *e* = *ei i*, where *ei* = 2 1 [ ] , *i* = *v w* and 1 *N* 1 *N* 

the objective is to achieve /1/1 0 *T kk kk k e e* – 2 0 *T k k k i i* < 0, for some *<sup>γ</sup>* . For convenience, it is assumed that *wk <sup>m</sup>* , { } *E wk* = 0, { }*<sup>T</sup> Ewwj <sup>k</sup>* = *Qk jk* , *vk <sup>p</sup>* , { } *E vk* = 0, { }*<sup>T</sup> Evvj <sup>k</sup>* = *Rk jk* and { }*<sup>T</sup> Ewvj <sup>k</sup>* = 0.

#### **9.3.2.2H∞ Solution**

The H∞ predictor has the same structure as the optimum minimum-variance (or Kalman) predictor. It is given by

$$
\hat{\mathbf{x}}\_{k+1/k} = \left(\mathbf{A}\_k - \mathbf{K}\_k \mathbf{C}\_{2,k}\right) \hat{\mathbf{x}}\_{k/k-1} + \mathbf{K}\_k \mathbf{z}\_k,\tag{73}
$$

$$
\hat{\mathbf{y}}\_{1,k/k-1} = \mathbf{C}\_{1,k} \hat{\mathbf{x}}\_{k/k-1},\tag{74}
$$

<sup>&</sup>quot;Why waste time learning when ignorance is instantaneous?" *William Boyd Watterson II*

where

Smoothing, Filtering and Prediction:

(68) (69)

A

<sup>228</sup> Estimating the Past, Present and Future

A verification is requested in the problems. It will be shown that predictors, filters and smoothers satisfy a H∞ performance objective if there exist solutions to Riccati difference equations arising from the application of Lemma 7 to the corresponding error systems. A summary of the discrete-time results from [5], [11], [13] and the further details described in

<sup>1</sup> , *k kk k k x Ax Bw*

2, 2, *k kk y C x* ,

that produces one-step-ahead predictions, 1, / 1

where *Ak*, *Bk*, *C2,k* and *C1,k* are of appropriate dimensions. The problem of interest is to find a

*N T k k k i i* 

1 2 0

The H∞ predictor has the same structure as the optimum minimum-variance (or Kalman)

1/ 2, / 1 ˆ ˆ , *k k k k k kk k k x A KC x Kz*

1, / 1 1, / 1 ˆ ˆ , *kk k kk y Cx*

"Why waste time learning when ignorance is instantaneous?" *William Boyd Watterson II*

*ei i*, where

realised by (68) and

1, 1, *k kk y C x* , (70)

*k kk* 2, *z y v* (71)

/ 1 1, 1, / 1 ˆ *kk k kk e yy* . (72)

*ei* = 2 1 [

< 0, for some *<sup>γ</sup>* . For convenience, it

, *vk <sup>p</sup>* , { } *E vk* = 0, { }*<sup>T</sup> Evvj <sup>k</sup>* = *Rk jk*

] , *i* =

*v w* 

and

(73) (74)

ˆ *k k y* , given measurements

[21], [30], is presented below.

**9.3.2.1Problem Definition**  Consider a nominal system

solution

the objective is to achieve

and { }*<sup>T</sup> Ewvj <sup>k</sup>* = 0.

**9.3.2.2H∞ Solution** 

predictor. It is given by

**9.3.2Discrete-Time H∞ Prediction** 

together with a fictitious reference system

at time *k* – 1. The prediction error is defined as

The error sequence (72) is generated by *e* =

1

*N T kk kk*

*k*

is assumed that *wk <sup>m</sup>* , { } *E wk* = 0, { }*<sup>T</sup> Ewwj <sup>k</sup>* = *Qk jk*

0

/1/1

–

*e e* 

$$K\_k = A\_k P\_{k \nmid k-1} \mathbb{C}\_{2,k}^T (\mathbb{C}\_{2,k} P\_{k \nmid k-1} \mathbb{C}\_{2,k}^T + R\_k)^{-1} \tag{75}$$

is the one-step-ahead predictor gain,

$$P\_{k/k-1} = (M\_k^{-1} - \gamma^{-2} \mathbf{C}\_{1,k+1} \mathbf{^T C}\_{1,k+1})^{-1} \, \tag{76}$$

and *Mk* = *<sup>T</sup> Mk* > 0 satisfies the Riccati differential equation

$$M\_{k+1} = A\_k M\_k A\_k^T + B\_k Q\_k B\_k^T - A\_k M\_k \begin{bmatrix} \mathbf{C}\_{1,k}^T & \mathbf{C}\_{2,k}^T \end{bmatrix} \begin{bmatrix} \mathbf{C}\_{1,k} M\_k \mathbf{C}\_{1,k}^T - \gamma^2 I & \mathbf{C}\_{1,k} M\_k \mathbf{C}\_{2,k}^T \\ \mathbf{C}\_{2,k} M\_k \mathbf{C}\_{1,k}^T & R\_k + \mathbf{C}\_{2,k} M\_k \mathbf{C}\_{2,k}^T \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{C}\_{1,k} \\ \mathbf{C}\_{2,k} \end{bmatrix} M\_k A\_k^T \tag{77}$$

such that 2 1, 1, 1, 2, 2, 1, 2, 2, 0 *T T kkk kk k T T kkk k kk k C MC I C MC C MC R C MC* . The above predictor is also known as an *a priori* filter within [11], [13], [30]. 

**9.3.2.3Performance** 

Following the approach in the continuous-time case, by subtracting (73) – (74) from (68), (70), the predictor error system is

$$
\begin{bmatrix}
\tilde{\boldsymbol{x}}\_{k+1/k} \\
\boldsymbol{e}\_{k/k-1}
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{A}\_{k} - \boldsymbol{K}\_{k}\mathbf{C}\_{2,k} & \begin{bmatrix}
\end{bmatrix}
\end{bmatrix} \begin{bmatrix}
\tilde{\boldsymbol{\mathcal{X}}}\_{k/k-1} \\
\boldsymbol{v}\_{k}
\end{bmatrix} \begin{bmatrix}
\boldsymbol{v}\_{k} \\
\boldsymbol{v}\_{k}
\end{bmatrix}, \quad \tilde{\boldsymbol{x}}\_{0} = \mathbf{0},
\tag{78}
$$

$$=\mathcal{R}\_i i\_{\mu}$$

where *k k*/ 1 *x* = *<sup>k</sup> x* – / 1 ˆ *k k x* , *ei* = 2, 1, [ ] [0 0] *k kk k k k A KC K B C* and *i* = *v w* . It is shown below

that the prediction error satisfies the desired performance objective.

*Lemma 8 [11], [13], [30]: In respect of the H∞ prediction problem (68) – (72), the existence of Mk = <sup>T</sup> Mk > 0 for the Riccati differential equation (77) ensures that the solution (73) – (74) achieves the performance objective*  1 /1/1 0 *N T kk kk k e e –*  1 2 0 *N T k k k i i < 0.* 

*Proof: By applying the Bounded Real Lemma to <sup>H</sup> ei and taking the adjoint to address ei , it is required that there exists a positive define symmetric solution to* 

<sup>&</sup>quot;Give me a fruitful error any time, full of seeds bursting with its own corrections. You can keep your sterile truth for yourself." *Vilfredo Federico Damaso Pareto*

is generated by *e* =

objective is to achieve

**9.3.3.2H∞ Solution** 

*N T kk kk*

*k*

[30]. Output estimates are obtained from

The filter gain is calculated as

such that

**9.3.3.3Performance** 

error system may be written as

happen on my watch." *George Walker Bush*

1/ 1 1,

1

*ei i*, where

/ / 0

As explained in Chapter 4, filtered states can be evolved from

where *Mk* = *<sup>T</sup> Mk* > 0 satisfies the Riccati differential equation

2 1, 1 1 1, 1 1, 1 1 2, 1 2, 1 1 1, 1 1 2, 1 1 2, 1

*C MC I C MC C MC R C MC*

 

*T T kkk kk k*

2, 1 1/ 1 ˆ *LC A x k kk k k* + 2, 1 1 ( ( *LC A x k kkk* + 1 1) *B w k k* + )*<sup>k</sup> v* . Denote *ik* =

–

*e e* 

1 2 0

<sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> L MC CMC R k kk k kk k*

*T T kkk k kk k*

1 1 1 1 1 1 1 1 1, 1 2, 1 *<sup>T</sup> <sup>T</sup> T T M AM A BQB AM C C k k k k k kk k k k k*

*T T*

*C MC I C MC C*

1, 1 1 1, 1 1, 1 1 2, 1 1, 1

*C MC R C MC C*

Subtracting from (83) from (68) gives *k k*/ *x* = 1 1/ 1 ˆ *A x k kk* + *B w k k* 1 1 – 1 1/ 1 ˆ *A x k kk* +

[0 0]

"I believe the most solemn duty of the American president is to protect the American people. If America shows uncertainty and weakness in this decade, the world will drift toward tragedy. This will not

/ 2, 1 2, 1 1/ 1

( ) [ ( )]

*k k kkk k k kk k k k k k k x I LC A L I LC B x e C i*

 

> *ei i* ,

2, 1 1 1, 1 1 2, 1 1 2, 1 2, 1

 

*kkk k kk k k*

<sup>1</sup> <sup>2</sup>

 

*kkk kk k k T*

*T T k k*

0

.

*N T k k k i i* 

*ei* = 2 1 [

where *Lk n p* is a filter gain. The above recursion is called an *a posteriori* filter in [11], [13],

] , *i* = *<sup>v</sup>*

/ 1 1/ 1 2, 1 1/ 1 ˆ ˆ ( ˆ ), *kk k k k k k k k k k x A x Lz C A x* (83)

1, 1/ 1 1, 1 1/ 1 ˆ ˆ *kk k kk y Cx* . (84)

, (85)

< 0, for some *<sup>γ</sup>* .

*w* 

. The H∞ performance

1 1

, then the filtered

(86)

(87)

*M A*

1 *k k v w* 

<sup>1</sup> 2, 2, ( )( )*<sup>T</sup> P A KC P A KC <sup>k</sup> k k kk k k k* <sup>2</sup> <sup>2</sup> <sup>1</sup> 2, 1, 1, 1, 2, 2, ( )( )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Ak k kk k K C PC I C PC C P A K C K R K BQ B kk k kk k k k k k k k kk* <sup>2</sup> <sup>2</sup> <sup>1</sup> 2, 1, 1, 1, 1, 2, ( )( ( ) )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> A K C P PC I C PC C P A K C K R K BQ B k kk k kk kk k kk k k k k k k k kk* 1 2 <sup>1</sup> 2, 1, 1, 2, ( )( )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Ak kk k KC P C C A KC K RK BQB k k k k k k k k k kk* (79)

*in which use was made of the Matrix Inversion Lemma. Defining Pk k*/ 1 *=* <sup>1</sup> (*Pk +* <sup>2</sup> <sup>1</sup> 1, 1, ) *<sup>T</sup> C Ck k leads to*

$$\begin{split} \left(P\_{k+1/k}^{-1} + \boldsymbol{\mathcal{Y}}^{-2} \mathbf{C}\_{1,k+1} \mathbf{C}\_{1,k+1}\right)^{-1} &= \left(A\_{k} - K\_{k} \mathbf{C}\_{2,k}\right) P\_{k/k-1} \left(A\_{k} - K\_{k} \mathbf{C}\_{2,k}\right)^{T} + K\_{k} R\_{k} K\_{k}^{T} + B\_{k} Q\_{k} B\_{k}^{T} \\ &= A\_{k} P\_{k/k-1} A\_{k}^{T} + B\_{k} Q\_{k} B\_{k}^{T} - A\_{k} P\_{k/k-1} \mathbf{C}\_{2,k}^{T} \left(\mathbf{R} + \mathbf{C}\_{k} P\_{k/k-1} \mathbf{C}\_{2,k}^{T}\right)^{-1} \mathbf{C}\_{2,k} P\_{k/k-1} A\_{k}^{T} .\end{split}$$

*and applying the Matrix Inversion Lemma gives* 

$$\left(\boldsymbol{P}\_{k+1/k}^{-1} + \boldsymbol{\mathcal{V}}^{-2}\boldsymbol{\mathsf{C}}\_{1,k+1}^{\top}\boldsymbol{\mathsf{C}}\_{1,k+1}\right)^{-1} = \boldsymbol{A}\_{k}\boldsymbol{P}\_{k/k-1}\boldsymbol{A}\_{k}^{\top} + \boldsymbol{B}\_{k}\boldsymbol{Q}\_{k}\boldsymbol{B}\_{k}^{\top} - \boldsymbol{A}\_{k}\boldsymbol{P}\_{k/k-1}\boldsymbol{\mathsf{C}}\_{2,k}^{\top}\left(\boldsymbol{R} + \boldsymbol{\mathsf{C}}\_{2,k}\boldsymbol{P}\_{k/k-1}\boldsymbol{\mathsf{C}}\_{2,k}^{\top}\right)^{-1}\boldsymbol{\mathsf{C}}\_{2,k}\boldsymbol{P}\_{k/k-1}\boldsymbol{A}\_{k}^{\top}\boldsymbol{\mathsf{C}}\_{2,k}^{\top}\left(\boldsymbol{R} + \boldsymbol{\mathsf{C}}\_{2,k}\boldsymbol{P}\_{k/k-1}\boldsymbol{\mathsf{C}}\_{2,k}^{\top}\right)^{-1}\boldsymbol{A}\_{k}^{\top}\boldsymbol{\mathsf{C}}\_{3,k}\boldsymbol{P}\_{k/k-1}\boldsymbol{A}\_{k}^{\top}\boldsymbol{\mathsf{C}}\_{4,k}\boldsymbol{A}\_{k}^{\top}\boldsymbol{A}\_{k}^{\top}\boldsymbol{A}\_{k}^{\top}$$

$$= \boldsymbol{A}\_{k}(\boldsymbol{P}\_{k/k-1} + \boldsymbol{\mathsf{C}}\_{2,k}\boldsymbol{\mathsf{R}}\_{k}^{\top}\boldsymbol{\mathsf{C}}\_{2,k})^{-1}\boldsymbol{A}\_{k}^{\top} + \boldsymbol{B}\_{k}\boldsymbol{Q}\_{k}\boldsymbol{B}\_{k}^{\top}\boldsymbol{A}\_{k}^{\top}.$$

*The change of variable (76), namely,* <sup>1</sup> *Pk k*/ 1 = <sup>1</sup> *Mk –* <sup>2</sup> 1, 1, *<sup>T</sup> C Ck k* , *results in* 

$$\begin{split} M\_{k+1} &= A\_k (M\_k^{-1} + \mathbf{C}\_{2,k} \, ^T R\_k^{-1} \mathbf{C}\_{2,k} - \boldsymbol{\gamma} \, ^{-2} \mathbf{C}\_{1,k}^T \mathbf{C}\_{1,k})^{-1} A\_k^T + B\_k \mathbf{Q}\_k B\_k^T \\ &= A\_k (M\_k^{-1} + \overline{\mathbf{C}}\_k \overline{R}\_k \overline{\mathbf{C}}\_k^T)^{-1} A\_k^T + B\_k \mathbf{Q}\_k B\_k^T \,. \end{split} \tag{80}$$

*where* 1, 2, *k k k C <sup>C</sup> <sup>C</sup> and*  <sup>2</sup> 0 0 *<sup>k</sup> k <sup>I</sup> <sup>R</sup> R . Applying the Matrix Inversion Lemma within (80) gives* 

$$M\_{k+1} = A\_k M A\_k^T - A\_k M\_k \overline{\mathcal{C}}\_k^T (\overline{R}\_k + \overline{\mathcal{C}}\_k M\_k \overline{\mathcal{C}}\_k^T)^{-1} \overline{\mathcal{C}}\_k M\_k A\_k^T + B\_k Q\_k B\_k^T \tag{81}$$

*Expanding (81) yields (77). The existence of Mk > 0 for the above Riccati differential equation implies Pk > 0 for (79). Thus, it follows from Lemma 7 that the stated performance objective is achieved. �*

#### **9.3.3Discrete-Time H∞ Filtering**

#### **9.3.3.1Problem Definition**

Consider again the configuration of Fig. 1. Assume that the systems and have the realisations (68) – (69) and (68), (70), respectively. It is desired to find a solution that operates on the measurements (71) and produces the filtered estimates 1, / ˆ *k k y* . The filtered error sequence,

$$\mathbf{e}\_{k \times k} = \mathbf{y}\_{1,k} - \hat{\mathbf{y}}\_{1,k \times k} \tag{82}$$

<sup>&</sup>quot;Never interrupt your enemy when he is making a mistake." *Napoléon Bonaparte*

is generated by *e* = *ei i*, where *ei* = 2 1 [ ] , *i* = *<sup>v</sup> w* . The H∞ performance objective is to achieve 1 / / 0 *N T kk kk k e e* – 1 2 0 *N T k k k i i* < 0, for some *<sup>γ</sup>* .

#### **9.3.3.2H∞ Solution**

Smoothing, Filtering and Prediction:

 *+* <sup>2</sup> <sup>1</sup> 1, 1, ) *<sup>T</sup> C Ck k*

(80)

have the

that operates

<sup>230</sup> Estimating the Past, Present and Future

2, 1, 1, 1, 2, 2, ( )( )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Ak k kk k K C PC I C PC C P A K C K R K BQ B kk k kk k k k k k k k kk*

2, 1, 1, 1, 1, 2, ( )( ( ) )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> A K C P PC I C PC C P A K C K R K BQ B k kk k kk kk k kk k k k k k k k kk*

(79)

/ 1 / 1 2, / 1 2, 2, / 1 ( ) . *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> AP A BQB AP C R C P C C P A k kk k k k k k kk k k kk k k kk k* 

> 1, 1, *<sup>T</sup> C Ck k*

, *results in* 

*. Applying the Matrix Inversion Lemma within (80) gives* 

/ 1, 1, / ˆ *kk k kk e yy* , (82)

 and 

<sup>1</sup> 2, 2, ( )( )*<sup>T</sup> P A KC P A KC <sup>k</sup> k k kk k k k*

1 2 <sup>1</sup>

1 2 1

*where* 1,

*k*

2, *k*

*C <sup>C</sup> <sup>C</sup>* 

*k*

**9.3.3Discrete-Time H∞ Filtering** 

**9.3.3.1Problem Definition** 

*leads to*

<sup>2</sup> <sup>2</sup> <sup>1</sup>

<sup>2</sup> <sup>2</sup> <sup>1</sup>

*and applying the Matrix Inversion Lemma gives* 

*The change of variable (76), namely,* <sup>1</sup> *Pk k*/ 1

 *and* 

1 1

<sup>1</sup> <sup>1</sup> ( ) *T T <sup>T</sup> Ak k k k k k k kk M C RC A BQB* .

0 *<sup>k</sup>*

*<sup>I</sup> <sup>R</sup>*

 2, 1, 1, 2, ( )( )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Ak kk k KC P C C A KC K RK BQB k k k k k k k k k kk*

*in which use was made of the Matrix Inversion Lemma. Defining Pk k*/ 1 *=* <sup>1</sup> (*Pk*

1/ 1, 1 1, 1 2, / 1 2, ( )( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P C C A KC P A KC K RK BQB k k k k k k k kk k k k k k k k k k*

<sup>1</sup>

1 2 1 1 1/ 1, 1 1, 1 / 1 / 1 2, 2, / 1 2, 2, / 1 ( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P C C AP A BQB AP C R C P C C P A k k k k k kk k k k k k kk k k kk k k kk k*

> / 1 2, 2, ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> Ak kk P C R C A BQB k k k k k kk* .

> > *–* <sup>2</sup>

1

, (81)

= <sup>1</sup> *Mk*

1 1 2 1 <sup>1</sup> 2, 2, 1, 1, ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Mk k k kk k A M C R C C C A BQB k k k k kk* 

 

<sup>2</sup> 0

*k*

<sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Mk k k k k k k k k k k k k k kk A MA A M C R C M C C M A B Q B*

*Expanding (81) yields (77). The existence of Mk > 0 for the above Riccati differential equation implies Pk > 0 for (79). Thus, it follows from Lemma 7 that the stated performance objective is achieved. �*

on the measurements (71) and produces the filtered estimates 1, / ˆ *k k y* . The filtered error sequence,

*R*

Consider again the configuration of Fig. 1. Assume that the systems

"Never interrupt your enemy when he is making a mistake." *Napoléon Bonaparte*

realisations (68) – (69) and (68), (70), respectively. It is desired to find a solution

As explained in Chapter 4, filtered states can be evolved from

$$
\hat{\mathbf{x}}\_{k/k} = A\_{k-1}\hat{\mathbf{x}}\_{k-1/k-1} + L\_k(\mathbf{z}\_k - \mathbf{C}\_{2,k}A\_{k-1}\hat{\mathbf{x}}\_{k-1/k-1}),\tag{83}
$$

where *Lk n p* is a filter gain. The above recursion is called an *a posteriori* filter in [11], [13], [30]. Output estimates are obtained from

$$
\hat{y}\_{1,k-1/k-1} = \mathbb{C}\_{1,k-1} \hat{\mathbf{x}}\_{k-1/k-1} \,. \tag{84}
$$

The filter gain is calculated as

$$L\_k = M\_k \mathbf{C}\_k^\top \left(\mathbf{C}\_k M\_k \mathbf{C}\_k^\top + R\_k\right)^{-1} \,\prime \tag{85}$$

where *Mk* = *<sup>T</sup> Mk* > 0 satisfies the Riccati differential equation

1 1 1 1 1 1 1 1 1, 1 2, 1 *<sup>T</sup> <sup>T</sup> T T M AM A BQB AM C C k k k k k kk k k k k* <sup>1</sup> <sup>2</sup> 1, 1 1 1, 1 1, 1 1 2, 1 1, 1 1 1 2, 1 1 1, 1 1 2, 1 1 2, 1 2, 1 *T T kkk kk k k T T T k k kkk k kk k k C MC I C MC C M A C MC R C MC C* (86) 2 1, 1 1 1, 1 1, 1 1 2, 1 2, 1 1 1, 1 1 2, 1 1 2, 1 0 *T T kkk kk k T T kkk k kk k C MC I C MC C MC R C MC* .

such that

#### **9.3.3.3Performance**

Subtracting from (83) from (68) gives *k k*/ *x* = 1 1/ 1 ˆ *A x k kk* + *B w k k* 1 1 – 1 1/ 1 ˆ *A x k kk* + 2, 1 1/ 1 ˆ *LC A x k kk k k* + 2, 1 1 ( ( *LC A x k kkk* + 1 1) *B w k k* + )*<sup>k</sup> v* . Denote *ik* = 1 *k k v w* , then the filtered

error system may be written as

$$
\begin{bmatrix}
\tilde{\mathbf{x}}\_{k\,\,k} \\
\mathbf{e}\_{k-1\,\,k-1}
\end{bmatrix} = \begin{bmatrix}
(I - L\_k \mathbf{C}\_{2,k}) A\_{k-1} & \begin{bmatrix}
\end{bmatrix}
\end{bmatrix} \begin{bmatrix}
\tilde{\mathbf{x}}\_{k-1\,\,k-1} \\
\dot{\mathbf{e}}\_{k}
\end{bmatrix} \tag{87}
$$

$$
= \mathcal{R}\_{\mathbf{d}} \mathbf{i}\_{\,\,k}
$$

<sup>&</sup>quot;I believe the most solemn duty of the American president is to protect the American people. If America shows uncertainty and weakness in this decade, the world will drift toward tragedy. This will not happen on my watch." *George Walker Bush*

1 1

*achieved. �*

*k k k k k k k k kk k k k k k*

<sup>1</sup> ( )( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> T T AMC BJD CMC DJD AMC BJD k k k kk k k k k kk k k k k kk k*

*<sup>C</sup>*2,1,*<sup>k</sup>* = *C*2,*<sup>k</sup>* , *D*1,1,*<sup>k</sup>* = 1, <sup>0</sup> *<sup>D</sup> <sup>k</sup>* , *D*1,2,*<sup>k</sup>* = *I* and *D*2,1,*<sup>k</sup>* = 2,*<sup>k</sup> I D* into (95) yields

1, 1, 1, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2, 2,

*<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> C MA D QB C MA D QB k k k k k k k k k k kk <sup>T</sup> BQB k kk* .

1/ / 1 2,1, / 1 ˆ ˆ ( ˆ ) *k k k kk k k k kk x Ax K z C x* ,

1, / 1,1, / 1 2,1, / 1 ˆ ˆ ( ˆ ) *kk k kk k k k kk y C x Lz C x* ,

"If we knew what it is we were doing, it would not be called research, would it?" *Albert Einstein*

*C MC D QD I C MC D QD C MC D QD R C MC D QD*

*T T T T k k k kk k k k k kk k*

*T T T T k k k kk k k k k k kk k*

2, 2,

*k k k kk k*

*T T*

*T k k k kk k*

 

*x AB x e C DD i z CD y*

1 1,1, / 1,1, 1,1, 1,2,

> *C C*

2,1, *k k*

, *Ck* = 1,1,

**9.3.4Solution to the General Filtering Problem** 

immediately from their results. Consider

*<sup>T</sup> <sup>T</sup> Mk k k k kk k AMA BJB*

Suppose in a general filtering problem that

1, 1,

*k k kk T T*

1, 1, 1, 2, 2,

2, ) *<sup>T</sup> BQD kk k k*

*AMC BQD M AMA AMC BQD* 

Let *<sup>k</sup> J* = <sup>2</sup>

1

The filter solution is given by

where *Kk* = 2, ( *<sup>T</sup> Akk k M C* + <sup>1</sup>

+ 2, 2, *<sup>T</sup> D QD kk k* + *Rk*.

<sup>1</sup>

{}0 0 *<sup>T</sup> Eiik k*

*which is the same as (86). The existence of Mk > 0 for the above Riccati difference equation implies the existence of a Pk > 0 for (88). Thus, it follows from Lemma 7 that the stated performance objective is* 

Limebeer, Green and Walker express Riccati difference equations such as (86) in a compact form using J-factorisation [5], [21]. The solutions for the general filtering problem follow

0

0 ˆ

, *Dk* = 1,1, 1,2,

*D*

2,1, 0 *k k*

. (95)

<sup>1</sup> <sup>2</sup>

, *Lk* = 1, 2, ( *<sup>T</sup> C MC kk k* + <sup>1</sup>

1, 2, ) *<sup>T</sup> D QD kk k k*

*k D D* . (94)

and *Bk* = 1,1, <sup>0</sup> *<sup>B</sup> <sup>k</sup>* . From the

(96)

(97) (98)

*<sup>T</sup> C MC kk k*

and Ω*k* = 2, 2,

is realised by (68), 2,*<sup>k</sup> y* = *C x* 2,*k k* + *D w*2,*k k* ,

2,1, 2,1, 1, /

approach of [5], [21], the Riccati difference equation corresponding to the H∞ problem (94) is

is realised by (68) and 1,*<sup>k</sup> y* = *C x* 1,*k k* + *D w*1,*k k* . Then substituting *B*1,1,*<sup>k</sup>* = 0 *Bk* , *C*1,1,*<sup>k</sup>* = *C*1,*<sup>k</sup>* ,

1 1 1 1 1 1 1 11 ( ) *T T <sup>T</sup> Mk k k k k k k k kk A M C RC A BQB* , (93)

with <sup>0</sup> *x* = 0, where *ei* = 2, 1 2, 1 1, ( ) [ ( )] [0 0] *kkk k k kk k I LC A L I LC B C* . It is shown below that the

filtered error satisfies the desired performance objective.

*Lemma 9 [11], [13], [30]: In respect of the H∞ problem (68) – (70), (82), the solution (83) – (84) achieves the performance*  1 / / 0 *N T kk kk k e e –*  1 2 0 *N T k k k i i < 0.* 

*Proof: By applying the Bounded Real Lemma to <sup>H</sup> ei and taking the adjoint to address ei , it is required that there exists a positive define symmetric solution to*

$$\begin{aligned} P\_{k+1} &= (I - L\_k \mathbf{C}\_{2,k}) A\_{k-1} P\_k A\_{k-1}^T (I - \mathbf{C}\_{2,k}^T \mathbf{L}\_k^T) \\ &+ \boldsymbol{\gamma}^{-2} (I - L\_k \mathbf{C}\_{2,k}) A\_{k-1} P\_k \mathbf{C}\_{1,k}^T (I - \boldsymbol{\gamma}^{-2} \mathbf{C}\_{1,k} \mathbf{P}\_k \mathbf{C}\_{1,k}^T)^{-1} \mathbf{C}\_{1,k} P\_k A\_{k-1}^T (I - L\_k \mathbf{C}\_{2,k})^T \\ &+ (I - L\_k \mathbf{C}\_{2,k}) B\_{k-1} Q\_{k-1} B\_{k-1} (I - \mathbf{C}\_{2,k}^T \mathbf{L}\_k^T) + L\_k R\_k \mathbf{L}\_k^T \end{aligned} \tag{88}$$
 
$$= (I - L\_k \mathbf{C}\_{2,k}) A\_{k-1} (P\_k^{-1} - \mathbf{\gamma}^{-2} \mathbf{C}\_{1,k}^T \mathbf{C}\_{1,k}) A\_{k-1}^T (I - \mathbf{C}\_{2,k}^T \mathbf{L}\_k^T)$$
 
$$\quad + (I - L\_k \mathbf{C}\_{2,k}) B\_{k-1} Q\_{k-1} B\_{k-1} (I - \mathbf{C}\_{2,k}^T \mathbf{L}\_k^T) + L\_k R\_k \mathbf{L}\_k^T,$$

in which use was made of the Matrix Inversion Lemma. Defining

$$P\_{k/k-1}^{-1} = P\_k^{-1} - \gamma^{-2} \mathbf{C}\_{1,k}^{-T} \mathbf{C}\_{1,k} \, \prime \tag{89}$$

using (85) and applying the Matrix Inversion Lemma leads to

$$(P\_{k+1|k}^{-1} + \gamma^{-2} \mathbf{C}\_{1,k}^{T} \mathbf{C}\_{1,k}) = (I - L\_k \mathbf{C}\_{2,k})(A\_{k-1} R\_{k'k-1} A\_{k-1}^T + B\_{k-1} Q\_{k-1} B\_{k-1}^T)(I - \mathbf{C}\_{2,k}^{T} L\_k^T) + L\_k R\_k L\_k^T$$

$$= (I - L\_k \mathbf{C}\_{2,k}) M\_k (I - \mathbf{C}\_{2,k}^{T} L\_k^T) + L\_k R\_k L\_k^T$$

$$= M\_k - M\_k \mathbf{C}\_{2,k}^T (R\_k + \mathbf{C}\_{2,k} M\_k \mathbf{C}\_{2,k}^T)^{-1} \mathbf{C}\_{2,k} M\_k$$

$$= (M\_k^{-1} + \mathbf{C}\_{2,k}^T R\_k^{-1} \mathbf{C}\_{2,k})^{-1},$$

*where* 

$$M\_k = A\_{k-1} P\_{k/k-1} A\_{k-1}^T + B\_{k-1} Q\_{k-1} B\_{k-1}^T. \tag{91}$$

*It follows from (90) that* <sup>1</sup> *Pk k* 1/ + <sup>2</sup> 1, 1, *<sup>T</sup> C Ck k* = <sup>1</sup> *Mk* + <sup>1</sup> 2, 2, *<sup>T</sup> C RC kk k and*

$$\begin{split} P\_{k/k-1}^{-1} &= M\_{k-1}^{-1} + \mathbf{C}\_{2,k-1}^{T} R\_{k-1}^{-1} \mathbf{C}\_{2,k-1} - \boldsymbol{\mathcal{T}}^{-1} \mathbf{C}\_{1,k-1}^{T} \mathbf{C}\_{1,k-1} \\ &= M\_{k-1}^{-1} + \overline{\mathbf{C}}\_{k-1}^{T} \overline{R}\_{k-1}^{-1} \overline{\mathbf{C}}\_{k-1} \end{split} \tag{92}$$

*where* 1, 2, *k k k C <sup>C</sup> <sup>C</sup> and*  <sup>2</sup> 0 0 *<sup>k</sup> k <sup>I</sup> <sup>R</sup> R . Substituting (92) into (91) yields* 

<sup>&</sup>quot;Hell, there are no rules here – we're trying to accomplish something." *Thomas Alva Edison*

Smoothing, Filtering and Prediction:

*ei and taking the adjoint to address* 

*,* (89)

*<sup>T</sup> <sup>T</sup> Mk k kk k k k k AP A BQB* . (91)

 *and*

 + <sup>1</sup> 2, 2, *<sup>T</sup> C RC kk k*

*. Substituting (92) into (91) yields* 

. It is shown below that the

*ei , it is* 

(88)

(90)

(92)

<sup>232</sup> Estimating the Past, Present and Future

( ) [ ( )]

*I LC A L I LC B*

*kkk k k kk*

*Lemma 9 [11], [13], [30]: In respect of the H∞ problem (68) – (70), (82), the solution (83) – (84)* 

 *< 0.* 

[0 0]

*ei* = 2, 1 2, 1

1 2 0

2, 1 1, 1, 1, 1, 1 2, () ( ) () *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> kkkkk kk k kk k k k*

 *I LC A PC I C PC C PA I LC* 

> 1 1 2 / 1 1, 1, *<sup>T</sup> P P CC kk k k k*

1/ 1, 1, 2, 1 / 1 1 1 1 1 2, ( ) ( )( )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> T T <sup>T</sup> P C C I LC A P A B Q B I C L LRL k k k k k k k kk k k k k kk k kk*

2, 2, 2, 2, ( ) *<sup>T</sup> <sup>T</sup> Mk k k k kk k kk MC R C MC C M*

1, 1, *<sup>T</sup> C Ck k*

1 1 1 2 / 1 1 2, 1 1 2, 1 1, 1 1, 1 *<sup>T</sup> <sup>T</sup> P M C RC C C kk k kk k k k*

> 1 111 *<sup>T</sup> Mk kkk C RC* ,

> > *k*

"Hell, there are no rules here – we're trying to accomplish something." *Thomas Alva Edison*

*R*

<sup>2</sup> 0

= <sup>1</sup> *Mk*

1 /1 1 1 1 1

*kk k kk k kk I LC M I C L LRL*

*N T k k k i i* 

1,

*C*

filtered error satisfies the desired performance objective.

/ / 0

 *–* 

*required that there exists a positive define symmetric solution to*

2, 1 1, 1, 1 2, ( )( )( ) *<sup>T</sup> <sup>T</sup> T T kkk k kk k k k I LC A P C C A I C L* 

in which use was made of the Matrix Inversion Lemma. Defining

using (85) and applying the Matrix Inversion Lemma leads to

2, 2, ( ) *<sup>T</sup> M C RC k kk k ,*

+ <sup>2</sup>

0 *<sup>k</sup>*

*<sup>I</sup> <sup>R</sup>*

2, 2, ( )( ) *T T <sup>T</sup>*

<sup>1</sup>

<sup>1</sup> 1 1

<sup>1</sup> <sup>1</sup>

 *and* 

*e e* 

1

*N T kk kk*

*k*

<sup>1</sup> 2, 1 1 2, () () *<sup>T</sup> T T P I LC A PA I C L <sup>k</sup> k k k kk k k*

1 2

*Proof: By applying the Bounded Real Lemma to <sup>H</sup>*

<sup>2</sup> <sup>2</sup> <sup>1</sup>

 2, 1 1 1 2, ( ) ( ) *T TT <sup>T</sup> k kk k k kk k kk I LC B Q B I C L LRL*

 2, 1 1 1 2, ( ) ( ) *T TT <sup>T</sup> k kk k k kk k kk I LC B Q B I C L LRL* ,

*k*

with <sup>0</sup> *x* = 0, where

*achieves the performance* 

1 2

*where* 

*where* 1,

*k*

*It follows from (90) that* <sup>1</sup> *Pk k* 1/

2, *k*

*C <sup>C</sup> <sup>C</sup>* 

*k*

$$M\_k = A\_{k-1}(M\_{k-1}^{-1} + \overline{\mathbf{C}}\_{k-1} \overline{\mathbf{R}}\_{k-1} \overline{\mathbf{C}}\_{k-1}^T)^{-1} A\_{k-1}^T + B\_{k-1} Q\_{k-1} B\_{k-1}^T \tag{93}$$

*which is the same as (86). The existence of Mk > 0 for the above Riccati difference equation implies the existence of a Pk > 0 for (88). Thus, it follows from Lemma 7 that the stated performance objective is achieved. �*

#### **9.3.4Solution to the General Filtering Problem**

Limebeer, Green and Walker express Riccati difference equations such as (86) in a compact form using J-factorisation [5], [21]. The solutions for the general filtering problem follow immediately from their results. Consider

$$
\begin{bmatrix} \mathbf{x}\_{k+1} \\ \mathbf{e}\_{k/k} \\ \mathbf{z}\_k \end{bmatrix} = \begin{bmatrix} A\_k & B\_{1,1,k} & \mathbf{0} \\ \mathbf{C}\_{1,1,k} & D\_{1,1,k} & D\_{1,2,k} \\ \mathbf{C}\_{2,1,k} & D\_{2,1,k} & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{x}\_k \\ \dot{\mathbf{z}}\_k \\ -\hat{\mathbf{y}}\_{1,k/k} \end{bmatrix}. \tag{94}
$$

Let *<sup>k</sup> J* = <sup>2</sup> {}0 0 *<sup>T</sup> Eiik k* , *Ck* = 1,1, 2,1, *k k C C* , *Dk* = 1,1, 1,2, 2,1, 0 *k k k D D D* and *Bk* = 1,1, <sup>0</sup> *<sup>B</sup> <sup>k</sup>* . From the

approach of [5], [21], the Riccati difference equation corresponding to the H∞ problem (94) is

$$\begin{split} M\_{k+1} &= A\_k M\_k A\_k^T + \overline{B}\_k l\_k \overline{B}\_k^T \\ &- (A\_k M\_k \overline{C}\_k^T + \overline{B}\_k l\_k \overline{D}\_k^T) (\overline{C}\_k M\_k \overline{C}\_k^T + \overline{D}\_k l\_k \overline{D}\_k^T)^{-1} (A\_k M\_k \overline{C}\_k^T + \overline{B}\_k l\_k \overline{D}\_k^T)^T. \end{split} \tag{95}$$

Suppose in a general filtering problem that is realised by (68), 2,*<sup>k</sup> y* = *C x* 2,*k k* + *D w*2,*k k* , is realised by (68) and 1,*<sup>k</sup> y* = *C x* 1,*k k* + *D w*1,*k k* . Then substituting *B*1,1,*<sup>k</sup>* = 0 *Bk* , *C*1,1,*<sup>k</sup>* = *C*1,*<sup>k</sup>* , *<sup>C</sup>*2,1,*<sup>k</sup>* = *C*2,*<sup>k</sup>* , *D*1,1,*<sup>k</sup>* = 1, <sup>0</sup> *<sup>D</sup> <sup>k</sup>* , *D*1,2,*<sup>k</sup>* = *I* and *D*2,1,*<sup>k</sup>* = 2,*<sup>k</sup> I D* into (95) yields

 1, 1, 1 2, 2, *T T T k k k kk k k k kk T T k k k kk k AMC BQD M AMA AMC BQD* <sup>1</sup> <sup>2</sup> 1, 1, 1, 1, 1, 2, 1, 2, 2, 1, 2, 1, 2, 2, 2, 2, *T T T T k k k kk k k k k kk k T T T T k k k kk k k k k k kk k C MC D QD I C MC D QD C MC D QD R C MC D QD* 1, 1, 1, 2, 2, *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> C MA D QB C MA D QB k k k k k k k k k k kk <sup>T</sup> BQB k kk* . (96) 

The filter solution is given by

$$
\hat{\mathbf{x}}\_{k \leftrightarrow 1/k} = \mathbf{A}\_k \hat{\mathbf{x}}\_{k/k-1} + \mathbf{K}\_k (\mathbf{z}\_k - \mathbf{C}\_{2,1,k} \hat{\mathbf{x}}\_{k/k-1}) \,\tag{97}
$$

$$
\hat{y}\_{1,k/k} = \mathbf{C}\_{1,1,k}\hat{\mathbf{x}}\_{k/k-1} + \mathbf{L}\_k(\mathbf{z}\_k - \mathbf{C}\_{2,1,k}\hat{\mathbf{x}}\_{k/k-1}) \, , \tag{98}
$$

where *Kk* = 2, ( *<sup>T</sup> Akk k M C* + <sup>1</sup> 2, ) *<sup>T</sup> BQD kk k k* , *Lk* = 1, 2, ( *<sup>T</sup> C MC kk k* + <sup>1</sup> 1, 2, ) *<sup>T</sup> D QD kk k k* and Ω*k* = 2, 2, *<sup>T</sup> C MC kk k* + 2, 2, *<sup>T</sup> D QD kk k* + *Rk*.

<sup>&</sup>quot;If we knew what it is we were doing, it would not be called research, would it?" *Albert Einstein*

*solutions to (77) for* 

<sup>2</sup> *.* 

> 

*a linear combination of x ,* 

algorithm, see Chapter 8.

*implies e* 

*Therefore,* 

*ei* =

 *and i* 

*Outline of Proof: From Lemma 8, x*

**9.3.5.4Performance Comparison** 

where *A* , 0 < *A* < 1. Estimates for <sup>2</sup>

−5

MSE [dB]

0

5

2,

2,

to 5 dB. The speech sample is modelled as a first-order autoregressive process

1 1 2, 2, 2, 2,

 

*k kk k k T T TT T k k k k kk k k T k k k kk k k*

0 0

> *ei*

<sup>2</sup> *, since it evolves within the predictor error system.* 

<sup>2</sup> *.* 

*<sup>k</sup>* <sup>1</sup> *k k x Ax w* , (105)

*<sup>w</sup>* and *A* were calculated at 20 dB SNR using an EM

(iv)

.

*, that is, i* 

<sup>2</sup>

<sup>2</sup> *, since it is* 

0

*Lemma 10: In respect of the smoother error system (104), if there exists a symmetric positive definite* 

 *> 0, then the smoother (101) – (103) achieves* 

<sup>2</sup> *, since it evolves within the adjoint predictor error system. Then e* 

*Example 4 [28].* A voiced speech utterance "a e i o u" was sampled at 8 kHz for the purpose of comparing smoother performance. Simulations were conducted with the zero-mean, unity-variance speech sample interpolated to a 16 kHz sample rate, to which 200 realizations of Gaussian measurement noise were added and the signal to noise ratio was varied from -5

−5 <sup>0</sup> <sup>5</sup> −10

Fig. 9. Speech estimate performance comparison: (i) data (crosses), (ii) Kalman filter (dotted line), (iii) H filter (dashed line), (iv) minimum-variance smoother (dot-dashed line) and (v) H smoother (solid line).

"If I have seen further it is only by standing on the shoulders of giants." *Isaac Newton*

SNR

(i)

(ii)

(iii)

(v)

*C C A CK C*

*A KC K B*

1 1

*R C RK R I*

#### **9.3.5 Discrete-Time H∞ Smoothing**

#### **9.3.5.1 Problem Definition**

Suppose that measurements (72) of a system (68) – (69) are available over an interval *k* [1, *N*]. The problem of interest is to calculate smoothed estimates / ˆ *k N y* of *<sup>k</sup> y* such that the error sequence

$$e\_{k/N} = y\_k - \hat{y}\_{k/N} \tag{99}$$

is in <sup>2</sup> .

#### **9.3.5.2 H∞ Solution**

The following fixed-interval smoother for output estimation [28] employs the gain for the H∞ predictor,

$$K\_k = A\_k P\_{k/k-1} \mathbf{C}\_{2,k}^T \mathbf{Q}\_k^{-1} \tag{100}$$

where Ω*k* = 2, / 1 2, *<sup>T</sup> CP C k kk k* + *Rk*, in which *Pk k*/ 1 is obtained from (76) and (77). The gain (100) is used in the minimum-variance smoother structure described in Chapter 7, *viz.*,

$$
\begin{bmatrix}
\hat{\mathbf{x}}\_{k+1/k} \\
\boldsymbol{\alpha}\_{k}
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{A}\_{k} - \boldsymbol{K}\_{k}\mathbf{C}\_{2,k} & \mathbf{K}\_{k} \\
\end{bmatrix} \begin{bmatrix}
\hat{\mathbf{x}}\_{k/k-1} \\
\boldsymbol{z}\_{k}
\end{bmatrix} \tag{101}
$$

$$
\begin{bmatrix} \tilde{\boldsymbol{\xi}}\_{k-1} \\ \boldsymbol{\mathcal{B}}\_{k} \end{bmatrix} = \begin{bmatrix} \boldsymbol{A}\_{k}^{\mathrm{T}} - \mathbf{C}\_{2,k}^{\mathrm{T}} \mathbf{K}\_{k}^{\mathrm{T}} & \mathbf{C}\_{2,k}^{\mathrm{T}} \boldsymbol{\Omega}\_{k}^{-1/2} \\\ -\mathbf{K}\_{k}^{\mathrm{T}} & \boldsymbol{\Omega}\_{k}^{-1/2} \end{bmatrix} \begin{bmatrix} \boldsymbol{\xi}\_{k} \\ \boldsymbol{\mathcal{Q}}\_{k} \end{bmatrix}, \boldsymbol{\xi}\_{\mathcal{N}} = \mathbf{0} \end{bmatrix}, \tag{102}
$$

$$
\hat{y}\_{k \wedge N} = \mathbf{z}\_k - \mathbf{R}\_k \boldsymbol{\beta}\_k \,. \tag{103}
$$

It is argued below that this smoother meets the desired H∞ performance objective.

#### **9.3.5.3H∞ Performance**

*ei*

It is easily shown that the smoother error system is

$$
\begin{bmatrix}
\tilde{\mathbf{x}}\_{k+1/k} \\
\tilde{\mathbf{z}}\_{k-1} \\
\mathbf{z}\_{k/N}
\end{bmatrix} = \begin{bmatrix}
\begin{bmatrix}
A\_{k} - \mathbf{K}\_{k} \mathbf{C}\_{2,k} & \mathbf{0} \\
\mathbf{C}\_{2,k}^{T} \mathbf{Q}\_{k}^{-1} \mathbf{C}\_{2,k} & A\_{k}^{T} - \mathbf{C}\_{2,k}^{T} \mathbf{K}\_{k}^{T} \\
\mathbf{C}\_{2,k} \mathbf{Q}\_{k}^{-1} \mathbf{C}\_{2,k} & -\mathbf{R}\_{k} \mathbf{K}\_{k}^{T}
\end{bmatrix} & \begin{bmatrix}
\mathbf{C}\_{2,k}^{T} \mathbf{Q}\_{k}^{-1} & \mathbf{0} \\
\mathbf{R}\_{k} \mathbf{Q}\_{k}^{-1} - I & \mathbf{0}
\end{bmatrix}
\begin{bmatrix}
\tilde{\mathbf{x}}\_{k/k-1} \\
\tilde{\mathbf{z}}\_{k} \\
\mathbf{w}\_{k} \\
\mathbf{w}\_{k}
\end{bmatrix},
\end{bmatrix},
\tag{104}
$$

with <sup>0</sup> *x* 0, where *k k*/ 1 *x* = *<sup>k</sup> x* – / 1 ˆ *k k x* , *i* = *v w* and

*i* ,

<sup>&</sup>quot;I have had my results for a long time: but I do not yet know how I am to arrive at them." *Karl Friedrich Gauss*

Smoothing, Filtering and Prediction:

(101)

(102)

(103)

(104)

<sup>234</sup> Estimating the Past, Present and Future

Suppose that measurements (72) of a system (68) – (69) are available over an interval *k* [1, *N*]. The problem of interest is to calculate smoothed estimates / ˆ *k N y* of *<sup>k</sup> y* such that the error

The following fixed-interval smoother for output estimation [28] employs the gain for the

/ 1 2, *<sup>T</sup> K AP C k k kk k k*

is used in the minimum-variance smoother structure described in Chapter 7, *viz.*,

*x A KC K x*

1 2, 2,

2, 2, 2, 2, <sup>1</sup> <sup>1</sup> <sup>1</sup> / 2,

*e R C RK R I*

*x A KC K B C C A CK C*

*T TT T k k kk k k k*

*A CK C K*

*k k k k*

 *C z* 

 

/ ˆ *kN k k k y zR*

It is argued below that this smoother meets the desired H∞ performance objective.

/ 1 2, 1/ <sup>1</sup> <sup>1</sup>

0

*k k k kk k k k k <sup>T</sup> T TT T <sup>k</sup> k k k k kk k k <sup>k</sup>*

*<sup>k</sup> <sup>T</sup> k N k k k kk k k*

 

> *v w*

"I have had my results for a long time: but I do not yet know how I am to arrive at them." *Karl Friedrich* 

and

It is easily shown that the smoother error system is

1/ 2, / 1 1/ 2 1/ 2 2, ˆ ˆ *k k k kk k k k k kkk k*

1

1/ 2

*T N*

.

*<sup>T</sup> CP C k kk k* + *Rk*, in which *Pk k*/ 1 is obtained from (76) and (77). The gain (100)

1/ 2 , 0

/ / ˆ *kN k kN e yy* (99)

, (100)

,

0 0

*k*

*v*

,

*w*

*x*

,

**9.3.5 Discrete-Time H∞ Smoothing** 

**9.3.5.1 Problem Definition** 

sequence

is in <sup>2</sup> .

**9.3.5.2 H∞ Solution** 

where Ω*k* = 2, / 1 2,

**9.3.5.3H∞ Performance** 

*ei*

*Gauss*

*i* ,

with <sup>0</sup> *x* 0, where *k k*/ 1 *x* = *<sup>k</sup> x* – / 1 ˆ *k k x* , *i* =

H∞ predictor,

$$
\mathbf{Z}\_{k} = \begin{bmatrix}
\begin{bmatrix}
\boldsymbol{A}\_{k} - \boldsymbol{\mathsf{K}}\_{k} \boldsymbol{\mathsf{C}}\_{2,k} & \mathbf{0} \\
\boldsymbol{\mathsf{C}}\_{2,k}^{\top} \boldsymbol{\mathsf{Q}}\_{k}^{-1} \boldsymbol{\mathsf{C}}\_{2,k} & \boldsymbol{A}\_{k}^{\top} - \boldsymbol{\mathsf{C}}\_{2,k}^{\top} \boldsymbol{\mathsf{K}}\_{k}^{\top}
\end{bmatrix} & \begin{bmatrix}
\boldsymbol{\mathsf{C}}\_{2,k}^{\top} \boldsymbol{\mathsf{Q}}\_{k}^{-1} & \mathbf{0}
\end{bmatrix} \\
\begin{bmatrix}
\boldsymbol{R}\_{k} \boldsymbol{\mathsf{Q}}\_{k}^{-1} \mathbf{C}\_{2,k} & -\boldsymbol{R}\_{k} \boldsymbol{\mathsf{K}}\_{k}^{\top}
\end{bmatrix} & \begin{bmatrix}
\boldsymbol{R}\_{k} \boldsymbol{\mathsf{Q}}\_{k}^{-1} - \boldsymbol{I} & \mathbf{0}
\end{bmatrix}.
\end{bmatrix}.
$$

*Lemma 10: In respect of the smoother error system (104), if there exists a symmetric positive definite solutions to (77) for > 0, then the smoother (101) – (103) achieves ei , that is, i*  <sup>2</sup> *implies e* <sup>2</sup> *.* 

*Outline of Proof: From Lemma 8, x* <sup>2</sup> *, since it evolves within the predictor error system. Therefore,*  <sup>2</sup> *, since it evolves within the adjoint predictor error system. Then e*  <sup>2</sup> *, since it is a linear combination of x , and i* <sup>2</sup> *.* 

#### **9.3.5.4Performance Comparison**

*Example 4 [28].* A voiced speech utterance "a e i o u" was sampled at 8 kHz for the purpose of comparing smoother performance. Simulations were conducted with the zero-mean, unity-variance speech sample interpolated to a 16 kHz sample rate, to which 200 realizations of Gaussian measurement noise were added and the signal to noise ratio was varied from -5 to 5 dB. The speech sample is modelled as a first-order autoregressive process

$$\mathbf{x}\_{k+1} = A\mathbf{x}\_k + \mathbf{w}\_k \tag{105}$$

where *A* , 0 < *A* < 1. Estimates for <sup>2</sup> *<sup>w</sup>* and *A* were calculated at 20 dB SNR using an EM algorithm, see Chapter 8.

Fig. 9. Speech estimate performance comparison: (i) data (crosses), (ii) Kalman filter (dotted line), (iii) H filter (dashed line), (iv) minimum-variance smoother (dot-dashed line) and (v) H smoother (solid line).

<sup>&</sup>quot;If I have seen further it is only by standing on the shoulders of giants." *Isaac Newton* is

*Proposition 1 [28]: In the above output estimation problem:* 

2

*v*

2 2

*v v*

0 0 { ,} { ,}

2

*v*

2 2

*v v*

 

2 2

*v v*

*observation (107) follows from* ( ) 2 2 () 2 1

 

 

*(ii) Observation (108) follows from* ( ) (2)

*(v) Observation (111) follows from* ( ) (2)

"All programmers are optimists." *Frederick P. Brooks, Jr*

weighting to the data. Since <sup>2</sup> <sup>0</sup>

*(iv) Observation (110) follows from* <sup>2</sup>

(108) and (111), as <sup>2</sup>

 *Rei*<sup>&</sup>lt;

*Outline of Proof: (i) Let* ( )

 

0 { ,}

0 0 { ,} { ,} lim sup ( ) lim sup ( ) .

0 { ,}

0 0 { ,} { ,} lim sup ( ) lim sup ( ) .

*(iii) Observation (109) follows immediately from the application of (107) in (106).* 

*v*

lim *<sup>v</sup> Rei* →

, it follows from (109) that an *a priori* design estimate is

( )

*<sup>j</sup> H e <sup>F</sup>* 

(2) ( )

 

lim sup ( ) ( ) lim sup ( ) ( ) .

( )

*<sup>j</sup> H e <sup>S</sup>* 

(2) ( )

 

(1,1) 1( ) *L cp* 

> (1,1) <sup>0</sup> lim ( )

smoothers asymptotically approach a short circuit (or zero impedance) when <sup>2</sup>

*<sup>j</sup> <sup>j</sup> H e <sup>S</sup> H e <sup>S</sup>*

 *v v*

(1,1) (1,1) *<sup>p</sup> <sup>p</sup> , which results in* <sup>2</sup>

2 2 () 2 1

(1,1) (1,1) *<sup>p</sup> <sup>p</sup> which results in* <sup>2</sup>

An interpretation of (107) and (110) is that the maximum magnitudes of the filters and

short circuit asymptote closer than the optimal minimum-variance solutions. That is, for low measurement noise, the robust solutions accommodate some uncertainty by giving greater

 

 *= 0.* 

*<sup>v</sup> <sup>v</sup> c p*

 *which implies* <sup>2</sup>

lim sup ( ) 1.

 

*<sup>j</sup> <sup>j</sup> H e <sup>F</sup> H e <sup>F</sup>*

() () () () 2

*<sup>H</sup> <sup>j</sup> <sup>H</sup> <sup>j</sup> RR e ei ei HH e F F <sup>v</sup>*

(107)

(108)

(109)

(111)

( ) 0 lim *v L* 

( ) <sup>3</sup> <sup>0</sup> lim *v*

 *>* <sup>2</sup>

*<sup>v</sup>* and the H∞ filter achieves the performance

 = *<sup>v</sup>* .

*<sup>v</sup>* → 0, the maximum magnitudes of the H∞ solutions approach the

 *>* <sup>2</sup>

 

(110)

( ) 0 lim *v L* 

 *= 1.* 

(2) 0 lim *v L* 

(2) 0 lim *<sup>v</sup>*

*. �*

*<sup>v</sup>* → 0. From

*.* 

 

(1,1) *p denote the (1,1) component of* ( ) *P . The low measurement noise* 

lim sup ( ) 1.

(i)

(ii)

(iii)

(iv)

(v)

 

Simulations were conducted in which a minimum-variance filter and a fixed-interval smoother were employed to recover the speech message from noisy measurements. The results are provided in Fig. 9. As expected, the smoother out-performs the filter. Searches were conducted for minimum values of *γ* such that solutions to the design Riccati difference equations were positive definite for each noise realisation. The performance of the resulting H∞ filter and smoother are indicated by the dashed line and solid line of the figure. It can be seen for this example that the H∞ filter out-performs the Kalman filter. The figure also indicates that the robust smoother provides the best performance and exhibits about 4 dB reduction in mean-square-error compared to the Kalman filter at 0 dB SNR. This performance benefit needs to be reconciled against the extra calculation cost of combining robust forward and backward state predictors within (101) – (103).

#### **9.3.5.5High SNR and Low SNR Asymptotes**

An understanding of why robust solutions are beneficial in the presence of uncertainties can be gleaned by examining single-input-single-output filtering and equalisation. Consider a time-invariant plant having the canonical form

$$A = \begin{bmatrix} 0 & 1 & \cdots & 0 \\ 0 & & \ddots & \\ \vdots & & & \\ -a\_0 & -a\_1 & & -a\_{n-1} \end{bmatrix}, \ B = \begin{bmatrix} 0 \\ \vdots \\ 0 \\ 1 \end{bmatrix}, \ C\_2 = \begin{bmatrix} c & 0 & \cdots & 0 \end{bmatrix},$$

*a0*, ... *an-1*, *c* . Since the plant is time-invariant, the transfer function exists and is denoted by *G*(*z*). Some notation is defined prior to stating some observations for output estimation problems. Suppose that an H∞ filter has been constructed for the above plant. Let the H∞ algebraic Riccati equation solution, predictor gain, filter gain, predictor, filter and smoother transfer function matrices be denoted by ( ) *P* , *K*( ) , ( ) *L* , ( ) ( ) *H z <sup>P</sup>* , ( ) ( ) *H z <sup>F</sup>* and ( ) ( ) *H z <sup>S</sup>* respectively. The H filter transfer function matrix may be written as ( ) ( ) *H z <sup>F</sup>* = ( ) *L* + (*I* – () () ) () *LH z <sup>P</sup>* where ( ) *L* = *I* – () 1 *R*( ) . The transfer function matrix of the map from the inputs to the filter output estimation error is

$$R\_{\circ}^{(\circ)}(z) = -\left[H\_{\mathrm{F}}^{(\circ)}(z)\sigma\_{\circ} \quad (H\_{\mathrm{F}}^{(\circ)}(z) - I)G(z)\sigma\_{\circ}\right].\tag{106}$$

The H∞ smoother transfer function matrix can be written as ( ) ( ) *H z <sup>S</sup>* = *I* – ( ) () 1 () ( ( ( )) )( ) ( ( )) *<sup>H</sup> RI H z <sup>P</sup> <sup>P</sup> IH z* . Similarly, let (2) *P* , (2) *K* , (2) *L* , (2) ( ) *H z <sup>F</sup>* and (2) ( ) *H z <sup>S</sup>* denote the minimum-variance algebraic Riccati equation solution, predictor gain, filter gain, filter and smoother transfer function matrices respectively.

<sup>&</sup>quot;In computer science, we stand on each other's feet." *Brian K. Reid*

#### *Proposition 1 [28]: In the above output estimation problem:*

(i)

Smoothing, Filtering and Prediction:

<sup>236</sup> Estimating the Past, Present and Future

Simulations were conducted in which a minimum-variance filter and a fixed-interval smoother were employed to recover the speech message from noisy measurements. The results are provided in Fig. 9. As expected, the smoother out-performs the filter. Searches were conducted for minimum values of *γ* such that solutions to the design Riccati difference equations were positive definite for each noise realisation. The performance of the resulting H∞ filter and smoother are indicated by the dashed line and solid line of the figure. It can be seen for this example that the H∞ filter out-performs the Kalman filter. The figure also indicates that the robust smoother provides the best performance and exhibits about 4 dB reduction in mean-square-error compared to the Kalman filter at 0 dB SNR. This performance benefit needs to be reconciled against the extra calculation cost of combining

An understanding of why robust solutions are beneficial in the presence of uncertainties can be gleaned by examining single-input-single-output filtering and equalisation. Consider a

*B*

*a0*, ... *an-1*, *c* . Since the plant is time-invariant, the transfer function exists and is denoted by *G*(*z*). Some notation is defined prior to stating some observations for output estimation problems. Suppose that an H∞ filter has been constructed for the above plant. Let the H∞ algebraic Riccati equation solution, predictor gain, filter gain, predictor, filter and smoother

where ( ) *L* = *I* – () 1 *R*( ) . The transfer function matrix of the map from the

( ) () 1 () ( ( ( )) )( ) ( ( )) *<sup>H</sup> RI H z <sup>P</sup> <sup>P</sup> IH z* . Similarly, let (2) *P* , (2) *K* , (2) *L* , (2) ( ) *H z <sup>F</sup>* and (2) ( ) *H z <sup>S</sup>* denote the minimum-variance algebraic Riccati equation solution, predictor gain, filter gain,

0

 

> 0 1

, <sup>2</sup> *C c* 0 0 ,

 *w* . (106)

, ( ) ( ) *H z <sup>F</sup>*

and ( ) ( ) *H z <sup>S</sup>*

= ( ) *L* + (*I* –

= *I* –

robust forward and backward state predictors within (101) – (103).

0 1 1

transfer function matrices be denoted by ( ) *P* , *K*( ) , ( ) *L* , ( ) ( ) *H z <sup>P</sup>*

filter and smoother transfer function matrices respectively.

"In computer science, we stand on each other's feet." *Brian K. Reid*

*aa a*

*n*

respectively. The H filter transfer function matrix may be written as ( ) ( ) *H z <sup>F</sup>*

( ) ( ) ( ) () [ () ( () ) () ] *R z H z H z IGz ei F vF* 

The H∞ smoother transfer function matrix can be written as ( ) ( ) *H z <sup>S</sup>*

01 0 0 0

 ,

**9.3.5.5High SNR and Low SNR Asymptotes** 

time-invariant plant having the canonical form

*A*

inputs to the filter output estimation error is

() () ) () *LH z <sup>P</sup>*

$$\lim\_{\sigma\_{\nu}^{2}\to 0} \sup\_{\boldsymbol{\alpha}\in\{-\boldsymbol{\pi},\boldsymbol{\pi}\}} \left| H\_{\boldsymbol{F}}^{(\boldsymbol{\alpha})}(\boldsymbol{e}^{\boldsymbol{\alpha}\boldsymbol{\pi}}) \right| = 1. \tag{107}$$

(ii)

$$\lim\_{\sigma\_s^2 \to 0} \sup\_{a \in \{-\pi, \pi\}} \left| H\_{\mathbb{F}}^{(2)}(e^{i\nu}) \right| < \lim\_{\sigma\_s^2 \to 0} \sup\_{a \in \{-\pi, \pi\}} \left| H\_{\mathbb{F}}^{(e)}(e^{i\nu}) \right|. \tag{108}$$

$$\begin{aligned} \text{(iii)}\\ \lim\_{\sigma\_v^2 \to 0} \sup\_{a \sim \{-x, x\}} \left| R\_{cl}^{(\alpha)} (R\_{cl}^{(\alpha)})^H (e^{/a}) \right| &= \lim\_{\sigma\_v^2 \to 0} \sup\_{a \sim \{-x, x\}} \left| H\_F^{(\alpha)} (H\_F^{(\alpha)})^H (e^{/a}) \right| \sigma\_v^2. \end{aligned} \tag{109}$$

(iv)

$$\lim\_{\sigma\_s^2 \to 0} \sup\_{\boldsymbol{\alpha} \in \{-\pi, \pi\}} \left| H\_S^{(\boldsymbol{\alpha})} (\boldsymbol{\varepsilon}^{\boldsymbol{\alpha}}) \right| = 1. \tag{110}$$

(v)

2 2 (2) ( ) 0 0 { ,} { ,} lim sup ( ) lim sup ( ) . *v v <sup>j</sup> <sup>j</sup> H e <sup>S</sup> H e <sup>S</sup>* (111) 

*Outline of Proof: (i) Let* ( ) (1,1) *p denote the (1,1) component of* ( ) *P . The low measurement noise observation (107) follows from* ( ) 2 2 () 2 1 (1,1) 1( ) *L cp v v which implies* <sup>2</sup> ( ) 0 lim *v L = 1.* 

*(ii) Observation (108) follows from* ( ) (2) (1,1) (1,1) *<sup>p</sup> <sup>p</sup> , which results in* <sup>2</sup> ( ) 0 lim *v L >* <sup>2</sup> (2) 0 lim *v L .* 


An interpretation of (107) and (110) is that the maximum magnitudes of the filters and smoothers asymptotically approach a short circuit (or zero impedance) when <sup>2</sup> *<sup>v</sup>* → 0. From (108) and (111), as <sup>2</sup> *<sup>v</sup>* → 0, the maximum magnitudes of the H∞ solutions approach the short circuit asymptote closer than the optimal minimum-variance solutions. That is, for low measurement noise, the robust solutions accommodate some uncertainty by giving greater weighting to the data. Since <sup>2</sup> <sup>0</sup> lim *<sup>v</sup> Rei* → *<sup>v</sup>* and the H∞ filter achieves the performance *Rei*<sup>&</sup>lt; , it follows from (109) that an *a priori* design estimate is = *<sup>v</sup>* .

<sup>&</sup>quot;All programmers are optimists." *Frederick P. Brooks, Jr*

and (118), as <sup>2</sup>

performance *Rei*

**9.4 Conclusion** 

1

*v*

by giving less weighting to the data. Since <sup>2</sup> <sup>0</sup>

input energy of an error system, that is, minimise

maximum magnitude of the error power spectrum density.

by giving less weighting to the data.

*(iii) The observation (116) follows immediately from the application of (114) in (113). �* An interpretation of (114) and (117) is that the maximum magnitudes of the equalisers

circuit asymptote closer than that of the optimum minimum-variance solution. That is, under high measurement noise conditions, robust solutions accommodate some uncertainty

> lim *v*

Proposition 1 follows intuitively. Indeed, the short circuit asymptote is sometimes referred to as the singular filter. Proposition 2 may appear counter-intuitive and warrants further explanation. When the plant is minimum phase and the measurement noise is negligible, the equaliser inverts the plant. Conversely, when the equalisation problem is dominated by measurement noise, the solution is a low gain filter; that is, the estimation error is minimised

Uncertainties are invariably present within the specification of practical problems. Consequently, robust solutions have arisen to accommodate uncertain inputs and plant models. The H∞ performance objective is to minimise the ratio of the output energy to the

2 2

continuous time output estimation, the error covariance is found from the solution of

<sup>0</sup> <sup>0</sup> <sup>2</sup> sup sup *ei <sup>i</sup> <sup>i</sup>*

for some *γ* . In the time-invariant case, the objective is equivalent to minimising the

Predictors, filters and smoothers that satisfy the above performance objective are found by applying the Bounded Real Lemma. The standard solution structures are retained but larger design error covariances are employed to account for the presence of uncertainty. In

<sup>1</sup> <sup>2</sup> ( ) ( ) ( ) ( ) ( ) ( )( ( ) ( ) ( ) ( ) ( )) ( ) ( ) ( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Pt AtPt Pt A t Pt C tR tC t C tC t Pt BtQtB t*

Discrete-time predictors, filters and smoothers for output estimation rely on the solution of

*k kk k kk k k T T k k k kk*

*P APA AP C C PA BQB C PC R C PC <sup>C</sup>* 

"Your most unhappy customers are your greatest source of learning." *William Henry (Bill) Gates III*

.

2

<sup>1</sup> <sup>2</sup>

*kk k k kk k k*

*T T T T T kk k kk k k T T*

*C PC I C PC C*

*e i*

*Rei*

, it follows from (116) than an *a priori* design estimate is *<sup>w</sup>*

<sup>→</sup>

→ 0, the maximum magnitude of the H∞ solution approaches the open

*v*

*<sup>w</sup>* , the H∞ solution achieves the

 .

.

→ 0. From (115)

asymptotically approach an open circuit (or infinite impedance) when <sup>2</sup>

Suppose now that a time-invariant plant has the transfer function <sup>1</sup> *G z C zI A B D* () ( ) , where *A*, *B* and *C* are defined above together with *D* . Consider an input estimation (or equalisation) problem in which the transfer function matrix of the causal H∞ solution that estimates the input of the plant is

$$H\_F^{(\ast \ast)}(\mathbf{z}) = \mathbf{Q} \mathbf{D}^{\mathrm{T}} \left( \mathbf{Q}^{(\ast \ast)} \right)^{-1} - \mathbf{Q} \mathbf{D}^{\mathrm{T}} \left( \mathbf{Q}^{(\ast \ast)} \right)^{-1} H\_p^{(\ast \ast)}(\mathbf{z}) \,. \tag{112}$$

The transfer function matrix of the map from the inputs to the input estimation error is

$$R\_{\boldsymbol{d}}^{(e)}(\boldsymbol{z}) = -[H\_{\boldsymbol{F}}^{(e)}(\boldsymbol{z})\boldsymbol{\sigma}\_{\boldsymbol{\nu}} \quad (H\_{\boldsymbol{F}}^{(e)}\mathbb{G}(\boldsymbol{z}) - \boldsymbol{I})\boldsymbol{\sigma}\_{\boldsymbol{\nu}}].\tag{113}$$

The noncausal H transfer function matrix of the input estimator can be written as ( ) ( ) *H z <sup>S</sup>* = ( ) () 1 () <sup>3</sup> ( )( ( ( )) )( ) ( ( )) *<sup>H</sup> <sup>H</sup> QG z I H z <sup>P</sup> <sup>P</sup> IH z* .

*Proposition 2 [28]: For the above input estimation problem:* 

(i)

$$\lim\_{\sigma\_{\nu}^{-2} \to 0} \sup\_{\alpha \in \{-\pi, \pi\}} \left| H\_F^{(\alpha)}(e^{\cdot \alpha}) \right| = 0. \tag{114}$$

(ii)

$$\lim\_{\sigma\_{\nu}^{-1} \to 0} \sup\_{a \in \{-\pi, \pi\}} \left| H\_{F}^{(2)}(e^{\langle \nu \rangle}) \right| > \lim\_{\sigma\_{\nu}^{-1} \to 0} \sup\_{a \in \{-\pi, \pi\}} \left| H\_{F}^{(e)}(e^{\langle \nu \rangle}) \right|. \tag{115}$$

(iii)

$$\begin{split} \lim\_{\sigma\_{\boldsymbol{\sigma}}^{-1} \to 0} \sup\_{o \sim \{-\pi, \boldsymbol{\pi}\}} \left| R\_{\boldsymbol{\alpha}}^{(o)} (R\_{\boldsymbol{\alpha}}^{(o)})^{\mathcal{H}} (\boldsymbol{e}^{\boldsymbol{\alpha}}) \right| \\ &= \lim\_{\sigma\_{\boldsymbol{\sigma}}^{-1} \to 0} \sup\_{o \sim \{-\pi, \boldsymbol{\pi}\}} \left| (H\_{\boldsymbol{\mathcal{F}}}^{(o)} \mathbf{G}^{(o)} (\boldsymbol{e}^{\boldsymbol{\alpha}}) - \mathbf{l}) (H\_{\boldsymbol{\mathcal{F}}}^{(o)} \mathbf{G}^{(o)} (\boldsymbol{e}^{\boldsymbol{\alpha}}) - \mathbf{l})^{\mathcal{H}} \right| \sigma\_{\boldsymbol{\omega}}^{2}. \end{split} \tag{116}$$

(iv)

$$\lim\_{\sigma\_{\nu}^{-2} \to 0} \sup\_{\alpha \in \{-\pi, \pi\}} \left| H\_{\mathcal{S}}^{(\alpha)}(\mathcal{e}^{\vee \nu}) \right| = 0. \tag{117}$$

(v)

$$\lim\_{\sigma\_{\nu}^{-1} \to 0} \sup\_{\alpha \in \{-\pi, \pi\}} \left| H\_{\mathcal{S}}^{(2)}(e^{\langle \nu \rangle}) \right| > \lim\_{\sigma\_{\nu}^{-1} \to 0} \sup\_{\alpha \in \{-\pi, \pi\}} \left| H\_{\mathcal{S}}^{(\alpha)}(e^{\langle \nu \rangle}) \right|. \tag{118}$$

*Outline of Proof: (i) and (iv) The high measurement noise observations (114) and (117) follow from*  () 2 () 2 2 (1,1) *w v cp D which implies* <sup>2</sup> () 1 <sup>3</sup> <sup>0</sup> lim ( ) *v = 0.* 

*(ii) and (v) The observations (115) and (118) follow from* ( ) (2) (1,1) (1,1) *<sup>p</sup> <sup>p</sup> , which results in* <sup>2</sup> ( ) <sup>3</sup> <sup>0</sup> lim *v >*  2 (2) 0 lim *<sup>v</sup> .* 

<sup>&</sup>quot;Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." *Damian Conway*

Smoothing, Filtering and Prediction:

(114)

(117)

. (115)

. (118)

(1,1) (1,1) *<sup>p</sup> <sup>p</sup> , which results in* <sup>2</sup>

(116)

( ) <sup>3</sup> <sup>0</sup> lim *v*

 *>* 

<sup>238</sup> Estimating the Past, Present and Future

Suppose now that a time-invariant plant has the transfer function <sup>1</sup> *G z C zI A B D* () ( ) , where *A*, *B* and *C* are defined above together with *D* . Consider an input estimation (or equalisation) problem in which the transfer function matrix of the causal H∞ solution that

. (112)

 *w* . (113)

() () () () 2

*<sup>j</sup> <sup>j</sup> <sup>H</sup> HG e HG e <sup>F</sup> <sup>F</sup> <sup>w</sup>*

lim sup ( ( ) 1)( ( ) 1) .

( )

*<sup>j</sup> H e <sup>S</sup>* 

(2) ( )

*<sup>j</sup> <sup>j</sup> H e <sup>S</sup> H e <sup>S</sup>*

 

*Outline of Proof: (i) and (iv) The high measurement noise observations (114) and (117) follow from* 

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who

() 1 <sup>3</sup> <sup>0</sup> lim ( )

 *= 0.* 

lim sup ( ) 0.

<sup>1</sup> <sup>1</sup> ( ) ( ) () () ( ) ( ) *<sup>T</sup> <sup>T</sup> H z QD QD H z <sup>F</sup> <sup>P</sup>*

( ) ( ) ( ) () [ () ( () ) ] *R z H z H Gz I ei F vF* 

The transfer function matrix of the map from the inputs to the input estimation error is

( )

*<sup>j</sup> H e <sup>F</sup>* 

(2) ( )

*<sup>j</sup> <sup>j</sup> H e <sup>F</sup> H e <sup>F</sup>*

 

lim sup ( ) 0.

The noncausal H transfer function matrix of the input estimator can be written as ( ) ( ) *H z <sup>S</sup>*

estimates the input of the plant is

= ( ) () 1 () <sup>3</sup> ( )( ( ( )) )( ) ( ( )) *<sup>H</sup> <sup>H</sup> QG z I H z <sup>P</sup> <sup>P</sup> IH z* .

0 { ,}

<sup>2</sup>

 

() 2 () 2 2 (1,1) *w v cp D*

 

() ()

*<sup>H</sup> <sup>j</sup> RR e ei ei*

*v*

lim sup ( ) ( )

knows where you live." *Damian Conway*

*which implies* <sup>2</sup>

 

(i)

(ii)

(iii)

(iv)

(v)

2

(2) 0 lim *<sup>v</sup>*

*.* 

<sup>2</sup>

*v*

*Proposition 2 [28]: For the above input estimation problem:* 

*R z* 

2

*v*

 0 { ,}

0 0 { ,} { ,} lim sup ( ) lim sup ( )

 

2 2

0 { ,}

0 { ,}

0 0 { ,} { ,} lim sup ( ) lim sup ( )

> *v*

 

2 2

*v v*

*(ii) and (v) The observations (115) and (118) follow from* ( ) (2)

 

2

*v*

*v v*

*(iii) The observation (116) follows immediately from the application of (114) in (113). �*

An interpretation of (114) and (117) is that the maximum magnitudes of the equalisers asymptotically approach an open circuit (or infinite impedance) when <sup>2</sup> *v* → 0. From (115) and (118), as <sup>2</sup> *v* → 0, the maximum magnitude of the H∞ solution approaches the open circuit asymptote closer than that of the optimum minimum-variance solution. That is, under high measurement noise conditions, robust solutions accommodate some uncertainty by giving less weighting to the data. Since <sup>2</sup> <sup>0</sup> lim *v Rei* <sup>→</sup> *<sup>w</sup>* , the H∞ solution achieves the

performance *Rei* , it follows from (116) than an *a priori* design estimate is *<sup>w</sup>* .

Proposition 1 follows intuitively. Indeed, the short circuit asymptote is sometimes referred to as the singular filter. Proposition 2 may appear counter-intuitive and warrants further explanation. When the plant is minimum phase and the measurement noise is negligible, the equaliser inverts the plant. Conversely, when the equalisation problem is dominated by measurement noise, the solution is a low gain filter; that is, the estimation error is minimised by giving less weighting to the data.

#### **9.4 Conclusion**

Uncertainties are invariably present within the specification of practical problems. Consequently, robust solutions have arisen to accommodate uncertain inputs and plant models. The H∞ performance objective is to minimise the ratio of the output energy to the input energy of an error system, that is, minimise

$$\sup\_{\|\underline{\mathbf{h}}\|\_{\ast^{\*0}}} \left\| \mathcal{R}\_{\varepsilon} \right\|\_{\varkappa} = \sup\_{\|\underline{\mathbf{h}}\|\_{\ast^{\*0}}} \frac{\left\| e \right\|\_{\mathbb{Z}}}{\left\| \underline{i} \right\|\_{\mathbb{Z}}} \le \mathcal{Y}$$

for some *γ* . In the time-invariant case, the objective is equivalent to minimising the maximum magnitude of the error power spectrum density.

Predictors, filters and smoothers that satisfy the above performance objective are found by applying the Bounded Real Lemma. The standard solution structures are retained but larger design error covariances are employed to account for the presence of uncertainty. In continuous time output estimation, the error covariance is found from the solution of

$$\dot{P}(t) = A(t)P(t) + P(t)A^\top(t) - P(t)(\mathbf{C}^\top(t)\mathbf{R}^{-1}(t)\mathbf{C}^\top(t) - \boldsymbol{\gamma}^\neg \mathbf{C}^\top(t)\mathbf{C}^\top(t))P(t) + B(t)Q(t)\mathbf{B}^\top(t)\text{ .}$$

Discrete-time predictors, filters and smoothers for output estimation rely on the solution of

$$P\_{k+1} = A\_k P\_k A\_k^T - A\_k P\_k \begin{bmatrix} \mathbf{C}\_k^T & \mathbf{C}\_k^T \end{bmatrix} \begin{bmatrix} \mathbf{C}\_k P\_k \mathbf{C}\_k^T - \boldsymbol{\gamma}^T \boldsymbol{I} & \mathbf{C}\_k P\_k \mathbf{C}\_k^T \\\ \mathbf{C}\_k P\_k \mathbf{C}\_k^T & \mathbf{R}\_k + \mathbf{C}\_k P\_k \mathbf{C}\_k^T \end{bmatrix}^{-1} \begin{bmatrix} \mathbf{C}\_k \\\ \mathbf{C}\_k \end{bmatrix} P\_k A\_k^T + B\_k Q\_k B\_k^T \ \mathbf{A}\_k$$

<sup>&</sup>quot;Your most unhappy customers are your greatest source of learning." *William Henry (Bill) Gates III*

**9.5 Problems Problem 1 [**31**].**

(i) Consider a system

2

**Problem 2.** Consider a system

smoother error *et T* (| ) is

*k k k k kk x P x x Px* and show that

*T T T T k k k k kk k k k k x P x x Px y y w w*

**Problem 4 [18].**

(i) For a

111 *T T*

111

 

*T T T A P PA C C PB B P*

<sup>0</sup> () () *<sup>T</sup> <sup>T</sup>*

*w t w t dt* .

 

*D*(*t*)*w*(*t*). Suppose that the Riccati differential equation

*I*

(ii) Generalise (i) for the case where *y*(*t*) = *Cx*(*t*) + *Dw*(*t*).

<sup>2</sup> <sup>1</sup> <sup>1</sup> ( ) ( ) ( ) ( )( ( ) ( ) ( )) ( ) *T T <sup>T</sup>*

define *V*(*x*(*t*)) = *xT*(*t*)*P*(*t*) *x*(*t*) and show that *Vxt* ( ( )) + () () *<sup>T</sup> y tyt* – <sup>2</sup> () () *<sup>T</sup>*

*M*(*t*) = *γ*2*I – DT*(*t*)*D*(*t*) *>* 0, has a solution on [0, *T*]. Show that

*B*(*t*)*w*(*t*), *y*(*t*) = *C*(*t*)*x*(*t*), show that the map from the inputs *i* =

*et T C t RtK t*

<sup>2</sup> 0

<sup>1</sup> ( ) ( )( ( ) ( ) ( ) ( ) ( )) *<sup>T</sup> Pt Pt At Bt M tD tCt* <sup>1</sup> ( ( ) ( ) ( ) ( ) ( )) ( ) *<sup>T</sup> <sup>T</sup> At BtM tD tCt Pt*

**Problem 3.** For measurements *z*(*t*) = *y*(*t*) + *v*(*t*) of a system realised by *x t* ( ) = *A*(*t*)*x*(*t*) +

1 1 ( ) () () () () <sup>0</sup> () () ( ) ( ) ) () () () () () 0 () () ( ) (| ) ( ) () () 0 0 ( )

*T*

2 2 1 <sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP B I BP B BP A CC k kk k kk k kk k kk k k k*

> <sup>111</sup> 0 *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> k kk k k k k k k k x Px x P x y y w w*

*xt At KtCt Bt Kt*

of a solution to the Riccati difference equation

2

"A computer once beat me at chess, but it was no match for me at kick boxing." *Emo Philips*

<sup>2</sup> <sup>2</sup> <sup>1</sup>

is sufficient for <sup>2</sup>

*t C tR tCt A t C tK t C tR t*

( . *<sup>T</sup> T TT <sup>T</sup>*

modelled by *xk*+1 = *Akxk* + *Bkwk*, *yk* = *Ckxk Dkwk*, show that the existence

 

. Hint: construct

<sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> T T k k k k k k k k kk*

*p I B P B p x A P Ax* ,

*BtM tB t C t I DtM tD t Ct* ,

having the state-space representation *x t* ( ) = *Ax*(*t*) +

modelled by *x t* ( ) = *A*(*t*)*x*(*t*) + *B*(*t*)*w*(*t*), *y*(*t*) = *C*(*t*)*x*(*t*) +

*v w* 

≤ *γ* for any *w* 2. (Hint:

*x t*

*w t*

*v t*

*t*

to the H∞ fixed-interval

*w twt* < 0.)

then () () *<sup>T</sup> x T Px T* – (0) (0) *<sup>T</sup> x Px* + <sup>0</sup> () () *<sup>T</sup> <sup>T</sup> y t y t dt* <sup>≤</sup>

*Bw*(*t*), *y*(*t*) = *Cx*(*t*). Show that if there exists a matrix *P* = *PT* > 0 such that

It follows that the H∞ designs revert to the optimum minimum-variance solutions as *γ*-2 → 0. Since robust solutions are conservative, the art of design involves finding satisfactory tradeoffs between average and worst-case performance criteria, namely, tweaking the *γ*.

A summary of suggested approaches for different linear estimation problem conditions is presented in Table 1. When the problem parameters are known precisely then the optimum minimum-variance solutions cannot be improved upon. However, when the inputs or the models are uncertain, robust solutions may provide improved mean-square-error performance. In the case of low measurement noise output-estimation, the benefit arises because greater weighting is given to the data. Conversely, for high measurement noise input estimation, robust solutions accommodate uncertainty by giving less weighting to the data.


Table 1. Suggested approaches for different linear estimation problem conditions.

<sup>&</sup>quot;A computer lets you make more mistakes than almost any invention in history, with the possible exceptions of tequila and hand guns." *Mitch Ratcliffe*

#### **9.5 Problems**

Smoothing, Filtering and Prediction:

<sup>240</sup> Estimating the Past, Present and Future

It follows that the H∞ designs revert to the optimum minimum-variance solutions as *γ*-2 → 0. Since robust solutions are conservative, the art of design involves finding satisfactory trade-

A summary of suggested approaches for different linear estimation problem conditions is presented in Table 1. When the problem parameters are known precisely then the optimum minimum-variance solutions cannot be improved upon. However, when the inputs or the models are uncertain, robust solutions may provide improved mean-square-error performance. In the case of low measurement noise output-estimation, the benefit arises because greater weighting is given to the data. Conversely, for high measurement noise input estimation, robust solutions accommodate uncertainty by giving less weighting to the data.

(see Example 3 of Chapter 7).

Chapter 6 and Example 2 of Chapter 7).

Gaussian noise assumptions.

1. Robust filter (see Example 4). 2. Robust smoother (see Example 4).

"A computer lets you make more mistakes than almost any invention in history, with the possible

Chapter 6).

Example 3).

Table 1. Suggested approaches for different linear estimation problem conditions.

exceptions of tequila and hand guns." *Mitch Ratcliffe*

1. Optimal minimum-variance (or Kalman) filter.

large smoothing lag results in optimal performance

2. Fixed-lag smoothers, which improve on filter performance (see Lemma 3 and Example 1 of Chapter 7). They suit on-line applications and have low additional complexity. A sufficiently

3. Maximum-likelihood (or Rauch-Tung-Striebel) smoothers, which also improve on filter performance (see Lemma 6 of Chapter 6 and Lemma 4 of Chapter 7). They can provide close to optimal performance (see Example 5 of Chapter 6). 4. The minimum-variance smoother provides the best performance (see Lemma 12 of Chapter 6 and Lemma 8 of Chapter 7) at the cost of increased complexity (see Example 5 of

1. Optimal minimum-variance filter, which does not rely on

2. Optimal minimum-variance smoother, which similarly does not rely on Gaussian noise assumptions (see Example 6 of

3. Robust filter which trades off H∞ performance (see Lemmas 2, 9) and mean-square-error performance (see Example 3). 4. Robust smoother which trades off H∞ performance (see Lemmas 5, 10) and mean-square-error performance (see

3. Robust filter or smoother with scaled inputs (see Lemma 3).

offs between average and worst-case performance criteria, namely, tweaking the *γ*.

PROBLEM CONDITIONS SUGGESTED APPROACHES

2nd-

Gaussian process and

model parameters.

Uncertain process and

Uncertain processes and measurement noises. Uncertain system model parameters.

model parameters.

measurement noises, known 2ndorder statistics. Known system

measurement noises, known 2ndorder statistics. Known system

**Problem 1 [**31**].**


**Problem 2.** Consider a system modelled by *x t* ( ) = *A*(*t*)*x*(*t*) + *B*(*t*)*w*(*t*), *y*(*t*) = *C*(*t*)*x*(*t*) + *D*(*t*)*w*(*t*). Suppose that the Riccati differential equation

$$-\dot{P}(t) = P(t)(A(t) + B(t)M^{-1}(t)D^{\top}(t)\mathbb{C}(t)) \ + (A(t) + B(t)M^{-1}(t)D^{\top}(t)\mathbb{C}(t))^{\top}P(t),$$

$$+\gamma^{-2}B(t)M^{-1}(t)B^{\top}(t) + \mathcal{C}^{\top}(t)(I + D(t)M^{-1}(t)\mathcal{D}^{\top}(t))\mathbb{C}(t),$$

*M*(*t*) = *γ*2*I – DT*(*t*)*D*(*t*) *>* 0, has a solution on [0, *T*]. Show that ≤ *γ* for any *w* 2. (Hint: define *V*(*x*(*t*)) = *xT*(*t*)*P*(*t*) *x*(*t*) and show that *Vxt* ( ( )) + () () *<sup>T</sup> y tyt* – <sup>2</sup> () () *<sup>T</sup> w twt* < 0.)

**Problem 3.** For measurements *z*(*t*) = *y*(*t*) + *v*(*t*) of a system realised by *x t* ( ) = *A*(*t*)*x*(*t*) + *B*(*t*)*w*(*t*), *y*(*t*) = *C*(*t*)*x*(*t*), show that the map from the inputs *i* = *v w* to the H∞ fixed-interval smoother error *et T* (| ) is

$$
\begin{bmatrix}
\dot{\tilde{\mathbf{x}}}(t) \\
\mathbf{e}(t\mid T)
\end{bmatrix} = \begin{bmatrix}
A(t) - K(t)\mathbf{C}(t) & \mathbf{0} & B(t) & -K(t) \\
\mathbf{C}(t) & R(t)\mathbf{K}^{\top}(t) & \mathbf{0} & \mathbf{0}
\end{bmatrix} \begin{bmatrix}
\dot{\tilde{\mathbf{x}}}(t) \\
\dot{\xi}(t) \\
w(t) \\
v(t)
\end{bmatrix}.
$$

#### **Problem 4 [18].**

(i) For a modelled by *xk*+1 = *Akxk* + *Bkwk*, *yk* = *Ckxk Dkwk*, show that the existence of a solution to the Riccati difference equation

$$P\_k = A\_k^\top P\_{k+1} A\_k + \gamma^{-2} A\_k^\top P\_{k+1} B\_k (I - \gamma^{-2} B\_k^\top P\_{k+1} B\_k)^{-1} B\_k^\top P\_{k+1} A\_k + \mathcal{C}\_k^\top C\_k A\_k$$

is sufficient for <sup>2</sup> <sup>111</sup> 0 *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> k kk k k k k k k k x Px x P x y y w w* . Hint: construct 111 *T T k k k k kk x P x x Px* and show that Uncertain processes and 1. Robust filter (see Example 4). Table 1. Suggested approaches for different linear estimation problem conditions.

$$\mathbf{x}\_{k+1}^T P\_{k+1} \mathbf{x}\_{k+1} - \mathbf{x}\_k^T P\_k \mathbf{x}\_k + y\_k^T y\_k - \boldsymbol{\gamma}^2 \mathbf{w}\_k^T \mathbf{w}\_k = -\boldsymbol{\gamma}^{-2} p\_k^T (\mathbf{I} - \boldsymbol{\gamma}^{-2} \mathbf{B}\_k^T P\_{k+1} \mathbf{B}\_k)^{-1} \mathbf{p}\_k - \mathbf{x}\_k^T \mathbf{A}\_k^T P\_{k+1} \mathbf{A}\_k \mathbf{x}\_{k+1}$$

<sup>&</sup>quot;A computer once beat me at chess, but it was no match for me at kick boxing." *Emo Philips*

no. 8, pp. 831 – 847, Aug. 198

Birkhäuser, Boston, 1991.

1993.

1994.

1486 – 1495, Jun. 1994.

Cliffs, New Jersey, 1995.

*of the MTNS*, pp. 261 – 267, Jun. 1991.

Tucson, Arizona, pp. 2305 – 2310, Dec. 1992.

*Systems and Control Letters*, vol. 16, pp. 309 – 317, 198

Bound", *Systems and Control Letters*, vol. 12, pp. 9 – 16, 198

Prentice Hall International (UK) Ltd, Hertfordshire, 1992.

*and Optimization*, vol. 30, no. 2, pp. 262 – 283, Mar. 1992.

[4] J. C. Doyle, K. Glover, P. P. Khargonekar and B. A. Francis, "State-Space Solutions to Standard H2 and H∞ Control Problems", *IEEE Transactions on Automatic Control*, vol. 34,

[5] D. J. N. Limebeer, M. Green and D. Walker, "Discrete-Time H∞ Control, *Proceedings 28th IEEE Conference on Decision and Control*, Tampa, Florida, pp. 392 – 396, Dec. 198 [6] T. Basar, "Optimum performance levels for minimax filters, predictors and smoothers"

[7] D. S. Bernstein and W. M. Haddad, "Steady State Kalman Filtering with an H∞ Error

[8] U. Shaked, "H∞-Minimum Error State Estimation of Linear Stationary Processes", *IEEE* 

[9] T. Başar and P. Bernhard, *H∞-Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach*, Series in Systems & Control: Foundations & Applications,

[10] K. M. Nagpal and P. P. Khargonekar, "Filtering and Smoothing in an H∞ Setting", *IEEE* 

[11] I. Yaesh and U. Shaked, "H∞-Optimal Estimation – The Discrete Time Case", *Proceedings* 

[12] A. Stoorvogel, *The H∞ Control Problem*, Series in Systems and Control Engineering,

[13] U. Shaked and Y. Theodor, "H∞ Optimal Estimation: A Tutorial", *Proceedings 31st IEEE Conference on Decision and Control*, pp. 2278 – 2286, Tucson, Arizona, Dec. 1992. [14] C. E. de Souza, U. Shaked and M. Fu, "Robust H∞ Filtering with Parametric Uncertainty and Deterministic Input Signal", *Proceedings 31st IEEE Conference on Decision and Control*,

[15] D. J. N. Limebeer, B. D. O. Anderson, P. P. Khargonekar and M. Green, "A Game Theoretic Approach to H∞ Control for Time-Varying Systems", *SIAM Journal on Control* 

[16] I. Yaesh and Shaked, "Game Theory Approach to Finite-Time Horizon Optimal Estimation", *IEEE Transactions on Automatic Control*, vol. 38, no. 6, pp. 957 – 963, Jun.

[17] B. van Keulen, *H∞ Control for Distributed Parameter Systems: A State-Space Approach*, Series in Systems & Control: Foundations & Applications, Birkhäuser, Boston, 1993. [18] L. Xie, C. E. De Souza and Y. Wang, "Robust Control of Discrete Time Uncertain

[19] Y. Theodore, U. Shaked and C. E. de Souza, "A Game Theory Approach To Robust Discrete-Time H∞-Estimation", *IEEE Transactions on Signal Processing*, vol. 42, no. 6, pp.

[20] S. Boyd, L. El Ghaoui, E. Feron and V. Balakrishnan, *Linear Matrix Inequalities in System and Control Theory*, SIAM Studies in Applied Mathematics, vol. 15, SIAM, Philadelphia,

[21] M. Green and D. J. N. Limebeer, *Linear Robust Control*, Prentice-Hall Inc., Englewood

Dynamical Systems", *Automatica*, vol. 29, no. 4, pp. 1133 – 1137, 1993.

"Never trust a computer you can't throw out a window." *Stephen Gary Wozniak*

*Transactions on Automatic Control*, vol. 35, no. 5, pp. 554 – 558, May, 1990.

*Transactions on Automatic Control*, vol. 36, no. 2, pp 152 – 166, Feb. 1991.

where 2 2 <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> k k k k k k k kk p w I B P B B P Ax* .

$$\text{(ii)}\qquad\text{Show that }\left-\mathbf{x}\_{0}^{T}P\_{0}\mathbf{x}\_{0} + \sum\_{k=0}^{N-1}y\_{k}^{T}y\_{k} - \boldsymbol{\gamma}^{z}\sum\_{k=0}^{N-1}w\_{k}^{T}w\_{k} < \mathbf{0}\text{ .}$$

**Problem 5.** Now consider the model *xk*+1 = *Akxk* + *Bkwk*, *yk* = *Ckxk* + *Dkwk* and show that the existence of a solution to the Riccati difference equation

$$P\_k = \mathbf{A}\_k^T P\_{k+1} \mathbf{A}\_k + \boldsymbol{\gamma}^{-2} (\mathbf{A}\_k^T P\_{k+1} \mathbf{B}\_k + \mathbf{C}\_k^T \mathbf{D}\_k) (\mathbf{I} - \boldsymbol{\gamma}^{-2} \mathbf{B}\_k^T P\_{k+1} \mathbf{B}\_k - \boldsymbol{\gamma}^{-2} \mathbf{D}\_k^T \mathbf{D}\_k)^{-1} (\mathbf{B}\_k^T P\_{k+1} \mathbf{A}\_k + \mathbf{D}\_k^T \mathbf{C}\_k) + \mathbf{C}\_k^T \mathbf{C}\_k$$

is sufficient for <sup>2</sup> <sup>111</sup> 0 *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> k kk k k k k k k k x Px x P x y y w w* . Hint: define

$$p\_k = w\_k - \gamma^{-2} (I - \gamma^{-2} B\_k^T P\_{k+1} B\_k - \gamma^{-2} D\_k^T D\_k)^{-1} (B\_k^T P\_{k+1} A\_k + D\_k^T C\_k) \ .$$

**Problem 6.** Suppose that a predictor attains a H∞ performance objective, that is, the conditions of Lemma 8 are satisfied. Show that using the predicted states to construct filtered output estimates / ˆ *k k y* results in *k k*/ *y* = *y –* / ˆ *k k y* <sup>2</sup> .

#### **9.6 Glossary**


#### **9.7 References**


<sup>&</sup>quot;On two occasions I have been asked, 'If you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question." *Charles Babbage*

Smoothing, Filtering and Prediction:

<sup>242</sup> Estimating the Past, Present and Future

*k k k k*

**Problem 5.** Now consider the model *xk*+1 = *Akxk* + *Bkwk*, *yk* = *Ckxk* + *Dkwk* and show that the

0

<sup>2</sup> .

*ei* from the inputs *i*(*t*) to the estimation error *e*(*t*) satisfies

*ei* from the inputs *ik* to the estimation error *ek* satisfies

*i t i t dt* < 0. Therefore, *<sup>i</sup>* 2 implies *<sup>e</sup>* 2.

<sup>1</sup> <sup>1</sup> ( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup>*

1 1 2

 .

*N N T T T*

*k k x Px y y w w* 

0 0

2 2 2 1 <sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> ( )( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP B CD I BP B DD BP A DC CC k kk k kk k k k kk k k k kk k k k k k*

*k k kk k k k kk k k k p w I BP B DD BP A DC*

.

**Problem 6.** Suppose that a predictor attains a H∞ performance objective, that is, the conditions of Lemma 8 are satisfied. Show that using the predicted states to construct

<sup>∞</sup> The Lebesgue ∞-space defined as the set of continuous-time systems

<sup>0</sup> () () *<sup>T</sup> <sup>T</sup>*

. Therefore, *<sup>i</sup>* <sup>2</sup> implies *<sup>e</sup>* <sup>2</sup> .

The Lebesgue ∞-space defined as the set of discrete-time systems

[1] J. Stelling, U. Sauer, Z. Szallasi, F. J. Doyle and J. Doyle, "Robustness of Cellular

[2] P. P. Vaidyanathan, "The Discrete-Time Bounded-Real Lemma in Digital Filtering", *IEEE Transactions on Circuits and Systems*, vol. 32, no. 9, pp. 918 – 924, Sep. 1985. [3] I. Petersen, "A Riccati equation approach to the design of stabilizing controllers and observers for a class of uncertain linear systems", *IEEE Transactions on Automatic* 

"On two occasions I have been asked, 'If you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a

0

. Hint: define

2 2 2 1

where 2 2 <sup>1</sup>

 

0 00

existence of a solution to the Riccati difference equation

filtered output estimates / ˆ *k k y* results in *k k*/ *y* = *y –* / ˆ *k k y*

*ei* <sup>∞</sup> The map

*ei* The map

is sufficient for <sup>2</sup>

(ii) Show that

**9.6 Glossary** 

**9.7 References** 

question." *Charles Babbage*

<sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> k k k k k k k kk p w I B P B B P Ax*

> <sup>111</sup> 0 *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> k kk k k k k k k k x Px x P x y y w w*

> >

having finite ∞-norm.

having finite ∞-norm.

1 1 2 0 0

 

*N N T T k k k k k k ee ii* 

Functions", *Cell*, vol. 118, pp. 675 – 685, Dep. 2004.

*Control*, vol. 30, iss. 9, pp. 904 – 907, 1985.

<sup>0</sup> () () *<sup>T</sup> <sup>T</sup> e t e t dt* – <sup>2</sup>

.

	- [15] D. J. N. Limebeer, B. D. O. Anderson, P. P. Khargonekar and M. Green, "A Game Theoretic Approach to H∞ Control for Time-Varying Systems", *SIAM Journal on Control and Optimization*, vol. 30, no. 2, pp. 262 – 283, Mar. 1992.
	- [16] I. Yaesh and Shaked, "Game Theory Approach to Finite-Time Horizon Optimal Estimation", *IEEE Transactions on Automatic Control*, vol. 38, no. 6, pp. 957 – 963, Jun. 1993.
	- [17] B. van Keulen, *H∞ Control for Distributed Parameter Systems: A State-Space Approach*, Series in Systems & Control: Foundations & Applications, Birkhäuser, Boston, 1993.
	- [18] L. Xie, C. E. De Souza and Y. Wang, "Robust Control of Discrete Time Uncertain Dynamical Systems", *Automatica*, vol. 29, no. 4, pp. 1133 – 1137, 1993.
	- [19] Y. Theodore, U. Shaked and C. E. de Souza, "A Game Theory Approach To Robust Discrete-Time H∞-Estimation", *IEEE Transactions on Signal Processing*, vol. 42, no. 6, pp. 1486 – 1495, Jun. 1994.
	- [20] S. Boyd, L. El Ghaoui, E. Feron and V. Balakrishnan, *Linear Matrix Inequalities in System and Control Theory*, SIAM Studies in Applied Mathematics, vol. 15, SIAM, Philadelphia, 1994.
	- [21] M. Green and D. J. N. Limebeer, *Linear Robust Control*, Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1995.

<sup>&</sup>quot;Never trust a computer you can't throw out a window." *Stephen Gary Wozniak*


<sup>&</sup>quot;The most amazing achievement of the computer software industry is its continuing cancellation of the steady and staggering gains made by the computer hardware industry." *Henry Petroski*

Chapter title

Author Name

### **Nonlinear Prediction, Filtering and Smoothing**

Nonlinear Prediction, Filtering and Smoothing 245

#### **10.1 Introduction**

Smoothing, Filtering and Prediction:

<sup>244</sup> Estimating the Past, Present and Future

[22] S. O. R. Moheimani, A. V. Savkin and I. R. Petersen, "Robust Filtering, Prediction, Smoothing and Observability of Uncertain Systems", *IEEE Transactions on Circuits and* 

[23] J. C. Geromel, "Optimal Linear Filtering Under Parameter Uncertainty", *IEEE* 

[24] T. Başar and G. J. Oldsder, *Dynamic Noncooperative Game Theory*, Second Edition, SIAM,

[25] B. Hassibi, A. H. Sayed and T. Kailath, *Indefinite-Quadratic Estimation and Control: A* 

[26] G. A. Einicke and L. B. White, "Robust Extended Kalman Filtering", *IEEE Transactions on* 

[27] Z. Wang and B. Huang, "Robust H2/H∞ Filtering for Linear Systems with Error Variance Constraints", *IEEE Transactions on Signal Processing,* vol. 48, no. 9, pp. 2463 –

[28] G. A. Einicke, "Optimal and Robust Noncausal Filter Formulations", *IEEE Transactions* 

[29] A. Saberi, A. A. Stoorvogel and P. Sannuti, *Filtering Theory With Applications to Fault* 

[30] F. L. Lewis, L. Xie and D. Popa, *Optimal and Robust Estimation: With an Introduction to Stochastic Control Theory*, Second Edition, Series in Automation and Control

[31] J.-C. Lo and M.-L. Lin, "Observer-Based Robust H∞ Control for Fuzzy Systems Using Two-Step Procedure", *IEEE Transactions on Fuzzy Systems*, vol. 12, no. 3, pp. 350 – 359,

[32] E. Blanco, P. Neveux and H. Thomas, "The H∞ Fixed-Interval Smoothing Problem for Continuous Systems", *IEEE Transactions on Signal Processing,* vol. 54, no. 11, pp. 4085 –

[33] G. A. Einicke, "A Solution to the Continuous-Time H-infinity Fixed-Interval Smoother Problem", *IEEE Transactions on Automatic Control,* vol. 54, no. 12, pp. 2904 – 2908, Dec.

[34] G. A. Einicke, "Asymptotic Optimality of the Minimum-Variance Fixed-Interval Smoother", *IEEE Transactions on Signal Processing,* vol. 55, no. 4, pp. 1543 – 1547, Apr.

[35] G. A. Einicke, J. C. Ralston, C. O. Hargrave, D. C. Reid and D. W. Hainsworth, "Longwall Mining Automation, An Application of Minimum-Variance Smoothing",

[36] K. Ogata, *Discrete-time Control Systems*, Prentice-Hall, Inc., Englewood Cliffs, New

"The most amazing achievement of the computer software industry is its continuing cancellation of the

*IEEE Control Systems Magazine*, vol. 28, no. 6, pp. 28 – 37, Dec. 2008.

steady and staggering gains made by the computer hardware industry." *Henry Petroski*

*Transactions on Signal Processing,* vol. 47, no. 1, pp. 168 – 175, Jan. 199

*Unified Approach to H∞ and H2 Theories*, SIAM, Philadelphia, 199

*Signal Processing*, vol. 47, no. 9, pp. 2596 – 2599, Sep., 199

*on Signal Processing,* vol. 54, no. 3, pp. 1069 - 1077, Mar. 2006.

*Detection, Isolation, and Estimation*, Birkhäuser, Boston, 2007.

Engineering, Taylor & Francis Group, LLC, 2008.

*Systems*, vol. 45, no. 4, pp. 446 – 457, Apr. 1998.

Philadelphia, 199

2467, Aug. 2000.

Jun. 2004.

200

2007.

Jersey, 1987.

4090, Nov. 2006.

The Kalman filter is widely used for linear estimation problems where its behaviour is wellunderstood. Under prescribed conditions, the estimated states are unbiased and stability is guaranteed. Many real-world problems are nonlinear which requires amendments to linear solutions. If the nonlinear models can be expressed in a state-space setting then the Kalman filter may find utility by applying linearisations at each time step. Linearising means finding tangents to the curves of interest about the current estimates, so that the standard filter recursions can be employed in tandem to produce predictions for the next step. This approach is known as extended Kalman filtering – see [1] – [5].

Extended Kalman filters (EKFs) revert to optimal Kalman filters when the problems become linear. Thus, EKFs can yield approximate minimum-variance estimates. However, there are no accompanying performance guarantees and they fall into the try-at-your-own-risk category. Indeed, Anderson and Moore [3] caution that the EKF "can be satisfactory on occasions". A number of compounding factors can cause performance degradation. The approximate linearisations may be crude and are carried out about estimated states (as opposed to true states). Observability problems occur when the variables do not map onto each other, giving rise to discontinuities within estimated state trajectories. Singularities within functions can result in non-positive solutions to the design Riccati equations and lead to instabilities.

The discussion includes suggestions for performance improvement and is organised as follows. The next section begins with Taylor series expansions, which are prerequisites for linearisation. First, second and third-order EKFs are then derived. EKFs tend be prone to instability and a way of enforcing stability is to masquerade the design Riccati equation by a faux version. This faux algebraic Riccati equation technique [6] – [10] is presented in Section 10.3. In Section 10.4, the higher order terms discarded by an EKF are treated as uncertainties. It is shown that a robust EKF arises by solving a scaled H∞ problem in lieu of one possessing uncertainties. Nonlinear smoother procedures can be designed similarly. The use of fixedlag and Rauch-Tung-Striebel smoothers may be preferable from a complexity perspective. However, the approximate minimum-variance and robust smoothers, which are presented in Section 10.5, revert to optimal solutions when the nonlinearities and uncertainties diminish. Another way of guaranteeing stability is to by imposing constraints and one such approach is discussed in Section 10.6.

<sup>&</sup>quot;It is the mark of an instructed mind to rest satisfied with the degree of precision to which the nature of the subject admits and not to seek exactness when only an approximation of the truth is possible." *Aristotle*

and

respectively.

approximated by

where *Ak*, *Bk*, *Ck*,

*Anthony Maraboli*

0 0

0 0

0 0

0 0

0 0

0 0

*x x x x*

*x x x x*

*x x*

*k n k*

*x n x*

,

*c c*

*x x x x*

*x x x x*

*k n k*

*x n x*

,

*b b*

2 *k k*

*c c*

*x x*

*x x x x*

*x x x x*

*x x*

*k n k*

*x n x*

.

*a a*

0 0 0 2

0 3 0 <sup>1</sup> <sup>1</sup> ( ) ( ) <sup>6</sup> !

0 0 0 2

*x x x x*

*x x x x*

2 *k k*

2 *k k*

0 3 0 <sup>1</sup> <sup>1</sup> ( ) ( ) <sup>6</sup> !

0 0 0 2

0 3 0 <sup>1</sup> <sup>1</sup> ( ) ( ) <sup>6</sup> !

where *vk* is a measurement noise sequence. A first-order EKF for the above problem is developed below. Following the approach within [3], the nonlinear system (2) – (3) is

> ,

> > ,

"You will always define events in a manner which will validate your agreement with reality." *Steve* 

,

*<sup>k</sup>* and *πk* are found from suitable truncations of the Taylor series for each

,

*k kk k k k* <sup>1</sup> *x Ax Bw*

*k kk k y Cx*

/ /1 / 1 ˆ ˆ ( ˆ ) *kk kk k k k kk k x x L z Cx*

1/ / ˆ ˆ *k k k kk k x Ax*

nonlinearity. From Chapter 4, a filter for the above model is given by

*x x x x*

*a a*

<sup>1</sup> () ( ) ( ) ( )

3 3

Similarly, Taylor series for ( ): *<sup>k</sup> b x* and ( ): *<sup>k</sup> c x* about *x* = *x*0 are

<sup>1</sup> () ( ) ( ) ( )

*<sup>b</sup> <sup>b</sup> bx bx x x x x*

3 3

<sup>1</sup> () ( ) ( ) ( )

3 3

*cx cx x x x x*

Suppose that filtered estimates / ˆ *k k x* of *xk* are desired given observations

*ax ax x x x x*

*k k*

*k k*

*k k*

**10.2.3 First-Order Extended Kalman Filter** 

2 2

2 2

*n*

*n*

*n*

2 2

*n*

( ) *k kk k z cx v* , (7)

*n*

*n*

(4)

(5)

(6)

(8) (9)

(10) (11)

#### **10.2 Extended Kalman Filtering**

#### **10.2.1 Taylor Series Expansion**

A nonlinear function *a x <sup>k</sup>* ( ): *<sup>n</sup>* having *n* continuous derivatives may be expanded as a Taylor series about a point x0

$$\begin{aligned} a\_k(\mathbf{x}) &= a\_k(\mathbf{x}\_0) + \frac{1}{1!} (\mathbf{x} - \mathbf{x}\_0)^\top \nabla a\_k(\mathbf{x}\_0) \\\\ &+ \frac{1}{2!} (\mathbf{x} - \mathbf{x}\_0)^\top \nabla^\top \nabla a\_k(\mathbf{x}\_0) (\mathbf{x} - \mathbf{x}\_0) \\\\ &+ \frac{1}{3!} (\mathbf{x} - \mathbf{x}\_0)^\top \nabla^\top \nabla (\mathbf{x} - \mathbf{x}\_0) \nabla a\_k(\mathbf{x}\_0) (\mathbf{x} - \mathbf{x}\_0) \\\\ &+ \frac{1}{4!} (\mathbf{x} - \mathbf{x}\_0)^\top \nabla^\top \nabla (\mathbf{x} - \mathbf{x}\_0) \nabla (\mathbf{x} - \mathbf{x}\_0) \nabla a\_k(\mathbf{x}\_0) (\mathbf{x} - \mathbf{x}\_0) + \dots, \end{aligned} \tag{1}$$

where 1 *k k k n a a a x x* is known as the gradient of *ak*(.) and

$$
\nabla^{\top} \nabla a\_{k} = \begin{bmatrix}
\frac{\widehat{\mathcal{O}}^{2} a\_{k}}{\widehat{\mathcal{O}} \mathbf{x}\_{1}^{2}} & \frac{\widehat{\mathcal{O}}^{2} a\_{k}}{\widehat{\mathcal{O}} \mathbf{x}\_{1} \widehat{\mathcal{O}} \mathbf{x}\_{2}} & \cdots & \frac{\widehat{\mathcal{O}}^{2} a\_{k}}{\widehat{\mathcal{O}} \mathbf{x}\_{1} \widehat{\mathcal{O}} \mathbf{x}\_{n}} \\
\frac{\widehat{\mathcal{O}}^{2} a\_{k}}{\widehat{\mathcal{O}} \mathbf{x}\_{2} \widehat{\mathcal{O}} \mathbf{x}\_{1}} & \frac{\widehat{\mathcal{O}}^{2} a\_{k}}{\widehat{\mathcal{O}} \mathbf{x}\_{2}^{2}} & \cdots & \frac{\widehat{\mathcal{O}}^{2} a\_{k}}{\widehat{\mathcal{O}} \mathbf{x}\_{2} \widehat{\mathcal{O}} \mathbf{x}\_{n}} \\
\vdots & \vdots & \ddots & \vdots \\
\frac{\widehat{\mathcal{O}}^{2} a\_{k}}{\widehat{\mathcal{O}} \mathbf{x}\_{n} \widehat{\mathcal{O}} \mathbf{x}\_{1}} & \frac{\widehat{\mathcal{O}}^{2} a\_{k}}{\widehat{\mathcal{O}} \mathbf{x}\_{n} \widehat{\mathcal{O}} \mathbf{x}\_{2}} & \cdots & \frac{\widehat{\mathcal{O}}^{2} a\_{k}}{\widehat{\mathcal{O}} \mathbf{x}\_{n}^{2}}
\end{bmatrix}
$$

is called a Hessian matrix.

#### **10.2.2 Nonlinear Signal Models**

Consider nonlinear systems having state-space representations of the form

$$\mathbf{x}\_{k+1} = a\_k(\mathbf{x}\_k) + b\_k(\mathbf{x}\_k)w\_{k\prime} \tag{2}$$

( ) *k kk y c x* , (3)

where *ak*(.), *bk*(.) and *ck*(.) are continuous differentiable functions. For a scalar function, ( ): *<sup>k</sup> a x* , its Taylor series about *x* = *x*0 may be written as

<sup>&</sup>quot;In the real world, nothing happens at the right place at the right time. It is the job of journalists and historians to correct that." *Samuel Langhorne Clemens aka. Mark Twain*

$$\begin{split} a\_{k}(\mathbf{x}) &= a\_{k}(\mathbf{x}\_{0}) + (\mathbf{x} - \mathbf{x}\_{0}) \frac{\partial \mathbf{z}\_{k}}{\partial \mathbf{x}} \bigg|\_{\mathbf{x} = \mathbf{x}\_{0}} + \frac{1}{2} (\mathbf{x} - \mathbf{x}\_{0})^{2} \frac{\partial^{2} \mathbf{z}\_{k}}{\partial \mathbf{x}^{2}} \bigg|\_{\mathbf{x} = \mathbf{x}\_{0}} \\ &+ \frac{1}{6} (\mathbf{x} - \mathbf{x}\_{0})^{3} \frac{\partial^{3} \mathbf{z}\_{k}}{\partial \mathbf{x}^{3}} \bigg|\_{\mathbf{x} = \mathbf{x}\_{0}} + \dots + \frac{1}{n!} (\mathbf{x} - \mathbf{x}\_{0})^{n} \frac{\partial^{n} \mathbf{z}\_{k}}{\partial \mathbf{x}^{n}} \bigg|\_{\mathbf{x} = \mathbf{x}\_{0}}. \end{split} \tag{4}$$

Similarly, Taylor series for ( ): *<sup>k</sup> b x* and ( ): *<sup>k</sup> c x* about *x* = *x*0 are

$$\begin{aligned} b\_k(\mathbf{x}) &= b\_k(\mathbf{x}\_0) + (\mathbf{x} - \mathbf{x}\_0) \frac{\partial b\_k}{\partial \mathbf{x}} \bigg|\_{\mathbf{x} = \mathbf{x}\_0} + \frac{1}{2} (\mathbf{x} - \mathbf{x}\_0)^2 \left. \frac{\partial^2 b\_k}{\partial \mathbf{x}^2} \right|\_{\mathbf{x} = \mathbf{x}\_0} \\ &+ \frac{1}{6} (\mathbf{x} - \mathbf{x}\_0)^3 \left. \frac{\partial^3 b\_k}{\partial \mathbf{x}^3} \right|\_{\mathbf{x} = \mathbf{x}\_0} + \dots + \frac{1}{n!} (\mathbf{x} - \mathbf{x}\_0)^n \left. \frac{\partial^n b\_k}{\partial \mathbf{x}^n} \right|\_{\mathbf{x} = \mathbf{x}\_0} \end{aligned} \tag{5}$$

and

Smoothing, Filtering and Prediction:

(1)

(2) (3)

<sup>246</sup> Estimating the Past, Present and Future

A nonlinear function *a x <sup>k</sup>* ( ): *<sup>n</sup>* having *n* continuous derivatives may be expanded as a

**10.2 Extended Kalman Filtering 10.2.1 Taylor Series Expansion** 

<sup>0</sup> <sup>0</sup> <sup>0</sup>

<sup>0</sup> 0 0

<sup>1</sup> () ( ) ( ) ( ) 1!

<sup>0</sup> 0 00

*n*

*T*

*T T*

*T T*

<sup>0</sup> <sup>0</sup> 0 00

*k k <sup>k</sup> ax ax x x ax*

*T*

<sup>1</sup> ( ) ( ) ( )( ) 3!

<sup>1</sup> ( ) ( ) ( ) ( )( ) , 4!

*<sup>k</sup> x x x x x x ax x x*

is known as the gradient of *ak*(.) and

2

Consider nonlinear systems having state-space representations of the form

( ): *<sup>k</sup> a x* , its Taylor series about *x* = *x*0 may be written as

historians to correct that." *Samuel Langhorne Clemens aka. Mark Twain*

<sup>1</sup> () () *k kk kk k x ax bxw* ,

( ) *k kk y c x* ,

2 2 2

1 1 2 1 2 2 2 2 2 1 2 2

*k k k*

*a a a x xx xx a a a*

*k k k*

*k k k n n n*

*a a a xx xx x*

2 2 2

*k n*

*a xx x xx*

1 2

where *ak*(.), *bk*(.) and *ck*(.) are continuous differentiable functions. For a scalar function,

"In the real world, nothing happens at the right place at the right time. It is the job of journalists and

2

*n*

*<sup>k</sup> x x x x ax x x*

<sup>1</sup> ( ) ( )( ) 2! *T T <sup>k</sup> x x ax x x*

Taylor series about a point x0

1

*k k*

*a a*

*x x* 

where

*k*

is called a Hessian matrix.

**10.2.2 Nonlinear Signal Models** 

*a*

$$\begin{split} c\_{k}(\mathbf{x}) &= c\_{k}(\mathbf{x}\_{0}) + (\mathbf{x} - \mathbf{x}\_{0}) \frac{\partial \mathbf{c}\_{k}}{\partial \mathbf{x}} \bigg|\_{\mathbf{x} = \mathbf{x}\_{0}} + \frac{1}{2} (\mathbf{x} - \mathbf{x}\_{0})^{2} \frac{\partial^{2} \mathbf{c}\_{k}}{\partial \mathbf{x}^{2}} \bigg|\_{\mathbf{x} = \mathbf{x}\_{0}} \\ &+ \frac{1}{6} (\mathbf{x} - \mathbf{x}\_{0})^{3} \left. \frac{\partial^{3} \mathbf{c}\_{k}}{\partial \mathbf{x}^{3}} \right|\_{\mathbf{x} = \mathbf{x}\_{0}} + \dots + \frac{1}{n!} (\mathbf{x} - \mathbf{x}\_{0})^{n} \left. \frac{\partial^{n} \mathbf{c}\_{k}}{\partial \mathbf{x}^{n}} \right|\_{\mathbf{x} = \mathbf{x}\_{0}}, \end{split} \tag{6}$$

respectively.

#### **10.2.3 First-Order Extended Kalman Filter**

Suppose that filtered estimates / ˆ *k k x* of *xk* are desired given observations

$$\mathbf{z}\_k = \mathbf{c}\_k(\mathbf{x}\_k) + \mathbf{v}\_k \tag{7}$$

where *vk* is a measurement noise sequence. A first-order EKF for the above problem is developed below. Following the approach within [3], the nonlinear system (2) – (3) is approximated by

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{z} \mathbf{w}\_k + \boldsymbol{\mu}\_k \tag{8}$$

$$\mathbf{y}\_k = \mathbf{C}\_k \mathbf{x}\_k + \boldsymbol{\pi}\_k \,\tag{9}$$

where *Ak*, *Bk*, *Ck*, *<sup>k</sup>* and *πk* are found from suitable truncations of the Taylor series for each nonlinearity. From Chapter 4, a filter for the above model is given by

$$
\hat{\mathbf{x}}\_{k/k} = \hat{\mathbf{x}}\_{k/k-1} + L\_k (\mathbf{z}\_k - \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k-1} - \boldsymbol{\pi}\_k) \, , \tag{10}
$$

$$
\hat{\mathbf{x}}\_{k\*1/k} = \mathbf{A}\_k \hat{\mathbf{x}}\_{k/k} + \boldsymbol{\mu}\_k \tag{11}
$$

<sup>&</sup>quot;You will always define events in a manner which will validate your agreement with reality." *Steve Anthony Maraboli*

where

where *Ak* = / <sup>ˆ</sup> ( ) *k k <sup>k</sup> x x a x* and

system output,

*C xkk k*

where *Ck* = / 1 <sup>ˆ</sup> ( ) 

 / / <sup>ˆ</sup> *k k T*

*T*

<sup>1</sup> ( )( ) <sup>ˆ</sup> <sup>ˆ</sup>

/ 1 / 1 <sup>ˆ</sup> () ( ) ( ) <sup>ˆ</sup> <sup>ˆ</sup> *k k*

*k k kk k kk k x x cx cx x x c*

,

*kk k x x tr P a* and / 1

third-order terms within (1) results in

*A xkk k*

**10.2.5 Third-Order Extended Kalman Filter** 

/ 1

*T T*

*k k <sup>k</sup> x xc x* and *πk* = / 1 ( ) <sup>ˆ</sup> *k kk c x* – / 1 <sup>ˆ</sup> *C xk kk* + / 1

*<sup>k</sup>* and *πk* into the filtering and prediction recursions (10) – (11) yields the second-order EKF

*T*

/ 1

/ 1 / 1 / 1 ˆ ˆ

/ /1 / 1 / 1 ˆ <sup>1</sup> ˆ ˆ ( ) <sup>ˆ</sup> <sup>2</sup> *k k*

> 1/ / / ˆ <sup>1</sup> ˆ ˆ( ) <sup>2</sup> *k k*

The above form is described in [2]. The further simplifications

/ 1 <sup>ˆ</sup> *k k T kk k x xP c*

<sup>1</sup> () ( ) ( ) ˆ ˆ

*T*

,

*k k kk kk k kk k x xx x ax ax x x a P a*

*kk k k kk k x x x x P a xx a*

<sup>1</sup> ( )

just to see what other people are doing with the same set of facts." *William E. Vaughan*

/ / ˆ ˆ <sup>1</sup> ( ) <sup>ˆ</sup> <sup>6</sup> *k k k k*

*k k k kk kk k x x x ax P a* .

<sup>≈</sup> / 1

Higher order EKFs can be realised just as elegantly as its predecessors. Retaining up to

/ / / ˆ ˆ

/ /

/ / / ˆ ˆ

*k k* 6 *k k T*

"It might be a good idea if the various countries of the world would occasionally swap history books,

*kk kk k k k kk kk k x x x x L z cx P c* 

*k kk k kk k kk k x x x x cx x x c P c* 

*<sup>k</sup>* = / ( ) <sup>ˆ</sup> *k kk a x* – / <sup>ˆ</sup> *A xk kk* + /

/ 1 / 1

*k k* 2 *k k*

1

1

*T*

/

/ 1 <sup>ˆ</sup>*k k T kk k x x tr P c*

/ /

*k k kk k x xx x A ax P a* (21)

*k k* 2 *k k T*

/ ˆ

/ 1 ˆ

,

are assumed in [4], [5].

2 *k k T kk k x x P c*

/ 1

*kk k x x P a* . Similarly for the

. Substituting for

(17)

(18)

(19)

(20)

/ / <sup>ˆ</sup> *k k T kk k x x P a* <sup>≈</sup>

2 *k k T*

/ 1 / 1 ˆ <sup>1</sup> ( ) <sup>ˆ</sup> ( ) <sup>ˆ</sup> <sup>2</sup> *k k T T k kk <sup>k</sup> k kk x x xx c xx* 

where *Lk* = <sup>1</sup> / 1 *<sup>T</sup> P C kk k k* is the filter gain, in which *k* = / 1 *<sup>T</sup> CP C k kk k* + *Rk* , *Pk k*/ = *Pk k*/ 1 – / 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup> / 1 ) *R CP k k kk* and *Pk k* 1/ = / *<sup>T</sup> AP A k kk k* + *<sup>T</sup> BQB k kk* . It is common practice (see [1] – [5]) to linearise about the current conditional mean estimate, retain up to first order terms within the corresponding Taylor series and assume *Bk* = / ( ) ˆ *k kk b x* . This leads to

$$\begin{aligned} a\_k(\mathbf{x}) & \approx a\_k(\hat{\mathbf{x}}\_{k \mid k}) + (\mathbf{x} - \hat{\mathbf{x}}\_{k \mid k})^\top \nabla a\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k \mid k}} \\ &= A\_k \mathbf{x}\_k + \mu\_k \end{aligned} \tag{12}$$

and

$$\begin{aligned} \mathbf{c}\_{k}(\mathbf{x}\_{k}) & \approx \mathbf{c}\_{k}(\hat{\mathbf{x}}\_{k \mid k-1}) + (\mathbf{x} - \hat{\mathbf{x}}\_{k \mid k-1})^{T} \nabla \mathbf{c}\_{k} \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k \mid k-1}} \\\\ &= \mathbf{C}\_{k} \mathbf{x}\_{k} + \boldsymbol{\pi}\_{k \mid k} \end{aligned} \tag{13}$$

where *Ak* = / <sup>ˆ</sup> ( ) *k k <sup>k</sup> x xa x* , *Ck* = / 1 <sup>ˆ</sup> ( ) *k k <sup>k</sup> x x c x* , *<sup>k</sup>* = / ( ) ˆ *k kk a x* – / ˆ *A xk kk* and *πk* = / 1 ( ) ˆ *k kk c x* – / 1 ˆ *C xk kk* . Substituting for *<sup>k</sup>* and *πk* into (10) – (11) gives

$$
\hat{\mathfrak{X}}\_{k/k} = \hat{\mathfrak{X}}\_{k/k-1} + L\_k(\mathbf{z}\_k - \mathbf{c}\_k(\hat{\mathfrak{X}}\_{k/k-1})) \, , \tag{14}
$$

$$
\hat{\mathfrak{x}}\_{k\*1/k} = \mathfrak{a}\_k(\hat{\mathfrak{x}}\_{k/k})\,. \tag{15}
$$

Note that nonlinearities enter into the state correction (14) and prediction (15), whereas linearised matrices *Ak*, *Bk* and *Ck* are employed in the Riccati equation and gain calculations.

In the case of scalar states, the linearisations are *Ak* = / ˆ *k k k x x a x* and *Ck* = / 1 ˆ *k k k x x c x* . In texts

on optimal filtering, the recursions (14) – (15) are either called a first-order EKF or simply an EKF, see [1] – [5]. Two higher order versions are developed below.

#### **10.2.4 Second-Order Extended Kalman Filter**

Truncating the series (1) after the second-order term and observing that / ( ) ˆ *T T k k x x* is a scalar yields

$$a\_k(\mathbf{x}) \approx a\_k(\hat{\mathbf{x}}\_{k/k}) + (\mathbf{x} - \hat{\mathbf{x}}\_{k/k})^T \nabla a\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k/k}} + \frac{1}{2} (\mathbf{x} - \hat{\mathbf{x}}\_{k/k})^T \nabla^T \nabla a\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k/k}} \text{ ( $\mathbf{x} - \hat{\mathbf{x}}\_{k/k}$ )},$$

$$= a\_k(\hat{\mathbf{x}}\_{k/k}) + (\mathbf{x} - \hat{\mathbf{x}}\_{k/k})^T \nabla a\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k/k}} + \frac{1}{2} \nabla P\_{k/k} \nabla^T a\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k/k}} \tag{16}$$

$$= A\_k \mathbf{x}\_k + \mu\_k \, ,$$

<sup>&</sup>quot;People take the longest possible paths, digress to numerous dead ends, and make all kinds of mistakes. Then historians come along and write summaries of this messy, nonlinear process and make it appear like a simple straight line." *Dean L. Kamen*

Smoothing, Filtering and Prediction:

*<sup>T</sup> CP C k kk k* + *Rk* , *Pk k*/ = *Pk k*/ 1 –

(12)

(13)

(14) (15)

. In texts

(16)

*<sup>T</sup> AP A k kk k* + *<sup>T</sup> BQB k kk* . It is common practice

/

*T*

/ 1

/ ˆ *k k*

and *Ck* =

/ 1 ˆ *k k*

*k k x x* is a

*x*

*k x x*

*c*

 

*k x x*

*a x* 

/ /

/ /

*k k* 2 *k k*

*k k* 2 *k k*

*<sup>k</sup>* = / ( ) ˆ *k kk a x* – / ˆ *A xk kk* and *πk* = / 1 ( ) ˆ *k kk c x* –

<sup>248</sup> Estimating the Past, Present and Future

(see [1] – [5]) to linearise about the current conditional mean estimate, retain up to first order

/ 1 / 1 <sup>ˆ</sup> () ( )( ) ˆ ˆ *k k*

Note that nonlinearities enter into the state correction (14) and prediction (15), whereas linearised matrices *Ak*, *Bk* and *Ck* are employed in the Riccati equation and gain calculations.

on optimal filtering, the recursions (14) – (15) are either called a first-order EKF or simply an

Truncating the series (1) after the second-order term and observing that / ( ) ˆ *T T*

/ / / / ˆ ˆ <sup>1</sup> () ( ) ( ) ˆ ˆ ( ) <sup>ˆ</sup> ( ) <sup>ˆ</sup>

"People take the longest possible paths, digress to numerous dead ends, and make all kinds of mistakes. Then historians come along and write summaries of this messy, nonlinear process and make it appear

*T T T k k kk kk k k k <sup>k</sup> k k x x x x ax ax x x a x x a x x* ,

*T T*

/ / / ˆ ˆ

*k kk kk k kk k x x x x ax x x a P a*

*k k k kk kk k x x cx cx x x c* 

*T*

is the filter gain, in which *k* = / 1

and *Pk k* 1/ = /

terms within the corresponding Taylor series and assume *Bk* = / ( ) ˆ *k kk b x* . This leads to

*k k kk kk k x x ax ax x x a*

,

,

*<sup>k</sup>* and *πk* into (10) – (11) gives

/ /1 / 1 ˆ ˆ ( ( )) ˆ *kk kk k k k kk x x Lz cx* ,

1/ / ˆ ˆ( ) *k k k kk x ax* .

/ / <sup>ˆ</sup> () ( ) ( ) ˆ ˆ *k k*

/ 1 ) *R CP k k kk* 

*A xkk k*

*C xkk k*

where *Ak* = / <sup>ˆ</sup> ( ) *k k <sup>k</sup> x xa x* , *Ck* = / 1 <sup>ˆ</sup> ( ) *k k <sup>k</sup> x xc x*

In the case of scalar states, the linearisations are *Ak* =

**10.2.4 Second-Order Extended Kalman Filter** 

,

like a simple straight line." *Dean L. Kamen*

EKF, see [1] – [5]. Two higher order versions are developed below.

<sup>1</sup> ( )( ) ˆ ˆ

where *Lk* = <sup>1</sup> / 1 *<sup>T</sup> P C kk k k* 

and

/ 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup>

/ 1 ˆ *C xk kk* . Substituting for

scalar yields

*A xkk k*

where *Ak* = / <sup>ˆ</sup> ( ) *k k <sup>k</sup> x x a x* and *<sup>k</sup>* = / ( ) <sup>ˆ</sup> *k kk a x* – / <sup>ˆ</sup> *A xk kk* + / / ˆ 1 2 *k k T kk k x x P a* . Similarly for the system output,

$$\begin{split} c\_{k}(\mathbf{x}) &\approx c\_{k}(\hat{\mathbf{x}}\_{k^{\ell/k-1}}) + (\mathbf{x}\_{k} - \hat{\mathbf{x}}\_{k^{\ell/k-1}})^{\top} \nabla c\_{k} \Big|\_{\mathbf{x}=\hat{\mathbf{x}}\_{k^{\ell/k-1}}} + \frac{1}{2} (\mathbf{x}\_{k} - \hat{\mathbf{x}}\_{k^{\ell}/k-1})^{\top} \nabla^{\top} \nabla c\_{k} \Big|\_{\mathbf{x}=\hat{\mathbf{x}}\_{k^{\ell/k-1}}} (\mathbf{x}\_{k} - \hat{\mathbf{x}}\_{k^{\ell/k-1}}) \\ &= c\_{k}(\hat{\mathbf{x}}\_{k^{\ell/k-1}}) + (\mathbf{x}\_{k} - \hat{\mathbf{x}}\_{k^{\ell/k-1}})^{\top} \nabla c\_{k} \Big|\_{\mathbf{x}=\hat{\mathbf{x}}\_{k^{\ell/k-1}}} + \frac{1}{2} \nabla P\_{k^{\ell/k}} \nabla^{\top} c\_{k} \Big|\_{\mathbf{x}=\hat{\mathbf{x}}\_{k^{\ell/k-1}}} \\ &= \mathbf{C}\_{k} \mathbf{x}\_{k} + \pi\_{k^{\ell/k}} \end{split} \tag{17}$$

where *Ck* = / 1 <sup>ˆ</sup> ( ) *k k <sup>k</sup> x x c x* and *πk* = / 1 ( ) <sup>ˆ</sup> *k kk c x* – / 1 <sup>ˆ</sup> *C xk kk* + / 1 / 1 ˆ 1 2 *k k T kk k x x P c* . Substituting for *<sup>k</sup>* and *πk* into the filtering and prediction recursions (10) – (11) yields the second-order EKF

$$\hat{\mathbf{x}}\_{k/k} = \hat{\mathbf{x}}\_{k/k-1} + L\_k \left( \mathbf{z}\_k - \mathbf{c}\_k(\hat{\mathbf{x}}\_{k/k-1}) - \frac{1}{2} \nabla P\_{k/k-1} \nabla^T \mathbf{c}\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k/k-1}} \right) \tag{18}$$

$$
\hat{\mathbf{x}}\_{k+1/k} = \mathbf{a}\_k(\hat{\mathbf{x}}\_{k/k}) + \frac{1}{2} \nabla P\_{k/k} \nabla^T \mathbf{a}\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k/k}}.\tag{19}
$$

The above form is described in [2]. The further simplifications / / <sup>ˆ</sup> *k k T kk k x x P a* <sup>≈</sup> / / <sup>ˆ</sup> *k k T kk k x xtr P a* and / 1 / 1 <sup>ˆ</sup> *k k T kk k x xP c* <sup>≈</sup> / 1 / 1 <sup>ˆ</sup>*k k T kk k x x tr P c* are assumed in [4], [5].

#### **10.2.5 Third-Order Extended Kalman Filter**

Higher order EKFs can be realised just as elegantly as its predecessors. Retaining up to third-order terms within (1) results in

$$\begin{aligned} a\_k(\mathbf{x}) & \approx a\_k(\hat{\mathbf{x}}\_{k \times k}) + (\mathbf{x} - \hat{\mathbf{x}}\_{k \times k}) \nabla a\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k \times k}} + \frac{1}{2} \nabla P\_{k \times k} \nabla^T a\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k \times k}} \\ & + \frac{1}{6} \nabla P\_{k \times k} \nabla^T a\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k \times k}} (\mathbf{x}\_k - \hat{\mathbf{x}}\_{k \times k}) \nabla a\_k \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k \times k}} \\ & = A\_k \mathbf{x}\_k + \mu\_k \end{aligned} \tag{20}$$

where

$$A\_k = \nabla a\_k(\mathbf{x})\big|\_{\mathbf{x}=\hat{\mathbf{x}}\_{k\times k}} + \frac{1}{6}\nabla P\_{k\times k}\nabla^T a\_k\big|\_{\mathbf{x}=\mathbf{x}\_{k\times k}}\tag{21}$$

<sup>&</sup>quot;It might be a good idea if the various countries of the world would occasionally swap history books, just to see what other people are doing with the same set of facts." *William E. Vaughan*

<sup>20</sup> <sup>25</sup> <sup>30</sup> <sup>35</sup> <sup>40</sup> −34

Figure 1. Mean-square-error (MSE) versus signal-to-noise-ratio (SNR) for Example 1: first-order EKF

The previously-described Extended-Kalman filters arise by linearising the signal model about the current state estimate and using the linear Kalman filter to predict the next estimate. This attempts to produce a locally optimal filter, however, it is not necessarily stable because the solutions of the underlying Riccati equations are not guaranteed to be positive definite. The faux algebraic Riccati technique [6] – [10] seeks to improve on EKF performance by trading off approximate optimality for stability. The familiar structure of the EKF is retained but stability is achieved by selecting a positive definite solution to a faux

Assume that data is generated by the following signal model comprising a stable, linear

where the components of *ck*(.) are assumed to be continuous differentiable functions. Suppose that it is desired to calculate estimates of the states from the measurements. A

where *gk*(.) is a gain function to be designed. From (24) – (26), the state prediction error is

"The observer, when he seems to himself to be observing a stone, is really, if physics is to be believed,

observing the effects of the stone upon himself." *Bertrand Arthur William Russell*

1/ / 1 ˆ ˆ ( ( )) ˆ *k k k k k kk x Ax g z c x* , (26)

(24) (25)

*<sup>k</sup>* <sup>1</sup> *k k x Ax Bw* ,

( ) *k kk k z cx v* ,

(solid line), second-order EKF (dashed line) and third-order EKF (dotted-crossed line).

SNR, dB

−33 −32 −31 −30 −29 −28 −27

**10.3 The Faux Algebraic Riccati Equation Technique** 

state evolution together with a nonlinear output mapping

nonlinear observer may be constructed having the form

**10.3.1 A Nonlinear Observer** 

Riccati equation for the gain design.

given by

MSE, dB

and *<sup>k</sup>* = / ( ) <sup>ˆ</sup> *k kk a x* – / <sup>ˆ</sup> *A xk kk* + / / ˆ 1 2 *k k T kk k x x P a* . Similarly, for the output nonlinearity it is assumed that

$$\begin{split} \mathbf{c}\_{k}(\mathbf{x}\_{k}) &\approx \mathbf{c}\_{k}(\hat{\mathbf{x}}\_{k^{\prime}/k-1}) + (\mathbf{x} - \hat{\mathbf{x}}\_{k^{\prime}/k-1}) \nabla c\_{k} \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k^{\prime}/k-1}} + \frac{1}{2} \nabla P\_{k^{\prime}/k-1} \nabla^{T} \mathbf{c}\_{k} \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k^{\prime}/k-1}} \\ &\quad + \frac{1}{6} \nabla P\_{k^{\prime}/k-1} \nabla^{T} \mathbf{c}\_{k} \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k^{\prime}/k-1}} (\mathbf{x}\_{k} - \hat{\mathbf{x}}\_{k^{\prime}/k-1}) \nabla c\_{k} \Big|\_{\mathbf{x} = \hat{\mathbf{x}}\_{k^{\prime}/k-1}} \\ &= \mathbf{C}\_{k} \mathbf{x}\_{k} + \boldsymbol{\pi}\_{k^{\prime}} \end{split} \tag{22}$$

where

$$\mathbf{C}\_{k} = \nabla c\_{k}(\mathbf{x})\big|\_{\mathbf{x}=\hat{\mathbf{x}}\_{k\times k-l}} + \frac{1}{6}\nabla P\_{k\times k-l}\nabla^{T}c\_{k}\big|\_{\mathbf{x}=\hat{\mathbf{x}}\_{k\times k-l}}\tag{23}$$

and *πk* = / 1 ( ) <sup>ˆ</sup> *k kk c x* – / 1 <sup>ˆ</sup> *C xk kk* + / 1 / 1 ˆ 1 6 *k k T kk k x x P c* . The resulting third-order EKF is defined by (18) – (19) in which the gain is now calculated using (21) and (23).

*Example 1***.** Consider a linear state evolution *xk*+1 = *Axk* + *wk*, with *A* = 0.5, *wk* , *Q* = 0.05, a nonlinear output mapping *yk* = sin(*xk*) and noisy observations *zk* = *yk* + *vk*, *vk* . The firstorder EKF for this problem is given by

$$
\hat{\mathfrak{X}}\_{k/k} = \hat{\mathfrak{X}}\_{k/k-1} + L\_k(z\_k - \sin(\hat{\mathfrak{X}}\_{k/k-1})) \text{ .}
$$

$$
\hat{\mathfrak{X}}\_{k+1/k} = A \hat{\mathfrak{X}}\_{k/k}.
$$

where *Lk* = <sup>1</sup> / 1 *<sup>T</sup> P C kk k k* , *k* = / 1 *<sup>T</sup> CP C k kk k* + *Rk* , *Ck* = / 1 cos( ) ˆ *k k x* , *Pk k*/ = *Pk k*/ 1 – / 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup> / 1 ) *R CP k k kk* and *Pk k* 1/ = / *<sup>T</sup> AP A k kk k* + *Qk* . The filtering step within the second-order EKF is amended to

$$
\hat{\mathfrak{X}}\_{k/k} = \hat{\mathfrak{X}}\_{k/k-1} + L\_k(z\_k - \sin(\hat{\mathfrak{X}}\_{k/k-1}) + \sin(\hat{\mathfrak{X}}\_{k/k-1})P\_{k/k-1}/2) \dots
$$

The modified output linearisation for the third-order EKF is

$$\mathbf{C}\_{k} = \cos(\hat{\mathbf{x}}\_{k/k-1}) + \sin(\hat{\mathbf{x}}\_{k/k-1}) P\_{k/k-1} / \mathbf{6} \ . $$

Simulations were conducted in which the signal-to-noise-ratio was varied from 20 dB to 40 dB for *N* = 200,000 realisations of Gaussian noise sequences. The mean-square-errors exhibited by the first, second and third-order EKFs are plotted in Fig. 1. The figure demonstrates that including higher-order Taylor series terms within the filter can provide small performance improvements but the benefit diminishes with increasing measurement noise.

<sup>&</sup>quot;No two people see the external world in exactly the same way. To every separate person a thing is what he thinks it is – in other words, not a thing, but a think." *Penelope Fitzgerald*

Smoothing, Filtering and Prediction:

(22)

*kk k x x P a* . Similarly, for the output nonlinearity it is

/ 1 / 1

*T*

*k k* 2 *k k*

<sup>250</sup> Estimating the Past, Present and Future

/ 1 / 1 / 1 ˆ ˆ

/ 1 / 1

(23)

. The resulting third-order EKF is defined

*<sup>T</sup> CP C k kk k* + *Rk* , *Ck* = / 1 cos( ) ˆ *k k x* , *Pk k*/ = *Pk k*/ 1 –

*<sup>T</sup> AP A k kk k* + *Qk* . The filtering step within the

*k k k kk kk k kk k x x x x cx cx x x c P c* 

> / 1 / 1 ˆ ˆ <sup>1</sup> ( ) <sup>ˆ</sup> <sup>6</sup> *k k k k*

> > / 1 / 1 <sup>ˆ</sup> / 1 <sup>ˆ</sup>

*T*

*k k* 6 *k k*

*Example 1***.** Consider a linear state evolution *xk*+1 = *Axk* + *wk*, with *A* = 0.5, *wk* , *Q* = 0.05, a nonlinear output mapping *yk* = sin(*xk*) and noisy observations *zk* = *yk* + *vk*, *vk* . The first-

/ /1 / 1 ˆ ˆ ( sin( )) ˆ *kk kk k k k k x x Lz x* ,

1/ / ˆ ˆ *k k kk x Ax ,* 

/ /1 / 1 /1 /1 ˆ ˆ ( sin( ) sin( ) / 2) ˆ ˆ *kk kk k k k k kk kk x x Lz x x P* .

/ 1 /1 /1 cos( ) sin( ) / 6 ˆ ˆ *C x xP <sup>k</sup> k k kk kk* .

Simulations were conducted in which the signal-to-noise-ratio was varied from 20 dB to 40 dB for *N* = 200,000 realisations of Gaussian noise sequences. The mean-square-errors exhibited by the first, second and third-order EKFs are plotted in Fig. 1. The figure demonstrates that including higher-order Taylor series terms within the filter can provide small performance

"No two people see the external world in exactly the same way. To every separate person a thing is

improvements but the benefit diminishes with increasing measurement noise.

what he thinks it is – in other words, not a thing, but a think." *Penelope Fitzgerald*

and *Pk k* 1/ = /

*kk k k kk k x x x xP c xx c* 

/ ˆ

<sup>1</sup> () ( )( ) ˆ ˆ

2 *k k T*

*T*

,

<sup>1</sup> ( )

1

, *k* = / 1

The modified output linearisation for the third-order EKF is

/ 1 ) *R CP k k kk* 

by (18) – (19) in which the gain is now calculated using (21) and (23).

*k k kk k x x x x C cx P c*

/ 1 ˆ

6 *k k T kk k x x P c*

*<sup>k</sup>* = / ( ) <sup>ˆ</sup> *k kk a x* – / <sup>ˆ</sup> *A xk kk* + /

*C xkk k*

order EKF for this problem is given by

/ 1 *<sup>T</sup> P C kk k k* 

second-order EKF is amended to

where *Lk* = <sup>1</sup>

/ 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup>

and *πk* = / 1 ( ) <sup>ˆ</sup> *k kk c x* – / 1 <sup>ˆ</sup> *C xk kk* + / 1

1

and 

where

assumed that

Figure 1. Mean-square-error (MSE) versus signal-to-noise-ratio (SNR) for Example 1: first-order EKF (solid line), second-order EKF (dashed line) and third-order EKF (dotted-crossed line).

#### **10.3 The Faux Algebraic Riccati Equation Technique**

#### **10.3.1 A Nonlinear Observer**

The previously-described Extended-Kalman filters arise by linearising the signal model about the current state estimate and using the linear Kalman filter to predict the next estimate. This attempts to produce a locally optimal filter, however, it is not necessarily stable because the solutions of the underlying Riccati equations are not guaranteed to be positive definite. The faux algebraic Riccati technique [6] – [10] seeks to improve on EKF performance by trading off approximate optimality for stability. The familiar structure of the EKF is retained but stability is achieved by selecting a positive definite solution to a faux Riccati equation for the gain design.

Assume that data is generated by the following signal model comprising a stable, linear state evolution together with a nonlinear output mapping

$$\mathbf{x}\_{k+1} = A\mathbf{x}\_k + Bw\_k.\tag{24}$$

$$
\omega\_k = \mathfrak{c}\_k(\mathfrak{x}\_k) + \mathfrak{v}\_{k'} \tag{25}
$$

where the components of *ck*(.) are assumed to be continuous differentiable functions. Suppose that it is desired to calculate estimates of the states from the measurements. A nonlinear observer may be constructed having the form

$$
\hat{\mathbf{x}}\_{k+1/k} = A\hat{\mathbf{x}}\_k + \mathbf{g}\_k(\mathbf{z}\_k - \mathbf{c}(\hat{\mathbf{x}}\_{k/k-1})) \,. \tag{26}
$$

where *gk*(.) is a gain function to be designed. From (24) – (26), the state prediction error is given by

<sup>&</sup>quot;The observer, when he seems to himself to be observing a stone, is really, if physics is to be believed, observing the effects of the stone upon himself." *Bertrand Arthur William Russell*

diag( (1) 2 *w* 

Let

*Wheeler*

 ,…, (6) 2 *w* 

**10.3.3 Tracking Multiple Signals** 

modelled by equation (34), where (1)

1

 

1

 1

1

1

1

). The states ( )*<sup>i</sup>*

denote the complex baseband observations, where (1)

*C*

*D*

prediction error to linear terms yields *Ck* = (1) [*Ck*

This form suggests the choice

uncorrelated, white processes with covariance *R* = (1)

*k*

*<sup>D</sup> <sup>D</sup> D* 

That is, rather than solve (30), an arbitrary positive definite solution *k* is assumed instead and then the gain at each time *k* is calculated from (31) – (32) using *k* in place of *Pk*/*k*-1.

Consider the problem of tracking two frequency or phase modulated signals which may be

and (1) *wk* , … (6) *wk* are zero-mean, uncorrelated, white processes with covariance *Q* =

*k* and ( )*<sup>i</sup>* 

*<sup>k</sup>* , *i* = 1, 2, represent the signals'

. (34)

*<sup>k</sup> v* are zero-mean,

. Expanding the

(35)

*w w*

) (4) (5) (6)

*k k k*

*w w w*

*<sup>k</sup> v* , …, (4)

,…, (4)

.

.

<sup>2</sup> ) *<sup>v</sup>* 

<sup>2</sup> ( *<sup>v</sup> diag* 

(2) ] *Ck* , where

( ) ( ) ( ) ( ) / 1 /1 /1 ( ) ( ) ( ) / 1 /1 /1

*a*

 

*i i i i k k kk kk <sup>k</sup> <sup>i</sup> <sup>i</sup> <sup>i</sup>*

*k k kk kk*

( ) ( )

*i i*

() () () () /1 /1 /1 /1

*kk kk kk kk*

*a a*

 

*a*

ˆ ˆ cos 0 ˆ sin ˆ 0 ˆ sin ˆ cos

, where

( ) / 1 / 1

*i k k k k <sup>k</sup> i i i i*

ˆ ˆ cos sin ˆ ˆ sin / cos / ˆ ˆ

"If you haven't found something strange during the day, it hasn't been much of a day." *John Archibald* 

*<sup>k</sup> a* , (1) *<sup>k</sup>* , (2) *<sup>k</sup>* , (1) *<sup>k</sup>* , (2) *<sup>k</sup>* , (1) *<sup>a</sup>* , (2) *<sup>a</sup>* , (1) , (2) 

*<sup>k</sup> a* , (2)

*<sup>k</sup> a* , ( )*<sup>i</sup>* 

(1) (1) (1) (1)

00 00 0 0 0 00 0 0 11 00 0 0 00 00 0 0 0 0 0 0 00 11 0

*k a k k k k k k k k*

*a a w*

(1) (1) (1) (2)

(1) (1) (3

(1) (1) (1) (1) (2) (1) (1) (2) (3) (2) (2) (3) (4) (2) (2) (4)

cos sin cos sin

*z a v z a v z a v z a v*

*k k k k k k k k k k k k k k k k*

> (1) (2) *k*

*k*

 instantaneous amplitude, frequency and phase components, respectively.

(2) (2) (2)

*k a k k k k k*

*a a*

(2) (2) (2)

(2) (2)

$$
\tilde{\mathbf{x}}\_{k+1/k} = A \tilde{\mathbf{x}}\_{k/k-1} - \mathbf{g}\_k(\mathbf{z}\_k) + \mathbf{w}\_{k,\prime} \tag{27}
$$

where *<sup>k</sup> x* = *xk* – / 1 ˆ *k k x* and *εk* = *zk* – / 1 ( ) ˆ *k k c x* . The Taylor series expansion of *ck*(.) to first order terms leads to *εk* <sup>≈</sup> *C xk kk*/ 1 + *vk*, where *Ck* = / 1 <sup>ˆ</sup> ( ) *k k <sup>k</sup> x xc x* . The objective here is to design *gk*(*εk*) to be a linear function of *k k*/ 1 *x* to first order terms. It will be shown that for certain classes of problems, this objective can be achieved by a suitable choice of a linear bounded matrix function of the states *Dk*, resulting in the time-varying gain function *gk*(*εk*) = *KkDkεk*, where *Kk* is a gain matrix of appropriate dimension. For example, consider *xk <sup>n</sup>* and *zk <sup>m</sup>* , which yield *εk <sup>m</sup>* and *Ck m n* . Suppose that a linearisation *Dk <sup>p</sup> <sup>m</sup>* can be found so that *Ck* = *DkCk <sup>p</sup> <sup>m</sup>* possesses approximately constant terms. Then the locally linearised error (27) may be written as

$$
\tilde{\mathbf{x}}\_{k+1/k} = (A - K\_k \overline{\mathbf{C}}\_k) \tilde{\mathbf{x}}\_{k/k-1} - K\_k D\_k \mathbf{v}\_k + w\_k \,. \tag{28}
$$

If ( ) *<sup>i</sup> A* < 1, *i* = 1 … *n*, and if the pair (, ) *A Ck* is completely observable, then the asymptotic stability of (28) can be guaranteed by selecting the gain such that ( ) *<sup>i</sup> A K Ck k* < 1. A method for selecting the gain is described below.

#### **10.3.2 Gain Selection**

From (28), an approximate equation for the error covariance *Pk*/*k*-1 = /1 /1 { } *<sup>T</sup> Ex x kk kk* is

$$P\_{k+1/k} = (A - K\_k \overline{C}\_k) P\_{k/k-1} (A - K\_k \overline{C}\_k)^T + K\_k D\_k R D\_k^T K\_k^T + Q\_{\prime\prime} \tag{29}$$

which can be written as

$$P\_{k/k} = P\_{k/k-1} - P\_{k/k-1} \overline{\mathbf{C}}\_k^T (\overline{\mathbf{C}}\_k P\_{k/k-1} \overline{\mathbf{C}}\_k^T + D\_k R D\_k^T)^{-1} \overline{\mathbf{C}}\_k P\_{k/k-1} \tag{30}$$

$$P\_{k+1/k} = A P\_{k/k} A + Q \,. \tag{31}$$

In an EKF for the above problem, the gain is obtained by solving the above Riccati difference equation and calculating

$$\mathbf{K}\_k = P\_{k/k-1} \overline{\mathbf{C}}\_k^T \left( \overline{\mathbf{C}}\_k P\_{k/k-1} \overline{\mathbf{C}}\_k^T + D\_k R D\_k^T \right)^{-1}. \tag{32}$$

The faux algebraic Riccati equation approach [6] – [10] is motivated by connections between Riccati difference equation and algebraic Riccati equation solutions. Indeed, it is noted for some nonlinear problems that the gains can converge to a steady-state matrix [3]. This technique is also known as 'covariance setting'. Following the approach of [10], the Riccati difference equation (30) may be masqueraded by the faux algebraic Riccati equation

$$
\Sigma\_k = \Sigma\_k - \Sigma\_k \overline{\mathbf{C}}\_k^T (\overline{\mathbf{C}}\_k \Sigma\_k \overline{\mathbf{C}}\_k^T + D\_k R D\_k^T)^{-1} \overline{\mathbf{C}}\_k \Sigma\_k \tag{33}
$$

<sup>&</sup>quot;The universe as we know it is a joint product of the observer and the observed." *Pierre Teilhard De Chardin*

That is, rather than solve (30), an arbitrary positive definite solution *k* is assumed instead and then the gain at each time *k* is calculated from (31) – (32) using *k* in place of *Pk*/*k*-1.

#### **10.3.3 Tracking Multiple Signals**

Consider the problem of tracking two frequency or phase modulated signals which may be modelled by equation (34), where (1) *<sup>k</sup> a* , (2) *<sup>k</sup> a* , (1) *<sup>k</sup>* , (2) *<sup>k</sup>* , (1) *<sup>k</sup>* , (2) *<sup>k</sup>* , (1) *<sup>a</sup>* , (2) *<sup>a</sup>* , (1) , (2) and (1) *wk* , … (6) *wk* are zero-mean, uncorrelated, white processes with covariance *Q* = diag( (1) 2 *w* ,…, (6) 2 *w* ). The states ( )*<sup>i</sup> <sup>k</sup> a* , ( )*<sup>i</sup> k* and ( )*<sup>i</sup> <sup>k</sup>* , *i* = 1, 2, represent the signals' instantaneous amplitude, frequency and phase components, respectively.

$$
\begin{bmatrix}
\boldsymbol{a}\_{k+1}^{(1)} \\
\boldsymbol{a}\_{k+1}^{(1)} \\
\boldsymbol{\theta}\_{k+1}^{(1)} \\
\boldsymbol{a}\_{k+1}^{(2)} \\
\boldsymbol{\theta}\_{k+1}^{(2)} \\
\boldsymbol{\theta}\_{k+1}^{(2)} \\
\boldsymbol{\theta}\_{k+1}^{(2)}
\end{bmatrix} = \begin{bmatrix}
\mu\_{a}^{(1)} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} \\
\boldsymbol{0} & \mu\_{a}^{(1)} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{1} & \boldsymbol{1} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \mu\_{a}^{(2)} & \boldsymbol{0} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \mu\_{a}^{(2)} & \boldsymbol{0} \\
\boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{0} & \boldsymbol{1} & \boldsymbol{1}
\end{bmatrix} \begin{bmatrix}
\boldsymbol{w}\_{k}^{(1)} \\
\boldsymbol{w}\_{k}^{(1)} \\
\boldsymbol{w}\_{k}^{(2)} \\
\boldsymbol{w}\_{k}^{(2)} \\
\boldsymbol{w}\_{k}^{(2)} \\
\boldsymbol{w}\_{k}^{(2)}
\end{bmatrix}.
\tag{34}$$

Let

Smoothing, Filtering and Prediction:

*<sup>i</sup> A K Ck k* <

(30) (31)

<sup>252</sup> Estimating the Past, Present and Future

where *<sup>k</sup> x* = *xk* – / 1 ˆ *k k x* and *εk* = *zk* – / 1 ( ) ˆ *k k c x* . The Taylor series expansion of *ck*(.) to first order

to be a linear function of *k k*/ 1 *x* to first order terms. It will be shown that for certain classes of problems, this objective can be achieved by a suitable choice of a linear bounded matrix function of the states *Dk*, resulting in the time-varying gain function *gk*(*εk*) = *KkDkεk*, where *Kk* is a gain matrix of appropriate dimension. For example, consider *xk <sup>n</sup>* and *zk <sup>m</sup>* , which yield *εk <sup>m</sup>* and *Ck m n* . Suppose that a linearisation *Dk <sup>p</sup> <sup>m</sup>* can be found so that *Ck* = *DkCk <sup>p</sup> <sup>m</sup>* possesses approximately constant terms. Then the locally linearised

*<sup>i</sup> A* < 1, *i* = 1 … *n*, and if the pair (, ) *A Ck* is completely observable, then the

1/ / 1 ( )( )*<sup>T</sup> T T P A K C P A K C K D RD K Q k k k k kk kk k k k k* , (29)

1

1

*k k kk kkk k k kk C C C D RD C* . (33)

. (32)

asymptotic stability of (28) can be guaranteed by selecting the gain such that ( )

From (28), an approximate equation for the error covariance *Pk*/*k*-1 = /1 /1 { } *<sup>T</sup> Ex x kk kk* is

/ /1 /1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> P P P C C P C D RD C P kk kk kk k k kk k k k k kk* ,

*P AP A Q k k kk* 1/ / .

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> K P C C P C D RD k kk k k kk k k k*

difference equation (30) may be masqueraded by the faux algebraic Riccati equation

<sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup>*

"The universe as we know it is a joint product of the observer and the observed." *Pierre Teilhard De* 

In an EKF for the above problem, the gain is obtained by solving the above Riccati difference

The faux algebraic Riccati equation approach [6] – [10] is motivated by connections between Riccati difference equation and algebraic Riccati equation solutions. Indeed, it is noted for some nonlinear problems that the gains can converge to a steady-state matrix [3]. This technique is also known as 'covariance setting'. Following the approach of [10], the Riccati

, (27)

1/ / 1 ( ) *k k k k kk k k k k x A KC x KDv w* . (28)

. The objective here is to design *gk*(*εk*)

1/ / 1 ( ) *k k kk k k k x Ax g w*

terms leads to *εk* <sup>≈</sup> *C xk kk*/ 1 + *vk*, where *Ck* = / 1 <sup>ˆ</sup> ( ) *k k <sup>k</sup> x xc x*

1. A method for selecting the gain is described below.

error (27) may be written as

**10.3.2 Gain Selection** 

which can be written as

equation and calculating

*Chardin*

If ( ) 

(1) (1) (1) (1) (2) (1) (1) (2) (3) (2) (2) (3) (4) (2) (2) (4) cos sin cos sin *k k k k k k k k k k k k k k k k z a v z a v z a v z a v* (35) 

denote the complex baseband observations, where (1) *<sup>k</sup> v* , …, (4) *<sup>k</sup> v* are zero-mean, uncorrelated, white processes with covariance *R* = (1) <sup>2</sup> ( *<sup>v</sup> diag* ,…, (4) <sup>2</sup> ) *<sup>v</sup>* . Expanding the prediction error to linear terms yields *Ck* = (1) [*Ck* (2) ] *Ck* , where

$$\mathbf{C}\_{k}^{(i)} = \begin{bmatrix} \cos \hat{\phi}\_{k/k-1}^{(i)} & \mathbf{0} & -\hat{a}\_{k/k-1}^{(i)} \sin \hat{\phi}\_{k/k-1}^{(i)} \\ \sin \hat{\phi}\_{k/k-1}^{(i)} & \mathbf{0} & \hat{a}\_{k/k-1}^{(i)} \cos \hat{\phi}\_{k/k-1}^{(i)} \end{bmatrix}.$$

This form suggests the choice (1) (2) *k k k <sup>D</sup> <sup>D</sup> D* , where

$$D\_k^{(i)} = \begin{bmatrix} \cos \hat{\phi}\_{k/k-1}^{(i)} & \sin \hat{\phi}\_{k/k-1}^{(i)} \\ -\sin \hat{\phi}\_{k/k-1}^{(i)} / \hat{a}\_{k/k-1}^{(i)} & \cos \hat{\phi}\_{k/k-1}^{(i)} / \hat{a}\_{k/k-1}^{(i)} \end{bmatrix}.$$

<sup>&</sup>quot;If you haven't found something strange during the day, it hasn't been much of a day." *John Archibald Wheeler*

*, that is,* 

*for all* ( )*<sup>i</sup> <sup>k</sup> e ,* ( )*<sup>i</sup>*

*for all* ( )*<sup>i</sup> <sup>k</sup> e* 

 *-1γ(e)+ e – I*

*nonlinearities,* 

*Wolfgang von Goethe*

*– I*

*Lemma 1 [10]: Consider the system (36), where w, e* 

γ

∑

*\_* 

*e* 

*w* 

γ *(e)*  *<sup>k</sup> e ≠ 0. Assume that* 

*. Under these conditions w* 

 *-1γ(e) + f q e . Then* 

 ( ), *e e =* () () 1

*i e e*

*m*

*Consider the first term on the right hand side of (39). Since the* 

( ),

*i i*

*Proof: From (36), <sup>f</sup> w = f e +* () () *fGz e*

*k* 

0 ≤ () () ( )/ *i i k k*

*e e* ≤

Figure 2. Nonlinear error system configuration. Figure 3. Stable gain space for Example 2.

<sup>1</sup> ( () () ), , *G z <sup>f</sup> q Gz I ee ee* 

<sup>1</sup> , () ( ), ( ) ), ( ) *w <sup>f</sup> <sup>f</sup> q w e eI e e*

 

*Using the approach of [11] together with the sector conditions on the identical noninteracting nonlinearities (37), it can be shown that expanding out the second term of (39) yields* , () *<sup>f</sup> e e*

"The intelligent man finds almost everything ridiculous, the sensible man hardly anything." *Johann* 

 

<sup>1</sup> ( ( ) ( ) ) ( ), ( ) *G z <sup>f</sup> q Gz I e e*

 *and* <sup>1</sup> *eI e e*

*noninteracting nonlinearities, with* ( ) ( ) *<sup>i</sup>*

*matrix. Suppose that for some q > 0, q* 

0 1 2 3 4

K1

(37)

 *is a causal, stable, finite-gain, time-invariant map <sup>m</sup>*

*, such that* 

(38)

 *e* <sup>2</sup> .

 *and w + f q w = (G(z) +* ( ) *<sup>f</sup> q G z + I*

 

1 *m i*

*i e* 

*k* 

*(.) consists of m identical,* 

( )*e consists of noninteracting* 

 *eI e ≥ 0.* 

> *≥*

 *–* () 1 () () , *<sup>i</sup> <sup>i</sup>* 

 *-1) γ(e) + e* 

(39)

*], ≥ 0,*  

0

*<sup>m</sup> , having a z-transform G(z), which is bounded on the unit circle. Let I denote an m m identity* 

*, there exists a* 

*<sup>m</sup> . Suppose that* 

*e monotonically increasing in the sector [0,*

 

*q e e*

.

 ( ), ( )  *=* ( )

 

<sup>2</sup>  *implies e,* ( ) ( ) *<sup>i</sup>*

1

2

K2

3

4

.

*a*

In the multiple signal case, the linearization *Ck* = *DkCk* does not result in perfect decoupling. While the diagonal blocks reduce to (,) *i i Ck* = 100 001 , the off-diagonal blocks are

$$
\overline{\mathbf{C}}\_{k}^{(i,j)} = \begin{bmatrix}
\cos(\hat{\phi}\_{k/k-1}^{(i)} - \hat{\phi}\_{k/k-1}^{(j)}) & \mathbf{0} & \hat{a}\_{k/k-1}^{(i)}\sin(\hat{\phi}\_{k}^{(i)} - \hat{\phi}\_{k/k-1}^{(j)}) \\
\end{bmatrix}
$$

Assuming a symmetric positive definite solution to (33) of the form k = 0 0 0 0 *k k k k k* ,

with *<sup>a</sup> <sup>k</sup>* , *<sup>k</sup>* , *<sup>k</sup>* , *<sup>k</sup>* and choosing the gains according to (32) yields *Kk* = 0 0 0 *a k k k K K K* ,

where *<sup>a</sup> Kk* = ( *a a k k* + 2 1 ) *v* , *Kk* = ( *k k* + 22 1 / ˆ ) *v kk a* and *Kk* = ( *k k* + 22 1 / ˆ ) *v kk a* . The nonlinear observer then becomes

$$
\hat{a}\_{k/k}^{(i)} = \hat{a}\_{k/k-1}^{(i)} + \Sigma\_k^a (\boldsymbol{z}\_k^{(1)} \cos \hat{\phi}\_{k/k-1}^{(i)} + \boldsymbol{z}\_k^{(2)} \sin \hat{\phi}\_{k/k-1}^{(i)}) (\Sigma\_k^a + \sigma\_v^2)^{-1},
$$

$$
\hat{a}\_{k/k}^{(i)} = \hat{a}\_{k/k-1}^{(i)} + \Sigma\_k^{a\phi} (\boldsymbol{z}\_k^{(1)} \cos \hat{\phi}\_{k/k-1}^{(i)} + \boldsymbol{z}\_k^{(2)} \sin \hat{\phi}\_{k/k-1}^{(i)}) (\hat{a}\_{k/k-1}^{(i)} \Sigma\_k^\phi + \sigma\_v^2 / \boldsymbol{a}\_{k/k-1}^{(i)})^{-1},
$$

$$
\hat{\phi}\_{k/k}^{(i)} = \hat{\phi}\_{k/k-1}^{(i)} + \Sigma\_k^\phi (\boldsymbol{z}\_k^{(1)} \cos \hat{\phi}\_{k/k-1}^{(i)} + \boldsymbol{z}\_k^{(2)} \sin \hat{\phi}\_{k/k-1}^{(i)}) (\hat{a}\_{k/k-1}^{(i)} \Sigma\_k^\phi + \sigma\_v^2 / \boldsymbol{a}\_{k/k-1}^{(i)})^{-1}.
$$

#### **10.3.4 Stability Conditions**

In order to establish conditions for the error system (28) to be asymptotically stable, the

problem is recast in a passivity framework as follows. Let *w* = (1) (2) ( ) *k k m k w w w* , *e*<sup>=</sup> (1) (2) ( ) *k k m k e e e <sup>m</sup>* .

Consider the configuration of Fig. 2, in which there is a cascade of a stable linear system and a nonlinear function matrix *γ*(.) acting on *e*. It follows from the figure that

$$e = w - \mathcal{G}\,\gamma(e). \tag{36}$$

Let *f* denote a forward difference operator with *<sup>f</sup> <sup>k</sup> e* = ( )*<sup>i</sup> <sup>k</sup> e* – ( ) 1 *i <sup>k</sup> e* . It is assumed that (.) satisfies some sector conditions which may be interpreted as bounds existing on the slope of the components of (.); see Theorem 14, p. 7 of [11].

<sup>&</sup>quot;Discovery consists of seeing what everyone has seen and thinking what nobody has thought." *Albert Szent-Görgyi*

Smoothing, Filtering and Prediction:

.

0 0

*a k*

0 0

0 0

/ ˆ ) *v kk* 

> (1) (2)

*k k*

 

*e e*

( )

*e*

*<sup>k</sup> e* . It is assumed that

*m k*

*<sup>m</sup>* .

(.)

*K*

*a k*

*k k k k*

 

*a* . The

,

,

 

0

*k k*

*K K* 

, the off-diagonal blocks are

 

> 

 

(1) (2)

*k k*

 , *e*<sup>=</sup>

*w w*

( )

*m k*

*w*

*γ*(*e*). (36)

*<sup>k</sup> e* – ( ) 1 *i*

  = ( *k k* + 22 1

<sup>254</sup> Estimating the Past, Present and Future

In the multiple signal case, the linearization *Ck* = *DkCk* does not result in perfect decoupling.

/ 1 / 1

*k k k k*

 

*a a*

 

Assuming a symmetric positive definite solution to (33) of the form k =

 = ( *k k* + 22 1

/ /1 / 1 / 1 ˆ ˆ ˆ ˆ ( cos sin )( ) *ii a <sup>i</sup> i a kk kk k k kk k kk k v aa z z*

100 001 

( ) ( ) ( ) ( ) ( ) /1 /1 /1 /1 /1 ( ) ( ) ( ) / 1 ( ) ( ) ( ) /1 /1 ( ) /1 /1

*a a*

 

*j j i j i kk kk kk kk kk j i j k k i j i kk kk i kk kk*

ˆ ˆ ˆ ˆ cos( ) <sup>ˆ</sup> sin( ) <sup>0</sup>

<sup>1</sup> <sup>ˆ</sup> ˆ ˆ <sup>0</sup> ˆ ˆ cos( ) cos( ) <sup>ˆ</sup> <sup>ˆ</sup>

and choosing the gains according to (32) yields *Kk* =

() () (1) ( ) ( 2) ( ) 2 1

() () (1) ( ) (2) ( ) ( ) 2 () 1 / /1 / 1 /1 /1 / 1 ˆ ˆ ˆ ˆ ( cos sin )(ˆ / ) *i i <sup>i</sup> i i <sup>i</sup> kk kk k k kk k kk kk k v kk z z a a*

() () (1) ( ) ( 2) ( ) ( ) 2 () 1 / /1 / 1 /1 /1 / 1 ˆ ˆ ˆ ˆ ( cos sin )(ˆ / ) *i i <sup>i</sup> i i <sup>i</sup> kk kk k k kk k kk kk k v kk z z a a*

In order to establish conditions for the error system (28) to be asymptotically stable, the

Consider the configuration of Fig. 2, in which there is a cascade of a stable linear system

satisfies some sector conditions which may be interpreted as bounds existing on the slope of

"Discovery consists of seeing what everyone has seen and thinking what nobody has thought." *Albert* 

and a nonlinear function matrix *γ*(.) acting on *e*. It follows from the figure that

*e* = *w* – 

(.); see Theorem 14, p. 7 of [11].

,

,

.

/ ˆ ) *v kk* 

*a* and *Kk*

While the diagonal blocks reduce to (,) *i i Ck* =

(, ) *<sup>i</sup> <sup>j</sup> Ck* =

*a a k k* + 2 1

 

> 

nonlinear observer then becomes

**10.3.4 Stability Conditions** 

the components of

*Szent-Görgyi*

) *v*

 , *Kk* 

problem is recast in a passivity framework as follows. Let *w* =

Let *f* denote a forward difference operator with *<sup>f</sup> <sup>k</sup> e* = ( )*<sup>i</sup>*

with *<sup>a</sup> <sup>k</sup>* , *<sup>k</sup>*

where *<sup>a</sup> Kk* = (

 , *<sup>k</sup>* , *<sup>k</sup>* 

Figure 2. Nonlinear error system configuration. Figure 3. Stable gain space for Example 2.

*Lemma 1 [10]: Consider the system (36), where w, e <sup>m</sup> . Suppose that (.) consists of m identical, noninteracting nonlinearities, with* ( ) ( ) *<sup>i</sup> k e monotonically increasing in the sector [0,], ≥ 0, , that is,* 

$$0 \le \gamma(e\_k^{(i)}) / e\_k^{(i)} \le \beta \tag{37}$$

*for all* ( )*<sup>i</sup> <sup>k</sup> e ,* ( )*<sup>i</sup> <sup>k</sup> e ≠ 0. Assume that is a causal, stable, finite-gain, time-invariant map <sup>m</sup> <sup>m</sup> , having a z-transform G(z), which is bounded on the unit circle. Let I denote an m m identity matrix. Suppose that for some q > 0, q , there exists a , such that* 

$$\left\langle (G(z) + q\Delta\_f G(z) + I\beta^{-1})e, e \right\rangle \ge \delta \left\langle e, e \right\rangle \tag{38}$$

*for all* ( )*<sup>i</sup> <sup>k</sup> e . Under these conditions w*  <sup>2</sup>  *implies e,* ( ) ( ) *<sup>i</sup> k e* <sup>2</sup> .

*Proof: From (36), <sup>f</sup> w = f e +* () () *fGz e and w + f q w = (G(z) +* ( ) *<sup>f</sup> q G z + I -1) γ(e) + e – I -1γ(e)+ e – I -1γ(e) + f q e . Then* 

$$\begin{split} \left\langle w + q\Delta\_{\mathcal{I}}w, \gamma(e) \right\rangle &\geq \left\langle e - I\beta^{-1}\gamma(e), \gamma(e) \right\rangle + \left\langle q\Delta\_{\mathcal{I}}e \right\rangle, \gamma(e) \right\rangle \\ &+ \left\langle (\mathcal{G}(z) + q\Delta\_{\mathcal{I}}\mathcal{G}(z) + I\beta^{-1})\gamma(e), \gamma(e) \right\rangle. \end{split} \tag{39}$$

*Consider the first term on the right hand side of (39). Since the*  ( )*e consists of noninteracting nonlinearities,*  ( ), *e e =* () () 1 ( ), *m i i i e e and* <sup>1</sup> *eI e e* ( ), ( )  *=* ( ) 1 *m i i e –* () 1 () () , *<sup>i</sup> <sup>i</sup> eI e ≥ 0. Using the approach of [11] together with the sector conditions on the identical noninteracting nonlinearities (37), it can be shown that expanding out the second term of (39) yields* , () *<sup>f</sup> e e ≥* anything." gain

<sup>&</sup>quot;The intelligent man finds almost everything ridiculous, the sensible man hardly anything." *Johann Wolfgang von Goethe*

−16

−14

−12

(i)

MSE, dB

−10

−8

−6

−5 <sup>0</sup> <sup>5</sup> −18

Figure 4. Demodulation performance for Example

demodulator error system in the form (36) is given by

1

 

written as *G*(*z*) = *C zI* ( – (*A* – <sup>1</sup> )) *KC K k k*

where *A* = diag(*A*(1), *A*(1)), *A*(1) =

1

1

1

(1) (1)

*k k*

(2) (2)

"To avoid criticism, do nothing, say nothing, be nothing." *Elbert Hubbard*

*k k*

2: (i) EKF and (ii) Nonlinear observer.

the above model with <sup>2</sup>

parameter choices were k =

degraded high-SNR performance.

SNR, dB

(ii)

<sup>0</sup> <sup>10</sup> <sup>20</sup> <sup>30</sup> −20

. The nonlinear observer gains were censored at

3: (i) EKF and (ii) Nonlinear observer.

*<sup>w</sup>* = 0.02. In a nonlinear observer design it was found that suitable

Figure 5. Demodulation performance for Example

SNR, dB

(ii)

, (44)

. The linear part of (44) may be

−15

−10

(i)

MSE, dB

A speech utterance, namely, the phrase "Matlab is number one", was sampled at 8 kHz and used to synthesize a unity-amplitude FM signal. An EKF demodulator was constructed for

each time *k* according to the stable gain space of Fig. 3. The results of a simulation study using 100 realisations of Gaussian measurement noise sequences are shown in Fig. 4. The figure demonstrates that enforcing stability can be beneficial at low SNR, at the cost of

*Example 3 [10]***.** Suppose that there are two superimposed FM signals present in the same frequency channel. Neglecting observation noise, a suitable approximation of the

sin( )

0100 0001 

 

 

. Two 8-kHz speech utterances, "Matlab is number

(1) (1) (1) (1)

(2) (2) (2) (2)

, *C* =

one" and "Number one is Matlab", centred at ±0.25 rad/s, were used to synthesize two superimposed unity-amplitude FM signals. Simulations were conducted using 100 realisations of Gaussian measurement noise sequences. The test condition (41) was

sin( ) ( )

*k k k k k k k k k k*

 

0 1 1 

*A KC K*

0.001 0.08 0.08 0.7  −5

0

*0. Using* <sup>2</sup> *<sup>f</sup> <sup>w</sup> ≤ 2* <sup>2</sup> *w (from p. 192 of [11]), the Schwartz inequality and the triangle inequality, it can be shown that* 

$$\left\langle w + q \nabla w, \gamma(e) \right\rangle \le (1 + 2q) \left\| w \right\|\_2. \tag{40}$$

*It follows from (38) – (40) that* <sup>2</sup> 2 ( )*e ≤ (1 +* <sup>1</sup> <sup>2</sup> 2 ) *q w ; hence* ( ) ( ) *<sup>i</sup> k e* <sup>2</sup> *. Since the gain of G(z) is finite, it also follows that* ( ) ()( ) *<sup>i</sup> Gz ek* <sup>2</sup> *. �*

If *G*(*z*) is stable and bounded on the unit circle, then the test condition (38) becomes

$$\mathcal{A}\_{\min} \{ I + q(I - z^{-1}I)(G(z) + G^H(z)) + \mathcal{J}^{-1} \} \ge \delta \tag{41}$$

see pp. 175 and 194 of [11].

#### **10.3.5 Applications**

*Example 2 [10]***.** Consider a unity-amplitude frequency modulated (FM) signal modelled as *k*+1 = *k* + *wk*, *k*+1 = *k* + *k*, (1) *<sup>k</sup> z* = cos(*k*) + (1) *<sup>k</sup> v* and (2) *<sup>k</sup> z* = sin(*k*) + (2) *<sup>k</sup> v* . The error system for an FM demodulator may be written as

$$
\begin{bmatrix}
\tilde{\boldsymbol{\sigma}}\_{k+1} \\
\tilde{\boldsymbol{\phi}}\_{k+1}
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{\mu}\_{\boldsymbol{\alpha}} & \boldsymbol{0} \\
\boldsymbol{1} & \boldsymbol{1}
\end{bmatrix} \begin{bmatrix}
\tilde{\boldsymbol{\alpha}}\_{k} \\
\tilde{\boldsymbol{\phi}}\_{k}
\end{bmatrix} - \begin{bmatrix}
\boldsymbol{K}\_{1} \\
\boldsymbol{K}\_{2}
\end{bmatrix} \sin(\tilde{\boldsymbol{\phi}}\_{k}) + \boldsymbol{w}\_{k} \tag{42}
$$

for gains *K*1, *K*<sup>2</sup> to be designed. In view of the form (36), the above error system is reformatted as

$$
\begin{bmatrix}
\tilde{\boldsymbol{\theta}}\_{k+1} \\
\tilde{\boldsymbol{\phi}}\_{k+1}
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{\mu}\_{o} & -\boldsymbol{\mathcal{K}}\_{1} \\
\boldsymbol{1} & \boldsymbol{1} - \boldsymbol{\mathcal{K}}\_{2}
\end{bmatrix} \begin{bmatrix}
\tilde{\boldsymbol{\phi}}\_{k} \\
\tilde{\boldsymbol{\phi}}\_{k}
\end{bmatrix} + \mathcal{\boldsymbol{\gamma}} \begin{bmatrix}
\boldsymbol{0} & \boldsymbol{1} \\
\boldsymbol{0} & \boldsymbol{1}
\end{bmatrix} \begin{bmatrix}
\tilde{\boldsymbol{\phi}}\_{k} \\
\tilde{\boldsymbol{\phi}}\_{k}
\end{bmatrix} + \boldsymbol{w}\_{k} \tag{43}
$$

where *γ*(*x*) = *x* – sin(*x*). The z-transform of the linear part of (43) is *G*(*z*) = (*K*2*z* + *K*2 + *K*1 ) (*z*2 + (*K*2 – 1 – )*z* + *K*1 + 1 – *K*2)-1. The nonlinearity satisfies the sector condition (37) for = 1.22. Candidate gains may be assessed by checking that *G*(*z*) is stable and the test condition (41). The stable gain space calculated for the case of = 0.9 is plotted in Fig. 3. The gains are required to lie within the shaded region of the plot for the error system (42) to be asymptotically stable.

<sup>&</sup>quot;He that does not offend cannot be honest." *Thomas Paine*

Smoothing, Filtering and Prediction:

*e* <sup>2</sup> *. Since the gain of G(z)* 

*<sup>k</sup> v* . The error system

*K*1 )

= 0.9 is plotted in Fig. 3.

≤ <sup>2</sup> (1 2 ) *q w* . (40)

*k* 

<sup>2</sup> *. �*

 

<sup>256</sup> Estimating the Past, Present and Future

*0. Using* <sup>2</sup> *<sup>f</sup> <sup>w</sup> ≤ 2* <sup>2</sup> *w (from p. 192 of [11]), the Schwartz inequality and the triangle inequality,* 

<sup>2</sup> 2 ) *q* 

*w ; hence* ( ) ( ) *<sup>i</sup>*

*<sup>k</sup> v* and (2)

 , (41)

*<sup>k</sup> z* = sin(

*k*

*w*

*K*2)-1. The nonlinearity satisfies the sector condition (37) for

*k k*

, (43)

*w*

(42)

*k*) + (2)

*w qw e* , () 

> 2

*I qI z I Gz G z*

*<sup>k</sup> z* = cos(

1 1 1 2 0

*K*

*K*

*k k k*

 

*k k k*

*k k*

*k k*

*k* + *k*, (1)

1 1 1 2

1 1

condition (41). The stable gain space calculated for the case of

( )*e ≤ (1 +* <sup>1</sup>

min{ ( )( ( ) ( )) } *<sup>H</sup>*

If *G*(*z*) is stable and bounded on the unit circle, then the test condition (38) becomes

*k*) + (1)

sin( ) 1 1

1 1

*Example 2 [10]***.** Consider a unity-amplitude frequency modulated (FM) signal modelled as

*K*

*K*

for gains *K*1, *K*<sup>2</sup> to be designed. In view of the form (36), the above error system is

= 1.22. Candidate gains may be assessed by checking that *G*(*z*) is stable and the test

The gains are required to lie within the shaded region of the plot for the error system (42) to

where *γ*(*x*) = *x* – sin(*x*). The z-transform of the linear part of (43) is *G*(*z*) = (*K*2*z* + *K*2 +

0 1

*it can be shown that* 

*It follows from (38) – (40) that* <sup>2</sup>

*is finite, it also follows that* ( ) ()( ) *<sup>i</sup> Gz ek*

*k*+1 = 

for an FM demodulator may be written as

 

)*z* + *K*1 + 1 –

"He that does not offend cannot be honest." *Thomas Paine*

be asymptotically stable.

 

see pp. 175 and 194 of [11].

**10.3.5 Applications** 

reformatted as

(*z*2 + (*K*2 – 1 –

*k*+1 = *k* + *wk*, Figure 4. Demodulation performance for Example 2: (i) EKF and (ii) Nonlinear observer.

Figure 5. Demodulation performance for Example 3: (i) EKF and (ii) Nonlinear observer.

A speech utterance, namely, the phrase "Matlab is number one", was sampled at 8 kHz and used to synthesize a unity-amplitude FM signal. An EKF demodulator was constructed for the above model with <sup>2</sup> *<sup>w</sup>* = 0.02. In a nonlinear observer design it was found that suitable

parameter choices were k = 0.001 0.08 0.08 0.7 . The nonlinear observer gains were censored at

each time *k* according to the stable gain space of Fig. 3. The results of a simulation study using 100 realisations of Gaussian measurement noise sequences are shown in Fig. 4. The figure demonstrates that enforcing stability can be beneficial at low SNR, at the cost of degraded high-SNR performance.

*Example 3 [10]***.** Suppose that there are two superimposed FM signals present in the same frequency channel. Neglecting observation noise, a suitable approximation of the demodulator error system in the form (36) is given by

$$
\begin{bmatrix}
\tilde{\boldsymbol{\phi}}\_{k+1}^{(1)} \\
\tilde{\boldsymbol{\phi}}\_{k+1}^{(1)} \\
\tilde{\boldsymbol{\phi}}\_{k+1}^{(2)} \\
\tilde{\boldsymbol{\phi}}\_{k+1}^{(2)}
\end{bmatrix} = (\boldsymbol{A} - \boldsymbol{K}\_{k} \overline{\mathbf{C}}) \begin{bmatrix}
\tilde{\boldsymbol{\phi}}\_{k}^{(1)} \\
\tilde{\boldsymbol{\phi}}\_{k}^{(1)} \\
\tilde{\boldsymbol{\phi}}\_{k}^{(2)} \\
\tilde{\boldsymbol{\phi}}\_{k}^{(2)}
\end{bmatrix} - \boldsymbol{K}\_{k} \begin{bmatrix}
\sin(\tilde{\boldsymbol{\phi}}\_{k}^{(1)}) - \tilde{\boldsymbol{\phi}}\_{k}^{(1)} \\
\sin(\tilde{\boldsymbol{\phi}}\_{k}^{(2)}) - \tilde{\boldsymbol{\phi}}\_{k}^{(2)}
\end{bmatrix},\tag{44}
$$

where *A* = diag(*A*(1), *A*(1)), *A*(1) = 0 1 1 , *C* = 0100 0001 . The linear part of (44) may be

written as *G*(*z*) = *C zI* ( – (*A* – <sup>1</sup> )) *KC K k k* . Two 8-kHz speech utterances, "Matlab is number one" and "Number one is Matlab", centred at ±0.25 rad/s, were used to synthesize two superimposed unity-amplitude FM signals. Simulations were conducted using 100 realisations of Gaussian measurement noise sequences. The test condition (41) was

<sup>&</sup>quot;To avoid criticism, do nothing, say nothing, be nothing." *Elbert Hubbard*

/ 1 ˆ *C xk kk* .

<sup>2</sup> (.) and <sup>3</sup> (.) to zero as

**10.4.2 Robust Solution** 

"Fight the good fight." *Timothy 4:7*

The Taylor series expansions of the nonlinear functions *ak*(.), *bk*(.) and *ck*(.) about filtered and

where <sup>1</sup> (.) , <sup>2</sup> (.) , <sup>3</sup> (.) are uncertainties that account for the higher order terms, *k k*/ *x* = *xk* – / ˆ *k k x* and *k k*/ 1 *x* = *xk* – / 1 ˆ *k k x* . It is assumed that <sup>1</sup> (.) , <sup>2</sup> (.) and <sup>3</sup> (.) are continuous

,

Note that the first-order EKF for the above system arises by setting the uncertainties <sup>1</sup> (.) ,

,

1

1

*<sup>k</sup>* = / ( ) ˆ *k kk a x* – / ˆ *A xk kk* and *πk* = / 1 ( ) ˆ *k kk c x* –

(45)

(46)

(47)

(48)

(49)

(50)

(51)

(52)

(53)

(54)

(55)

(56)

/ / / 1/ ( ) ( ) ( )( ) ( ) ˆ ˆˆ *k k k kk k kk k kk k k ax ax ax x x x* ,

/ 2/ () ( ) ( ) ˆ *k k k kk k k bx bx x* ,

/ 1 / 1 /1 3 /1 ( ) ( ) ( )( ) ( ) ˆ ˆ ˆ *k k k kk k kk k kk k k cx cx cx x x x* ,

operators mapping 2 2 , with H∞ norms bounded by *δ*1, *δ*2 and *δ*3, respectively.

Substituting (45) – (47) into the nonlinear system (2), (7) gives the linearised system <sup>1</sup> 1/ 2/ () () *k kk k k k k k kk k x Ax Bw x x w* 

> 3 /1 ( ) *k kk k kk k z Cx x v*

/ /1 / 1 ˆ ˆ ( ( )) ˆ *kk kk k k k kk x x Lz cx* ,

1/ / ˆ ˆ( ) *k k k kk x ax* ,

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> L P C CP C R k kk k k kk k k* ,

/ /1 /1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> P P P C CP C R P C kk kk kk k k kk k k kk k* ,

*<sup>T</sup> <sup>T</sup> P AP A BQB k k k kk k k k k* .

*k kk k k k k* <sup>1</sup> *x Ax Bw s*

*k kk k k k z Cx v t* 

Following the approach in Chapter 9, instead of addressing the problem (48) – (49) which

,

,

1/ / 1

possesses uncertainties, an auxiliary H∞ problem is defined as

,

predicted estimates / ˆ *k k x* and / 1 ˆ *k k x* may be written as

where *Ak* = / <sup>ˆ</sup> ( ) *k k <sup>k</sup> x x a x* , *Ck* = / 1 <sup>ˆ</sup> ( ) *k k <sup>k</sup> x x c x*

evaluated at each time *k* for the above parameter values with *β* = 1.2, *q* = 0.001, *δ* = 0.82 and used to censor the gains. The resulting co-channel demodulation performance is shown in Fig. 5. It can be seen that the nonlinear observer significantly outperforms the EKF at high SNR.

Two mechanisms have been observed for occurrence of outliers or faults within the cochannel demodulators. Firstly errors can occur in the state attribution, that is, there is correct tracking of some component speech message segments but the tracks are inconsistently associated with the individual signals. This is illustrated by the example frequency estimate tracks shown in Figs. 6 and 7. The solid and dashed lines in the figures indicate two sample co-channel frequency tracks. Secondly, the phase unwrapping can be erroneous so that the frequency tracks bear no resemblance to the underlying messages. These faults can occur without any significant deterioration in the error residual.

Figure 6. Sample EKF frequency tracks for Example 3. Figure 7. Sample Nonlinear observer frequency

tracks for Example 3.

The EKF demodulator is observed to be increasingly fault prone at higher SNR. This arises because lower SNR designs possess narrower bandwidths and so are less sensitive to nearby frequency components. The figures also illustrate the trade-off between stability and optimality. In particular, it can be seen from Fig. 6, that the sample EKF speech estimates exhibit faults in the state attribution. This contrasts with Fig. 7, where the nonlinear observer's estimates exhibit stable state attribution at the cost of degraded speech fidelity.

#### **10.4 Robust Extended Kalman Filtering**

#### **10.4.1 Nonlinear Problem Statement**

Consider again the nonlinear, discrete-time signal model (2), (7). It is shown below that the H∞ techniques of Chapter 9 can be used to recast nonlinear filtering problems into a model uncertainty setting. The following discussion attends to state estimation, that is, *C*1,*<sup>k</sup>* = *I* is assumed within the problem and solution presented in Section 9.3.2.

<sup>&</sup>quot;You have enemies? Good. That means you've stood up for something, sometime in your life." *Winston Churchill*

Smoothing, Filtering and Prediction:

2.7 2.8 2.9 <sup>3</sup> −1

time, ms

<sup>258</sup> Estimating the Past, Present and Future

evaluated at each time *k* for the above parameter values with *β* = 1.2, *q* = 0.001, *δ* = 0.82 and used to censor the gains. The resulting co-channel demodulation performance is shown in Fig. 5. It can be seen that the nonlinear observer significantly outperforms the EKF at high

Two mechanisms have been observed for occurrence of outliers or faults within the cochannel demodulators. Firstly errors can occur in the state attribution, that is, there is correct tracking of some component speech message segments but the tracks are inconsistently associated with the individual signals. This is illustrated by the example frequency estimate tracks shown in Figs. 6 and 7. The solid and dashed lines in the figures indicate two sample co-channel frequency tracks. Secondly, the phase unwrapping can be erroneous so that the frequency tracks bear no resemblance to the underlying messages. These faults can occur

−0.5

tracks for Example 3.

0

frequency, rads/sec

Figure 6. Sample EKF frequency tracks for Example 3. Figure 7. Sample Nonlinear observer frequency

The EKF demodulator is observed to be increasingly fault prone at higher SNR. This arises because lower SNR designs possess narrower bandwidths and so are less sensitive to nearby frequency components. The figures also illustrate the trade-off between stability and optimality. In particular, it can be seen from Fig. 6, that the sample EKF speech estimates exhibit faults in the state attribution. This contrasts with Fig. 7, where the nonlinear observer's estimates exhibit stable state attribution at the cost of degraded speech fidelity.

Consider again the nonlinear, discrete-time signal model (2), (7). It is shown below that the H∞ techniques of Chapter 9 can be used to recast nonlinear filtering problems into a model uncertainty setting. The following discussion attends to state estimation, that is, *C*1,*<sup>k</sup>* = *I* is

"You have enemies? Good. That means you've stood up for something, sometime in your life." *Winston* 

assumed within the problem and solution presented in Section 9.3.2.

0.5

1

without any significant deterioration in the error residual.

2.7 2.8 2.9 <sup>3</sup> −1

**10.4 Robust Extended Kalman Filtering** 

**10.4.1 Nonlinear Problem Statement** 

time, ms

SNR.

−0.5

0

frequency, rads/sec

*Churchill*

0.5

1

The Taylor series expansions of the nonlinear functions *ak*(.), *bk*(.) and *ck*(.) about filtered and predicted estimates / ˆ *k k x* and / 1 ˆ *k k x* may be written as

$$a\_k(\mathbf{x}\_k) = a\_k(\hat{\mathbf{x}}\_{k/k}) + \nabla a\_k(\hat{\mathbf{x}}\_{k/k})(\mathbf{x}\_k - \hat{\mathbf{x}}\_{k/k}) + \Delta\_l(\tilde{\mathbf{x}}\_{k/k})\,. \tag{45}$$

$$b\_k(\mathbf{x}\_k) = b\_k(\hat{\mathbf{x}}\_{k/k}) + \Delta\_2(\tilde{\mathbf{x}}\_{k/k}) \, , \tag{46}$$

$$\mathbf{c}\_{k}(\mathbf{x}\_{k}) = \mathbf{c}\_{k}(\hat{\mathbf{x}}\_{k/k-1}) + \nabla \mathbf{c}\_{k}(\hat{\mathbf{x}}\_{k/k-1})(\mathbf{x}\_{k} - \hat{\mathbf{x}}\_{k/k-1}) + \Delta\_{3}(\tilde{\mathbf{x}}\_{k/k-1})\,\,\,\tag{47}$$

where <sup>1</sup> (.) , <sup>2</sup> (.) , <sup>3</sup> (.) are uncertainties that account for the higher order terms, *k k*/ *x* = *xk* – / ˆ *k k x* and *k k*/ 1 *x* = *xk* – / 1 ˆ *k k x* . It is assumed that <sup>1</sup> (.) , <sup>2</sup> (.) and <sup>3</sup> (.) are continuous operators mapping 2 2 , with H∞ norms bounded by *δ*1, *δ*2 and *δ*3, respectively.

Substituting (45) – (47) into the nonlinear system (2), (7) gives the linearised system

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{w}\_k + \mu\_k + \Delta\_1(\tilde{\mathbf{x}}\_{k/k}) + \Delta\_2(\tilde{\mathbf{x}}\_{k/k}) \mathbf{w}\_{k'} \tag{48}$$

$$\mathbf{z}\_k = \mathbf{C}\_k \mathbf{x}\_k + \boldsymbol{\pi}\_k + \boldsymbol{\Delta}\_3(\tilde{\mathbf{x}}\_{k/k-1}) + \boldsymbol{\upsilon}\_{k \text{ \textquotedblleft}} \tag{49}$$

where *Ak* = / <sup>ˆ</sup> ( ) *k k <sup>k</sup> x x a x* , *Ck* = / 1 <sup>ˆ</sup> ( ) *k k <sup>k</sup> x x c x* , *<sup>k</sup>* = / ( ) ˆ *k kk a x* – / ˆ *A xk kk* and *πk* = / 1 ( ) ˆ *k kk c x* – / 1 ˆ *C xk kk* .

Note that the first-order EKF for the above system arises by setting the uncertainties <sup>1</sup> (.) , <sup>2</sup> (.) and <sup>3</sup> (.) to zero as

$$
\hat{\mathfrak{X}}\_{k/k} = \hat{\mathfrak{X}}\_{k/k-1} + L\_k(\mathbf{z}\_k - \mathbf{c}\_k(\hat{\mathfrak{X}}\_{k/k-1})) \, \tag{50}
$$

$$
\hat{\mathfrak{X}}\_{k+1/k} = a\_k(\hat{\mathfrak{X}}\_{k/k})\,\,\,\,\,\,\,\tag{51}
$$

$$L\_k = P\_{k/k=1} \mathbb{C}\_k^T (\mathbb{C}\_k P\_{k/k=1} \mathbb{C}\_k^T + R\_k)^{-1},\tag{52}$$

$$P\_{k/k} = P\_{k/k-1} - P\_{k/k-1} \mathbf{C}\_k^T (\mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^T + R\_k)^{-1} P\_{k/k-1} \mathbf{C}\_k \tag{53}$$

$$P\_{k+1/k} = A\_k P\_{k/k-1} A\_k^T + B\_k Q\_k B\_k^T \,. \tag{54}$$

#### **10.4.2 Robust Solution**

Following the approach in Chapter 9, instead of addressing the problem (48) – (49) which possesses uncertainties, an auxiliary H∞ problem is defined as

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{w}\_k + \boldsymbol{\mu}\_k + \mathbf{s}\_k \tag{55}$$

$$
\pi\_k = \mathbf{C}\_k \mathbf{x}\_k + \pi\_k + \boldsymbol{\upsilon}\_k + \mathbf{t}\_k \tag{56}
$$

<sup>&</sup>quot;Fight the good fight." *Timothy 4:7*

*and* 

2 / 1 / 1

below.

*k k kk k*

*P I PC CP R CP C*

/ 1 / 1

 

*k kk k k kk k*

*T*

*T*

0

**Example 4 [12].** Suppose that an FM signal is generated by17

*<sup>k</sup>*

respectively. Simulations were conducted with *μω* = 0.9, *μ*

17"Happy is he who gets to know the reasons for things." *Virgil*

*<sup>k</sup>*<sup>1</sup> *k k w* ,

10

20

Cummulative Frequency

and first-order robust EKF (dotted line).

was found for <sup>2</sup>

30

40

2 2 2 2 2 2 / <sup>2</sup> <sup>2</sup> <sup>2</sup> ( ) *k k w k vk x cw cv*

/ 1 / 1 / /1 /1 / 1

 

*T k k kk k kk kk kk k T k k*

/ 1 / 1

. *�*

*T*

> 0 and *Pk*/*<sup>k</sup>*-1 > 0 over *k* [1, *N*]. An illustration is provided

(i)

(65)

(66)

(67)

(68)

*<sup>k</sup> z* ,

*<sup>k</sup> z* and (2)

= 0.001. It

<sup>1</sup> <sup>2</sup>

*k kk k k kk k k P I PC I*

*CP R CP C C*

The robust first-order extended Kalman filter for state estimation is given by (50) – (52),

*P P P IC P*

(ii)

and (54). As discussed in Chapter 9, a search is required for a minimum *γ* such that

−20 −10 0 10

Figure 8. Histogram of demodulator mean-square-error for Example 4: (i) first-order EKF (solid line)

<sup>1</sup> arctan( )

(1) (1) cos( ) *<sup>k</sup> k k z v* ,

(2) (2) sin( ) *<sup>k</sup> k k z v* .

message *ωk* from the noisy in-phase and quadrature measurements (1)

 *k k* ,

The objective is to construct an FM demodulator that produces estimates of the frequency

*<sup>w</sup>* < 0.1, where the state behaviour is almost linear, a robust EKF does not

= 0.99 and (1)

2 *v* 

 = (2) 2 *v* 

MSE, dB

$$
\tilde{\mathfrak{X}}\_{k/k} = \mathfrak{X}\_k - \hat{\mathfrak{X}}\_{k/k} \tag{57}
$$

where *sk* = 1 / ( ) *k k x* + 2 / ( ) *kk k x w* and *tk* = 3 / *k k x* ≈ 3 /1 *k k x* are additional exogenous inputs satisfying

$$\left\|\boldsymbol{s}\_{k}\right\|\_{2}^{2} \leq \left\delta\_{1}^{2} \left\|\widetilde{\boldsymbol{x}}\_{k/k}\right\|\_{2}^{2} + \left\|\delta\_{2}^{2}\right\|\boldsymbol{w}\_{k}\right\|\_{2}^{2} \tag{58}$$

$$\left\|\boldsymbol{t}\_{k}\right\|\_{2}^{2} \leq \delta\_{3}^{2} \left\|\widetilde{\boldsymbol{x}}\_{k/k}\right\|\_{2}^{2} \leq \delta\_{3}^{2} \left\|\widetilde{\boldsymbol{x}}\_{k/k-1}\right\|\_{2}^{2}.\tag{59}$$

A sufficient solution to the auxiliary H∞ problem (55) – (57) can be obtained by solving another problem in which *wk* and *vk* are scaled in lieu of the additional inputs *sk* and *rk*. The scaled H∞ problem is defined by

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{c}\_w w\_k + \boldsymbol{\mu}\_k \tag{60}$$

$$\mathbf{z}\_k = \mathbf{C}\_k \mathbf{x}\_k + \mathbf{c}\_v \mathbf{v}\_k + \boldsymbol{\pi}\_k \tag{61}$$

$$
\tilde{\mathbf{x}}\_{k/k} = \mathbf{x}\_k - \hat{\mathbf{x}}\_{k/k} \tag{62}
$$

where *cw*, *cv* are to be found.

*Lemma 2 [12]: The solution of the H∞ problem (60) – (62), where vk is scaled by* 

$$
\sigma\_v^2 = 1 - \gamma^2 \delta\_1^2 - \gamma^2 \delta\_3^2 \tag{63}
$$

*and wk is scaled by* 

$$\text{c}\_{\text{w}}^{2} = \text{c}\_{\text{v}}^{2} (\text{l} + \text{\textdegree}\_{2}^{2})^{-1} \text{ \textdegree} \tag{64}$$

*is sufficient for the solution of the auxiliary H∞ problem (55) – (57).* 

*Proof: If the H∞ problem (50) – (52) has been solved then there exists a* 0 *such that* 

$$\begin{aligned} \left\|\tilde{\mathbf{x}}\_{k/k}\right\|\_{2}^{2} &\leq \gamma^{2} \left(\left\|\boldsymbol{w}\_{k}\right\|\_{2}^{2} + \left\|\boldsymbol{s}\_{k}\right\|\_{2}^{2} + \left\|\boldsymbol{t}\_{k}\right\|\_{2}^{2} + \left\|\boldsymbol{w}\_{k}\right\|\_{2}^{2}\right) \\ &\leq \gamma^{2} \left(\left\|\boldsymbol{w}\_{k}\right\|\_{2}^{2} + \delta\_{1}^{2} \left\|\tilde{\boldsymbol{x}}\_{k/k}\right\|\_{2}^{2} + \delta\_{2}^{2} \left\|\boldsymbol{w}\_{k}\right\|\_{2}^{2} + \delta\_{3}^{2} \left\|\tilde{\boldsymbol{x}}\_{k/k}\right\|\_{2}^{2} + \left\|\boldsymbol{v}\_{k}\right\|\_{2}^{2}\right), \end{aligned}$$

*which implies* 

$$\left\| \left( 1 - \mathcal{Y}^2 \delta\_1^2 - \mathcal{Y}^2 \delta\_3^2 \right) \right\| \tilde{\mathbf{x}}\_{k/k} \right\|\_2^2 \le \mathcal{Y}^2 (\left( 1 + \delta\_2^2 \right) \left\| \leftvec{w}\_k \right\|\_2^2 + \left\| \leftvec{v}\_k \right\|\_2^2) $$

<sup>&</sup>quot;You can't wait for inspiration. You have to go after it with a club." *Jack London*

*and* 

Smoothing, Filtering and Prediction:

(58) (59)

(60) (61) (62)

/ / ˆ *kk k kk x xx* , (57)

, (63)

0 *such that* 

, (64)

1/ 2 3 / <sup>2</sup> <sup>2</sup> <sup>2</sup> 2 2 ( ) *w x w xv <sup>k</sup> k k <sup>k</sup> kk k*

,

<sup>260</sup> Estimating the Past, Present and Future

where *sk* = 1 / ( ) *k k x* + 2 / ( ) *kk k x w* and *tk* = 3 / *k k x* ≈ 3 /1 *k k x* are additional exogenous inputs

<sup>2</sup> <sup>2</sup> *wk* ,

3 /1 2 *k k*

,

,

2 222 2 2 / <sup>2</sup> 222 2 ( ) *k k kkk k x wstv*

> 

2 2 2 22 22 2 2 1 3/ <sup>2</sup> <sup>2</sup> 2 2 (1 ) ((1 ) ) *k k k k*

 *x w v*

 

*x* + <sup>2</sup> <sup>2</sup>

*x* ≤ <sup>2</sup> <sup>2</sup>

*k k k kw k k* <sup>1</sup> *x Ax Bc w*

*k kk vk k z Cx cv*

/ / ˆ *kk k kk x xx* ,

2 22 22 1 3 1 *<sup>v</sup> c* 

2 2 21 <sup>2</sup> (1 ) *w v c c* 

<sup>2</sup> <sup>2</sup> <sup>2</sup> 2 2 <sup>2</sup> <sup>2</sup> <sup>2</sup> <sup>2</sup>

*Lemma 2 [12]: The solution of the H∞ problem (60) – (62), where vk is scaled by* 

*is sufficient for the solution of the auxiliary H∞ problem (55) – (57).* 

 

"You can't wait for inspiration. You have to go after it with a club." *Jack London*

*Proof: If the H∞ problem (50) – (52) has been solved then there exists a* 

*x* .

A sufficient solution to the auxiliary H∞ problem (55) – (57) can be obtained by solving another problem in which *wk* and *vk* are scaled in lieu of the additional inputs *sk* and *rk*. The

2 <sup>2</sup> *<sup>k</sup> <sup>s</sup>* <sup>≤</sup> <sup>2</sup> <sup>2</sup> 1 / 2 *k k* 

2

scaled H∞ problem is defined by

where *cw*, *cv* are to be found.

*and wk is scaled by* 

*which implies* 

<sup>2</sup> *<sup>k</sup> <sup>t</sup>* <sup>≤</sup> <sup>2</sup> <sup>2</sup> 3 / 2 *k k* 

satisfying

$$\left\|\tilde{\boldsymbol{x}}\_{k/k}\right\|\_{2}^{2} \leq \gamma^{2} \left(\boldsymbol{c}\_{w}^{-2} \left\|\boldsymbol{w}\_{k}\right\|\_{2}^{2} + \boldsymbol{c}\_{v}^{-2} \left\|\boldsymbol{v}\_{k}\right\|\_{2}^{2}\right) \,. \tag{1}$$

The robust first-order extended Kalman filter for state estimation is given by (50) – (52),

$$P\_{k/k} = P\_{k/k-1} - P\_{k/k-1} \begin{bmatrix} I & \mathbf{C}\_k^T \end{bmatrix} \begin{bmatrix} P\_{k/k-1} - \boldsymbol{\mathcal{Y}}^2 I & P\_{k/k-1} \mathbf{C}\_k^T \\ \mathbf{C}\_k P\_{k/k-1} & \mathbf{R}\_k + \mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^T \end{bmatrix}^{-1} \begin{bmatrix} I \\ \mathbf{C}\_k \end{bmatrix} P\_{k/k-1}$$

and (54). As discussed in Chapter 9, a search is required for a minimum *γ* such that 2 / 1 / 1 / 1 / 1 *T k k kk k T k kk k k kk k P I PC CP R CP C* > 0 and *Pk*/*<sup>k</sup>*-1 > 0 over *k* [1, *N*]. An illustration is provided below.

Figure 8. Histogram of demodulator mean-square-error for Example 4: (i) first-order EKF (solid line) and first-order robust EKF (dotted line).

**Example 4 [12].** Suppose that an FM signal is generated by17

$$
\mu \alpha\_{k+1} = \mu\_{\alpha} \alpha\_{k} + w\_{k} \tag{65}
$$

$$\phi\_{k+1} = \arctan(\mu\_{\phi}\phi\_k + o\_k) \, , \tag{66}$$

$$\begin{aligned} \varphi\_k^{(l)} &= \cos(\phi\_k) + \upsilon\_k^{(l)} \end{aligned} \tag{67}$$

$$
\omega\_k^{(2)} = \sin(\phi\_k) + \upsilon\_k^{(2)}.\tag{68}
$$

The objective is to construct an FM demodulator that produces estimates of the frequency message *ωk* from the noisy in-phase and quadrature measurements (1) *<sup>k</sup> z* and (2) *<sup>k</sup> z* , respectively. Simulations were conducted with *μω* = 0.9, *μ* = 0.99 and (1) 2 *v* = (2) 2 *v* = 0.001. It was found for <sup>2</sup> *<sup>w</sup>* < 0.1, where the state behaviour is almost linear, a robust EKF does not

<sup>17&</sup>quot;Happy is he who gets to know the reasons for things." *Virgil*

reversed transpose of the result to obtain *βk*.

**10.5.2 Robust Smoother** 

2 / 1 / 1 / 1 / 1

 

**10.5.3 Application** 

0

1 *<sup>w</sup> A*

*μ*

*CP C I CP C CP C R CP C*

*T T k kk k k kk k*

*T T k kk k k k kk k*

sampled at 8 kHz, for which estimates ˆ

The innovations within Steps 1 and 2 are given by

Step 3. Calculate the smoothed output estimate from

be realised by replacing the error covariance correction (72) by

, *B* = 1 0 , (1) (1) cos( ) *<sup>k</sup> k k z*

Step 2. Operate (69) – (71) on the time-reversed transpose of *αk*. Then take the time-

From the arguments within Chapter 9, a smoother that is robust to uncertain *wk* and *vk* can

/ /1 /1 / 1

*kk kk kk k k T T k k*

within Procedure 1. As discussed in Chapter 9, a search for a minimum *γ* such that

Returning to the problem of demodulating a unity-amplitude FM signal, let *xk* = *<sup>k</sup>*

denote the instantaneous frequency message, instantaneous phase, complex observations and measurement noise respectively. A zero-mean voiced speech utterance "a e i o u" was

serves as a benchmark and as an auxiliary frequency measurement for the above smoother.

respectively. A unity-amplitude FM signal was synthesized using

"The farther the experiment is from theory, the closer it is to the Nobel Prize." *Irène Joliot-Curie*

 = 0.99 and the SNR was varied in 1.5 dB steps from 3 dB to 15 dB. The mean-square errors were calculated over 200 realisations of Gaussian measurement noise and are shown in Fig. 9. It can be seen from the figure, that at 7.5 dB SNR, the first-order EKF improves on the FM discriminator MSE by about 12 dB. The improvement arises because the EKF

 (2) (1) <sup>1</sup> (3) (1) (2) (1) 2 (1) 2 () () *<sup>k</sup> <sup>k</sup> k k k k k dz dz zz z z z dt dt*

 

expectation maximization algorithm. An FM discriminator output [13],

*P P P CC P*

. (73)

*k* 

*<sup>k</sup>*, *zk* and *vk*

*<sup>w</sup>* = 0.053 were obtained using an

and

, (74)

(1) (2) (2) (2) (3) (1)

*k k k k k k*

cos( ) ˆ sin( ) ˆ ˆ

*x x x*

 ,

<sup>1</sup> <sup>2</sup>

/ 1 / 1

 

/ 1 / 1

> 0 and *Pk*/*<sup>k</sup>*-1 > 0 over *k* [1, *N*] is desired.

*v* , (2) (2) sin( ) *<sup>k</sup> k k z*

= 0.97 and <sup>2</sup> ˆ

(1) (2) (2) (2) (3) (1)

*k k k k k k*

*z x z x z x*

cos( ) ˆ sin( ) ˆ ˆ

*v* , where *ωk*,

*k kk k k k kk k k*

*CP C R CP C C*

*T T T T k kk k k kk k k*

*CP C I CP C C*

/ ˆ *kN k k k y zR*

improve on the EKF. However, when <sup>2</sup> *<sup>w</sup>* = 1, the problem is substantially nonlinear and a performance benefit can be observed. A robust EKF demodulator was designed with

$$A\_k = \begin{bmatrix} \phi\_k \\ o\_k \end{bmatrix}, A\_k = \left\lfloor \frac{\mu\_\phi}{(\mu\_\phi \hat{\phi}\_{k/k} + \hat{o}\_{k/k})^2 + \mathbf{l}} \cdot \frac{\mu\_\phi}{(\mu\_\phi \hat{\phi}\_{k/k} + \hat{o}\_{k/k})^2 + \mathbf{l}} \right\rfloor, C\_k = \begin{bmatrix} -\sin(\hat{\phi}\_{k/k-1}) & \mathbf{0} \\ \cos(\hat{\phi}\_{k/k-1}) & \mathbf{0} \end{bmatrix},$$

*δ*1 = 0.1, *δ*2 = 4.5 and *δ*3 = 0.001. It was found that *γ* = 1.38 was sufficient for *Pk*/*<sup>k</sup>*-1 of the above Riccati difference equation to always be positive definite. A histogram of the observed frequency estimation error is shown in Fig. 8, which demonstrates that the robust demodulator provides improved mean-square-error performance. For sufficiently large <sup>2</sup> *<sup>w</sup>* , the output of the above model will resemble a digital signal, in which case a detector may outperform a demodulator.

#### **10.5 Nonlinear Smoothing**

#### **10.5.1 Approximate Minimum-Variance Smoother**

Consider again a nonlinear estimation problem where *xk*+1 = *ak*(*xk*) + *Bkwk*, *zk* = *ck*(*xk*) + *vk*, with *xk* , in which the nonlinearities *ak*(.), *ck*(.) are assumed to be smooth, differentiable functions of appropriate dimension. The linearisations akin to Extended Kalman filtering may be applied within the smoothers described in Chapter 7 in the pursuit of performance improvement. The fixed-lag, Fraser-Potter and Rauch-Tung-Striebel smoother recursions are easier to apply as they are less complex. The application of the minimum-variance smoother can yield approximately optimal estimates when the problem becomes linear, provided that the underlying assumptions are correct.

*Procedure 1.* An approximate minimum-variance smoother for output estimation can be implemented via the following three-step procedure.

Step 1. Operate

$$\alpha\_k = -\Omega\_k^{1/2} \left( z\_k - c\_k(\hat{\mathbf{x}}\_{k/k-1}) \right) \, \tag{69}$$

$$
\hat{\mathfrak{X}}\_{k/k} = \hat{\mathfrak{x}}\_{k/k-1} + L\_k(\mathbf{z}\_k - \mathbf{c}\_k(\hat{\mathfrak{x}}\_{k/k-1})) \, \, \, \, \tag{70}
$$

$$
\hat{\mathfrak{X}}\_{k+1/k} = a\_k(\hat{\mathfrak{x}}\_{k/k})\,\,\,\,\,\,\,\,\tag{71}
$$

on the measurement *zk*, where *Lk* = <sup>1</sup> / 1 *<sup>T</sup> P C kk k k* ,

$$\begin{array}{c} \boldsymbol{\Omega}\_{k} = \boldsymbol{C}\_{k} \boldsymbol{P}\_{k/k-1} \boldsymbol{C}\_{k}^{\top} + \boldsymbol{R}\_{k} \boldsymbol{\prime} \\\\ \boldsymbol{P}\_{k/k} = \boldsymbol{P}\_{k/k-1} - \boldsymbol{P}\_{k/k-1} \boldsymbol{C}\_{k}^{\top} \boldsymbol{\Omega}\_{k}^{-1} \boldsymbol{C}\_{k} \boldsymbol{P}\_{k/k-1} \boldsymbol{\prime} \\\\ \boldsymbol{P}\_{k+1/k} = \boldsymbol{A}\_{k} \boldsymbol{P}\_{k/k} \boldsymbol{A}\_{k}^{\top} + \boldsymbol{B}\_{k} \boldsymbol{Q}\_{k} \boldsymbol{B}\_{k}^{\top} \boldsymbol{\prime} \\\\ \boldsymbol{A}\_{k} = \frac{\boldsymbol{\partial}a\_{k}}{\boldsymbol{\gamma}} \bigg| \qquad \text{and } \boldsymbol{C}\_{k} = \frac{\boldsymbol{\partial}c\_{k}}{\boldsymbol{\gamma}} \bigg| \qquad . \end{array} \tag{72}$$

$$A\_k = \left. \frac{\partial a\_k}{\partial \mathbf{x}} \right|\_{\mathbf{x} = \mathbb{x}\_{k \times k}} \quad \text{and} \ C\_k = \left. \frac{\partial c\_k}{\partial \mathbf{x}} \right|\_{\mathbf{x} = \mathbb{x}\_{k \times k-l}}.$$

<sup>&</sup>quot;You can recognize a pioneer by the arrows in his back" *Beverly Rubik*

Step 2. Operate (69) – (71) on the time-reversed transpose of *αk*. Then take the timereversed transpose of the result to obtain *βk*.

Step 3. Calculate the smoothed output estimate from

$$
\hat{\boldsymbol{y}}\_{k \wedge \mathcal{N}} = \boldsymbol{z}\_k - \boldsymbol{R}\_k \boldsymbol{\mathcal{B}}\_k \,. \tag{73}
$$

#### **10.5.2 Robust Smoother**

Smoothing, Filtering and Prediction:

/ 1

ˆ sin( ) 0 ˆ cos( ) 0 *k k k k*

 

,

*<sup>w</sup>* ,

(69) (70) (71)

(72)

*<sup>w</sup>* = 1, the problem is substantially nonlinear and a

, *Ck* = / 1

<sup>262</sup> Estimating the Past, Present and Future

 

, *Ak* = <sup>2</sup> <sup>2</sup>

0

 

**10.5.1 Approximate Minimum-Variance Smoother** 

implemented via the following three-step procedure.

on the measurement *zk*, where *Lk* = <sup>1</sup>

*Ak* =

the underlying assumptions are correct.

performance benefit can be observed. A robust EKF demodulator was designed with

/ / / / <sup>ˆ</sup> <sup>ˆ</sup> ( <sup>ˆ</sup> ) 1( <sup>ˆ</sup> ) 1

*δ*1 = 0.1, *δ*2 = 4.5 and *δ*3 = 0.001. It was found that *γ* = 1.38 was sufficient for *Pk*/*<sup>k</sup>*-1 of the above Riccati difference equation to always be positive definite. A histogram of the observed frequency estimation error is shown in Fig. 8, which demonstrates that the robust demodulator provides improved mean-square-error performance. For sufficiently large <sup>2</sup>

the output of the above model will resemble a digital signal, in which case a detector may

Consider again a nonlinear estimation problem where *xk*+1 = *ak*(*xk*) + *Bkwk*, *zk* = *ck*(*xk*) + *vk*, with *xk* , in which the nonlinearities *ak*(.), *ck*(.) are assumed to be smooth, differentiable functions of appropriate dimension. The linearisations akin to Extended Kalman filtering may be applied within the smoothers described in Chapter 7 in the pursuit of performance improvement. The fixed-lag, Fraser-Potter and Rauch-Tung-Striebel smoother recursions are easier to apply as they are less complex. The application of the minimum-variance smoother can yield approximately optimal estimates when the problem becomes linear, provided that

*Procedure 1.* An approximate minimum-variance smoother for output estimation can be

1/2 / 1 ( ( )) ˆ *k k k k kk*

> / 1 *<sup>T</sup> P C kk k k* ,

*<sup>T</sup> CP C k kk k* + *Rk* ,

*<sup>T</sup> AP A k kk k* + *<sup>T</sup> BQB k kk* ,

and *Ck* =

/ 1 / 1 *<sup>T</sup> P C CP kk k k k kk* ,

> *k x x*

*c*

/ 1 ˆ *k k*

*x*

.

*k* = / 1

*Pk k*/ = *Pk k*/ 1 – <sup>1</sup>

/ ˆ *k k*

*Pk k* 1/ = /

*k x x*

"You can recognize a pioneer by the arrows in his back" *Beverly Rubik*

*a x* 

 *z cx* , / /1 / 1 ˆ ˆ ( ( )) ˆ *kk kk k k k kk x x Lz cx* , 1/ / ˆ ˆ( ) *k k k kk x ax* ,

 

*kk kk kk kk*

improve on the EKF. However, when <sup>2</sup>

*xk* = *<sup>k</sup> k* 

outperform a demodulator.

**10.5 Nonlinear Smoothing** 

Step 1. Operate

 

> From the arguments within Chapter 9, a smoother that is robust to uncertain *wk* and *vk* can be realised by replacing the error covariance correction (72) by

$$\boldsymbol{P}\_{k'k} = \boldsymbol{P}\_{k'k-1} - \boldsymbol{P}\_{k'k-1} \begin{bmatrix} \boldsymbol{\mathsf{C}}\_{k}^{\mathrm{T}} & \boldsymbol{\mathsf{C}}\_{k}^{\mathrm{T}} \end{bmatrix} \begin{bmatrix} \boldsymbol{\mathsf{C}}\_{k}\boldsymbol{P}\_{k'k-1}\boldsymbol{\mathsf{C}}\_{k}^{\mathrm{T}} - \boldsymbol{\mathsf{y}}^{\mathrm{T}}\boldsymbol{I} & \boldsymbol{\mathsf{C}}\_{k}\boldsymbol{P}\_{k'k-1}\boldsymbol{\mathsf{C}}\_{k}^{\mathrm{T}} \\\ \boldsymbol{\mathsf{C}}\_{k}\boldsymbol{P}\_{k'k-1}\boldsymbol{\mathsf{C}}\_{k}^{\mathrm{T}} & \boldsymbol{\mathsf{R}}\_{k} + \boldsymbol{\mathsf{C}}\_{k}\boldsymbol{P}\_{k'k-1}\boldsymbol{\mathsf{C}}\_{k}^{\mathrm{T}} \end{bmatrix}^{-1} \begin{bmatrix} \boldsymbol{\mathsf{C}}\_{k} \\ \boldsymbol{\mathsf{C}}\_{k} \end{bmatrix} \boldsymbol{P}\_{k'k-1}$$

within Procedure 1. As discussed in Chapter 9, a search for a minimum *γ* such that

$$
\begin{bmatrix}
\mathbf{C}\_{k}\mathbf{P}\_{k/k-1}\mathbf{C}\_{k}^{\top} - \boldsymbol{\gamma}^{2}\mathbf{I} & \mathbf{C}\_{k}\mathbf{P}\_{k/k-1}\mathbf{C}\_{k}^{\top} \\
\mathbf{C}\_{k}\mathbf{P}\_{k/k-1}\mathbf{C}\_{k}^{\top} & \mathbf{R}\_{k} + \mathbf{C}\_{k}\mathbf{P}\_{k/k-1}\mathbf{C}\_{k}^{\top}
\end{bmatrix} \succ 0 \text{ and } \mathbf{P}\_{k/k-1} \succ 0 \text{ over } k \in [1, N] \text{ is desired.}
$$

#### **10.5.3 Application**

Returning to the problem of demodulating a unity-amplitude FM signal, let *xk* = *<sup>k</sup> k* ,

$$A = \begin{bmatrix} \mu\_w & 0\\ 1 & \mu\_\phi \end{bmatrix}, \ B = \begin{bmatrix} 1 & 0 \end{bmatrix}, \ z\_k^{(l)} = \cos(\phi\_k) + v\_k^{(l)}, \ z\_k^{(2)} = \sin(\phi\_k) + v\_k^{(2)}, \ \text{where } \ a\iota\_\nu \text{ } \phi\_\nu \text{ } z\_k \text{ and } v\_k^{(2)} \text{ are the same as the vectors of the form } \begin{bmatrix} \mu\_w & 0\\ 0 & \mu\_\phi \end{bmatrix}$$

denote the instantaneous frequency message, instantaneous phase, complex observations and measurement noise respectively. A zero-mean voiced speech utterance "a e i o u" was sampled at 8 kHz, for which estimates ˆ = 0.97 and <sup>2</sup> ˆ *<sup>w</sup>* = 0.053 were obtained using an expectation maximization algorithm. An FM discriminator output [13],

$$z z\_k^{(3)} = \left( z\_k^{(1)} \frac{d z\_k^{(2)}}{dt} - z\_k^{(2)} \frac{d z\_k^{(1)}}{dt} \right) \left( (z\_k^{(1)})^2 + (z\_k^{(1)})^2 \right)^{-1},\tag{74}$$

serves as a benchmark and as an auxiliary frequency measurement for the above smoother.

The innovations within Steps 1 and 2 are given by (1) (2) (2) (2) (3) (1) cos( ) ˆ sin( ) ˆ ˆ *k k k k k k z x z x z x* and (1) (2) (2) (2) (3) (1) cos( ) ˆ sin( ) ˆ ˆ *k k k k k k x x x* 

respectively. A unity-amplitude FM signal was synthesized using *μ* = 0.99 and the SNR was varied in 1.5 dB steps from 3 dB to 15 dB. The mean-square errors were calculated over 200 realisations of Gaussian measurement noise and are shown in Fig. 9. It can be seen from the figure, that at 7.5 dB SNR, the first-order EKF improves on the FM discriminator MSE by about 12 dB. The improvement arises because the EKF

<sup>&</sup>quot;The farther the experiment is from theory, the closer it is to the Nobel Prize." *Irène Joliot-Curie*

form of recursive least squares.

performance criterion. <sup>22</sup>

where

**10.6.2 Problem Statement** 

Constraints can be applied to state estimates, see [16], where a positivity constraint is used within a Kalman filter and a fixed-lag smoother. Three different state equality constraint approaches, namely, maximum-probability, mean-square and projection methods are described in [17]. Under prescribed conditions, the perfect-measurement and projection approaches are equivalent [5], [18], which is identical to applying linear constraints within a

In the state equality constrained methods [5], [16] – [18], a constrained estimate can be calculated from a Kalman filter's unconstrained estimate at each time step. Constraint information could also be embedded within nonlinear models for use with EKFs. A simpler, low-computation-cost technique that avoids EKF stablity problems and suits real-time implementation is described in [19]. In particular, an on-line procedure is proposed that involves using nonlinear functions to censor the measurements and subsequently applying the minimum-variance filter recursions. An off-line procedure for retrospective analyses is also described, where the minimum-variance fixed-interval smoother recursions are applied to the censored measurements. In contrast to the afore-mentioned techniques, which employ constraint matrices and vectors, here constraint information is represented by an exogenous input process. This approach uses the Bounded Real Lemma which enables the nonlinearities to be designed so that the filtered and smoothed estimates satisfy a

The ensuing discussion concerns odd and even functions which are defined as follows. A function *go* of *X* is said to be odd if *go*(– *X*) = – go(*X*). A function fe of *X* is said to be even if *fe*(– *X*) = *fe*(*X*). The product of *go* and *fe* is an odd function since *go*(– *X*) *fe*(– *X*) = – *go*(*X*) *fe*(*X*). Problems are considered where stochastic random variables are subjected to inequality constraints. Therefore, nonlinear censoring functions are introduced whose outputs are

constraint vector and an odd function of a random variable *X <sup>p</sup>* about its expected value

 

if { }

*X EX*

if { }

*X EX*

*<sup>o</sup> g* → *<sup>p</sup>* denote a

. (76)

, (75)

constrained to lie within prescribed bounds. Let *β <sup>p</sup>* and : *<sup>p</sup>*

( ) {} (, ) *<sup>o</sup> gX EX g X*

( , ) { } if { }

*g X X EX X EX*

22"If at first, the idea is not absurd, then there is no hope for it." *Albert Einstein*

*E*{*X*}, respectively. Define the censoring function

*o*

demodulator exploits the signal model whereas the FM discriminator does not. The figure shows that the approximate minimum-variance smoother further reduces the MSE by about 2 dB, which illustrates the advantage of exploiting all the data in the time interval. In the robust designs, searches for minimum values of *γ* were conducted such that the corresponding Riccati difference equation solutions were positive definite over each noise realisation. It can be seen from the figure at 7.5 dB SNR that the robust EKF provides about a 1 dB performance improvement compared to the EKF, whereas the approximate minimumvariance smoother and the robust smoother performance are indistinguishable.

This nonlinear example illustrates once again that smoothers can outperform filters. Since a first-order speech model is used and the Taylor series are truncated after the first-order terms, some model uncertainty is present, and so the robust designs demonstrate a marginal improvement over the EKF.

Figure 9. FM demodulation performance comparison: (i) FM discriminator (crosses), (ii) first-order EKF (dotted line), (iii) Robust EKF (dashed line), (iv) approximate minimum-variance smoother and robust smoother (solid line).21

#### **10.6 Constrained Filtering and Smoothing**

#### **10.6.1 Background**

Constraints often appear within navigation problems. For example, vehicle trajectories are typically constrained by road, tunnel and bridge boundaries. Similarly, indoor pedestrian trajectories are constrained by walls and doors. However, as constraints are not easily described within state-space frameworks, many techniques for constrained filtering and smoothing are reported in the literature. An early technique for constrained filtering involves augmenting the measurement vector with perfect observations [14]. The application of the perfect-measurement approach to filtering and fixed-interval smoothing is described in [15].

<sup>21&</sup>quot;They thought I was crazy, absolutely mad." *Barbara McClintock*

Smoothing, Filtering and Prediction:

<sup>264</sup> Estimating the Past, Present and Future

demodulator exploits the signal model whereas the FM discriminator does not. The figure shows that the approximate minimum-variance smoother further reduces the MSE by about 2 dB, which illustrates the advantage of exploiting all the data in the time interval. In the robust designs, searches for minimum values of *γ* were conducted such that the corresponding Riccati difference equation solutions were positive definite over each noise realisation. It can be seen from the figure at 7.5 dB SNR that the robust EKF provides about a 1 dB performance improvement compared to the EKF, whereas the approximate minimum-

This nonlinear example illustrates once again that smoothers can outperform filters. Since a first-order speech model is used and the Taylor series are truncated after the first-order terms, some model uncertainty is present, and so the robust designs demonstrate a marginal

4 6 8 10 12 14

(ii), (iii)

(iv)

(i)

SNR, dB

Figure 9. FM demodulation performance comparison: (i) FM discriminator (crosses), (ii) first-order EKF (dotted line), (iii) Robust EKF (dashed line), (iv) approximate minimum-variance smoother and robust

Constraints often appear within navigation problems. For example, vehicle trajectories are typically constrained by road, tunnel and bridge boundaries. Similarly, indoor pedestrian trajectories are constrained by walls and doors. However, as constraints are not easily described within state-space frameworks, many techniques for constrained filtering and smoothing are reported in the literature. An early technique for constrained filtering involves augmenting the measurement vector with perfect observations [14]. The application of the perfect-measurement approach to filtering and fixed-interval smoothing is

variance smoother and the robust smoother performance are indistinguishable.

improvement over the EKF.

smoother (solid line).21

**10.6.1 Background** 

described in [15].

5

10

MSE, dB

**10.6 Constrained Filtering and Smoothing** 

21"They thought I was crazy, absolutely mad." *Barbara McClintock*

15

Constraints can be applied to state estimates, see [16], where a positivity constraint is used within a Kalman filter and a fixed-lag smoother. Three different state equality constraint approaches, namely, maximum-probability, mean-square and projection methods are described in [17]. Under prescribed conditions, the perfect-measurement and projection approaches are equivalent [5], [18], which is identical to applying linear constraints within a form of recursive least squares.

In the state equality constrained methods [5], [16] – [18], a constrained estimate can be calculated from a Kalman filter's unconstrained estimate at each time step. Constraint information could also be embedded within nonlinear models for use with EKFs. A simpler, low-computation-cost technique that avoids EKF stablity problems and suits real-time implementation is described in [19]. In particular, an on-line procedure is proposed that involves using nonlinear functions to censor the measurements and subsequently applying the minimum-variance filter recursions. An off-line procedure for retrospective analyses is also described, where the minimum-variance fixed-interval smoother recursions are applied to the censored measurements. In contrast to the afore-mentioned techniques, which employ constraint matrices and vectors, here constraint information is represented by an exogenous input process. This approach uses the Bounded Real Lemma which enables the nonlinearities to be designed so that the filtered and smoothed estimates satisfy a performance criterion. <sup>22</sup>

#### **10.6.2 Problem Statement**

The ensuing discussion concerns odd and even functions which are defined as follows. A function *go* of *X* is said to be odd if *go*(– *X*) = – go(*X*). A function fe of *X* is said to be even if *fe*(– *X*) = *fe*(*X*). The product of *go* and *fe* is an odd function since *go*(– *X*) *fe*(– *X*) = – *go*(*X*) *fe*(*X*).

Problems are considered where stochastic random variables are subjected to inequality constraints. Therefore, nonlinear censoring functions are introduced whose outputs are constrained to lie within prescribed bounds. Let *β <sup>p</sup>* and : *<sup>p</sup> <sup>o</sup> g* → *<sup>p</sup>* denote a constraint vector and an odd function of a random variable *X <sup>p</sup>* about its expected value *E*{*X*}, respectively. Define the censoring function

$$\mathcal{g}(X) = E\{X\} + \mathcal{g}\_{\boldsymbol{\nu}}(X, \boldsymbol{\beta}) \,. \tag{75}$$

where

$$\mathcal{g}\_{\circ}(X,\beta) = \begin{cases} \beta & \text{if } \beta \le X - E\{X\} \\ X - E\{X\} & \text{if } -\beta < X - E\{X\} < \beta \\ -\beta & \text{if } X - E\{X\} \le -\beta \end{cases} \tag{76}$$

<sup>22&</sup>quot;If at first, the idea is not absurd, then there is no hope for it." *Albert Einstein*

and road boundaries.

{} 0 *E vk* , , { }*<sup>T</sup> Evv R <sup>j</sup> k k <sup>j</sup> <sup>k</sup>*

<sup>1</sup> [ *<sup>T</sup> y* … ] *T T*

constraint process. 24

, ,,

*<sup>j</sup> k jk jk y*ˆ

**10.6.3 Constrained Filtering** 

obtain

which satisfy

where *z* <sup>1</sup> [ *<sup>T</sup> z* … ] *T T*

depicted in Fig. 10.

 … ] *T T* 

off minimum mean-square-error performance and achieve

*k*

*z*

24"Man's greatest asset is the unsettled mind." *Isaac Asimov*

*<sup>N</sup> y* and *θ* <sup>1</sup> [ *<sup>T</sup>*

Similarly, a vehicle trajectory constraint process could include information about building

Assume that observations *zk* = *yk* + *vk* are available, where *vk <sup>p</sup>* is a stochastic, white measurement noise process having an even probability density function, with {} 0 *E vk* ,

Thus, the energy of the system's output is bounded from above by the energy of the

The minimum-variance filter and smoother which produce estimates of a linear system's output, minimise the mean square error. Here, it is desired to calculate estimates that trade

Note that (80) implies (81) but the converse is not true. Although estimates , ˆ*<sup>j</sup> <sup>k</sup> y* of *<sup>j</sup>*,*<sup>k</sup> y* satisfying

A procedure is proposed in which a linear filter : *<sup>p</sup> <sup>p</sup>* is used to calculate estimates *y*ˆ from zero-mean measurements *zk* that are constrained using an odd censoring function to

1, 1, 1,

*z gz*

*z gz*

*k ok k*

, , ,

*pk o pk pk*

2 2 2 <sup>2</sup> <sup>2</sup> *z* 

1

(, )

 

(, )

 

1

are desirable, the procedures described below only ensure that (82) is satisfied.

 *<sup>N</sup>* . It follows that 2 2 <sup>2</sup> <sup>2</sup> *y* 

> 2 2 <sup>2</sup> <sup>2</sup> *<sup>y</sup>*<sup>ˆ</sup>

and { }0 *<sup>T</sup> Ewvj k* . It is convenient to define the stacked vectors *y*

. (81)

. (82)

, (83)

. (84)

*Nz* , for a positive *γ* to be designed. This design problem is

By inspection of (75) – (76), *g*(*X*) is constrained within *E*{*X*} ± *β*. Suppose that the probability density function of *X* about *E*{*X*} is even, that is, is symmetric about *E*{*X*}. Under these conditions, the expected value of *g*(*X*) is given by

$$E\{\mathbf{g}(X)\} = \int\_{-\alpha}^{\alpha} \mathbf{g}(\mathbf{x}) f\_{\epsilon}(\mathbf{x}) d\mathbf{x}$$

$$= E\{X\} \int\_{-\alpha}^{\alpha} f\_{\epsilon}(\mathbf{x}) d\mathbf{x} + \int\_{-\alpha}^{\alpha} \mathbf{g}\_{\alpha}(\mathbf{x}, \beta) f\_{\epsilon}(\mathbf{x}) d\mathbf{x} \tag{77}$$

$$= E\{X\} \,.$$

since ( ) *<sup>e</sup> f x dx* = 1 and the product (, ) () *<sup>o</sup> <sup>e</sup> <sup>g</sup> <sup>x</sup> f x* is odd.

Thus, a constraining process can be modelled by a nonlinear function. Equation (77) states that *g*(*X*) is unbiased, provided that *go*(*X*,*β*) and *fX*(*X*) are odd and even functions about *E*{*X*}, respectively. In the analysis and examples that follow, attention is confined to systems having zero-mean inputs, states and outputs, in which case the censoring functions are also centred on zero, that is, *E*{*X*} = 0.23

Let *wk* = 1, , *T w w <sup>k</sup> m k <sup>m</sup>* represent a stochastic white input process having an even probability density function, with {}0 *E wk* , { }*<sup>T</sup> Eww Q <sup>j</sup> k k jk* , in which *jk* denotes the Kronecker delta function. Suppose that the states of a system : *<sup>m</sup>* → *<sup>p</sup>* are realised by

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{w}\_{k\text{ \textquotedblleft}} \tag{78}$$

where *Ak n n* and *Bk n m* . Since *wk* is zero-mean, it follows that linear combinations of the states are also zero-mean. Suppose also that the system outputs, *yk*, are generated by

$$\mathbf{y}\_k = \begin{bmatrix} \mathbf{y}\_{1,k} \\ \vdots \\ \mathbf{y}\_{p,k} \end{bmatrix} = \begin{bmatrix} \mathbf{g}\_o(\mathbf{C}\_{1,k}\mathbf{x}\_k, \boldsymbol{\theta}\_{l,k}) \\ \vdots \\ \mathbf{g}\_o(\mathbf{C}\_{p,k}\mathbf{x}\_k, \boldsymbol{\theta}\_{p,k}) \end{bmatrix}' \tag{79}$$

where *Cj,k* is the *jth* row of *Ck <sup>p</sup> <sup>m</sup>* , *θk* = 1, [ *<sup>k</sup>* … , ] *T p k <sup>p</sup>* is an input constraint process and , , ( ,) *<sup>o</sup> <sup>j</sup> k k <sup>j</sup> <sup>k</sup> gC x* , *j* = 1, … *p,* is an odd censoring function centred on zero. The outputs *yj,k* are constrained to lie within *j k*, , that is,

$$-\theta\_{j,k} \le y\_{j,k} \le \theta\_{j,k} \,. \tag{80}$$

For example, if the system outputs represent the trajectories of pedestrians within a building then the constraint process could include knowledge about wall, floor and ceiling positions.

<sup>23 &</sup>quot;It was not easy for a person brought up in the ways of classical thermodynamics to come around to the idea that gain of entropy eventually is nothing more nor less than loss of information." *Gilbert Newton Lewis*

Smoothing, Filtering and Prediction:

(77)

266 Estimating the Past, Present and Future

By inspection of (75) – (76), *g*(*X*) is constrained within *E*{*X*} ± *β*. Suppose that the probability density function of *X* about *E*{*X*} is even, that is, is symmetric about *E*{*X*}. Under these

> *e*

*f x* is odd.

*w w <sup>k</sup> m k <sup>m</sup>* represent a stochastic white input process having an even

*k kk k k* <sup>1</sup> *x Ax Bw* , (78)

, (79)

, in which *jk*

*p k <sup>p</sup>* is an input constraint process

. (80)

denotes the

{ ( )} ( ) ( ) *E g X g x f x dx <sup>e</sup>* 

 

Thus, a constraining process can be modelled by a nonlinear function. Equation (77) states that *g*(*X*) is unbiased, provided that *go*(*X*,*β*) and *fX*(*X*) are odd and even functions about *E*{*X*}, respectively. In the analysis and examples that follow, attention is confined to systems having zero-mean inputs, states and outputs, in which case the censoring functions are also

Kronecker delta function. Suppose that the states of a system : *<sup>m</sup>* → *<sup>p</sup>* are realised by

where *Ak n n* and *Bk n m* . Since *wk* is zero-mean, it follows that linear combinations of the states are also zero-mean. Suppose also that the system outputs, *yk*, are generated by

( ,)

*<sup>k</sup>* … , ]

*T* 

, *j* = 1, … *p,* is an odd censoring function centred on zero. The outputs *yj,k*

( ,)

1, 1, 1,

*y gCx*

*y gC x*

*<sup>j</sup>*, ,, *<sup>k</sup> <sup>j</sup> <sup>k</sup> <sup>j</sup> <sup>k</sup>*

*k o kk k*

, , ,

For example, if the system outputs represent the trajectories of pedestrians within a building then the constraint process could include knowledge about wall, floor and ceiling positions.

23 "It was not easy for a person brought up in the ways of classical thermodynamics to come around to the idea that gain of entropy eventually is nothing more nor less than loss of information." *Gilbert* 

*pk o pk k pk*

{ } () (, ) () *E X f x dx g x f x dx <sup>e</sup> <sup>o</sup>*

= 1 and the product (, ) () *<sup>o</sup> <sup>e</sup> <sup>g</sup> <sup>x</sup>*

*T*

probability density function, with {}0 *E wk* , { }*<sup>T</sup> Eww Q <sup>j</sup> k k jk*

*k*

*j k*, , that is,

*y*

*y*

where *Cj,k* is the *jth* row of *Ck <sup>p</sup> <sup>m</sup>* , *θk* = 1, [

conditions, the expected value of *g*(*X*) is given by

*E X*{ } .

centred on zero, that is, *E*{*X*} = 0.23

Let *wk* = 1, ,

and , , ( ,) *<sup>o</sup> <sup>j</sup> k k <sup>j</sup> <sup>k</sup> gC x*

*Newton Lewis*

are constrained to lie within

since ( ) *<sup>e</sup> f x dx*

Similarly, a vehicle trajectory constraint process could include information about building and road boundaries.

Assume that observations *zk* = *yk* + *vk* are available, where *vk <sup>p</sup>* is a stochastic, white measurement noise process having an even probability density function, with {} 0 *E vk* , {} 0 *E vk* , , { }*<sup>T</sup> Evv R <sup>j</sup> k k <sup>j</sup> <sup>k</sup>* and { }0 *<sup>T</sup> Ewvj k* . It is convenient to define the stacked vectors *y* <sup>1</sup> [ *<sup>T</sup> y* … ] *T T <sup>N</sup> y* and *θ* <sup>1</sup> [ *<sup>T</sup>* … ] *T T <sup>N</sup>* . It follows that

$$\left\|\boldsymbol{y}\right\|\_{2}^{2} \leq \left\|\boldsymbol{\theta}\right\|\_{2}^{2}.\tag{81}$$

Thus, the energy of the system's output is bounded from above by the energy of the constraint process. 24

The minimum-variance filter and smoother which produce estimates of a linear system's output, minimise the mean square error. Here, it is desired to calculate estimates that trade off minimum mean-square-error performance and achieve

$$\left\|\hat{y}\right\|\_{2}^{2} \leq \left\|\theta\right\|\_{2}^{2} \,. \tag{82}$$

Note that (80) implies (81) but the converse is not true. Although estimates , ˆ*<sup>j</sup> <sup>k</sup> y* of *<sup>j</sup>*,*<sup>k</sup> y* satisfying , ,, *<sup>j</sup> k jk jk y*ˆ are desirable, the procedures described below only ensure that (82) is satisfied.

#### **10.6.3 Constrained Filtering**

A procedure is proposed in which a linear filter : *<sup>p</sup> <sup>p</sup>* is used to calculate estimates *y*ˆ from zero-mean measurements *zk* that are constrained using an odd censoring function to obtain

$$\underline{\mathbf{z}}\_{k} = \begin{bmatrix} \underline{\mathbf{z}}\_{1,k} \\ \vdots \\ \underline{\mathbf{z}}\_{p,k} \end{bmatrix} = \begin{bmatrix} \mathbf{g}\_{o}(\mathbf{z}\_{1,k}, \boldsymbol{\mathcal{y}}^{-1}\boldsymbol{\theta}\_{1,k}) \\ \vdots \\ \mathbf{g}\_{o}(\mathbf{z}\_{p,k}, \boldsymbol{\mathcal{y}}^{-1}\boldsymbol{\theta}\_{p,k}) \end{bmatrix} \tag{83}$$

which satisfy

$$\left\|\underline{\boldsymbol{z}}\right\|\_{2}^{2} \leq \boldsymbol{\gamma}^{-2} \left\|\boldsymbol{\theta}\right\|\_{2}^{2}.\tag{84}$$

where *z* <sup>1</sup> [ *<sup>T</sup> z* … ] *T T Nz* , for a positive *γ* to be designed. This design problem is depicted in Fig. 10.

<sup>24&</sup>quot;Man's greatest asset is the unsettled mind." *Isaac Asimov*

*argument. It follows from (85) that* 

*induction. Thus,* 1/ { } *E xk k = 0.* 

within (83) as described below.

*Lemma to the system k k*

*objective (82) being satisfied.* 

2 2 

is accepted as self-evident." *Arthur Schopenhauer*

*<sup>T</sup> x Mx ≤* <sup>2</sup> <sup>2</sup>

*−* 0 00

*k k A B C D* 

)*<sup>k</sup> v and therefore* 

*Proof: (i) Condition (iii) implies* 1/0 *E x*{ }  *= 0, which is the initialization step of an induction* 

*Subtracting (88) from (78) gives k k* 1/ *x =* (*Ak –* / 1 ) *KC x k k kk – B wk k – K vk k –* ( *K z k k – C xk k –* 

*From above assumptions, the second and third terms on the right-hand-side of (89) are zero. The property (77) implies* { } *E zk =* { } *E zk =* {*ECxk k +* }*<sup>k</sup> v and so* { } *Ez Cx v k kk k is zero. The first term on the right-hand-side of (89) pertains to the unconstrained Kalman filter and is zero by* 

*Substituting k x = A x k k* 1 1 *+ B w k k* 1 1 *into (90) yields k k*/ *x = (I −* 1 1/ 1 ) *LC A x kk k k k + (I −* 1 1 ) *LC B w kk k k − L vk k −* ( *L z k k − C xk k −* )*<sup>k</sup> v and* / { } *E xk k =* 1 1/ 1 ( ){ } *kk k k k I LC A Ex =* 

*(iii) Defining k k*/ *y = k y −* / ˆ *k k y = k y +* ( *C x k k −* / ˆ ) *k k x − C xk k = C xk kk*/  *+ k y − C xk k and using (77) leads to* / { } *E yk k =* / { } *CEx k kk +* { } *Ey Cx k kk =* / { } *CEx k kk = 0 under condition (iii). �*

Recall that the Bounded Real Lemma (see Lemma 7 of Chapter 9) specifies a bound for a ratio of a system's output and input energies. This lemma is used to find a design for *γ*

*Lemma 4 [19]: Consider the filter (85) – (87) which operates on the constrained measurements (83). Let Ak = A KC k kk , Bk = Kk , Ck =* ( ) *C I LC k kk and Dk = C Lk k denote the state-space parameters of the filter. Suppose for a given γ2 > 0, that a solution Mk = <sup>T</sup> Mk > 0 exists over k* 

*[1, N] for the Riccati Difference equation resulting from the application of the Bounded Real* 

*Proof: For the application of the Bounded Real Lemma to the filter (85) – (87), the existence of a* 

"All truth passes through three stages: First, it is ridiculed; Second, it is violently opposed; and Third, it

*z , which together with (84) leads to (82). �*

*solution Mk = <sup>T</sup> Mk > 0 for the associated Riccati difference equation ensures that* <sup>2</sup>

It is argued below that the proposed filtering procedure is asymptotically stable.

*. Then the design γ = γ2 within (83) results in the performance* 

<sup>2</sup> *<sup>y</sup>*<sup>ˆ</sup> *<sup>≤</sup>* <sup>2</sup> <sup>2</sup> 2 2 *z*

*(ii) Condition (iii) again serves as an induction assumption. It follows from (86) that* 

<sup>1</sup> ( ) *kk k I LC A …* 1 1 0 0/0 ( ) {} *I LC AE x . Hence,* / { } *E xk k = 0 by induction.* 

1/ / 1 { } ( ){ } { } { } { } *Ex A KC Ex BEw KEv KE z Cx v k k k k k kk k k k k k k k k k .* (89)

1/ / 1 ˆ ( ) ( )( ˆ ) *k k k k k kk k k k k k k k k k x A KC x K Cx v K z Cx v .* (88)

/ /1 / 1 ˆ ˆ ( ˆ ) ( ) *kk kk k k k k k kk k k k k k x x L Cx v Cx L z Cx v* . (90)

Figure 10. The constrained filtering design problem. The task is to design a scalar *γ* so that the outputs of a filter operating on the censored zero-mean measurements 1, [ *<sup>T</sup> <sup>k</sup> z* … , ] *T T p k z* produce output estimates 1, [ˆ*<sup>T</sup> <sup>k</sup> y* … , ˆ ] *T T p k <sup>y</sup>* , which trade off mean square error performance and achieve <sup>2</sup> <sup>2</sup> <sup>2</sup> <sup>2</sup> *<sup>y</sup>*<sup>ˆ</sup> .

Censoring the measurements is suggested as a low-implementation-cost approach to constrained filtering. Design constraints are sought for the measurement censoring functions so that the outputs of a subsequent filter satisfy the performance objective (82). The recursions akin to the minimum-variance filter are applied to calculate predicted and filtered state estimates from the constrained measurements *<sup>k</sup> z* at time *k*. That is, the output mapping *Ck* is retained within the linear filter design even though nonlinearities are present with (83). The predicted states, filtered states and output estimates are respectively obtained as

$$
\hat{\mathbf{x}}\_{k+1/k} = (\mathbf{A}\_k - \mathbf{K}\_k \mathbf{C}\_k) \hat{\mathbf{x}}\_{k/k-1} + \mathbf{K}\_k \underline{\mathbf{z}}\_{k \text{ } \prime} \tag{85}
$$

$$
\hat{\mathbf{x}}\_{k/k} = (I - L\_k \mathbf{C}\_k) \hat{\mathbf{x}}\_{k/k-1} + L\_k \underline{\mathbf{z}}\_{k''} \tag{86}
$$

$$
\hat{\mathbf{y}}\_{k/k} = \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k} \tag{87}
$$

where *Lk* = / 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup> ) *Rk* , *Kk* = *AkLk*, and *Pk k*/ 1 = / 1 *<sup>T</sup> Pk k* > 0 is obtained from *Pk k*/ = *Pk k*/ 1 – / 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup> / 1 ) *R CP k k kk* , *Pk k* 1/ = / *<sup>T</sup> AP A k kk k* + *<sup>T</sup> BQB k kk* . Nonzeromean sequences can be accommodated using deterministic inputs as described in Chapter 4. Since a nonlinear system output (79) and a nonlinear measurement (83) are assumed, the estimates calculated from (85) – (87) are not optimal. Some properties that are exhibited by these estimates are described below.26

*Lemma 3 [19]: In respect of the filter (85) – (87) which operates on the constrained measurements (83), suppose the following:* 


*Then the following applies:* 


<sup>26&</sup>quot;A mind that is stretched by a new idea can never go back to its original dimensions." *Oliver Wendell Holmes*

Smoothing, Filtering and Prediction:

*<sup>k</sup> z* … , ] *T T*

*p k z* produce output

<sup>2</sup> <sup>2</sup> *<sup>y</sup>*<sup>ˆ</sup> .

*<sup>T</sup> Pk k* > 0 is obtained from

*<sup>T</sup> AP A k kk k* + *<sup>T</sup> BQB k kk* . Nonzero-

(85)

(86)

(87)

<sup>268</sup> Estimating the Past, Present and Future

Figure 10. The constrained filtering design problem. The task is to design a scalar *γ* so that the outputs

Censoring the measurements is suggested as a low-implementation-cost approach to constrained filtering. Design constraints are sought for the measurement censoring functions so that the outputs of a subsequent filter satisfy the performance objective (82). The recursions akin to the minimum-variance filter are applied to calculate predicted and filtered state estimates from the constrained measurements *<sup>k</sup> z* at time *k*. That is, the output mapping *Ck* is retained within the linear filter design even though nonlinearities are present with (83). The

predicted states, filtered states and output estimates are respectively obtained as

1/ / 1 ˆ ( )ˆ *k k k k k kk k k x A KC x Kz* ,

/ / 1 ˆ ( )ˆ *k k k k kk k k x I LC x L z* ,

/ / ˆ ˆ *kk k kk y C x* ,

*(i) the probability density functions associated with wk and vk are even;* 

*(ii) the nonlinear functions within (79) and (83) are odd; and* 

/ 1 ) *R CP k k kk* 

mean sequences can be accommodated using deterministic inputs as described in Chapter 4. Since a nonlinear system output (79) and a nonlinear measurement (83) are assumed, the estimates calculated from (85) – (87) are not optimal. Some properties that are

*Lemma 3 [19]: In respect of the filter (85) – (87) which operates on the constrained measurements* 

26"A mind that is stretched by a new idea can never go back to its original dimensions." *Oliver Wendell* 

*p k <sup>y</sup>* , which trade off mean square error performance and achieve <sup>2</sup> <sup>2</sup>

, *Kk* = *AkLk*, and *Pk k*/ 1 = / 1

, *Pk k* 1/ = /

of a filter operating on the censored zero-mean measurements 1, [ *<sup>T</sup>*

estimates 1, [ˆ*<sup>T</sup>*

*<sup>k</sup> y* … , ˆ ] *T T*

where *Lk* = / 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup> ) *Rk*

*(83), suppose the following:* 

*Then the following applies:* 

*Holmes*

*Pk k*/ = *Pk k*/ 1 – / 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup>

exhibited by these estimates are described below.26

*(iii) the filter is initialized with* 0/0 *x*ˆ *=* <sup>0</sup> *E x*{ } *.* 

*(iii) the output estimates,* / ˆ *k k y , are unbiased.* 

*(i) the predicted state estimates,* 1/ ˆ *k k x , are unbiased; (ii) the corrected state estimates,* / ˆ *k k x , are unbiased; and* 

$$
\hat{\mathbf{x}}\_{k+1/k} = \left(\mathbf{A}\_{k} - \mathbf{K}\_{k}\mathbf{C}\_{k}\right)\hat{\mathbf{x}}\_{k/k-1} + \mathbf{K}\_{k}\left(\mathbf{C}\_{k}\mathbf{x}\_{k} + \boldsymbol{\upsilon}\_{k}\right) + \mathbf{K}\_{k}\left(\underline{\mathbf{z}}\_{k} - \mathbf{C}\_{k}\mathbf{x}\_{k} - \boldsymbol{\upsilon}\_{k}\right).
\tag{88}
$$

*Subtracting (88) from (78) gives k k* 1/ *x =* (*Ak –* / 1 ) *KC x k k kk – B wk k – K vk k –* ( *K z k k – C xk k –*  )*<sup>k</sup> v and therefore* 

$$E\{\tilde{\mathbf{x}}\_{k+1/k}\} = (A\_k - K\_k C\_k) E\{\tilde{\mathbf{x}}\_{k/k-1}\} + B\_k E\{w\_k\} - K\_k E\{\upsilon\_k\} - K\_k E\{\underline{z}\_k - C\_k \mathbf{x}\_k - \upsilon\_k\} \,. \tag{89}$$

*From above assumptions, the second and third terms on the right-hand-side of (89) are zero. The property (77) implies* { } *E zk =* { } *E zk =* {*ECxk k +* }*<sup>k</sup> v and so* { } *Ez Cx v k kk k is zero. The first term on the right-hand-side of (89) pertains to the unconstrained Kalman filter and is zero by induction. Thus,* 1/ { } *E xk k = 0.* 

*(ii) Condition (iii) again serves as an induction assumption. It follows from (86) that* 

$$
\hat{\mathbf{x}}\_{k/k} = \hat{\mathbf{x}}\_{k/k-1} + L\_k(\mathbf{C}\_k \mathbf{x}\_k + \upsilon\_k - \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k-1}) + L\_k(\underline{\mathbf{z}}\_k - \mathbf{C}\_k \mathbf{x}\_k - \upsilon\_k) \,. \tag{90}
$$

*Substituting k x = A x k k* 1 1 *+ B w k k* 1 1 *into (90) yields k k*/ *x = (I −* 1 1/ 1 ) *LC A x kk k k k + (I −* 1 1 ) *LC B w kk k k − L vk k −* ( *L z k k − C xk k −* )*<sup>k</sup> v and* / { } *E xk k =* 1 1/ 1 ( ){ } *kk k k k I LC A Ex =*  <sup>1</sup> ( ) *kk k I LC A …* 1 1 0 0/0 ( ) {} *I LC AE x . Hence,* / { } *E xk k = 0 by induction.* 

*(iii) Defining k k*/ *y = k y −* / ˆ *k k y = k y +* ( *C x k k −* / ˆ ) *k k x − C xk k = C xk kk*/  *+ k y − C xk k and using (77) leads to* / { } *E yk k =* / { } *CEx k kk +* { } *Ey Cx k kk =* / { } *CEx k kk = 0 under condition (iii). �*

Recall that the Bounded Real Lemma (see Lemma 7 of Chapter 9) specifies a bound for a ratio of a system's output and input energies. This lemma is used to find a design for *γ* within (83) as described below.

*Lemma 4 [19]: Consider the filter (85) – (87) which operates on the constrained measurements (83). Let Ak = A KC k kk , Bk = Kk , Ck =* ( ) *C I LC k kk and Dk = C Lk k denote the state-space parameters of the filter. Suppose for a given γ2 > 0, that a solution Mk = <sup>T</sup> Mk > 0 exists over k [1, N] for the Riccati Difference equation resulting from the application of the Bounded Real Lemma to the system k k k k A B C D . Then the design γ = γ2 within (83) results in the performance objective (82) being satisfied.*  

*Proof: For the application of the Bounded Real Lemma to the filter (85) – (87), the existence of a solution Mk = <sup>T</sup> Mk > 0 for the associated Riccati difference equation ensures that* <sup>2</sup> <sup>2</sup> *<sup>y</sup>*<sup>ˆ</sup> *<sup>≤</sup>* <sup>2</sup> <sup>2</sup> 2 2 *z −* 0 00 *<sup>T</sup> x Mx ≤* <sup>2</sup> <sup>2</sup> 2 2 *z , which together with (84) leads to (82). �*

It is argued below that the proposed filtering procedure is asymptotically stable.

<sup>&</sup>quot;All truth passes through three stages: First, it is ridiculed; Second, it is violently opposed; and Third, it is accepted as self-evident." *Arthur Schopenhauer*

white).

*Henrik David Bohr*

operating the minimum-variance filter recursions on the raw data *zk* = *yk* + *vk* are indicated by the outer black region of Fig. 11. It can be seen that the filter outputs do not satisfy the performance objective (82), which motivates the pursuit of constrained techniques. A minimum value of *γ*2 = 1.24 was found for the solutions of the Riccati difference equation mentioned specified within Lemma 4 to be positive definite. The filter (85) – (87) was

> 2, *k k z z* =

of the observed distribution of the constrained filter estimates are indicated by the inner white region of Fig. 11. The figure shows that the constrained filter estimates satisfy (82),

*Example 6 [19]***.** Measurements were similarly synthesized using the parameters of Example 5 to demonstrate constrained fixed-interval smoother performance. A minimum value of *γ*2 = 5.6 was found for the solutions of the Riccati difference equation mentioned within Lemma 4 to be positive definite. The superimposed distributions of the unconstrained and constrained smoothers are respectively indicated by the inner and outer black regions of Fig. 12. It can be seen by inspection of the figure that the constrained smoother estimates meets (80), where as those produced by the standard smoother do not. 30

30"An expert is a man who has made all the mistakes which can be made in a very narrow field." *Niels* 

white).

1 1, 1, 1 2, 2, (, ) (, ) *ok k ok k*

Figure 12. Superimposed distributions of smoothed estimates for Example 5: unconstrained smoother (outer black); and constrained smoother (middle

using (91). The limits

 

 

 

*g z g z*

applied to the censored measurements *<sup>k</sup> z* = 1,

Figure 11. Superimposed distributions of filtered estimates for Example 4: unconstrained filter (outer black); and constrained filter (middle

which illustrates Lemma 5.

*Lemma 5 [19]: Define the filter output estimation error as y = y y*ˆ *. Under the conditions of Lemma 4, y* <sup>2</sup> *.* 

*Proof: It follows from y = y <sup>y</sup>*ˆ *that* <sup>2</sup> *<sup>y</sup>* <sup>2</sup> *y +* <sup>2</sup> *<sup>y</sup>*<sup>ˆ</sup> *, which together with (10) and the result of Lemma 4 yields* <sup>2</sup> *<sup>y</sup>* <sup>2</sup> <sup>2</sup> *, thus the claim follows. �*

#### **10.6.4 Constrained Smoothing**

In the sequel, it is proposed that the minimum-variance fixed-interval smoother recursions operate on the censored measurements *<sup>k</sup> z* to produce output estimates / ˆ *k N y* of *yk* .

*Lemma 6 [19]: In respect of the minimum-variance smoother recursions that operate on the censored measurements k z , under the conditions of Lemma 3, the smoothed estimates,* / ˆ *k N y , are unbiased.* 

The proof follows *mutatis mutandis* from the approach within the proofs of Lemma 5 of Chapter 7 and Lemma 3. An analogous result to Lemma 5 is now stated.

*Lemma 7 [19]: Define the smoother output estimation error as y = y y*ˆ *. Under the conditions of Lemma 3, y* <sup>2</sup> *.* 

The proof follows *mutatis mutandis* from that of Lemma 5. Two illustrative examples are set out below. A GPS and inertial navigation system integration application is detailed in [19].

*Example 5 [19]***.** Consider the saturating nonlinearity29

$$\mathcal{g}\_{\circ}(X,\mathcal{J}) = 2\beta\pi^{-1}\arctan\left(\pi X(\mathcal{D}\mathcal{J})^{-1}\right). \tag{91}$$

2, /

*k k*

which is a continuous approximation of (76) that satisfies (,) *<sup>o</sup> g X* ≤ and (,) *<sup>o</sup> dg X dX* =

1 + <sup>1</sup> 2 2 ( ) (2 ) *<sup>X</sup>* ≈ 1 when 2 2 ( ) (2 ) *X* << 1. Data was generated from (78), (79), (91), where *A* = 0.9 0 0 0.9 , *B = C* = 1 0 0 1 , Gaussian, white, zero-mean processes with *Q* = *R* = 0.01 0 0 0.01 . The constraint vector within (80) was chosen to be fixed, namely, *θk* = 0.5 0.5 , *k* [1, 105]. The limits of the observed distribution of estimates, / ˆ *k k y* = 1, / ˆ ˆ *k k y y* , arising by

<sup>29&</sup>quot;Everything we know is only some kind of approximation, because we know that we do not know all the laws yet. Therefore, things must be learned only to be unlearned again or, more likely, to be corrected." *Richard Phillips Feynman*

Smoothing, Filtering and Prediction:

<sup>270</sup> Estimating the Past, Present and Future

*Lemma 5 [19]: Define the filter output estimation error as y = y y*ˆ *. Under the conditions of* 

*Proof: It follows from y = y <sup>y</sup>*ˆ *that* <sup>2</sup> *<sup>y</sup>* <sup>2</sup> *y +* <sup>2</sup> *<sup>y</sup>*<sup>ˆ</sup> *, which together with (10) and the* 

In the sequel, it is proposed that the minimum-variance fixed-interval smoother recursions operate on the censored measurements *<sup>k</sup> z* to produce output estimates / ˆ *k N y* of

*Lemma 6 [19]: In respect of the minimum-variance smoother recursions that operate on the censored measurements k z , under the conditions of Lemma 3, the smoothed estimates,* / ˆ *k N y , are* 

The proof follows *mutatis mutandis* from the approach within the proofs of Lemma 5 of

*Lemma 7 [19]: Define the smoother output estimation error as y = y y*ˆ *. Under the conditions* 

The proof follows *mutatis mutandis* from that of Lemma 5. Two illustrative examples are set out below. A GPS and inertial navigation system integration application is detailed in

> 

. The constraint vector within (80) was chosen to be fixed, namely, *θk* =

29"Everything we know is only some kind of approximation, because we know that we do not know all the laws yet. Therefore, things must be learned only to be unlearned again or, more likely, to be

*X* . (91)

≤

<< 1. Data was generated from (78), (79), (91),

, Gaussian, white, zero-mean processes with *Q* = *R* =

2, /

ˆ ˆ *k k k k*

*y y* 

 and (,) *<sup>o</sup> dg X dX* =

> 0.5 0.5 ,

, arising by

*, thus the claim follows. �*

Chapter 7 and Lemma 3. An analogous result to Lemma 5 is now stated.

 <sup>1</sup> <sup>1</sup> ( , ) 2 arctan (2 ) *<sup>o</sup> g X* 

*k* [1, 105]. The limits of the observed distribution of estimates, / ˆ *k k y* = 1, /

which is a continuous approximation of (76) that satisfies (,) *<sup>o</sup> g X*

*X*

 

*Example 5 [19]***.** Consider the saturating nonlinearity29

*<sup>X</sup>* ≈ 1 when 2 2 ( ) (2 )

, *B = C* =

*Lemma 4, y*

*yk* .

*unbiased.* 

*of Lemma 3, y*

1 + <sup>1</sup> 2 2 ( ) (2 ) 

where *A* =

0.01 0 0 0.01 

 

0.9 0 0 0.9 

corrected." *Richard Phillips Feynman*

[19].

<sup>2</sup> *.* 

*result of Lemma 4 yields* <sup>2</sup> *<sup>y</sup>* <sup>2</sup> <sup>2</sup>

**10.6.4 Constrained Smoothing** 

<sup>2</sup> *.*  operating the minimum-variance filter recursions on the raw data *zk* = *yk* + *vk* are indicated by the outer black region of Fig. 11. It can be seen that the filter outputs do not satisfy the performance objective (82), which motivates the pursuit of constrained techniques. A minimum value of *γ*2 = 1.24 was found for the solutions of the Riccati difference equation mentioned specified within Lemma 4 to be positive definite. The filter (85) – (87) was

applied to the censored measurements *<sup>k</sup> z* = 1, 2, *k k z z* = 1 1, 1, 1 2, 2, (, ) (, ) *ok k ok k g z g z* using (91). The limits

of the observed distribution of the constrained filter estimates are indicated by the inner white region of Fig. 11. The figure shows that the constrained filter estimates satisfy (82), which illustrates Lemma 5.

*Example 6 [19]***.** Measurements were similarly synthesized using the parameters of Example 5 to demonstrate constrained fixed-interval smoother performance. A minimum value of *γ*2 = 5.6 was found for the solutions of the Riccati difference equation mentioned within Lemma 4 to be positive definite. The superimposed distributions of the unconstrained and constrained smoothers are respectively indicated by the inner and outer black regions of Fig. 12. It can be seen by inspection of the figure that the constrained smoother estimates meets (80), where as those produced by the standard smoother do not. 30

Figure 11. Superimposed distributions of filtered estimates for Example 4: unconstrained filter (outer black); and constrained filter (middle white).

Figure 12. Superimposed distributions of smoothed estimates for Example 5: unconstrained smoother (outer black); and constrained smoother (middle white).

<sup>30&</sup>quot;An expert is a man who has made all the mistakes which can be made in a very narrow field." *Niels Henrik David Bohr*

1st-order EKF

2nd-order EKF

3rd-order EKF

Linearisation Predictor-Corrector Form

/ /1 / 1 ˆ ˆ ( ( )) ˆ *kk kk k k k kk x x Lz cx*

1/ / ˆ ˆ( ) *k k k kk x ax*

/ /1 / 1 2

1/ / / 2

*<sup>c</sup> x x L z cx <sup>x</sup>*

*k k k kk kk*

*<sup>a</sup> x ax P <sup>x</sup>*

/ /1 / 1 2

*x xc x x L z cx <sup>x</sup>* 

1/ / / 2

<sup>1</sup> ˆ ˆ ( ) <sup>ˆ</sup> <sup>2</sup>

<sup>1</sup> ˆ ˆ( ) <sup>2</sup>

<sup>1</sup> ( ) ( )( ) 2! *T T x x fx x x*

*k k k kk kk*

*<sup>a</sup> x ax P <sup>x</sup>*

*kk kk k k k kk*

<sup>1</sup> ˆ ˆ ( ) <sup>ˆ</sup> <sup>2</sup>

<sup>1</sup> ˆ ˆ( ) <sup>2</sup>

*kk kk k k k kk*

/ 1

/ 1

*k k*

ˆ

*k k*

ˆ

*x x*

2

2

2

*k*

*k*

*k*

/

*k k*

ˆ

*x x*

2

*k*

/

*k k*

ˆ

*x x*

/ ˆ

*k k*

/ 1 ˆ

*k k*

/ ˆ

*x x*

*k k*

/ 1 ˆ

/ /

/ 1 / 1

<sup>6</sup> *k k k k*

**Problem 1.** Use the following Taylor series expansion of *f*(*x*)

<sup>0</sup> 00 0

<sup>6</sup> *k k k k*

2 / 2 ˆ ˆ

2 / 1 2 ˆ ˆ

Table 1. Summary of first, second and third-order EKFs for the case of *xk* .

0 0 0 <sup>1</sup> () ( ) ( ) ( ) 1!

<sup>0</sup> <sup>0</sup> 00 0

still be anaerobic bacteria and there would be no music." *Lewis Thomas*

*<sup>T</sup> f x fx x x fx* <sup>0</sup> 0 0

<sup>1</sup> ( ) ( ) ( ) ( )( ) , 4! *T T x x x x x x fx x x*

*"*The capacity to blunder slightly is the real marvel of DNA. Without this special attribute, we would

<sup>1</sup> ( ) ( ) ( )( ) 3! *T T x x x x fx x x*

*k k*

*x x*

*x x*

( )

( )

*k*

 

*c x x*

*Bk* = / ( ) ˆ *k kk b x*

*k*

 

*a x x*

( )

( )

*k*

 

( ) 1

( ) 1

 

*c x x*

*Bk* = / ( ) ˆ *k kk b x*

*k k k k x xx x a x <sup>a</sup> <sup>P</sup> <sup>x</sup> <sup>x</sup>* 

*k k k k x x x xc x <sup>c</sup> <sup>P</sup> <sup>x</sup> <sup>x</sup>* 

*Bk* = / ( ) ˆ *k kk b x*

*k x xa x x*

 

*Ak* =

*Ck* =

*Ak* =

*Ck* =

*Ak* =

*Ck* =

**10.8 Problems** 

The above examples involved searching for minimum value of *γ*2 for the existence of positive definite solutions for the Riccati equation alluded to within Lemma 4. The need for a search may not be apparent as stability is guaranteed whenever a positive definite solution for the associated Riccati equation exists. Searching for a minimum *γ*2 is advocated because the use of an excessively large value can lead to a nonlinearity design that is conservative and exhibits poor mean-square-error performance. If a design is still too conservative then an empirical value, namely, *γ*2 = <sup>1</sup> <sup>2</sup> <sup>2</sup> *<sup>y</sup>*<sup>ˆ</sup> *<sup>z</sup>* , may need to be considered instead.

#### **10.7 Conclusion**

In this chapter it is assumed that nonlinear systems are of the form *xk*+1 = *ak*(*xk*) + *bk*(*wk*), *yk* = *ck*(*xk*), where *ak*(.), *bk*(.) and *ck*(.) are continuous differentiable functions. The EKF arises by linearising the model about conditional mean estimates and applying the standard filter recursions. The first, second and third-order EKFs simplified for the case of *xk* are summarised in Table 1.

The EKF attempts to produce locally optimal estimates. However, it is not necessarily stable because the solutions of the underlying Riccati equations are not guaranteed to be positive definite. The faux algebraic Riccati technique trades off approximate optimality for stability. The familiar structure of the EKF is retained but stability is achieved by selecting a positive definite solution to a faux Riccati equation for the gain design.

H∞ techniques can be used to recast nonlinear filtering applications into a model uncertainty problem. It is demonstrated with the aid of an example that a robust EKF can reduce the mean square error when the problem is sufficiently nonlinear.

Linearised models may be applied within the previously-described smoothers in the pursuit of performance improvement. Nonlinear versions of the fixed-lag, Fraser-Potter and Rauch-Tung-Striebel smoothers are easier to implement as they are less complex. However, the application of the minimum-variance smoother can yield approximately optimal estimates when the problem becomes linear, provided that the underlying assumptions are correct. A smoother that is robust to input uncertainty is obtained by replacing the approximate error covariance correction with an H∞ version. The resulting robust nonlinear smoother can exhibit performance benefits when uncertainty is present.

In some applications, it may be possible to censor a system's inputs, states or outputs, rather than proceed with an EKF design. It has been shown that the use of a nonlinear censoring function to constrain input measurements leads to bounded filter and smoother estimation errors.

<sup>&</sup>quot;Most of what I learned as an entrepreneur was by trial and error." *Gordon Earl Moore*

Smoothing, Filtering and Prediction:

, may need to be

<sup>2</sup> <sup>2</sup> *<sup>y</sup>*<sup>ˆ</sup> *<sup>z</sup>*

<sup>272</sup> Estimating the Past, Present and Future

The above examples involved searching for minimum value of *γ*2 for the existence of positive definite solutions for the Riccati equation alluded to within Lemma 4. The need for a search may not be apparent as stability is guaranteed whenever a positive definite solution for the associated Riccati equation exists. Searching for a minimum *γ*2 is advocated because the use of an excessively large value can lead to a nonlinearity design that is conservative and exhibits poor mean-square-error performance. If a design is still

exists.

In this chapter it is assumed that nonlinear systems are of the form *xk*+1 = *ak*(*xk*) + *bk*(*wk*), *yk* = *ck*(*xk*), where *ak*(.), *bk*(.) and *ck*(.) are continuous differentiable functions. The EKF arises by linearising the model about conditional mean estimates and applying the standard filter recursions. The first, second and third-order EKFs simplified for the case of *xk*

The EKF attempts to produce locally optimal estimates. However, it is not necessarily stable because the solutions of the underlying Riccati equations are not guaranteed to be positive definite. The faux algebraic Riccati technique trades off approximate optimality for stability. The familiar structure of the EKF is retained but stability is achieved by

H∞ techniques can be used to recast nonlinear filtering applications into a model uncertainty problem. It is demonstrated with the aid of an example that a robust EKF can

Linearised models may be applied within the previously-described smoothers in the pursuit of performance improvement. Nonlinear versions of the fixed-lag, Fraser-Potter and Rauch-Tung-Striebel smoothers are easier to implement as they are less complex. However, the application of the minimum-variance smoother can yield approximately optimal estimates when the problem becomes linear, provided that the underlying assumptions are correct. A smoother that is robust to input uncertainty is obtained by replacing the approximate error covariance correction with an H∞ version. The resulting robust nonlinear smoother can exhibit performance benefits when uncertainty is present. In some applications, it may be possible to censor a system's inputs, states or outputs, rather than proceed with an EKF design. It has been shown that the use of a nonlinear censoring function to constrain input measurements leads to bounded filter and smoother

selecting a positive definite solution to a faux Riccati equation for the gain design.

reduce the mean square error when the problem is sufficiently nonlinear.

"Most of what I learned as an entrepreneur was by trial and error." *Gordon Earl Moore*

too conservative then an empirical value, namely, *γ*2 = <sup>1</sup>

considered instead.

**10.7 Conclusion** 

estimation errors.

are summarised in Table 1.


Table 1. Summary of first, second and third-order EKFs for the case of *xk* .

#### **10.8 Problems**

**Problem 1.** Use the following Taylor series expansion of *f*(*x*)

$$\begin{aligned} f(\mathbf{x}) &= f(\mathbf{x}\_0) + \frac{1}{1!} (\mathbf{x} - \mathbf{x}\_0)^T \nabla f(\mathbf{x}\_0) + \frac{1}{2!} (\mathbf{x} - \mathbf{x}\_0)^T \nabla^T \nabla f(\mathbf{x}\_0) (\mathbf{x} - \mathbf{x}\_0) \\\\ &+ \frac{1}{3!} (\mathbf{x} - \mathbf{x}\_0)^T \nabla^T \nabla (\mathbf{x} - \mathbf{x}\_0) \nabla f(\mathbf{x}\_0) (\mathbf{x} - \mathbf{x}\_0) \\\\ &+ \frac{1}{4!} (\mathbf{x} - \mathbf{x}\_0)^T \nabla^T \nabla (\mathbf{x} - \mathbf{x}\_0) \nabla (\mathbf{x} - \mathbf{x}\_0) \nabla f(\mathbf{x}\_0) (\mathbf{x} - \mathbf{x}\_0) + \dots, \end{aligned}$$

*<sup>&</sup>quot;*The capacity to blunder slightly is the real marvel of DNA. Without this special attribute, we would still be anaerobic bacteria and there would be no music." *Lewis Thomas*

*E*{*v*(*t*)*vT*(*t*)} = *R*(*t*).

sin( ( )) *<sup>g</sup> t*

**10.9 Glossary** 

**10.10 References** 

*Heisenberg*

Cliffs, New Jersey, 1979.

*K*(*t*)*C*(*t*)*P*(*t*) + *Q*(*t*), *A*(*t*) =

+ *w*(*t*) and *z*(*t*) = sin( ( ))

derivatives.

FM Frequency modulation.

constant, *w*(*t*) and *v*(*t*) are stochastic inputs.

**Problem 4. (Continuous-time EKF)** Assume that continuous-time signals may be modelled as *x t* ( ) = a(*x*(*t*)) + *w*(*t*), y(*t*) = *c*(*x*(*t*)), *z*(*t*) = *y*(*t*) + *v*(*t*), where *E*{*w*(*t*)*wT*(*t*)} = *Q*(*t*) and

(i) Show that approximate state estimates can be obtained from *x t* ˆ( ) = *axt* ( ( )) ˆ +

( )

(ii) Often signal models are described in the above continuous-time setting but

(i) Set out the pendulum's equations of motion in a state-space form and write

(ii) Use Euler's first-order integration formula to discretise the above model and

*f* The gradient of a function *f*, which is a row-vector of partial

*<sup>T</sup> f* The Hessian of a function *f*, which is a matrix of partial derivatives.

[1] A. P. Sage and J. L. Melsa, *Estimation Theory with Applications to Communications and* 

"What we observe is not nature itself, but nature exposed to our mode of questioning." *Werner* 

[2] A. Gelb, *Applied Optimal Estimation*, The Analytic Sciences Corporation, USA, 1974. [3] B. D. O. Anderson and J. B. Moore, *Optimal Filtering*, Prentice-Hall Inc, Englewood

tr(*Pk*) The trace of a matrix *Pk*, which is the sum of its diagonal terms.

down the continuous-time EKF for estimating *θ*(*t*) from *v*(*t*).

*x xt*

continuous-discrete version of the EKF in corrector-predictor form. **Problem 5.** Consider a pendulum of length that subtends an angle *θ*(*t*) with a vertical line through its pivot. The pendulum's angular acceleration and measurements of its

( )

*a x x*

 

instantaneous horizontal position (from the vertical) may be modelled as

then detail the corresponding discrete-time EKF.

*<sup>f</sup>* The forward difference operator with *<sup>f</sup> <sup>k</sup> e* = ( )*<sup>i</sup>*

*Control*, McGraw-Hill Book Company, New York, 1971.

*Kt zt* () () – *cxt* ( ( )) ˆ , where *K*(*t*) = *P*(*t*)*CT*(*t*)*R*-1(*t*), *P t*( ) = *A*(*t*)P(*t*) + P(*t*)*AT*(*t*) –

sampled measurements *zk* of *z*(*t*) are available. Write down a hybrid

and *C*(*t*) =

( )

2 2 ( ) ( ) *d t d t* 

=

*x xt*

.

( )

*t* + *v*(*t*), respectively, where *g* is the gravitational

*<sup>k</sup> e* – ( ) 1 *i <sup>k</sup> e* .

*c x x*

to find expressions for the coefficients *αi* within the functions below.

$$\text{(i)}\qquad f(\mathbf{x}) = \alpha\_0 + \alpha\_1(\mathbf{x} - \mathbf{x}\_0) + \alpha\_2(\mathbf{x} - \mathbf{x}\_0)^2 \dots$$

$$\text{(iii)}\qquad f(\mathbf{x}) = \alpha\_0 + \alpha\_1(\mathbf{x} - \mathbf{x}\_0) + \alpha\_2(\mathbf{x} - \mathbf{x}\_0)^2 + \alpha\_3(\mathbf{x} - \mathbf{x}\_0)^3.$$

 (iii) <sup>2</sup> 01 0 2 0 *f* (, ) ( ) ( ) *x y xx xx* . <sup>2</sup> 3 04 0 5 0 0 ( ) ( ) ( )( ) *yy yy x x y y* (iv) <sup>2</sup> <sup>3</sup> 01 0 2 0 3 0 *f* (, ) ( ) ( ) ( ) *x y xx xx xx* <sup>2</sup> <sup>3</sup> 4 05 0 6 0 ( )( ) ( ) *yy yy yy* <sup>2</sup> 7 0 08 0 <sup>0</sup> ( )( ) ( ) ( ) *x x y y x x y y* <sup>2</sup> 90 0 ( )( ) *x x y y* . (v) <sup>2</sup> <sup>3</sup> <sup>4</sup> 01 0 2 0 3 0 4 0 *f* (, ) ( ) ( ) ( ) ( ) *x y xx xx xx xx* <sup>2</sup> <sup>3</sup> <sup>4</sup> 5 06 0 7 0 8 0 ( )( ) ( ) ( ) *yy yy yy yy* <sup>2</sup> 9 0 0 10 0 <sup>0</sup> ( )( ) ( ) ( ) *x x y y x x y y* <sup>2</sup> <sup>3</sup> 11 0 0 12 0 <sup>0</sup> ( )( ) ( ) ( ) *x x y y x x y y* <sup>3</sup> <sup>2</sup> <sup>2</sup> 13 0 0 14 0 <sup>0</sup> ( )( ) ( ) ( ) *x x y y x x y y* .

**Problem 2.** Consider a state estimation problem, where *xk*+1 = *ak*(*xk*) + *Bkwk*, *yk* = *ck*(*xk*), *zk* = *yk* + *vk*, in which *wk*, *xk*, *yk*, *vk*, *ak*(.), *Bk*, *ck*(.) . Derive the


assuming the required derivatives exist.

**Problem 3.** Suppose that an FM signal is generated by *ak*+1 = *aak* + (1) *wk* , *k*+1 = *k* + (2) *wk* , *k*+1 = *k* + *k*, (1) *<sup>k</sup> z* = *ak*cos(*k*) + (1) *<sup>k</sup> v* and (2) *<sup>k</sup> z* = *ak*sin(*k*) + (2) *<sup>k</sup> v* . Write down the recursions for


EKF demodulators.

<sup>&</sup>quot;I am quite conscious that my speculations run quite beyond the bounds of true science." *Charles Robert Darwin*

Smoothing, Filtering and Prediction:

<sup>274</sup> Estimating the Past, Present and Future

 

 ( ) ( ) ( )( ) *yy yy x x y y*

> 

 

*x x y y*

 

*xx xx xx xx*

*x x y y*

*x x y y*

**Problem 2.** Consider a state estimation problem, where *xk*+1 = *ak*(*xk*) + *Bkwk*, *yk* = *ck*(*xk*), *zk* = *yk*

*<sup>k</sup> z* = *ak*sin(

"I am quite conscious that my speculations run quite beyond the bounds of true science." *Charles Robert* 

*k*) + (2)

*x x y y* .

5 06 0 7 0 8 0

 

( )( ) ( ) ( ) *yy yy yy yy*

 

*aak* + (1) *wk* , *k*+1 =

*<sup>k</sup> v* . Write down the recursions for

*k* + (2) *wk* ,

 

.

to find expressions for the coefficients *αi* within the functions below.

.

3 04 0 5 0 0

*xx xx xx*

 

 

 *xx xx* .

> 

4 05 0 6 0

 (v) <sup>2</sup> <sup>3</sup> <sup>4</sup> 01 0 2 0 3 0 4 0 *f* (, ) ( ) ( ) ( ) ( ) *x y*

9 0 0 10 0 <sup>0</sup>

 ( )( ) ( ) ( ) *x x y y* 

11 0 0 12 0 <sup>0</sup>

13 0 0 14 0 <sup>0</sup>

 ( )( ) ( ) ( ) *x x y y* 

 ( )( ) ( ) ( ) *x x y y* 

 

<sup>2</sup> <sup>3</sup> <sup>4</sup>

7 0 08 0 <sup>0</sup>

( )( ) ( ) *yy yy yy*

 ( )( ) ( ) ( ) *x x y y* 

 (ii) <sup>2</sup> <sup>3</sup> 01 0 2 0 3 0 *f* () ( ) ( ) ( ) *x xx xx xx*

 (iv) <sup>2</sup> <sup>3</sup> 01 0 2 0 3 0 *f* (, ) ( ) ( ) ( ) *x y*

<sup>2</sup> <sup>3</sup>

90 0

<sup>2</sup>

<sup>2</sup> <sup>3</sup>

<sup>3</sup> <sup>2</sup> <sup>2</sup>

+ *vk*, in which *wk*, *xk*, *yk*, *vk*, *ak*(.), *Bk*, *ck*(.) . Derive the

**Problem 3.** Suppose that an FM signal is generated by *ak*+1 =

*<sup>k</sup> v* and (2)

*k*) + (1)

( )( ) *x x y y* .

<sup>2</sup>

 (i) <sup>2</sup> 01 0 2 0 *f* () ( ) ( ) *x xx xx*

 (iii) <sup>2</sup> 01 0 2 0 *f* (, ) ( ) ( ) *x y*

<sup>2</sup>

*<sup>k</sup> z* = *ak*cos(

(i) first-order and (ii) second-order

(i) first-order, (ii) second-order, (iii) third-order and (iv) fourth-order EKFs, assuming the required derivatives exist.

*k* + *k*, (1)

EKF demodulators.

*k*+1 = 

*Darwin*

<sup>2</sup>

**Problem 4. (Continuous-time EKF)** Assume that continuous-time signals may be modelled as *x t* ( ) = a(*x*(*t*)) + *w*(*t*), y(*t*) = *c*(*x*(*t*)), *z*(*t*) = *y*(*t*) + *v*(*t*), where *E*{*w*(*t*)*wT*(*t*)} = *Q*(*t*) and *E*{*v*(*t*)*vT*(*t*)} = *R*(*t*).

(i) Show that approximate state estimates can be obtained from *x t* ˆ( ) = *axt* ( ( )) ˆ + *Kt zt* () () – *cxt* ( ( )) ˆ , where *K*(*t*) = *P*(*t*)*CT*(*t*)*R*-1(*t*), *P t*( ) = *A*(*t*)P(*t*) + P(*t*)*AT*(*t*) –

$$K(t)C(t)P(t) + Q(t)\_\prime \, A(t) = \left. \frac{\partial a(\infty)}{\partial \mathbf{x}} \right|\_{\mathbf{x} = \mathbf{x}(t)} \text{ and } C(t) = \left. \frac{\partial c(\infty)}{\partial \mathbf{x}} \right|\_{\mathbf{x} = \mathbf{x}(t)}.$$

(ii) Often signal models are described in the above continuous-time setting but sampled measurements *zk* of *z*(*t*) are available. Write down a hybrid continuous-discrete version of the EKF in corrector-predictor form.

**Problem 5.** Consider a pendulum of length that subtends an angle *θ*(*t*) with a vertical line through its pivot. The pendulum's angular acceleration and measurements of its instantaneous horizontal position (from the vertical) may be modelled as 2 2 ( ) ( ) *d t d t* =

sin( ( )) *<sup>g</sup> t* + *w*(*t*) and *z*(*t*) = sin( ( )) *t* + *v*(*t*), respectively, where *g* is the gravitational constant, *w*(*t*) and *v*(*t*) are stochastic inputs.

	- (ii) Use Euler's first-order integration formula to discretise the above model and then detail the corresponding discrete-time EKF.

#### **10.9 Glossary**


#### **10.10 References**


<sup>&</sup>quot;What we observe is not nature itself, but nature exposed to our mode of questioning." *Werner Heisenberg*


<sup>&</sup>quot;We know nothing in reality; for truth lies in an abyss." *Democritus*

Smoothing, Filtering and Prediction:

<sup>276</sup> Estimating the Past, Present and Future

[4] T. Söderström, *Discrete-time Stochastic Systems: Estimation and Control*, Springer-Verlag

[5] D. Simon, *Optimal State Estimation, Kalman H∞ and Nonlinear Approaches*, John Wiley &

[6] R. R. Bitmead, A.-C. Tsoi and P. J. Parker, "Kalman filtering approach to short time Fourier analysis", *IEEE Transactions on Acoustics, Speech and Signal Processing*, vol. 34, no.

[7] M.-A. Poubelle, R. R. Bitmead and M. Gevers, "Fake Algebraic Riccati Techniques and Stability", *IEEE Transactions on Automatic Control*, vol. 33, no. 4, pp. 379 – 381, Apr. 1988. [8] R. R. Bitmead, M. Gevers and V. Wertz, *Adaptive Optimal Control. The thinking Man's* 

[9] R. R. Bitmead and Michel Gevers, "Riccati Difference and Differential Equations: Convergence, Monotonicity and Stability", In S. Bittanti, A. J. Laub and J. C. Willems

[10] G. A. Einicke, L. B. White and R. R. Bitmead, "The Use of Fake Algebraic Riccati Equations for Co-channel Demodulation", *IEEE Transactions on Signal Processing*, vol. 51,

[11] C. A. Desoer and M. Vidyasagar, *Feedback Systems : Input Output Properties*, Academic

[12] G. A. Einicke and L. B. White, "Robust Extended Kalman Filtering", *IEEE Transactions on* 

[13] J. Aisbett, "Automatic Modulation Recognition Using Time Domain Parameters", *Signal* 

[14] P. S. Maybeck, *Stochastic models, estimation, and control*, Academic Press, New York, vol.

[15] H. E. Doran, "Constraining Kalman filter and smoothing estimates to satisfy timevarying restrictions", *Review of Economics and Statistics*, vol. 74, no. 3, pp. 568 – 572, 1992. [16] D. Massicotte, R. Z. Morawski and A. Barwicz, "Incorporation of a Positivity Constraint Into A Kalman-Filter-Based Algorithm for Correction of Spectrometric Data", *IEEE* 

[19] G. A. Einicke, G. Falco and J. T. Malos, "Bounded Constrained Filtering for GPS/INS

Integration", *IEEE Transactions on Automated Control*, 2012 (to appear).

"We know nothing in reality; for truth lies in an abyss." *Democritus*

*Transactions on Instrumentation and Measurement*, vol. 44, no. 1, pp. 2 – 7, 1995. [17] D. Simon and T. L. Chia, "Kalman Filtering with State Equality Constraints", *IEEE Transactions on Aerospace and Electronic Systems*, vol. 38, no. 1, pp. 128 – 136, 2002. [18] S. J. Julier and J. J. LaViola, "On Kalman Filtering Within Nonlinear Equality Constraints", *IEEE Transactions on Signal Processing*, vol. 55, no. 6, pp. 2774 – 2784, Jun.

London Ltd., 2002.

Sons, Inc., Hoboken, New Jersey, 2006.

*GPC*, Prentice Hall, New York, 1990.

no. 9, pp. 2288 – 2293, Sep., 2003.

*Processing*, vol. 13, pp. 311-323, 1987.

Press, NewYork, 1975.

1, 1979.

2007.

(Eds.), *The Riccati Equation*, Springer Verlag, 1991.

*Signal Processing*, vol. 47, no. 9, pp. 2596 – 2599, Sep., 1999.

6, pp. 1493 – 1501, Jun. 1986.

### *Authored by Garry A. Einicke*

This book describes the classical smoothing, filtering and prediction techniques together with some more recently developed embellishments for improving performance within applications. It aims to present the subject in an accessible way, so that it can serve as a practical guide for undergraduates and newcomers to the field. The material is organised as a ten-lecture course. The foundations are laid in Chapters 1 and 2, which explain minimummean-square-error solution construction and asymptotic behaviour. Chapters 3 and 4 introduce continuous-time and discrete-time minimum-variance filtering. Generalisations for missing data, deterministic inputs, correlated noises, direct feedthrough terms, output estimation and equalisation are described. Chapter 5 simplifies the minimumvariance filtering results for steady-state problems. Observability, Riccati equation solution convergence, asymptotic stability and Wiener filter equivalence are discussed. Chapters 6 and 7 cover the subject of continuous-time and discrete-time smoothing. The main fixed-lag, fixed-point and fixed-interval smoother results are derived. It is shown that the minimum-variance fixed-interval smoother attains the best performance. Chapter 8 attends to parameter estimation. As the above-mentioned approaches all rely on knowledge of the underlying model parameters, maximum-likelihood techniques within expectation-maximisation algorithms for joint state and parameter estimation are described. Chapter 9 is concerned with robust techniques that accommodate uncertainties within problem specifications. An extra term within Riccati equations enables designers to trade-off average error and peak error performance. Chapter 10 rounds off the course by applying the afore-mentioned linear techniques to nonlinear estimation problems. It is demonstrated that step-wise linearisations can be used within predictors, filters and smoothers, albeit by forsaking optimal performance guarantees.

Smoothing, Filtering and Prediction - Estimating The Past, Present and Future

Smoothing,

Filtering and Prediction

Estimating The Past, Present and Future

*Authored by Garry A. Einicke*

Photo by nadtytok / iStock