We are IntechOpen, the world's leading publisher of Open Access books Built by scientists, for scientists

3,800+

Open access books available

116,000+

International authors and editors

120M+

Downloads

Our authors are among the

Top 1%

most cited scientists

12.2%

Contributors from top 500 universities

Selection of our books indexed in the Book Citation Index in Web of Science™ Core Collection (BKCI)

## Interested in publishing with us? Contact book.department@intechopen.com

Numbers displayed above are based on latest data collected. For more information visit www.intechopen.com

## **Meet the author**

Garry A. Einicke received his bachelors, masters, and doctoral degrees, all in electrical and electronic engineering, from the University of Adelaide, in 1979, 1991, and 1996, respectively. He spent 1980 to 1996 with the Defence Science and Technology Organisation, where he was a Senior Research Scientist in signal analysis discipline. He has been with the Commonwealth Scientific

and Industrial Research Organisation since 1997, where he is a Principal Research Scientist and leads the signal processing and mine communications research team. Garry has received numerous science and inventor awards at the CSIRO. In his spare time, he supervises students at the University of Queensland and alternately chairs the signal processing, communications, aerospace and electronic systems chapters in the Queensland section of the IEEE.

### Contents

#### **Preface XI**


### Preface

Scientists, engineers and the like are a strange lot. Unperturbed by societal norms, they direct their energies to finding better alternatives to existing theories and concocting solutions to unsolved problems. Driven by an insatiable curiosity, they record their observations and crunch the numbers. This tome is about the science of crunching. It's about digging out something of value from the detritus that others tend to leave behind. The described approaches involve constructing models to process the available data. Smoothing entails revisiting historical records in an endeavour to understand something of the past. Filtering refers to estimating what is happening currently, whereas prediction is concerned with hazarding a guess about what might happen next.

The basics of smoothing, filtering and prediction were worked out by Norbert Wiener, Rudolf E. Kalman and Richard S. Bucy et al over half a century ago. This book describes the classical techniques together with some more recently developed embellishments for improving performance within applications. Its aims are threefold. First, to present the subject in an accessible way, so that it can serve as a practical guide for undergraduates and newcomers to the field. Second, to differentiate between techniques that satisfy performance criteria versus those relying on heuristics. Third, to draw attention to Wiener's approach for optimal non-causal filtering (or smoothing).

Optimal estimation is routinely taught at a post-graduate level while not necessarily assuming familiarity with prerequisite material or backgrounds in an engineering discipline. That is, the basics of estimation theory can be taught as a standalone subject. In the same way that a vehicle driver does not need to understand the workings of an internal combustion engine or a computer user does not need to be acquainted with its inner workings, implementing an optimal filter is hardly rocket science. Indeed, since the filter recursions are all known – its operation is no different to pushing a button on a calculator. The key to obtaining good estimator performance is developing intimacy with the application at hand, namely, exploiting any available insight, expertise and a priori knowledge to model the problem. If the measurement noise is negligible, any number of solutions may suffice. Conversely, if the observations are dominated by measurement noise, the problem may be too hard. Experienced practitioners are able recognise those intermediate sweet-spots where cost-benefits can be realised.

Systems employing optimal techniques pervade our lives. They are embedded within medical diagnosis equipment, communication networks, aircraft avionics, robotics and market forecasting – to name a few. When tasked with new problems, in which information is to be extracted from noisy measurements, one can be faced with a plethora of algorithms and techniques. Understanding the performance of candidate approaches may seem unwieldy and daunting to novices. Therefore, the philosophy here is to present the linear-quadratic-Gaussian results for smoothing, filtering and prediction with accompanying proofs about performance being attained, wherever this is appropriate. Unfortunately, this does require some maths which trades off accessibility. The treatment is little repetitive and may seem trite, but hopefully it contributes an understanding of the conditions under which solutions can value-add.

Science is an evolving process where what we think we know is continuously updated with refashioned ideas. Although evidence suggests that Babylonian astronomers were able to predict planetary motion, a bewildering variety of Earth and universe models followed. According to lore, ancient Greek philosophers such as Aristotle assumed a geocentric model of the universe and about two centuries later Aristarchus developed a heliocentric version. It is reported that Eratosthenes arrived at a good estimate of the Earth's circumference, yet there was a revival of flat earth beliefs during the middle ages. Not all ideas are welcomed - Galileo was famously incarcerated for knowing too much. Similarly, newly-appearing signal processing techniques compete with old favourites. An aspiration here is to publicise that the oft forgotten approach of Wiener, which in concert with Kalman's, leads to optimal smoothers. The ensuing results contrast with traditional solutions and may not sit well with more orthodox practitioners.

Kalman's optimal filter results were published in the early 1960s and various techniques for smoothing in a state-space framework were developed shortly thereafter. Wiener's optimal smoother solution is less well known, perhaps because it was framed in the frequency domain and described in the archaic language of the day. His work of the 1940s was borne of an analog world where filters were made exclusively of lumped circuit components. At that time, computers referred to people labouring with an abacus or an adding machine – Alan Turing's and John von Neumann's ideas had yet to be realised. In his book, Extrapolation, Interpolation and Smoothing of Stationary Time Series, Wiener wrote with little fanfare and dubbed the smoother "unrealisable". The use of the Wiener-Hopf factor allows this smoother to be expressed in a time-domain state-space setting and included alongside other techniques within the designer's toolbox.

A model-based approach is employed throughout where estimation problems are defined in terms of state-space parameters. I recall attending Michael Green's robust control course, where he referred to a distillation column control problem competition, in which a student's robust low-order solution out-performed a senior specialist's optimal high-order solution. It is hoped that this text will equip readers to do similarly, namely: make some simplifying assumptions, apply the standard solutions and back-off from optimality if uncertainties degrade performance.

Both continuous-time and discrete-time techniques are presented. Sometimes the state dynamics and observations may be modelled exactly in continuous-time. In the majority of applications, some discrete-time approximations and processing of sampled data will be required. The material is organised as a ten-lecture course.

• Chapter 1 introduces some standard continuous-time fare such as the Laplace Transform, stability, adjoints and causality. A completing-the-square approach is then used to obtain the minimum-mean-square error (or Wiener) filtering solutions.

VIII Preface

toolbox.

information is to be extracted from noisy measurements, one can be faced with a plethora of algorithms and techniques. Understanding the performance of candidate approaches may seem unwieldy and daunting to novices. Therefore, the philosophy here is to present the linear-quadratic-Gaussian results for smoothing, filtering and prediction with accompanying proofs about performance being attained, wherever this is appropriate. Unfortunately, this does require some maths which trades off accessibility. The treatment is little repetitive and may seem trite, but hopefully it contributes an

Science is an evolving process where what we think we know is continuously updated with refashioned ideas. Although evidence suggests that Babylonian astronomers were able to predict planetary motion, a bewildering variety of Earth and universe models followed. According to lore, ancient Greek philosophers such as Aristotle assumed a geocentric model of the universe and about two centuries later Aristarchus developed a heliocentric version. It is reported that Eratosthenes arrived at a good estimate of the Earth's circumference, yet there was a revival of flat earth beliefs during the middle ages. Not all ideas are welcomed - Galileo was famously incarcerated for knowing too much. Similarly, newly-appearing signal processing techniques compete with old favourites. An aspiration here is to publicise that the oft forgotten approach of Wiener, which in concert with Kalman's, leads to optimal smoothers. The ensuing results contrast with traditional solutions and may not sit well with more orthodox practitioners.

Kalman's optimal filter results were published in the early 1960s and various techniques for smoothing in a state-space framework were developed shortly thereafter. Wiener's optimal smoother solution is less well known, perhaps because it was framed in the frequency domain and described in the archaic language of the day. His work of the 1940s was borne of an analog world where filters were made exclusively of lumped circuit components. At that time, computers referred to people labouring with an abacus or an adding machine – Alan Turing's and John von Neumann's ideas had yet to be realised. In his book, Extrapolation, Interpolation and Smoothing of Stationary Time Series, Wiener wrote with little fanfare and dubbed the smoother "unrealisable". The use of the Wiener-Hopf factor allows this smoother to be expressed in a time-domain state-space setting and included alongside other techniques within the designer's

A model-based approach is employed throughout where estimation problems are defined in terms of state-space parameters. I recall attending Michael Green's robust control course, where he referred to a distillation column control problem competition, in which a student's robust low-order solution out-performed a senior specialist's optimal high-order solution. It is hoped that this text will equip readers to do similarly, namely: make some simplifying assumptions, apply the standard solutions and back-off from

Both continuous-time and discrete-time techniques are presented. Sometimes the state dynamics and observations may be modelled exactly in continuous-time. In the majority of applications, some discrete-time approximations and processing of sampled data

optimality if uncertainties degrade performance.

will be required. The material is organised as a ten-lecture course.

understanding of the conditions under which solutions can value-add.


The foundations are laid in Chapters 1 – 2, which explain minimum-mean-squareerror solution construction and asymptotic behaviour. In single-input-single-output cases, finding Wiener filter transfer functions may have appeal. In general, designing Kalman filters is more tractable because solving a Riccati equation is easier than polezero cancellation. Kalman filters are needed if the signal models are time-varying. The filtered states can be updated via a one-line recursion but the gain may require to be reevaluated at each step in time. Extended Kalman filters are contenders if nonlinearities are present. Smoothers are advocated when better performance is desired and some calculation delays can be tolerated.

This book elaborates on ten articles published in IEEE journals and I am grateful to the anonymous reviewers who have improved my efforts over the years. The great people at the CSIRO, such as David Hainsworth and George Poropat generously make themselves available to anglicise my engineering jargon. Sometimes posing good questions is helpful, for example, Paul Malcolm once asked "is it stable?" which led down to fruitful paths. During a seminar at HSU, Udo Zoelzer provided the impulse for me to undertake this project. My sources of inspiration include interactions at the CDC meetings - thanks particularly to Dennis Bernstein whose passion for writing has motivated me along the way.

> **Garry Einicke**  CSIRO Australia

Chapter title

Author Name

## **Continuous-Time Minimum-Mean-Square-Error Filtering**

#### **1.1 Introduction**

X Preface

calculation delays can be tolerated.

tivated me along the way.

The foundations are laid in Chapters 1 – 2, which explain minimum-mean-squareerror solution construction and asymptotic behaviour. In single-input-single-output cases, finding Wiener filter transfer functions may have appeal. In general, designing Kalman filters is more tractable because solving a Riccati equation is easier than polezero cancellation. Kalman filters are needed if the signal models are time-varying. The filtered states can be updated via a one-line recursion but the gain may require to be reevaluated at each step in time. Extended Kalman filters are contenders if nonlinearities are present. Smoothers are advocated when better performance is desired and some

This book elaborates on ten articles published in IEEE journals and I am grateful to the anonymous reviewers who have improved my efforts over the years. The great people at the CSIRO, such as David Hainsworth and George Poropat generously make themselves available to anglicise my engineering jargon. Sometimes posing good questions is helpful, for example, Paul Malcolm once asked "is it stable?" which led down to fruitful paths. During a seminar at HSU, Udo Zoelzer provided the impulse for me to undertake this project. My sources of inspiration include interactions at the CDC meetings - thanks particularly to Dennis Bernstein whose passion for writing has mo-

> **Garry Einicke**  CSIRO Australia

Optimal filtering is concerned with designing the best linear system for recovering data from noisy measurements. It is a model-based approach requiring knowledge of the signal generating system. The signal models, together with the noise statistics are factored into the design in such a way to satisfy an optimality criterion, namely, minimising the square of the error.

Continuous-Time Minimum-Mean-Square-Error Filtering 1

A prerequisite technique, the method of least-squares, has its origin in curve fitting. Amid some controversy, Kepler claimed in 1609 that the planets move around the Sun in elliptical orbits [1]. Carl Freidrich Gauss arrived at a better performing method for fitting curves to astronomical observations and predicting planetary trajectories in 1799 [1]. He formally published a least-squares approximation method in 1809 [2], which was developed independently by Adrien-Marie Legendre in 1806 [1]. This technique was famously used by Giusseppe Piazzi to discover and track the asteroid Ceres using a least-squares analysis which was easier than solving Kepler's complicated nonlinear equations of planetary motion [1]. Andrey N. Kolmogorov refined Gauss's theory of least-squares and applied it for the prediction of discrete-time stationary stochastic processes in 1939 [3]. Norbert Wiener, a faculty member at MIT, independently solved analogous continuous-time estimation problems. He worked on defence applications during the Second World War and produced a report entitled *Extrapolation, Interpolation and Smoothing of Stationary Time Series* in 1943. The report was later published as a book in 1949 [4].

Wiener derived two important results, namely, the optimum (non-causal) minimum-meansquare-error solution and the optimum causal minimum-mean-square-error solution [4] – [6]. The optimum causal solution has since become known at the Wiener filter and in the time-invariant case is equivalent to the Kalman filter that was developed subsequently. Wiener pursued practical outcomes and attributed the term "unrealisable filter" to the optimal non-causal solution because "it is not in fact realisable with a finite network of resistances, capacities, and inductances" [4]. Wiener's unrealisable filter is actually the optimum linear smoother.

The optimal Wiener filter is calculated in the frequency domain. Consequently, Section 1.2 touches on some frequency-domain concepts. In particular, the notions of spaces, state-space systems, transfer functions, canonical realisations, stability, causal systems, power spectral density and spectral factorisation are introduced. The Wiener filter is then derived by minimising the square of the error. Three cases are discussed in Section 1.3. First, the

<sup>&</sup>quot;All men by nature desire to know." *Aristotle*

**1.2.4 Linear Systems** 

 , : 

*w <sup>q</sup>* . The following properties hold.

( +) *w* =

> () *w* =

 + 

equivalent to scalar amplification of a system's output.

() *w* =

1 1 1 0 () () () ... ( )

<sup>1</sup> <sup>1</sup> 1 0 ... ( )

<sup>1</sup>

*d y t d y t dy t a a a ayt dt dt dt* 

<sup>1</sup>

systems

where

invariant system

 : 

<sup>1</sup>

<sup>1</sup>

and defined by

*n n n n n n*

*d d d a a a a yt dt dt dt* 

**1.2.6 The Laplace Transform of a Signal** 

*n n n n n n*

that is,

equivalent to the system

equivalent to the system

**1.2.5 Polynomial Fraction Systems** 

 *w* , 

A linear system is defined as having an output vector which is equal to the value of a linear operator applied to an input vector. That is, the relationships between the output and input vectors are described by linear equations, which may be algebraic, differential or integral. Linear time-domain systems are denoted by upper-case script fonts. Consider two linear

> *w* +*w* ,

 (*w* ),

 (*w* ),

. An interpretation of (2) is that a parallel combination of

The Wiener filtering results [4] – [6] were originally developed for polynomial fraction descriptions of systems which are described below. Consider an nth-order linear, time-

> *m m m m m n*

conditions. This differential equation can be written in the more compact form

() () *st Y s y t e dt*

"Science is a way of thinking much more than it is a body of knowledge." *Carl Edward Sagan*

where *a*0, … *an* and *b*0, … *bm* are real-valued constant coefficients, 0 *<sup>n</sup> a* , with zero initial

*m m m m m n*

The two-sided Laplace transform of a continuous-time signal *y*(*t*) is denoted by *Y*(*s*)

: . Suppose that the differential equation model for this system is

*<sup>p</sup> <sup>q</sup>* , that is, they operate on an input *w <sup>p</sup>* and produce outputs

. From (3), a series combination of

that operates on an input *w*(*t*) and produces an output *y*(*t*) ,

1 1 1 0 ( ) () () ... ( )

*d d <sup>d</sup> b b b b wt dt dt dt* 

<sup>1</sup> <sup>1</sup> 1 0 ... ( )

, (7)

*d w t d w t dw t b b b bwt dt dt dt* ,

. Equation (4) states that scalar amplification of a system is

(2) (3) (4)

(5)

(6)

is

 and is

and

.

solution to general estimation problem is stated. Second, the general estimation results are specialised to output estimation. The optimal input estimation or equalisation solution is then described. An example, demonstrating the recovery of a desired signal from noisy measurements, completes the chapter.

#### **1.2 Prerequisites**

#### **1.2.1 Signals**

Consider two continuous-time, real-valued stochastic (or random) signals ( ) *<sup>T</sup> v t* = <sup>1</sup> [ ( ), *<sup>T</sup> v t* <sup>2</sup> ( ), *<sup>T</sup> v t* …, ( )] *<sup>T</sup> <sup>n</sup> v t* , ( ) *<sup>T</sup> w t* = <sup>1</sup> [ ( ), *<sup>T</sup> w t* <sup>2</sup> ( ), *<sup>T</sup> w t* …, ( )] *<sup>T</sup> w t <sup>n</sup>* , with ( ) *<sup>i</sup> v t* , ( ) *w t <sup>i</sup>* , *i* = 1, … *n*, which are said to belong to the space *<sup>n</sup>* , or more concisely *v*(*t*), *w*(*t*) *<sup>n</sup>* . Let *w* denote the set of *w*(*t*) over all time *t*, that is, *w* = { *w*(*t*), *t* ( ,) }.

#### **1.2.2 Elementary Functions Defined on Signals**

The inner product *v w*, of two continuous-time signals *v* and *w* is defined by

, *<sup>T</sup> v w v w dt* . (1)

The 2-norm or Euclidean norm of a continuous-time signal *w*, <sup>2</sup> *w* , is defined as <sup>2</sup> *w* = *w w*, = *<sup>T</sup> w wdt* . The square of the 2-norm, that is, <sup>2</sup> <sup>2</sup> *<sup>w</sup>* = *<sup>T</sup> w w* = *<sup>T</sup> w w dt* is commonly known as energy of the signal *w*.

#### **1.2.3 Spaces**

The Lebesgue 2-space, defined as the set of continuous-time signals having finite 2-norm, is denoted by 2. Thus, *w* 2 means that the energy of *w* is bounded. The following properties hold for 2-norms.


(v) 2 2 *vw v w* , , which is known as the Cauchy-Schwarz inequality.

See [8] for more detailed discussions of spaces and norms.

<sup>&</sup>quot;Scientific discovery consists in the interpretation for our own convenience of a system of existence which has been made with no eye to our convenience at all." *Norbert Wiener*

#### **1.2.4 Linear Systems**

Smoothing, Filtering and Prediction:

2 Estimating the Past, Present and Future

solution to general estimation problem is stated. Second, the general estimation results are specialised to output estimation. The optimal input estimation or equalisation solution is then described. An example, demonstrating the recovery of a desired signal from noisy

Consider two continuous-time, real-valued stochastic (or random) signals ( ) *<sup>T</sup> v t* =

*n*, which are said to belong to the space *<sup>n</sup>* , or more concisely *v*(*t*), *w*(*t*) *<sup>n</sup>* . Let *w* denote

The 2-norm or Euclidean norm of a continuous-time signal *w*, <sup>2</sup> *w* , is defined as <sup>2</sup> *w* =

The Lebesgue 2-space, defined as the set of continuous-time signals having finite 2-norm, is denoted by 2. Thus, *w* 2 means that the energy of *w* is bounded. The following

"Scientific discovery consists in the interpretation for our own convenience of a system of existence

The inner product *v w*, of two continuous-time signals *v* and *w* is defined by

, *<sup>T</sup> v w v w dt*

. The square of the 2-norm, that is, <sup>2</sup>

(iii) 22 2 *vw v w* , which is known as the triangle inequality.

which has been made with no eye to our convenience at all." *Norbert Wiener*

See [8] for more detailed discussions of spaces and norms.

(v) 2 2 *vw v w* , , which is known as the Cauchy-Schwarz inequality.

*<sup>n</sup> v t* , ( ) *<sup>T</sup> w t* = <sup>1</sup> [ ( ), *<sup>T</sup> w t* <sup>2</sup> ( ), *<sup>T</sup> w t* …, ( )] *<sup>T</sup> w t <sup>n</sup>* , with ( ) *<sup>i</sup> v t* , ( ) *w t <sup>i</sup>* , *i* = 1, …

. (1)

<sup>2</sup> *<sup>w</sup>* = *<sup>T</sup> w w* = *<sup>T</sup> w w dt*

is

measurements, completes the chapter.

the set of *w*(*t*) over all time *t*, that is, *w* = { *w*(*t*), *t* ( ,) }.

**1.2.2 Elementary Functions Defined on Signals** 

commonly known as energy of the signal *w*.

**1.2 Prerequisites** 

<sup>1</sup> [ ( ), *<sup>T</sup> v t* <sup>2</sup> ( ), *<sup>T</sup> v t* …, ( )] *<sup>T</sup>*

*w w*, = *<sup>T</sup> w wdt* 

properties hold for 2-norms.

(i) <sup>2</sup> *v v* 0 0 .

(ii) <sup>2</sup> <sup>2</sup> 

 *v v* .

(iv) 2 22 *vw v w* .

**1.2.3 Spaces** 

**1.2.1 Signals** 

A linear system is defined as having an output vector which is equal to the value of a linear operator applied to an input vector. That is, the relationships between the output and input vectors are described by linear equations, which may be algebraic, differential or integral. Linear time-domain systems are denoted by upper-case script fonts. Consider two linear systems , : *<sup>p</sup> <sup>q</sup>* , that is, they operate on an input *w <sup>p</sup>* and produce outputs *w* , *w <sup>q</sup>* . The following properties hold.

$$(\mathcal{J} + \mathcal{H}') \,\mathrm{w} = \mathcal{J} \,\mathrm{w} + \mathcal{H} \,\mathrm{w} \,\mathrm{w} \tag{2}$$

$$(\mathcal{G}\mathcal{H})\,\mathrm{w} = \mathcal{G}\,(\mathcal{H}\,\mathrm{w}),\tag{3}$$

$$(a\mathcal{G})\,\,\,\!w\equiv\,a\,(\mathcal{G}w\,),\tag{4}$$

where . An interpretation of (2) is that a parallel combination of and is equivalent to the system + . From (3), a series combination of and is equivalent to the system . Equation (4) states that scalar amplification of a system is equivalent to scalar amplification of a system's output.

#### **1.2.5 Polynomial Fraction Systems**

The Wiener filtering results [4] – [6] were originally developed for polynomial fraction descriptions of systems which are described below. Consider an nth-order linear, timeinvariant system that operates on an input *w*(*t*) and produces an output *y*(*t*) , that is, : : . Suppose that the differential equation model for this system is

$$\begin{split} a\_n \frac{d^n y(t)}{dt^n} + a\_{n-1} \frac{d^{n-1} y(t)}{dt^{n-1}} + \dots + a\_1 \frac{dy(t)}{dt} + a\_0 y(t) \\ &= b\_n \frac{d^m w(t)}{dt^m} + b\_{m-1} \frac{d^{m-1} w(t)}{dt^{n-1}} + \dots + b\_1 \frac{dw(t)}{dt} + b\_0 w(t) \end{split} \tag{5}$$

where *a*0, … *an* and *b*0, … *bm* are real-valued constant coefficients, 0 *<sup>n</sup> a* , with zero initial conditions. This differential equation can be written in the more compact form

$$\begin{aligned} \left( a\_n \frac{d^n}{dt^n} + a\_{n-1} \frac{d^{n-1}}{dt^{n-1}} + \dots + a\_1 \frac{d}{dt} + a\_0 \right) y(t) \\ &= \left( b\_n \frac{d^m}{dt^m} + b\_{n-1} \frac{d^{m-1}}{dt^{m-1}} + \dots + b\_1 \frac{d}{dt} + b\_0 \right) w(t) \end{aligned} \tag{6}$$

#### **1.2.6 The Laplace Transform of a Signal**

The two-sided Laplace transform of a continuous-time signal *y*(*t*) is denoted by *Y*(*s*) and defined by

$$Y(s) = \int\_{-\alpha}^{\alpha} y(t)e^{-st}dt \,\,\,\,\,\tag{7}$$

<sup>&</sup>quot;Science is a way of thinking much more than it is a body of knowledge." *Carl Edward Sagan*

where

transfer function.

**1.2.8 Poles and Zeros** 

factors, respectively, to give

1 1 1 0

1

... ( ) ... *m m m m n n n n*

*b s b s bs b G s a s a s as a* 

1 1 0

is known as the transfer function of the system. It can be seen from (6) and (12) that the polynomial transfer function coefficients correspond to the system's differential equation coefficients. Thus, knowledge of a system's differential equation is sufficient to identify its

The numerator and denominator polynomials of (12) can be factored into *m* and *n* linear

 

 

1 2 1 2 ( )( )...( ) ( ) ( )( )...( ) *m m n n*

11 12 1

*Gs Gs Gs*

: : ( ) .. ( )

*q qp*

*+ +* 

*B* Σ *C* 

*A*

*G s G s*

( ) ( ) .. ( )

*p*

*+ y(t)* 

21 22

*Gs Gs*

() () ( )

1

*w(t) x t* ( ) *x(t)* 

The numerator of *G*(*s*) is zero when *s* = *βi*, *i* = 1 … *m*. These values of *s* are called the zeros of *G*(*s*). Zeros in the left-hand-plane are called minimum-phase whereas zeros in the righthand-plane are called non-minimum phase. The denominator of *G*(*s*) is zero when *s* = α*i*, *i* =

*Example 1.* Consider a system described by the differential equation *y*( )*t* = – *y*(*t*) + *w*(*t*), in which *y*(*t*) is the output arising from the input *w*(*t*). From (6) and (12), it follows that the corresponding transfer function is given by *G*(*s*) = (*s* + 1)-1, which possesses a pole at *s = - 1.*  The system in Example 1 operates on a single input and produces a single output, which is known as single-input-single-output (SISO) system. Systems operating on multiple inputs and

output (MIMO). The corresponding transfer function matrices can be written as equation (14), where the components *Gij*(*s*) have the polynomial transfer function form within (12) or (13).

*bs s s G s as s s* 

1 … *n*. These values of *s* are called the poles of *G*(*s*).

producing multiple outputs, for example, : *<sup>p</sup>*

*G s*

Figure 1. Continuous-time state-space system.

"Nature laughs at the difficulties of integration." *Pierre-Simon Laplace*

. (12)

. (13)

→ *<sup>q</sup>* , are known as multiple-input-multiple-

. (14)

Σ

*+* 

*D* 

where *s* = σ + *jω* is the Laplace transform variable, in which σ, *ω* and *j* = 1 . Given a signal *y*(*t*) with Laplace transform *Y*(*s*), *y*(*t*) can be calculated from *Y*(*s*) by taking the inverse Laplace Transform of *Y*(*s*), which is defined by

$$\mathbf{y}(t) = \int\_{\sigma - j\epsilon}^{\sigma + j\epsilon} Y(s)e^{st}ds \,. \tag{8}$$

*Theorem 1 Parseval's Theorem [7]:* 

$$\int\_{-\alpha}^{\alpha} \left| y(t) \right|^2 dt = \int\_{-j\alpha}^{j\alpha} \left| Y(s) \right|^2 ds \,. \tag{9}$$

*Proof. Let* () () *<sup>j</sup> <sup>H</sup> H st <sup>j</sup> <sup>y</sup> t Y s e ds and YH(s) denote the Hermitian transpose (or adjoint) of y(t) and Y(s), respectively. The left-hand-side of (9) may be written as*

$$\begin{aligned} \left[\int\_{-\alpha}^{\alpha} \left| y(t) \right|^2 dt = \int\_{-\alpha}^{\alpha} y^{it}(t) y(t) dt \\ = \int\_{-\beta\alpha}^{\beta\alpha} \frac{1}{2\pi f} \int\_{-\alpha}^{\alpha} Y^{it}(s) e^{-st} ds \text{ s} \, y(t) dt \\ = \int\_{-\alpha}^{\alpha} \frac{1}{2\pi f} \int\_{-\beta\alpha}^{\beta\alpha} y(t) e^{-st} dt \text{ if } Y^{it}(s) ds \\ = \int\_{-\beta\alpha}^{\beta\alpha} Y(s) Y^{it}(s) ds \\ = \int\_{-\beta\alpha}^{\beta\alpha} \left| Y(s) \right|^2 ds \,. \end{aligned}$$

The above theorem is attributed to Parseval whose original work **[**7] concerned the sums of trigonometric series. An interpretation of (9) is that the energy in the time domain equals the energy in the frequency domain.

#### **1.2.7 Polynomial Fraction Transfer Functions**

The steady-state response *y*(*t*) = *Y*(*s*)*est* can be found by applying the complex-exponential input *w*(*t*) = *W*(*s*)*est* to the terms of (6), which results in

$$\left(a\_n s^n + a\_{n-l} s^{n-1} + \dots + a\_l s + a\_0\right) Y(\mathbf{s}) e^{st} = \left(b\_n s^n + b\_{n-l} s^{n-1} + \dots + b\_l s + b\_0\right) W(\mathbf{s}) e^{st} \,. \tag{10}$$

Therefore,

$$\mathbf{Y}(\mathbf{s}) = \left[ \frac{b\_m s^m + b\_{m-1} s^{m-1} + \dots + b\_1 s + b\_o}{a\_n s^n + a\_{n-1} s^{n-1} + \dots + a\_1 s + a\_0} \right] \mathbf{W}(\mathbf{s}) \tag{11}$$
 
$$= \mathbf{G}(\mathbf{s}) \mathbf{W}(\mathbf{s}) \, \prime$$

<sup>&</sup>quot;No, no, you're not thinking; you're just being logical." *Niels Henrik David Bohr*

where

Smoothing, Filtering and Prediction:

□

(11)

. (8)

. (9)

 *and YH(s) denote the Hermitian transpose (or adjoint) of y(t)* 

*Y s e ds y t dt*

<sup>1</sup> 1 0 ... ( ) *<sup>m</sup> <sup>m</sup> st*

*m m b s b s bs b W s e* . (10)

*<sup>j</sup> st H <sup>j</sup> <sup>y</sup> t e dt Y s ds*

4 Estimating the Past, Present and Future

where *s* = σ + *jω* is the Laplace transform variable, in which σ, *ω* and *j* = 1 . Given a signal *y*(*t*) with Laplace transform *Y*(*s*), *y*(*t*) can be calculated from *Y*(*s*) by taking the inverse

> () () *<sup>j</sup> st <sup>j</sup> <sup>y</sup> t Y s e ds*

2 2 ( ) ( ) *<sup>j</sup> <sup>j</sup> <sup>y</sup> t dt Y s ds*

() () () *<sup>H</sup> <sup>y</sup> t dt y t y t dt*

*j*

*j*

*n n a s a s as a Y s e* <sup>1</sup>

*m m m m n n n n*

"No, no, you're not thinking; you're just being logical." *Niels Henrik David Bohr*

1 1 1 0

1

*b s b s bs b Y s W s a s a s as a* 

1 1 0 ... ( ) ( ) ...

  *j*

 *j* 

> ( ) *<sup>j</sup> j*

*Y s ds* .

*<sup>j</sup> H st*

*Y s Y s ds*

The above theorem is attributed to Parseval whose original work **[**7] concerned the sums of trigonometric series. An interpretation of (9) is that the energy in the time domain equals the

The steady-state response *y*(*t*) = *Y*(*s*)*est* can be found by applying the complex-exponential

Laplace Transform of *Y*(*s*), which is defined by

*<sup>j</sup> <sup>y</sup> t Y s e ds* 

() () *<sup>j</sup> <sup>H</sup>*

<sup>2</sup>

**1.2.7 Polynomial Fraction Transfer Functions** 

<sup>1</sup> 1 0 ... ( ) *n n st*

input *w*(*t*) = *W*(*s*)*est* to the terms of (6), which results in

energy in the frequency domain.

<sup>1</sup>

*GsWs* () () ,

Therefore,

*and Y(s), respectively. The left-hand-side of (9) may be written as*

2

<sup>1</sup> () () <sup>2</sup>

<sup>1</sup> () () <sup>2</sup>

*Theorem 1 Parseval's Theorem [7]:* 

*Proof. Let* () () *<sup>j</sup> <sup>H</sup> H st*

$$G(s) = \frac{b\_n s^n + b\_{n-1} s^{n-1} + \dots + b\_1 s + b\_0}{a\_n s^n + a\_{n-1} s^{n-1} + \dots + a\_1 s + a\_0} \,. \tag{12}$$

is known as the transfer function of the system. It can be seen from (6) and (12) that the polynomial transfer function coefficients correspond to the system's differential equation coefficients. Thus, knowledge of a system's differential equation is sufficient to identify its transfer function.

#### **1.2.8 Poles and Zeros**

The numerator and denominator polynomials of (12) can be factored into *m* and *n* linear factors, respectively, to give

$$G(\mathbf{s}) = \frac{b\_m(\mathbf{s} - \boldsymbol{\beta}\_1)(\mathbf{s} - \boldsymbol{\beta}\_2)...(\mathbf{s} - \boldsymbol{\beta}\_m)}{a\_n(\mathbf{s} - \boldsymbol{\alpha}\_1)(\mathbf{s} - \boldsymbol{\alpha}\_2)...(\mathbf{s} - \boldsymbol{\alpha}\_n)}.\tag{13}$$

The numerator of *G*(*s*) is zero when *s* = *βi*, *i* = 1 … *m*. These values of *s* are called the zeros of *G*(*s*). Zeros in the left-hand-plane are called minimum-phase whereas zeros in the righthand-plane are called non-minimum phase. The denominator of *G*(*s*) is zero when *s* = α*i*, *i* = 1 … *n*. These values of *s* are called the poles of *G*(*s*).

*Example 1.* Consider a system described by the differential equation *y*( )*t* = – *y*(*t*) + *w*(*t*), in which *y*(*t*) is the output arising from the input *w*(*t*). From (6) and (12), it follows that the corresponding transfer function is given by *G*(*s*) = (*s* + 1)-1, which possesses a pole at *s = - 1.* 

The system in Example 1 operates on a single input and produces a single output, which is known as single-input-single-output (SISO) system. Systems operating on multiple inputs and producing multiple outputs, for example, : *<sup>p</sup>* → *<sup>q</sup>* , are known as multiple-input-multipleoutput (MIMO). The corresponding transfer function matrices can be written as equation (14), where the components *Gij*(*s*) have the polynomial transfer function form within (12) or (13). written

$$\mathbf{G}(\mathbf{s}) = \begin{bmatrix} \mathbf{G}\_{11}(\mathbf{s}) & \mathbf{G}\_{12}(\mathbf{s}) & \dots & \mathbf{G}\_{1p}(\mathbf{s}) \\ \mathbf{G}\_{21}(\mathbf{s}) & \mathbf{G}\_{22}(\mathbf{s}) \\ \vdots & & \ddots & \vdots \\ \mathbf{G}\_{q1}(\mathbf{s}) & & \dots & \mathbf{G}\_{qp}(\mathbf{s}) \end{bmatrix} \tag{14}$$

Figure 1. Continuous-time state-space system.

<sup>&</sup>quot;Nature laughs at the difficulties of integration." *Pierre-Simon Laplace*

*Example 2***.** In respect of the continuous-time state evolution (15), consider *A* = −1, *B* = 1 together with the deterministic input *w*(*t*) = sin(*t*) + cos(*t*). The states can be calculated from the known *w*(*t*) using (19) and the difference equation (18). In this case, the state error is given by *e*(*tk*) = sin(*tk*) – *x*(*tk*). In particular, root-mean-square-errors of 0.34, 0.031, 0.0025 and 0.00024, were observed for *δt* = 1, 0.1, 0.01 and 0.001, respectively. This demonstrates that the

*Example 3.* For a state-space model with *A* = −1, *B* = *C* = 1 and *D* = 0, the transfer function is

 <sup>1</sup> *d b ad bc c a* 

3 2 1 0 *<sup>A</sup>*

and 1 0

The mapping of a polynomial fraction transfer function (12) to a state-space representation (20) is not unique. Two standard state-space realisations of polynomial fraction transfer functions are described below. Assume that: the transfer function has been expanded into the sum of a direct feed-though term plus a strictly proper transfer function, in which the order of the numerator polynomial is less than the order of the denominator polynomial; and the strictly proper transfer function has been normalised so that *an* = 1. Under these assumptions, the system can be realised in the controllable canonical form which is

"Science is everything we understand well enough to explain to a computer. Art is everything else."

*BC D*

<sup>1</sup> *G s C sI A B D* () ( ) , (20)

, *C* 2 5 and *D* = 0, the use

into (20) results in the transfer

, yields the transfer function *G*(*s*) =

, <sup>1</sup> 0 *<sup>B</sup>*

0 1

first order approximation (18) can be reasonable when *δt* is sufficiently small.

The transfer function matrix of the state-space system (15) - (16) is defined by

**1.2.11 State-Space Transfer Function Matrix** 

*Example 4.* For state-space parameters

of Cramer's rule, that is,

*s s* 1 1

*Example 5.* Substituting 1 0

**1.2.12 Canonical Realisations** 

parameterised by [10]

*David Knuth*

*G*(*s*) = (*s* + 1)-1.

(2 5) ( 1)( 2) *s*

function matrix

in which *s* again denotes the Laplace transform variable.

1 *a b c d*

0 2 *<sup>A</sup>*

2

.

<sup>2</sup> <sup>0</sup>

*s s*

<sup>1</sup> ( ) <sup>3</sup> <sup>0</sup>

*s*

*<sup>s</sup> G s*

( 1) ( 2) *s s* .

#### **1.2.9 State-Space Systems**

A system : *<sup>p</sup>* → *<sup>q</sup>* having a state-space realisation is written in the form

$$\dot{\mathbf{x}}(t) = A\mathbf{x}(t) + Bw(t) \, \text{ \, \, \, \, \, \, \, \, \, \, \, B} \tag{15}$$

$$\text{Cov}(t) = \text{Cx}(t) + Dw(t) \,. \tag{16}$$

where *A n n* , *B <sup>p</sup> <sup>m</sup>* , *C q n* and *D q q* , in which *w <sup>p</sup>* is an input, *x <sup>n</sup>* is a state vector and *y <sup>q</sup>* is an output. *A* is known as the state matrix and *D* is known as the direct feed-through matrix. The matrices *B* and *C* are known as the input mapping and the output mapping, respectively. This system is depicted in Fig. 1.

#### **1.2.10 Euler's Method for Numerical Integration**

Differential equations of the form (15) could be implemented directly by analog circuits. Digital or software implementations require a method for numerical integration. A firstorder numerical integration technique, known as Euler's method, is now derived. Suppose that *x*(*t*) is infinitely differentiable and consider its Taylor series expansion in the neighbourhood of *t*<sup>0</sup>

$$\mathbf{x}(t) = \mathbf{x}(t\_0) + \frac{(t - t\_0)}{1!} \frac{d\mathbf{x}(t\_0)}{dt} + \frac{(t - t\_0)^2}{2!} \frac{d^2 \mathbf{x}(t\_0)}{dt^2} + \frac{(t - t\_0)^3}{3!} \frac{d^3 \mathbf{x}(t\_0)}{dt^3} + \cdots \tag{17}$$

$$= \mathbf{x}(t\_0) + \frac{(t - t\_0)}{1!} \dot{\mathbf{x}}(t\_0) + \frac{(t - t\_0)^2}{2!} \ddot{\mathbf{x}}(t\_0) + \frac{(t - t\_0)^3}{3!} \dddot{\mathbf{x}}(t\_0) + \cdots$$

Truncating the series after the first order term yields the approximation *x*(*t*) = *x*(*t*0) + 0 0 ( )( ) *t t xt* . Defining *tk* = *tk*-1 + *δt* leads to

$$\begin{aligned} \mathbf{x}(t\_1) &= \mathbf{x}(t\_0) + \delta\_r \dot{\mathbf{x}}(t\_0) \\\\ \mathbf{x}(t\_2) &= \mathbf{x}(t\_1) + \delta\_r \dot{\mathbf{x}}(t\_1) \\\\ &\vdots \\\\ \mathbf{x}(t\_{k+1}) &= \mathbf{x}(t\_k) + \delta\_r \dot{\mathbf{x}}(t\_k) \end{aligned} \tag{18}$$

Thus, the continuous-time linear system (15) could be approximated in discrete-time by iterating

$$
\dot{\mathbf{x}}(t\_{k+1}) = A\mathbf{x}(t\_k) + Bw(t\_k) \tag{19}
$$

and (18) provided that *δt* is chosen to be suitably small. Applications of (18) – (19) appear in [9] and in the following example.

<sup>&</sup>quot;It is important that students bring a certain ragamuffin, barefoot irreverence to their studies; they are not here to worship what is known, but to question it." *Jacob Bronowski*

Smoothing, Filtering and Prediction:

(15) (16)

(17)

(18)

6 Estimating the Past, Present and Future

where *A n n* , *B <sup>p</sup> <sup>m</sup>* , *C q n* and *D q q* , in which *w <sup>p</sup>* is an input, *x <sup>n</sup>* is a state vector and *y <sup>q</sup>* is an output. *A* is known as the state matrix and *D* is known as the direct feed-through matrix. The matrices *B* and *C* are known as the input mapping and the

Differential equations of the form (15) could be implemented directly by analog circuits. Digital or software implementations require a method for numerical integration. A firstorder numerical integration technique, known as Euler's method, is now derived. Suppose that *x*(*t*) is infinitely differentiable and consider its Taylor series expansion in the

0 0 0 0 0 0

Truncating the series after the first order term yields the approximation *x*(*t*) = *x*(*t*0) +

*dt dt dt*

0 2 3 ( ) () ( ) () ( ) () () ( ) 1! 2! 3! *t t dx t t t d x t t t d x t xt xt*

0 0 0 0 0 0 0 () () () () () () () 1! 2! 3! *tt tt tt xt xt x t x t*

> 1 0 <sup>0</sup> () ( ) ( ) *<sup>t</sup> xt xt xt*

> 2 1 <sup>1</sup> ( ) () () *<sup>t</sup> xt xt xt*

 <sup>1</sup> ( ) () () *<sup>k</sup> k tk xt xt xt*

.

Thus, the continuous-time linear system (15) could be approximated in discrete-time by

and (18) provided that *δt* is chosen to be suitably small. Applications of (18) – (19) appear in

"It is important that students bring a certain ragamuffin, barefoot irreverence to their studies; they are

<sup>1</sup> ( ) () () *<sup>k</sup> <sup>k</sup> <sup>k</sup> x t Ax t Bw t* (19)

2 2 3 3

→ *<sup>q</sup>* having a state-space realisation is written in the form

*x t Ax t Bw t* () () () ,

*y*() () () *t Cx t Dw t* ,

output mapping, respectively. This system is depicted in Fig. 1.

**1.2.10 Euler's Method for Numerical Integration** 

<sup>2</sup> <sup>3</sup>

not here to worship what is known, but to question it." *Jacob Bronowski*

0 0 ( )( ) *t t xt* . Defining *tk* = *tk*-1 + *δt* leads to

[9] and in the following example.

**1.2.9 State-Space Systems** 

A system : *<sup>p</sup>* 

neighbourhood of *t*<sup>0</sup>

iterating

*Example 2***.** In respect of the continuous-time state evolution (15), consider *A* = −1, *B* = 1 together with the deterministic input *w*(*t*) = sin(*t*) + cos(*t*). The states can be calculated from the known *w*(*t*) using (19) and the difference equation (18). In this case, the state error is given by *e*(*tk*) = sin(*tk*) – *x*(*tk*). In particular, root-mean-square-errors of 0.34, 0.031, 0.0025 and 0.00024, were observed for *δt* = 1, 0.1, 0.01 and 0.001, respectively. This demonstrates that the first order approximation (18) can be reasonable when *δt* is sufficiently small. 

#### **1.2.11 State-Space Transfer Function Matrix**

The transfer function matrix of the state-space system (15) - (16) is defined by

$$\mathbf{G}(\mathbf{s}) = \mathbf{C}(\mathbf{s}I - A)^{-1}B + D\_{\text{max}} \tag{20}$$

in which *s* again denotes the Laplace transform variable.

*Example 3.* For a state-space model with *A* = −1, *B* = *C* = 1 and *D* = 0, the transfer function is *G*(*s*) = (*s* + 1)-1.

3 2

*Example 4.* For state-space parameters 1 0 *<sup>A</sup>* , <sup>1</sup> 0 *<sup>B</sup>* , *C* 2 5 and *D* = 0, the use of Cramer's rule, that is, 1 *a b c d* <sup>1</sup> *d b ad bc c a* , yields the transfer function *G*(*s*) = (2 5) ( 1)( 2) *s s s* 1 1 ( 1) ( 2) *s s* . 

*Example 5.* Substituting 1 0 0 2 *<sup>A</sup>* and 1 0 0 1 *BC D* into (20) results in the transfer

.

$$\text{function matrix} \quad G(s) = \begin{vmatrix} s+2 \\ \hline s+1 \\ \\ 0 & \frac{s+3}{s+2} \end{vmatrix}$$

#### **1.2.12 Canonical Realisations**

The mapping of a polynomial fraction transfer function (12) to a state-space representation (20) is not unique. Two standard state-space realisations of polynomial fraction transfer functions are described below. Assume that: the transfer function has been expanded into the sum of a direct feed-though term plus a strictly proper transfer function, in which the order of the numerator polynomial is less than the order of the denominator polynomial; and the strictly proper transfer function has been normalised so that *an* = 1. Under these assumptions, the system can be realised in the controllable canonical form which is parameterised by [10]

<sup>&</sup>quot;Science is everything we understand well enough to explain to a computer. Art is everything else." *David Knuth*

*Lemma 1 (State-space representation of an adjoint system): Suppose that a continuous-time* 

*is the linear system having the realisation* 

( ) 0( ) ( ) ( )

*dx dt Ax Bw dt u Cx Dw dt*

*.*

*<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>d</sup> <sup>T</sup> T xT x dt Ax Bw dt*

*dt TxT u w*

 *is given by (23) – (24).* □

"If you thought that science was certain—well, that is just an error on your part." *Richard Phillips* 

*T*

*.* 

(21) (22)

(23) (24)

(25)

(26)

(27)

*linear time-invariant system* 

*with x(t0) = 0. The adjoint <sup>H</sup>*

*with ζ(T) = 0.* 

*with x(t0) = 0. Thus* 

*w> =* ,

*o*

<sup>0</sup> ( ) *<sup>T</sup> <sup>T</sup> u Cx Dw dt*

*Integrating the last term by parts gives* 

*<y,*   *<y,* 

*where <sup>H</sup>* 

*Feynman*

*Proof: The system (21) – (22) can be written equivalently* 

*dt*

*dt*

*dt* 

> > , , *Hy w*

 *is described by* 

*x t Ax t Bw t* () () () ,

*y*() () () *t Cx t Dw t* ,

 () () () *<sup>T</sup> <sup>T</sup> t A t C u t ,* 

() () () *<sup>T</sup> <sup>T</sup> zt B t Dut ,*

*<sup>d</sup> IA B x t <sup>t</sup>*

*<sup>d</sup> IA B <sup>x</sup>*

*<sup>u</sup> <sup>w</sup> C D*

<sup>0</sup> <sup>0</sup> () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup>*

, ( )( )

*w>* <sup>0</sup> <sup>0</sup> ( )( ) ( )

*T T*

 

*<sup>d</sup> IA C <sup>x</sup>*

*T T*

*B D*

*dt* 

*w t <sup>y</sup> <sup>t</sup> C D* 

$$A = \begin{bmatrix} -a\_{n-1} & -a\_{n-2} & \dots & -a\_1 & -a\_0 \\ 1 & 0 & & \dots & 0 \\ 0 & 1 & & & \\ \vdots & & \ddots & 0 & 0 \\ 0 & 0 & \dots & 1 & 0 \end{bmatrix}, B = \begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \\ 0 \end{bmatrix} \text{ and } C = \begin{bmatrix} b\_m & b\_{m-1} & \dots & b\_1 & b\_0 \end{bmatrix}.$$

The system can be also realised in the observable canonical form which is parameterised by

$$A = \begin{bmatrix} -a\_{n-1} & 1 & 0 & \dots & 0 \\ -a\_{n-2} & 0 & 1 & & 0 \\ \vdots & & & \ddots & 0 \\ -a\_1 & & & 0 & 1 \\ -a\_0 & 0 & \dots & 0 & 0 \end{bmatrix}, B = \begin{bmatrix} b\_m \\ b\_{m-1} \\ \vdots \\ b\_1 \\ b\_0 \end{bmatrix} \text{ and } C = \begin{bmatrix} 1 & 0 & \dots & 0 & 0 \end{bmatrix}.$$

#### **1.2.13 Asymptotic Stability**

Consider a continuous-time, linear, time-invariant *n*th-order system that operates on an input *w* and produces an output *y*. The system is said to be asymptotically stable if the output remains bounded, that is, *y* 2, for any *w* 2. This is also known as boundedinput-bounded-output stability. Two equivalent conditions for to be asymptotically stable are:


*Example 6.* A state-space system having *A* = – 1, *B* = *C* = 1 and *D* = 0 is stable, since *λ*(*A*) = – 1 is in the left-hand-plane. Equivalently, the corresponding transfer function *G*(*s*) = (*s* + 1)-1 has a pole at *s* = – 1 which is in the left-hand-plane and so the system is stable. Conversely, the transfer function *GT*(-*s*) = (1 – *s*)-1 is unstable because it has a singularity at the pole *s* = 1 which is in the right hand side of the complex plane. *GT*(-*s*) is known as the adjoint of *G*(*s*) which is discussed below.

#### **1.2.14 Adjoint Systems**

An important concept in the ensuing development of filters and smoothers is the adjoint of a system. Let : *<sup>p</sup>* → *<sup>q</sup>* be a linear system operating on the interval [0, *T*]. Then : *<sup>H</sup> <sup>q</sup>* → *<sup>p</sup>* , the adjoint of , is the unique linear system such that <y, w> = < *<sup>H</sup>* y, w>, for all y *<sup>q</sup>* and w *<sup>p</sup>* . The following derivation is a simplification of the time-varying version that appears in [11].

<sup>&</sup>quot;Science might almost be redefined as the process of substituting unimportant questions which can be answered for important questions which cannot." *Kenneth Ewart Boulding* 

*Lemma 1 (State-space representation of an adjoint system): Suppose that a continuous-time linear time-invariant system is described by* 

$$
\dot{\mathbf{x}}(t) = A\mathbf{x}(t) + Bw(t) \, \text{ } \tag{21}
$$

$$y(t) = \mathbb{C}x(t) + Dw(t) \, , \tag{2}$$

*with x(t0) = 0. The adjoint <sup>H</sup> is the linear system having the realisation* 

$$
\dot{\tilde{\zeta}}(t) = -A^T \tilde{\zeta}(t) - C^T u(t) \tag{23}
$$

$$\mathbf{z}(t) = \mathbf{B}^{\top} \boldsymbol{\zeta}(t) + \mathbf{D}^{\top} \boldsymbol{u}(t) \,, \tag{24}$$

*with ζ(T) = 0.* 

Smoothing, Filtering and Prediction:

and <sup>1</sup> 1 0 ... *C b b bb m m* .

 w> = < *<sup>H</sup>* 

is said to be asymptotically stable if the

that operates on an

to be asymptotically

: *<sup>H</sup> <sup>q</sup>* →

y, w>, for all y

8 Estimating the Past, Present and Future

00 0

The system can be also realised in the observable canonical form which is parameterised by

and *C* 1 0 ... 0 0 .

The real part of the eigenvalues of the system's state matrix are in the left-hand-

*<sup>i</sup> A* , *i* = 1 …*n*. The real part of the poles of the system's transfer function are in the left-hand-

*<sup>i</sup>* < 0, *i* = 1 …*n*.

*Example 6.* A state-space system having *A* = – 1, *B* = *C* = 1 and *D* = 0 is stable, since *λ*(*A*) = – 1 is in the left-hand-plane. Equivalently, the corresponding transfer function *G*(*s*) = (*s* + 1)-1 has a pole at *s* = – 1 which is in the left-hand-plane and so the system is stable. Conversely, the transfer function *GT*(-*s*) = (1 – *s*)-1 is unstable because it has a singularity at the pole *s* = 1 which is in the right hand side of the complex plane. *GT*(-*s*) is known as the adjoint of *G*(*s*)

An important concept in the ensuing development of filters and smoothers is the adjoint of a

→ *<sup>q</sup>* be a linear system operating on the interval [0, *T*]. Then

*<sup>q</sup>* and w *<sup>p</sup>* . The following derivation is a simplification of the time-varying version

"Science might almost be redefined as the process of substituting unimportant questions which can be

, is the unique linear system such that <y,

answered for important questions which cannot." *Kenneth Ewart Boulding* 

output remains bounded, that is, *y* 2, for any *w* 2. This is also known as bounded-

1 2 1 0 ... 1 1 0 ... 0 0 0 1 , :

*n n a a aa*

*A B*

1

input *w* and produces an output *y*. The system

**1.2.13 Asymptotic Stability** 

which is discussed below.

**1.2.14 Adjoint Systems** 

system. Let : *<sup>p</sup>* 

*<sup>p</sup>* , the adjoint of

that appears in [11].

stable are:

*A B*

0 0 ... 1 0 0

1 0 ... 0 01 0

0 ... 0 0

Consider a continuous-time, linear, time-invariant *n*th-order system

input-bounded-output stability. Two equivalent conditions for

plane, that is, for *A* of (20), Re{ ( )} 0

plane, that is, for *αi* of (13), Re{ }

2 1

 

*n m n m a b a b*

> 0 , 0 1

1 1 0 0

*a b a b*

*Proof: The system (21) – (22) can be written equivalently* 

$$
\begin{bmatrix}
\frac{d}{dt}I - A & -B\\ 
 C & D
\end{bmatrix}
\begin{bmatrix}
\varkappa(t)\\ 
\varkappa(t)
\end{bmatrix} = 
\begin{bmatrix}
0(t)\\ 
\varkappa(t)
\end{bmatrix}
\tag{25}
$$

*with x(t0) = 0. Thus* 

$$\begin{aligned} \text{} \begin{aligned} \text{} & \begin{bmatrix} \varphi \\ u \end{bmatrix}, \begin{bmatrix} \frac{d}{dt}I - A & -B \\ \mathbf{C} & D \end{bmatrix} \begin{bmatrix} x \\ w \end{bmatrix} \\ &= \int\_{0}^{\overline{r}} \left( \boldsymbol{\varphi}^{T} \frac{d\mathbf{x}}{dt} \right) dt - \int\_{0}^{\overline{r}} \boldsymbol{\zeta}^{T} (A\mathbf{x} + Bw) \ \mathrm{d}t + \int\_{0}^{\overline{r}} \boldsymbol{u}^{T} (\mathbf{C}\mathbf{x} + Dw) \ \mathrm{d}t \end{aligned} \tag{26} $$

*Integrating the last term by parts gives* 

$$
\begin{split}
\mathsf{Q}\prime\_{\prime} \; \mathcal{G} \; \mathbf{w} \rhd &= \zeta^{\top}(\mathbf{T}) \mathbf{x}(\mathbf{T}) - \int\_{0}^{\mathsf{T}} \left( \frac{d\zeta^{\top}}{dt} \right) \mathbf{x} \, dt - \int\_{0}^{\mathsf{T}} \zeta^{\top}(A\mathbf{x} + B\mathbf{w}) \, dt \; .
\tag{27}
$$

$$
\begin{split}
&\quad + \int\_{0}^{\mathsf{T}} u^{\top}(\mathbf{C}\mathbf{x} + D\mathbf{w}) \, dt \\
&= \left\langle \begin{bmatrix} -\left(\frac{d}{dt}I - A^{\top}\right) & \mathbf{C}^{\top} \\ -B^{\top} & D^{\top} \end{bmatrix} \begin{bmatrix} \zeta^{\top} \\ \mathbf{u} \end{bmatrix}, \begin{bmatrix} \mathbf{x} \\ \mathbf{w} \end{bmatrix} \right\rangle + \lambda^{\top}(\mathbf{T})\mathbf{x}(\mathbf{T}) \\
&= \mathbf{c} \mathbf{ \mathcal{G}}^{H} \mathbf{y}, \mathbf{w} >,
\end{split} \tag{27}
$$

*where <sup>H</sup> is given by (23) – (24).* □

<sup>&</sup>quot;If you thought that science was certain—well, that is just an error on your part." *Richard Phillips Feynman*

Time-reverse the input signal *u*(*t*), that is, construct *u*(*τ*), where *τ = T - t* is a time-to-

  (31) (32)

 ,

The above procedure is known as noncausal filtering or smoothing; see the discrete-time case described in [13]. Thus, a combination of causal and non-causal system components can be used to implement an otherwise unrealisable system. This approach will be exploited in

*Example 10.* Suppose that it is required to realise the unstable system 2 1 () () () *<sup>H</sup> Gs G sG s* over

The power of a voltage signal applied to a 1-ohm load is defined as the squared value of the signal and is expressed in watts. The power spectral density is expressed as power per unit

corresponding transfer function matrix *G*(*s*). Assume that *w* is a zero-mean, stationary, white

The total energy of a signal is the integral of the power of the signal over time and is expressed in watt-seconds or joules. From Parseval's theorem (9), the average total energy of *y(t)* is 2 2 <sup>2</sup> ( ) ( ) ( ) { ( ) ( )} *<sup>j</sup> <sup>T</sup>*

bandwidth, that is, W/Hz. Consider again a linear, time-invariant system *y* =

<sup>1</sup> *G s*( ) <sup>2</sup> ( ) *<sup>T</sup> G s* <sup>2</sup> ( ) *<sup>T</sup>*

 ( ) 

() () *<sup>H</sup>*

<sup>2</sup> *Gs s* ( ) ( 2) . This system can be realised

Timereverse transpose

, in which *δ* denotes the Dirac delta function.

*yy s GQG s* , (33)

*s ds y t dt y t E y t y t* , (34)

*w* and its

<sup>1</sup> *Gs s* ( ) ( 1) and <sup>1</sup>

*W*(s) *Y*1(s) *Y*2(s)

<sup>1</sup> ( ) *Y s <sup>T</sup> Y s*

*T*

Time-reverse the output signal *z*(*τ*), that is, construct *z*(*t*).

 

the realisation of smoothers within subsequent sections.

Timereverse transpose

() () () *<sup>T</sup> <sup>T</sup>*

,

 *A Cu* 

() () () *<sup>T</sup> <sup>T</sup> z B Du*

 

go variable (see [12]). Realise the stable system

an interval [0, *T*], where <sup>1</sup>

Figure 2. Realising an unstable 2 1 () () () *<sup>H</sup> Gs G sG s* .

<sup>1</sup> *<sup>y</sup>* ( )*<sup>z</sup>*

*<sup>T</sup> Ewtw* = *Q t*

which is equal to the area under the power spectral density curve.

"Time is what prevents everything from happening at once." *John Archibald Wheeler*

Then ( ) *yy s* , the power spectral density of *y*, is given by

using the processes shown in Fig. 2.

**1.2.17 Power Spectral Density** 

noise process with { ( ) ( )}

which has the property ( ) *yy s* = *yy* ( ) *s* .

*yy <sup>j</sup>*

with () 0 *T* .

Thus, the adjoint of a system having the parameters *A B C D* is a system with *T T T T A C B D* .

Adjoint systems have the property ( ) *H H* . The adjoint of the transfer function matrix *G*(*s*) is denoted as *GH*(*s*) and is defined by the transfer function matrix

$$G^{\mu}(\mathbf{s}) \equiv G^{r}(\mathbf{s}).\tag{28}$$

*Example 7.* Suppose that a system has state-space parameters *A* = −1 and *B* = *C* = *D* = 1. From (23) – (24), an adjoint system has the state-space parameters *A* = 1, *B* = *D* = 1 and *C* = −1 and the corresponding transfer function is *GH*(*s*) = 1 – (*s* – 1)-1 = (- *s* + 2)(- *s* + 1)-1 = (*s* - 2)(*s* - 1)-1 , which is unstable and non-minimum-phase. Alternatively, the adjoint of *G*(*s*) = 1 + (*s* + 1)-1 = (*s* + 2)(*s* + 1)-1 can be obtained using (28), namely *GH*(*s*) = *GT*(-*s*) = (- *s* + 2)(- *s* + 1)-1.

#### **1.2.15 Causal and Noncausal Systems**

A causal system is a system that depends exclusively on past and current inputs.

*Example 8.* The differential of *x*(*t*) with respect to *t* is defined by <sup>0</sup> ( ) () ( ) lim*dt x t dt x t dt x t .* 

Consider

$$
\dot{\mathbf{x}}(t) = A\mathbf{x}(t) + B\mathbf{w}(t) \tag{29}
$$

with Re{ ( )} 0 *<sup>i</sup> A* , *i* = 1, …, *n*. The positive sign of *x t* ( ) within (29) denotes a system that proceeds forward in time. This is called a causal system because it depends only on past and current inputs.

*Example 9.* The negative differential of *ξ*(*t*) with respect to *t* is defined by 0 () ( ) ( ) lim*dt t t dt dt t .* Consider

$$-\dot{\tilde{\zeta}}(t) = A^T \tilde{\zeta}(t) + \mathbf{C}^T u(t) \tag{30}$$

with Re{ ( )} Re{ ( )} 0 *<sup>T</sup> <sup>i</sup> A A <sup>i</sup>* , *i* = 1 …*n*. The negative sign of ( )*t* within (30) denotes a system that proceeds backwards in time. Since this system depends on future inputs, it is termed noncausal. Note that Re{ ( )} 0 *<sup>i</sup> A* implies Re{ ( )} 0 *<sup>i</sup> A* . Hence, if causal system (21) – (22) is stable, then its adjoint (23) – (24) is unstable.

#### **1.2.16 Realising Unstable System Components**

Unstable systems are termed unrealisable because their outputs are not in 2 that is, they are unbounded. In other words, they cannot be implemented as forward-going systems. It follows from the above discussion that an unstable system component can be realised as a stable noncausal or backwards system.

Suppose that the time domain system is stable. The adjoint system *<sup>H</sup> z u* can be realised by the following three-step procedure.

<sup>&</sup>quot;We haven't the money, so we've got to think." *Baron Ernest Rutherford*


$$
\dot{\tilde{\zeta}}(\tau) = A^\top \tilde{\zeta}(\tau) + C^\top u(\tau) \,. \tag{31}
$$

$$z(\boldsymbol{\pi}) = \boldsymbol{B}^{\top} \boldsymbol{\zeta}(\boldsymbol{\pi}) + \boldsymbol{D}^{\top} \boldsymbol{u}(\boldsymbol{\pi}) \,, \tag{32}$$

with () 0 *T* .

Smoothing, Filtering and Prediction:

*T T T T A C B D* 

( ) () ( ) lim*dt x t dt x t dt*

( )*t* within (30) denotes a

can be

*<sup>i</sup> A* . Hence, if causal system

*.* 

*x t*

.

is a system with

. The adjoint of the transfer function

10 Estimating the Past, Present and Future

From (23) – (24), an adjoint system has the state-space parameters *A* = 1, *B* = *D* = 1 and *C* = −1 and the corresponding transfer function is *GH*(*s*) = 1 – (*s* – 1)-1 = (- *s* + 2)(- *s* + 1)-1 = (*s* - 2)(*s* - 1)-1 , which is unstable and non-minimum-phase. Alternatively, the adjoint of *G*(*s*) = 1 + (*s* + 1)-1 = (*s* + 2)(*s* + 1)-1 can be obtained using (28), namely *GH*(*s*) = *GT*(-*s*) = (- *s* + 2)(- *s* + 1)-1.

A causal system is a system that depends exclusively on past and current inputs.

*Example 8.* The differential of *x*(*t*) with respect to *t* is defined by <sup>0</sup>

matrix *G*(*s*) is denoted as *GH*(*s*) and is defined by the transfer function matrix

*C D* 

*GH*(*s*) *GT*(-*s*). (28)

*x t Ax t Bw t* () () () (29)

*t A t C ut* (30)

is stable. The adjoint system *<sup>H</sup> z u*

has state-space parameters *A* = −1 and *B* = *C* = *D* = 1.

*<sup>i</sup> A* , *i* = 1, …, *n*. The positive sign of *x t* ( ) within (29) denotes a system that

proceeds forward in time. This is called a causal system because it depends only on past and

*Example 9.* The negative differential of *ξ*(*t*) with respect to *t* is defined by

system that proceeds backwards in time. Since this system depends on future inputs, it is

Unstable systems are termed unrealisable because their outputs are not in 2 that is, they are unbounded. In other words, they cannot be implemented as forward-going systems. It follows from the above discussion that an unstable system component can be realised as a

*<sup>i</sup> A* implies Re{ ( )} 0

() () () *<sup>T</sup> <sup>T</sup>*

 

*<sup>i</sup> A A <sup>i</sup>* , *i* = 1 …*n*. The negative sign of

(21) – (22) is stable, then its adjoint (23) – (24) is unstable.

**1.2.16 Realising Unstable System Components** 

Thus, the adjoint of a system having the parameters *A B*

Adjoint systems have the property ( ) *H H*

*Example 7.* Suppose that a system

**1.2.15 Causal and Noncausal Systems** 

Consider

with Re{ ( )} 0 

> 0 () ( ) ( ) lim*dt*

*t t dt dt*

*.* Consider

 

with Re{ ( )} Re{ ( )} 0 *<sup>T</sup>*

 

termed noncausal. Note that Re{ ( )} 0

stable noncausal or backwards system. Suppose that the time domain system

realised by the following three-step procedure.

"We haven't the money, so we've got to think." *Baron Ernest Rutherford*

current inputs.

*t*

Time-reverse the output signal *z*(*τ*), that is, construct *z*(*t*).

The above procedure is known as noncausal filtering or smoothing; see the discrete-time case described in [13]. Thus, a combination of causal and non-causal system components can be used to implement an otherwise unrealisable system. This approach will be exploited in the realisation of smoothers within subsequent sections.

*Example 10.* Suppose that it is required to realise the unstable system 2 1 () () () *<sup>H</sup> Gs G sG s* over an interval [0, *T*], where <sup>1</sup> <sup>1</sup> *Gs s* ( ) ( 1) and <sup>1</sup> <sup>2</sup> *Gs s* ( ) ( 2) . This system can be realised using the processes shown in Fig. 2.

Figure 2. Realising an unstable 2 1 () () () *<sup>H</sup> Gs G sG s* .

#### **1.2.17 Power Spectral Density**

The power of a voltage signal applied to a 1-ohm load is defined as the squared value of the signal and is expressed in watts. The power spectral density is expressed as power per unit bandwidth, that is, W/Hz. Consider again a linear, time-invariant system *y* = *w* and its corresponding transfer function matrix *G*(*s*). Assume that *w* is a zero-mean, stationary, white noise process with { ( ) ( )} *<sup>T</sup> Ewtw* = *Q t* ( ) , in which *δ* denotes the Dirac delta function. Then ( ) *yy s* , the power spectral density of *y*, is given by

$$\Phi\_{yy}(\mathbf{s}) = \mathbf{G} \mathbf{Q} \mathbf{G}^H(\mathbf{s}) \, , \tag{33}$$

which has the property ( ) *yy s* = *yy* ( ) *s* .

The total energy of a signal is the integral of the power of the signal over time and is expressed in watt-seconds or joules. From Parseval's theorem (9), the average total energy of *y(t)* is

$$\int\_{-j\alpha}^{j\alpha} \Phi\_{yy}(s)ds = \int\_{-\alpha}^{\alpha} \left| y(t) \right|^2 dt = \left\| y(t) \right\|\_{2}^{2} = E\{y^T(t)y(t)\} \tag{34}$$

which is equal to the area under the power spectral density curve.

<sup>&</sup>quot;Time is what prevents everything from happening at once." *John Archibald Wheeler*

Figure 3. The s-domain general filtering problem.

*G1(s)* 

2 1 ( ) () () () () ( ) 

*G H(s) 2(s)* Σ

*Y2(s)*

*Y1(z)*

*V(s)*

*+* 

*Z(s)*

The error power spectrum density matrix is denoted by ( ) *ee s* and given by the covariance

is the spectral density matrix of the measurements. The quantity ( )*s* is a spectral factor, which is unique up to the product of an inner matrix. Denote <sup>1</sup> () ( ) () *<sup>H</sup> <sup>H</sup> s s* . Completing

> 1 1 1 2 2 1 () () ( ) () *<sup>H</sup> H H <sup>H</sup> ee s G QG s G QG G QG s*

+ 1 2 1 2 ( () ( ))( ( ) ( )) *H H HH H H s G QG s H s G QG s* .

1 1 1 2 2 1 ( ) () ( ) () *<sup>j</sup> <sup>j</sup> <sup>H</sup> H H <sup>H</sup> ee <sup>j</sup> <sup>j</sup> s ds G QG s G QG G QG s ds*

1 2 1 2 ( () ( ))( ( ) ( )) *<sup>j</sup> H H HH H*

"Science is what you know. Philosophy is what you don't know." *Earl Bertrand Arthur William Russell*

It follows that the total energy of the error signal is given by

<sup>1</sup>

*E s H s HG s G s*

<sup>0</sup> ( ) () () () <sup>0</sup> () () 

*<sup>R</sup> H s H s HG s G s*

1 1 1 2 2 1 ( ) () () ( ) *<sup>H</sup> H H <sup>H</sup> H H G QG s G QG H s HG QG s H H z* ,

*V s*

*+ \_* 

2 1

1

*<sup>j</sup> H s G QG s H s G QG s ds* .

2 2 () () *<sup>H</sup> <sup>H</sup> s G QG s R* (41)

*Q GH s G s*

*H H H H*

*W s* . (39)

Σ

1 ˆ *Y s*( )

*E(s)*

*+* 

(40)

(42)

(43)

It follows from Fig. 3 that *E*(*s*) is generated by

*W(s)*

of *E(s)*, that is,

where

 () () () *<sup>H</sup> ee s EsE s*

2 1

the square within (40) yields

#### **1.2.18 Spectral Factorisation**

Suppose that noisy measurements

$$z(t) = y(t) + v(t)\tag{35}$$

of a linear, time-invariant system , described by (21) - (22), are available, where *v*(*t*) *<sup>q</sup>* is an independent, zero-mean, stationary white noise process with { ( ) ( )} *<sup>T</sup> Evtv* = *R t* ( ) . Let

$$\Phi\_{zz}(\mathbf{s}) = \mathbf{G} \mathbf{Q} \mathbf{G}^H(\mathbf{s}) + \mathbf{R} \tag{36}$$

denote the spectral density matrix of the measurements *z*(*t*). Spectral factorisation was pioneered by Wiener (see [4] and [5]). It refers to the problem of decomposing a spectral density matrix into a product of a stable, minimum-phase matrix transfer function and its adjoint. In the case of the output power spectral density (36), a spectral factor ( )*s* satisfies () () *<sup>H</sup> s s* = ( ) *zz s* .

The problem of spectral factorisation within continuous-time Wiener filtering problems is studied in [14]. The roots of the transfer function polynomials need to be sorted into those within the left-hand-plane and the right-hand plane. This is an eigenvalue decomposition problem – see the survey of spectral factorisation methods detailed in [11].

*Example 11.* In respect of the observation spectral density (36), suppose that *G*(*s*) = (*s* + 1)-1 and *Q* = *R* = 1, which results in ( ) *zz s* = (- *s*2 + 2)(- *s*2 + 1)-1. By inspection, the spectral factor ( )*s* = <sup>1</sup> ( 2 )( 1) *s s* is stable, minimum-phase and satisfies () () *<sup>H</sup> s s* = ( ) *zz s* .

#### **1.3 Minimum-Mean-Square-Error Filtering**

#### **1.3.1 Filter Derivation**

Now that some underlying frequency-domain concepts have been introduced, the Wiener filter [4] – [6] can be described. A Wiener-Hopf derivation of the Wiener filter appears in [4], [6]. This section describes a simpler completing-the-square approach (see [14], [16]). Consider a stable linear time-invariant system having a transfer function matrix *G*2(*s*) = *C*2(*sI* – *A*)-1 *B* + *D*2. Let *Y*2(*s*), *W*(*s*), *V*(*s*) and *Z*(*s*) denote the Laplace transforms of the system's output, measurement noise, process noise and observations, respectively, so that

$$Z(\mathbf{s}) = Y\_{\mathbf{j}}(\mathbf{s}) + V(\mathbf{s}) \,. \tag{37}$$

Consider also a fictitious reference system having the transfer function *G*1(*s*) = *C*1(*s*I – *A*)-1*B* + *D*1 as shown in Fig. 3. The problem is to design a filter transfer function *H*(*s*) to calculate estimates <sup>1</sup> ˆ *Y s*( ) = *H*(*s*)*Z*(s) of *Y*1(*s*) so that the energy () () *<sup>j</sup> <sup>H</sup> j E s E s ds* of the estimation error

$$E(\mathbf{s}) = Y\_1(\mathbf{s}) - \hat{Y}\_1(\mathbf{s}) \tag{38}$$

is minimised.

<sup>&</sup>quot;Science may be described as the art of systematic over-simplification." *Karl Raimund Popper*

Figure 3. The s-domain general filtering problem.

It follows from Fig. 3 that *E*(*s*) is generated by

$$E(\mathbf{s}) = -\begin{bmatrix} H(\mathbf{s}) & H\mathbf{G}\_2(\mathbf{s}) - \mathbf{G}\_1(\mathbf{s}) \end{bmatrix} \begin{bmatrix} V(\mathbf{s}) \\ W(\mathbf{s}) \end{bmatrix}. \tag{39}$$

The error power spectrum density matrix is denoted by ( ) *ee s* and given by the covariance of *E(s)*, that is,

$$\begin{aligned} \boldsymbol{\Phi}\_{\alpha}(\mathbf{s}) &= \mathbf{E}(\mathbf{s}) \mathbf{E}^{H}(\mathbf{s}) \\ &= \begin{bmatrix} H(\mathbf{s}) & H \mathbf{G}\_{2}(\mathbf{s}) - \mathbf{G}\_{1}(\mathbf{s}) \begin{bmatrix} \mathbf{R} & \mathbf{0} \\ \mathbf{0} & \mathbf{Q} \end{bmatrix} \begin{bmatrix} \boldsymbol{H}^{H}(\mathbf{s}) \\ \mathbf{G}\_{2}^{H} \boldsymbol{H}^{H}(\mathbf{s}) - \mathbf{G}\_{1}^{H}(\mathbf{s}) \end{bmatrix} \\ &= \mathbf{G}\_{1} \mathbf{Q} \mathbf{G}\_{1}^{H}(\mathbf{s}) - \mathbf{G}\_{1} \mathbf{Q} \mathbf{G}\_{2}^{H} \boldsymbol{H}^{H}(\mathbf{s}) - \mathbf{H} \mathbf{G}\_{2} \mathbf{Q} \mathbf{G}\_{1}^{H}(\mathbf{s}) + \mathbf{H} \boldsymbol{\Delta} \boldsymbol{\Delta}^{H} \boldsymbol{H}^{H}(\mathbf{z}) \end{aligned} \tag{40}$$

where

Smoothing, Filtering and Prediction:

*<sup>T</sup> Evtv* = *R t*

 ( ) .

*zt yt vt* () () () (35)

, described by (21) - (22), are available, where *v*(*t*) *<sup>q</sup>*

*zz s GQG s R* (36)

<sup>2</sup> *Zs Y s Vs* () () () . (37)

*E s E s ds* of the estimation error

*Y s*( ) (38)

12 Estimating the Past, Present and Future

denote the spectral density matrix of the measurements *z*(*t*). Spectral factorisation was pioneered by Wiener (see [4] and [5]). It refers to the problem of decomposing a spectral density matrix into a product of a stable, minimum-phase matrix transfer function and its adjoint. In the case of the output power spectral density (36), a spectral factor ( )*s* satisfies

The problem of spectral factorisation within continuous-time Wiener filtering problems is studied in [14]. The roots of the transfer function polynomials need to be sorted into those within the left-hand-plane and the right-hand plane. This is an eigenvalue decomposition

*Example 11.* In respect of the observation spectral density (36), suppose that *G*(*s*) = (*s* + 1)-1 and *Q* = *R* = 1, which results in ( ) *zz s* = (- *s*2 + 2)(- *s*2 + 1)-1. By inspection, the spectral factor

Now that some underlying frequency-domain concepts have been introduced, the Wiener filter [4] – [6] can be described. A Wiener-Hopf derivation of the Wiener filter appears in [4], [6]. This section describes a simpler completing-the-square approach (see [14], [16]). Consider a stable linear time-invariant system having a transfer function matrix *G*2(*s*) = *C*2(*sI* – *A*)-1 *B* + *D*2. Let *Y*2(*s*), *W*(*s*), *V*(*s*) and *Z*(*s*) denote the Laplace transforms of the system's

Consider also a fictitious reference system having the transfer function *G*1(*s*) = *C*1(*s*I – *A*)-1*B* + *D*1 as shown in Fig. 3. The problem is to design a filter transfer function *H*(*s*) to calculate

ˆ

 *<sup>j</sup> <sup>H</sup> j*

( )*s* = <sup>1</sup> ( 2 )( 1) *s s* is stable, minimum-phase and satisfies () () *<sup>H</sup> s s* = ( ) *zz s* .

output, measurement noise, process noise and observations, respectively, so that

*Y s*( ) = *H*(*s*)*Z*(s) of *Y*1(*s*) so that the energy () ()

*E*(*s*) = *Y*1(*s*) – <sup>1</sup>

"Science may be described as the art of systematic over-simplification." *Karl Raimund Popper*

is an independent, zero-mean, stationary white noise process with { ( ) ( )}

() () *<sup>H</sup>*

problem – see the survey of spectral factorisation methods detailed in [11].

**1.2.18 Spectral Factorisation**  Suppose that noisy measurements

of a linear, time-invariant system

**1.3 Minimum-Mean-Square-Error Filtering** 

() () *<sup>H</sup> s s* = ( ) *zz s* .

**1.3.1 Filter Derivation** 

estimates <sup>1</sup>

is minimised.

ˆ

Let

$$
\Delta\boldsymbol{\Delta}^{H}(\mathbf{s}) = \mathbf{G}\_{2}\mathbf{Q}\mathbf{G}\_{2}^{H}(\mathbf{s}) + \boldsymbol{R} \tag{41}
$$

is the spectral density matrix of the measurements. The quantity ( )*s* is a spectral factor, which is unique up to the product of an inner matrix. Denote <sup>1</sup> () ( ) () *<sup>H</sup> <sup>H</sup> s s* . Completing the square within (40) yields

$$\begin{aligned} \boldsymbol{\Phi}\_{\boldsymbol{\alpha}}(\boldsymbol{s}) &= \mathbf{G}\_{\mathrm{l}} \mathbf{Q} \mathbf{G}\_{\mathrm{l}}^{H}(\boldsymbol{s}) - \mathbf{G}\_{\mathrm{l}} \mathbf{Q} \mathbf{G}\_{\mathrm{2}}^{H}(\boldsymbol{\Delta} \boldsymbol{\Lambda}^{H})^{-1} \mathbf{G}\_{\mathrm{2}} \mathbf{Q} \mathbf{G}\_{\mathrm{1}}^{H}(\boldsymbol{s}) \\ &+ (\boldsymbol{H} \boldsymbol{\Delta}(\boldsymbol{s}) - \mathbf{G}\_{\mathrm{l}} \mathbf{Q} \mathbf{G}\_{\mathrm{2}}^{H} \boldsymbol{\Delta}^{-H}(\boldsymbol{s})) (\boldsymbol{H} \boldsymbol{\Delta}(\boldsymbol{s}) - \mathbf{G}\_{\mathrm{l}} \mathbf{Q} \mathbf{G}\_{\mathrm{2}}^{H} \boldsymbol{\Delta}^{-H}(\boldsymbol{s}))^{H} . \end{aligned} \tag{42}$$

It follows that the total energy of the error signal is given by

$$\begin{split} \int\_{-|\alpha|}^{|\alpha|} \boldsymbol{\Phi}\_{\alpha}(\mathbf{s}) d\mathbf{s} &= \int\_{-|\alpha|}^{|\alpha|} \mathbf{G}\_{\mathbf{i}} \mathbf{Q} \mathbf{G}\_{\mathbf{i}}^{H}(\mathbf{s}) - \mathbf{G}\_{\mathbf{i}} \mathbf{Q} \mathbf{G}\_{\mathbf{i}}^{H}(\boldsymbol{\Delta}\boldsymbol{\Lambda}^{H})^{-1} \mathbf{G}\_{\mathbf{i}} \mathbf{Q} \mathbf{G}\_{\mathbf{i}}^{H}(\mathbf{s}) d\mathbf{s} \\ &+ \int\_{-|\alpha|}^{|\alpha|} (\boldsymbol{H}\boldsymbol{\Delta}(\mathbf{s}) - \mathbf{G}\_{\mathbf{i}} \mathbf{Q} \mathbf{G}\_{\mathbf{i}}^{H}\boldsymbol{\Delta}^{-H}(\mathbf{s})) (\boldsymbol{H}\boldsymbol{\Delta}(\mathbf{s}) - \mathbf{G}\_{\mathbf{i}} \mathbf{Q} \mathbf{G}\_{\mathbf{i}}^{H}\boldsymbol{\Delta}^{-H}(\mathbf{s}))^{H} d\mathbf{s} . \end{split} \tag{43}$$

<sup>&</sup>quot;Science is what you know. Philosophy is what you don't know." *Earl Bertrand Arthur William Russell*

2 2 2 2 ( ) ( ) *s* 

Figure 4. The s-domain output estimation problem.

*W(s)* 

<sup>1</sup>

<sup>1</sup>

The optimal causal solution for output estimation is

1/ 2 1 *IR s*( ) .

"Science is the topography of ignorance." *Oliver Wendell Holmes*

short circuit, that is,

<sup>1</sup> ( )( ) ( ) *<sup>H</sup> <sup>H</sup> R s*

**1.3.3 Minimum-Mean-Square-Error Output Estimation** 

that *G*(*s*) = {*G*(*s*)}+ + {*G*(*s*)}-.

*G*1(*s*) = *G*2(*s*) can be expressed as

12 2 0.5 ( )

Thus, the causal part of *G*(*s*) is {*G*(*s*)}+ = 1 – 12 2 <sup>1</sup> 0.5 ( )( )

*G*(*s*) is denoted as {*G*(*s*)}- and is given by {*G*(*s*)}- = 12 2 <sup>1</sup> 0.5 ( )( )

 

( ) *s*

*Y2(s)* 

*V(s)* 

In output estimation, the reference system is the same as the generating system, as depicted in Fig. 4. The simplification of the optimal noncausal solution (44) of Theorem 2 for the case

*G2(s)* Σ *HOE(s)* 

2 2 ( ) ( ) *H H H s G QG s OE* 

2 2 ( ) () *H H G QG s*

( ) *<sup>H</sup> IR s* .

 <sup>1</sup> 2 2 ( ) ( ) *H H H s G QG s OE*

 <sup>1</sup> ( ) *<sup>H</sup> IR s* 

( ) 0, 0

lim

When the measurement noise becomes negligibly small, the output estimator approaches a

*Hs I OE R s* . (48)

1

*+* 

12 2 0.5 ( )

*+ \_* 

  

( ) *s*

 .

 

*Y s*( ) *Z(s)* 

*s* . The noncausal part of

Σ

2 ˆ

*E(s)* 

*+* 

(46)

(47)

 *s* . It is easily verified

The first term on the right-hand-side of (43) is independent of *H*(*s*) and represents a lower bound of ( ) *<sup>j</sup> ee <sup>j</sup> s ds* . The second term on the right-hand-side of (43) may be minimised by a judicious choice for *H*(*s*).

*Theorem 2: The above linear time-invariant filtering problem with by the measurements (37) and estimation error (38) has the solution* 

$$H(\mathbf{s}) = \mathbf{G}\_1 \mathbf{Q} \mathbf{G}\_2^H \boldsymbol{\Delta}^{-H} \boldsymbol{\Delta}^{-1}(\mathbf{s}) \,. \tag{44}$$

which minimises ( ) *<sup>j</sup> ee <sup>j</sup> s ds* .

*Proof: The result follows by setting* 1 2 ( ) ( ) *H H H s G QG s = 0 within (43).* □

By Parseval's theorem, the minimum mean-square-error solution (44) also minimises <sup>2</sup> <sup>2</sup> *e t*( ) .

The solution (44) is unstable because the factor <sup>1</sup> <sup>2</sup> ( ) () *H H G s* possesses right-hand-plane poles. This optimal noncausal solution is actually a smoother, which can be realised by a combination of forward and backward processes. Wiener called (44) the optimal unrealisable solution because it cannot be realised by a memory-less network of capacitors, inductors and resistors [4].

The transfer function matrix of a realisable filter is given by

$$H(\mathbf{s}) = \left\langle \mathbf{G}\_1 \mathbf{Q} \mathbf{G}\_2^H (\boldsymbol{\Delta}^H)^{-1} \right\rangle\_\* \boldsymbol{\Delta}^{-1}(\mathbf{s}) \; \tag{45}$$

in which { }+ denotes the causal part. A procedure for finding the causal part of a transfer function is described below.

#### **1.3.2 Finding the Causal Part of a Transfer Function**

The causal part of transfer function can be found by carrying out the following three steps.


Incidentally, the noncausal part is what remains, namely the sum of the unstable partial fractions.

*Example 12.* Consider *G*(*s*) 2 2 2 21 ( )( ) *s s* with *α*, *β* < 0. Since *G*2(*s*) possesses equal order numerator and denominator polynomials, synthetic division is required, which yields *<sup>G</sup>*2(*s*) 1 + 2 2 2 21 ( )( ) *s* . A partial fraction expansion results in

<sup>&</sup>quot;There is an astonishing imagination, even in the science of mathematics." *Francois-Marie Arouet de Voltaire*

Smoothing, Filtering and Prediction:

<sup>2</sup> ( ) () *H H G s* possesses right-hand-plane

<sup>2</sup> *e t*( ) .

14 Estimating the Past, Present and Future

The first term on the right-hand-side of (43) is independent of *H*(*s*) and represents a lower

*Theorem 2: The above linear time-invariant filtering problem with by the measurements (37) and* 

*Proof: The result follows by setting* 1 2 ( ) ( ) *H H H s G QG s = 0 within (43).* □

By Parseval's theorem, the minimum mean-square-error solution (44) also minimises <sup>2</sup>

poles. This optimal noncausal solution is actually a smoother, which can be realised by a combination of forward and backward processes. Wiener called (44) the optimal unrealisable solution because it cannot be realised by a memory-less network of capacitors,

1 1

in which { }+ denotes the causal part. A procedure for finding the causal part of a transfer

The causal part of transfer function can be found by carrying out the following three steps. If the transfer function is not strictly proper, that is, if the order of the numerator is not less than the degree of the denominator, then perform synthetic division to

Expand out the (strictly proper) transfer function into the sum of stable and

 The causal part is the sum of the constant term and the stable partial fractions. Incidentally, the noncausal part is what remains, namely the sum of the unstable partial

> 

*s* . A partial fraction expansion results in

order numerator and denominator polynomials, synthetic division is required, which yields

"There is an astonishing imagination, even in the science of mathematics." *Francois-Marie Arouet de* 

1 2 () ( ) () *H H H s G QG s*

1

1 2 ( ) ( ) *H H H s G QG s .* (44)

, (45)

with *α*, *β* < 0. Since *G*2(*s*) possesses equal

. The second term on the right-hand-side of (43) may be minimised by

bound of ( ) *<sup>j</sup>*

*ee <sup>j</sup> s ds*

*estimation error (38) has the solution* 

*ee <sup>j</sup> s ds* .

The solution (44) is unstable because the factor <sup>1</sup>

The transfer function matrix of a realisable filter is given by

**1.3.2 Finding the Causal Part of a Transfer Function** 

isolate the constant term.

unstable partial fractions.

*Example 12.* Consider *G*(*s*) 2 2 2 21 ( )( ) *s s*

 

a judicious choice for *H*(*s*).

which minimises ( ) *<sup>j</sup>*

inductors and resistors [4].

function is described below.

*<sup>G</sup>*2(*s*) 1 + 2 2 2 21 ( )( ) 

fractions.

*Voltaire*

$$\frac{(\alpha^{2}-\beta^{2})}{(s^{2}-a^{2})} = \frac{0.5a^{-1}(\alpha^{2}-\beta^{2})}{(s+a)} - \frac{0.5a^{-1}(\alpha^{2}-\beta^{2})}{(s-a)}.$$

Thus, the causal part of *G*(*s*) is {*G*(*s*)}+ = 1 – 12 2 <sup>1</sup> 0.5 ( )( ) *s* . The noncausal part of *G*(*s*) is denoted as {*G*(*s*)}- and is given by {*G*(*s*)}- = 12 2 <sup>1</sup> 0.5 ( )( ) *s* . It is easily verified that *G*(*s*) = {*G*(*s*)}+ + {*G*(*s*)}-.

Figure 4. The s-domain output estimation problem.

#### **1.3.3 Minimum-Mean-Square-Error Output Estimation**

In output estimation, the reference system is the same as the generating system, as depicted in Fig. 4. The simplification of the optimal noncausal solution (44) of Theorem 2 for the case *G*1(*s*) = *G*2(*s*) can be expressed as

$$\begin{aligned} H\_{\rm Ot}(\mathbf{s}) &= \mathbf{G}\_2 \mathbf{Q} \mathbf{G}\_2^H \boldsymbol{\Delta}^{-H} \boldsymbol{\Delta}^{-1}(\mathbf{s}) \\ &= \mathbf{G}\_2 \mathbf{Q} \mathbf{G}\_2^H (\boldsymbol{\Delta} \boldsymbol{\Delta}^H)^{-1}(\mathbf{s}) \\ &= (\boldsymbol{\Delta} \boldsymbol{\Delta}^H - \boldsymbol{R})(\boldsymbol{\Delta} \boldsymbol{\Delta}^H)^{-1}(\mathbf{s}) \\ &= \boldsymbol{I} - \boldsymbol{R} \boldsymbol{\Delta}^{-H} \boldsymbol{\Delta}^{-1}(\mathbf{s}) \end{aligned} \tag{46}$$

The optimal causal solution for output estimation is

$$H\_{\rm OE}(\mathbf{s}) = \left\{ G\_2 \mathbf{Q} G\_2^H \boldsymbol{\Delta}^{-H} \right\}\_+ \boldsymbol{\Delta}^{-1}(\mathbf{s})$$

$$= I - R \left\{ \boldsymbol{\Delta}^{-H} \right\}\_+ \boldsymbol{\Delta}^{-1}(\mathbf{s}) \tag{47}$$

$$= I - R^{1/2} \boldsymbol{\Delta}^{-1}(\mathbf{s}) \ . $$

When the measurement noise becomes negligibly small, the output estimator approaches a short circuit, that is,

$$\lim\_{\substack{I \to 0, s \to 0 \\ \|I\| = 1}} \left| H\_{\text{OE}}(s) \right| = I \,. \tag{48}$$

<sup>&</sup>quot;Science is the topography of ignorance." *Oliver Wendell Holmes*

and

Substituting

lim <sup>99</sup> ( ) <sup>0</sup> <sup>100</sup> *H s*

of the system model, that is,

by ignoring the data.

( ) *H s OE* = <sup>1</sup>

line of Fig. 5 (b)) estimates the system output (the dotted line of Fig. 5(b)).

**1.3.4 Minimum-Mean-Square-Error Input Estimation** 

2 2 { } () *H H G QG s* =

*<sup>s</sup>* , which illustrates the low measurement noise asymptote (48). Some sample trajectories from a simulation conducted with δ*t* = 0.001 s are shown in Fig. 5. The input measurements are shown in Fig. 5(a). It can be seen that the filtered signal (the solid

In input estimation problems, it is desired to estimate the input process *w*(*t*), as depicted in Fig. 6. This is commonly known as an equalisation problem, in which it is desired to mitigate the distortion introduced by a communication channel *G*2(*s*). The simplification of

1

Equation (49) is known as the optimum minimum-mean-square-error noncausal equaliser [12]. Assume that: *G*2(*s*) is proper, that is, the order of the numerator is the same as the order of the denominator, and the zeros of *G*2(*s*) are in the left-hand-plane. Under these conditions, when the measurement noise becomes negligibly small, the equaliser estimates the inverse

> 1 2

The observation (50) can be verified by substituting ( ) *<sup>H</sup> s* 2 2 ( ) *<sup>H</sup> G QG s* into (49). In other words, if the channel model is invertible and signal to noise ratio is sufficiently high, the equaliser will estimate *w*(*t*). When measurement noise is present the equaliser no longer approximates the channel inverse because some filtering is also required. In the limit, when the signal to noise ratio is sufficiently low, the equaliser approaches an open circuit, namely,

() () <sup>0</sup> *Hs Gs IE <sup>R</sup>*

() 0 0, 0

The observation (51) can be verified by substituting *Q* = 0 into (49). Thus, when the equalisation problem is dominated by measurement noise, the estimation error is minimised

"All of the biggest technological inventions created by man - the airplane, the automobile, the computer - says little about his intelligence, but speaks volumes about his laziness." *Mark Raymond Kennedy* 

the general noncausal solution (44) of Theorem 2 for the case of *G*2(*s*) = *I* results in

<sup>2</sup> ( ) ( ) *H H H s QG s IE*

lim

lim

2 2

 

1 , *Q* = 1 and *R* = 0.0001 yields *H*(*s*) = <sup>1</sup> 99( 100) *s* . By inspection,

*s QR*

. (49)

. (50)

*H s IE Q s* . (51)

 .

) ) *Q R*

The observation (48) can be verified by substituting ( ) *<sup>H</sup> <sup>s</sup>* 2 2 ( ) *<sup>H</sup> G QG s* into (46). This observation is consistent with intuition, that is, when the measurements are perfect, filtering will be superfluous.

Figure 5. Sample trajectories for Example 13: (a) measurement, (b) system output (dotted line) and filtered signal (solid line).

*Example 13.* Consider a scalar output estimation problem, where *G*2(*s*) = <sup>1</sup> ( ) *s* , = - 1, *Q* = 1 and *R* = 0.0001. Then 2 2 ( ) *<sup>H</sup> G QG s* 2 21 *Q s* ( ) and ( ) *<sup>H</sup> s* = – (*Rs*2 + *Rα*2 + *Q*) 2 21 ( ) *s* , which leads to ( )*s* 1/ 2 *R s*( + <sup>2</sup> <sup>1</sup> *QR s* )( ) . Therefore, 1 2 2 ( ) () *H H G QG s* = 1/ 2 <sup>2</sup> ( ) ( )( ) ( ) *Q s s s R s QR* <sup>=</sup>1/ 2 <sup>2</sup> ( )( ) *Q R s s QR* , in

which a common pole and zero were cancelled. Expanding into partial fractions and taking the causal part results in

$$\{\mathbf{G}\_2\mathbf{Q}\mathbf{G}\_2^H(\boldsymbol{\Delta}^H)^{-1}(\mathbf{s})\}\_+ = \frac{\frac{Q}{R^{1/2}\left(-\mathbf{s} + \sqrt{\alpha^2 + Q/R}\right)}\Big|\_{s=a}}{(s-a)}$$

<sup>&</sup>quot;Science is the systematic classification of experience." *George Henry Lewes*

and

Smoothing, Filtering and Prediction:

,

. Therefore,

> 

and ( ) *<sup>H</sup> s* = – (*Rs*2 + *Rα*2 + *Q*)

*QR s* )( )

*s*

<sup>=</sup>1/ 2 <sup>2</sup> ( )( ) *Q R s s QR* 

= - 1,

, in

16 Estimating the Past, Present and Future

The observation (48) can be verified by substituting ( ) *<sup>H</sup> <sup>s</sup>* 2 2 ( ) *<sup>H</sup> G QG s* into (46). This observation is consistent with intuition, that is, when the measurements are perfect, filtering

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Time, s

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

Time, s

1/ 2 <sup>2</sup> ( ) ( )

 

*Q R s QR s*

Figure 5. Sample trajectories for Example 13: (a) measurement, (b) system output (dotted

( )

which a common pole and zero were cancelled. Expanding into partial fractions and taking

*Example 13.* Consider a scalar output estimation problem, where *G*2(*s*) = <sup>1</sup> ( ) *s*

, which leads to ( )*s* 1/ 2 *R s*( + <sup>2</sup> <sup>1</sup>

( )( ) ( ) *Q s s s R s QR*

1 2 2 { ( ) ( )} *H H G QG s* =

 

"Science is the systematic classification of experience." *George Henry Lewes*

will be superfluous.

−0.02

−0.02

1

the causal part results in

line) and filtered signal (solid line).

*Q* = 1 and *R* = 0.0001. Then 2 2 ( ) *<sup>H</sup> G QG s* 2 21 *Q s* ( )

2 2 ( ) () *H H G QG s* = 1/ 2 <sup>2</sup>

−0.01

Amplitude

2 21 ( ) *s* 

0

0.01

0.02

−0.01

Amplitude

0

(a)

(b)

0.01

0.02

$$H\_{\rm O\bar{t}}(\mathbf{s}) \ = \left\langle \mathbf{G}\_2 \mathbf{Q} \mathbf{G}\_2^H \boldsymbol{\Delta}^{-H} \right\rangle\_\* \boldsymbol{\Delta}^{-1}(\mathbf{s}) \ = \frac{\boldsymbol{\alpha} + \sqrt{\boldsymbol{\alpha}^2 + \mathbf{Q}/\hbar \mathbf{R}}}{\mathbf{s} + \sqrt{\boldsymbol{\alpha}^2 + \mathbf{Q}/\hbar \mathbf{R}}} \ .$$

Substituting 1 , *Q* = 1 and *R* = 0.0001 yields *H*(*s*) = <sup>1</sup> 99( 100) *s* . By inspection,

lim <sup>99</sup> ( ) <sup>0</sup> <sup>100</sup> *H s <sup>s</sup>* , which illustrates the low measurement noise asymptote (48). Some

sample trajectories from a simulation conducted with δ*t* = 0.001 s are shown in Fig. 5. The input measurements are shown in Fig. 5(a). It can be seen that the filtered signal (the solid line of Fig. 5 (b)) estimates the system output (the dotted line of Fig. 5(b)).

#### **1.3.4 Minimum-Mean-Square-Error Input Estimation**

In input estimation problems, it is desired to estimate the input process *w*(*t*), as depicted in Fig. 6. This is commonly known as an equalisation problem, in which it is desired to mitigate the distortion introduced by a communication channel *G*2(*s*). The simplification of the general noncausal solution (44) of Theorem 2 for the case of *G*2(*s*) = *I* results in to

$$H\_{\rm IF}(\mathbf{s}) = \mathbf{Q} \mathbf{G}\_2^H \boldsymbol{\Delta}^{-H} \boldsymbol{\Delta}^{-1}(\mathbf{s}) \,. \tag{49}$$

Equation (49) is known as the optimum minimum-mean-square-error noncausal equaliser [12]. Assume that: *G*2(*s*) is proper, that is, the order of the numerator is the same as the order of the denominator, and the zeros of *G*2(*s*) are in the left-hand-plane. Under these conditions, when the measurement noise becomes negligibly small, the equaliser estimates the inverse of the system model, that is,

$$\lim\_{\parallel R \to 0} \mathcal{H}\_{\rm II}(\mathbf{s}) = \mathcal{G}\_2^{-1}(\mathbf{s}) \,. \tag{50}$$

The observation (50) can be verified by substituting ( ) *<sup>H</sup> s* 2 2 ( ) *<sup>H</sup> G QG s* into (49). In other words, if the channel model is invertible and signal to noise ratio is sufficiently high, the equaliser will estimate *w*(*t*). When measurement noise is present the equaliser no longer approximates the channel inverse because some filtering is also required. In the limit, when the signal to noise ratio is sufficiently low, the equaliser approaches an open circuit, namely,

$$\lim\_{\substack{\|Q\to 0, s\to 0\\|\mathcal{Q}| \to 0, \mathcal{P} \to 0}} \left| H\_{\mathcal{U}}(s) \right| = 0 \,. \tag{51}$$

The observation (51) can be verified by substituting *Q* = 0 into (49). Thus, when the equalisation problem is dominated by measurement noise, the estimation error is minimised by ignoring the data.

<sup>&</sup>quot;All of the biggest technological inventions created by man - the airplane, the automobile, the computer - says little about his intelligence, but speaks volumes about his laziness." *Mark Raymond Kennedy* 

ASSUMPTIONS MAIN RESULTS

1

1

1 1

1 1

1 1 <sup>1</sup> *G s C sI A B D* () ( )

2 2 <sup>2</sup> *G s C sI A B D* () ( )

2 2 () () *<sup>H</sup> <sup>H</sup> s G QG s R*

1 2 () ( ) () *H H H s G QG s*

1 2 () ( ) () *H H H s G QG s* 

*E*{*w*(*t*)} = *E*{*W*(*s*)} = *E*{v(*t*)} = *E*{*V*(*s*)} = 0. *E*{*w*(*t*)*wT*(*t*)} = *E*{*W*(*s*)*WT*(*s*)} = *Q > 0* and *E*{v(*t*)*vT*(*t*)} = *E*{*V*(*s*)*VT*(*s*)} = *R >* 0 are known. *A*, *B*, *C*1, *C*2, *D*1 and *D*2 are known. *G*1(*s*) and *G*2(*s*) are

stable, *i.e.*, *Re*{λ*i*(*A*)} < 0.

left-half-plane.

Δ(*s*) and Δ-1(*s*) are stable, *i.e.*, the poles and zeros of Δ(*s*) are in the

Table 1. Main results for the continuous-time general filtering problem.

**Problem 1.** Find the transfer functions and comment on stability of the systems having the

**Problem 2.** Find the transfer functions and comment on the stability for systems having the

"The important thing in science is not so much to obtain new facts as to discover new ways of thinking

, *C* 6 14 and *D* 1 .

Signals and

Spectral

Non-causal

Causal

solution

solution

**1.5 Problems** 

following polynomial fractions. (a) *y y yww w* 7 12 2 .

(b) *y y yw w w* 1 20 5 6 .

(c) *y y yw w w* 11 30 7 12 .

(d) *y y yw w w* 13 42 9 20 .

(e) *y y yw w w* 15 56 11 30 .

following state-space parameters.

, <sup>1</sup> 0 *<sup>B</sup>*

(a) 7 12

1 0 *<sup>A</sup>*

about them." *William Henry Bragg*

factorisation

systems

Figure 6. The s-domain input estimation problem.

#### **1.4 Conclusion**

Continuous-time, linear, time-invariant systems can be described via either a differential equation model or as a state-space model. Signal models can be written in the time-domain as

1 <sup>1</sup> <sup>1</sup> 1 0 ... ( ) *n n n n n n d d d a a a a y t dt dt dt* 1 <sup>1</sup> <sup>1</sup> 1 0 ... ( ) *m m m m m n d d <sup>d</sup> b b b b wt dt dt dt* .

Under the time-invariance assumption, the system transfer function matrices exist, which are written as polynomial fractions in the Laplace transform variable

$$Y(\mathbf{s}) = \left[ \frac{b\_m \mathbf{s}^m + b\_{m-1} \mathbf{s}^{m-1} + \dots + b\_1 \mathbf{s} + b\_0}{a\_n \mathbf{s}^n + a\_{n-1} \mathbf{s}^{n-1} + \dots + a\_1 \mathbf{s} + a\_0} \right] V V(\mathbf{s}) = G(\mathbf{s}) V V(\mathbf{s}).$$

Thus, knowledge of a system's differential equation is sufficient to identify its transfer function. If the poles of a system's transfer function are all in the left-hand-plane then the system is asymptotically stable. That is, if the input to the system is bounded then the output of the system will be bounded.

The optimal solution minimises the energy of the error in the time domain. It is found in the frequency domain by minimising the mean-square-error. The main results are summarised in Table 1. The optimal noncausal solution has unstable factors. It can only be realised by a combination of forward and backward processes, which is known as smoothing. The optimal causal solution is also known as the Wiener filter.

In output estimation problems, *C*1 = *C*2, *D*1 = *D*2, that is, *G*1(*s*) = *G*2(*s*) and when the measurement noise becomes negligible, the solution approaches a short circuit. In input estimation or equalisation, *C*1 = 0, *D*1 = *I*, that is, *G*1(*s*) = *I* and when the measurement noise becomes negligible, the optimal equaliser approaches the channel inverse, provided the inverse exists. Conversely, when the problem is dominated by measurement noise then the equaliser approaches an open circuit.

<sup>&</sup>quot;Read Euler, read Euler, he is our master in everything." *Pierre-Simon Laplace*


Table 1. Main results for the continuous-time general filtering problem.

#### **1.5 Problems**

Smoothing, Filtering and Prediction:

18 Estimating the Past, Present and Future

*G HIE(s) 2(s)* Σ

*Y2(s)* 

*V(s)* 

*+* 

Continuous-time, linear, time-invariant systems can be described via either a differential equation model or as a state-space model. Signal models can be written in the time-domain

Under the time-invariance assumption, the system transfer function matrices exist, which

1 1 0 ... ( ) ( ) ( ) ( ). ...

Thus, knowledge of a system's differential equation is sufficient to identify its transfer function. If the poles of a system's transfer function are all in the left-hand-plane then the system is asymptotically stable. That is, if the input to the system is bounded then the

The optimal solution minimises the energy of the error in the time domain. It is found in the frequency domain by minimising the mean-square-error. The main results are summarised in Table 1. The optimal noncausal solution has unstable factors. It can only be realised by a combination of forward and backward processes, which is known as smoothing. The

In output estimation problems, *C*1 = *C*2, *D*1 = *D*2, that is, *G*1(*s*) = *G*2(*s*) and when the measurement noise becomes negligible, the solution approaches a short circuit. In input estimation or equalisation, *C*1 = 0, *D*1 = *I*, that is, *G*1(*s*) = *I* and when the measurement noise becomes negligible, the optimal equaliser approaches the channel inverse, provided the inverse exists. Conversely, when the problem is dominated by measurement noise then the

*b s b s bs b Y s Ws GsWs*

1 1 1 0

1

*a s a s as a* 

1

<sup>ˆ</sup>*W s*( ) *Z(s)* 

*d d <sup>d</sup> b b b b wt dt dt dt* 

*m m m m m n*

*+ \_* 

<sup>1</sup> <sup>1</sup> 1 0 ... ( )

Σ

*E(s)* 

*+* 

.

Figure 6. The s-domain input estimation problem.

*W(s)* 

1

*d d d a a a a y t dt dt dt* 

*n n n n n n*

output of the system will be bounded.

equaliser approaches an open circuit.

<sup>1</sup> <sup>1</sup> 1 0 ... ( )

optimal causal solution is also known as the Wiener filter.

"Read Euler, read Euler, he is our master in everything." *Pierre-Simon Laplace*

are written as polynomial fractions in the Laplace transform variable

*m m m m n n n n*

as

**1.4 Conclusion** 

**Problem 1.** Find the transfer functions and comment on stability of the systems having the following polynomial fractions.


**Problem 2.** Find the transfer functions and comment on the stability for systems having the following state-space parameters.

$$\begin{array}{rcl} \text{(a)} \quad A = \begin{bmatrix} -7 & -12 \\ 1 & 0 \end{bmatrix}, & B = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, & C = \begin{bmatrix} -6 & -14 \end{bmatrix} \text{ and } & D = \begin{bmatrix} 1 \end{bmatrix} \end{array}$$

<sup>&</sup>quot;The important thing in science is not so much to obtain new facts as to discover new ways of thinking about them." *William Henry Bragg*

**Problem 6 [**16**].** In respect of the configuration in Fig. 2, suppose that

causal filter is given by <sup>2</sup> <sup>3</sup> <sup>2</sup> <sup>1</sup> *Hs s s s s s* ( ) (16.9 86.5 97.3)( 8.64 30.3 50.3) .

causal filter for output estimation is given by <sup>2</sup> <sup>1</sup> ( ) (4 60)( 17 60) *Hs s s s OE*

*<sup>n</sup>* The space of real-valued n-element column vectors.

*<sup>H</sup> G QG s*

The following terms have been introduced within this section.

*w* The set of *w*(*t*) over a prescribed interval.

*v*(*t*) A stationary stochastic measurement noise signal.

*Y*(*s*) The Laplace transform of a continuous-time signal *y*(*t*).

*G*(*s*) The transfer function matrix of a system

"Facts are not science **-** as the dictionary is not literature." *Martin Henry Fischer*

**Problem 7 [**18**].** Suppose that 2 2 <sup>2</sup> <sup>2</sup>

The space of real numbers.

signal.

The output of a linear system

input signal.

respectively. *s* The Laplace transform variable.

system

*δ*(*t*)The Dirac delta function.

, <sup>2</sup> *C* 121 , <sup>1</sup> *C* 111 , *D* = 0, *Q* = 1 and *R* = 1. Show that the optimal

<sup>3600</sup> ( ) (169 )

*t* The real-valued continuous-time variable. For example, *t* ( ,)

*w*(*t*) *<sup>n</sup>* A continuous-time, real-valued, *n*-element stationary stochastic input

: *<sup>p</sup> <sup>q</sup>* A linear system that operates on a *p*-element input signal and

*A*, *B*, *C, D* Time-invariant state space matrices of appropriate dimension. The

*Q* and *R* Time-invariant covariance matrices of stochastic signals *w*(*t*) and *v*(*t*),

produces a *q*-element output signal.

and *t* [0, ) denote −∞ < *t* < ∞ and 0 ≤ *t* < ∞, respectively.

*y*(*t*) = *Cx*(*t*) + *Dw*(*t*) in which *w*(*t*) is known as the process noise or

transfer function matrix of the system *x t* ( ) = *Ax*(*t*) + *Bw*(*t*), *y*(*t*) =

is assumed to have the realisation *x t* ( ) = *Ax*(*t*) + *Bw*(*t*),

*s s*

25 25 25

**1.6 Glossary** 

*y w* 

*B*

10 0 0 20 00 3

,

 

*A*

and *R*(*s*) = 1. Show that the optimal

that operates on an input signal *w*.

. For example, the

.

$$\begin{array}{l} \text{(b)} \quad A = \begin{bmatrix} -7 & 20 \\ 1 & 0 \end{bmatrix}, \; B = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \; C = \begin{bmatrix} -2 & 26 \end{bmatrix} \text{ and } \; D = 1 \; . \\\\ \text{(c)} \quad A = \begin{bmatrix} -11 & -30 \\ 1 & 0 \end{bmatrix}, \; B = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \; C = \begin{bmatrix} -18 & -18 \end{bmatrix} \text{ and } \; D = 1 \; . \\\\ \text{(d)} \quad A = \begin{bmatrix} 13 & -42 \\ 1 & 0 \end{bmatrix}, \; B = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \; C = \begin{bmatrix} 22 & -22 \\ 0 \end{bmatrix} \text{ and } \; D = 1 \; . \\\\ \text{(e)} \quad A = \begin{bmatrix} -15 & -56 \\ 1 & 0 \end{bmatrix}, \; B = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, \; C = \begin{bmatrix} -4 & -26 \\ 0 \end{bmatrix} \text{ and } \; D = 1 \; . \end{array}$$

**Problem 3.** Calculate the spectral factors for () () *<sup>H</sup> zz s GQG s R* having the following models and noise statistics.

(a) <sup>1</sup> *Gs s* ( ) ( 1) , *Q* = 2 and *R* = 1.

$$\text{(b)}\ \text{G(s)} = \text{(s+2)}^{-1}\ \text{,}\ \text{Q}=\text{5} \text{ and } \text{R}=\text{1.}$$


**Problem 4.** Calculate the optimal causal output estimators for Problem 3.

**Problem 5.** Consider the error spectral density matrix

$$\boldsymbol{\Phi}\_{\boldsymbol{\alpha}}(\mathbf{s}) = [\boldsymbol{H}\boldsymbol{\Lambda} - \mathbf{G}\_{\mathrm{l}}\mathbf{Q}\mathbf{G}\_{\mathrm{2}}^{H}(\boldsymbol{\Lambda}^{H})^{-1}][\boldsymbol{H}\boldsymbol{\Lambda} - \mathbf{G}\_{\mathrm{l}}\mathbf{Q}\mathbf{G}\_{\mathrm{2}}^{H}(\boldsymbol{\Lambda}^{H})^{-1}]^{H}(\mathbf{s}).$$

$$+ [\mathbf{G}\_{\mathrm{l}}\mathbf{Q}\mathbf{G}\_{\mathrm{l}}^{H} - \mathbf{G}\_{\mathrm{l}}\mathbf{Q}\mathbf{G}\_{\mathrm{2}}^{H}(\boldsymbol{\Delta}\boldsymbol{\Lambda}^{H})^{-1}\mathbf{G}\_{\mathrm{2}}\mathbf{Q}\mathbf{G}\_{\mathrm{l}}^{H}](\mathbf{s})\ .$$


<sup>&</sup>quot;Nothing shocks me. I'm a scientist." *Harrison Ford*

**Problem 6 [**16**].** In respect of the configuration in Fig. 2, suppose that 10 0 0 20 00 3 *A* ,

25 25 25 *B* , <sup>2</sup> *C* 121 , <sup>1</sup> *C* 111 , *D* = 0, *Q* = 1 and *R* = 1. Show that the optimal

causal filter is given by <sup>2</sup> <sup>3</sup> <sup>2</sup> <sup>1</sup> *Hs s s s s s* ( ) (16.9 86.5 97.3)( 8.64 30.3 50.3) .

**Problem 7 [**18**].** Suppose that 2 2 <sup>2</sup> <sup>2</sup> <sup>3600</sup> ( ) (169 ) *<sup>H</sup> G QG s s s* and *R*(*s*) = 1. Show that the optimal

#### causal filter for output estimation is given by <sup>2</sup> <sup>1</sup> ( ) (4 60)( 17 60) *Hs s s s OE* .

#### **1.6 Glossary**

Smoothing, Filtering and Prediction:

*zz s GQG s R* having the following

20 Estimating the Past, Present and Future

, *C* 2 26 and *D* 1 .

, *C* 18 18 and *D* 1 .

, *C* 22 22 and *D* 1 .

, *C* 4 26 and *D* 1 .

**Problem 3.** Calculate the spectral factors for () () *<sup>H</sup>*

**Problem 4.** Calculate the optimal causal output estimators for Problem 3.

1 1

1 2 1 2 () [ ( ) ][ ( ) ] () *H H HH H ee s H G QG H G QG s*

11 1 2 2 1 [ ( ) ]( ) *<sup>H</sup> H H <sup>H</sup> G QG G QG G QG s* .

**Problem 5.** Consider the error spectral density matrix

<sup>1</sup>

(b) 7 20

(c) 11 30

(d) 13 42

(e) 15 56

1 0 *<sup>A</sup>* 

1 0 *<sup>A</sup>* 

1 0 *<sup>A</sup>* 

1 0 *<sup>A</sup>* 

models and noise statistics.

(a) <sup>1</sup> *Gs s* ( ) ( 1) , *Q* = 2 and *R* = 1.

(b) <sup>1</sup> *Gs s* ( ) ( 2) , *Q* = 5 and *R* = 1.

(c) <sup>1</sup> *Gs s* ( ) ( 3) , *Q* = 7 and *R* = 1.

(d) <sup>1</sup> *Gs s* ( ) ( 4) , *Q* = 9 and *R* = 1.

(e) <sup>1</sup> *Gs s* ( ) ( 5) , *Q* = 11 and *R* = 1.

(a) Derive the optimal output estimator.

(c) Derive the optimal input estimator.

(b) Derive the optimal causal output estimator.

"Nothing shocks me. I'm a scientist." *Harrison Ford*

, <sup>1</sup> 0 *<sup>B</sup>*

> , <sup>1</sup> 0 *<sup>B</sup>*

, <sup>1</sup> 0 *<sup>B</sup>*

> , <sup>1</sup> 0 *<sup>B</sup>*

> > The following terms have been introduced within this section.


<sup>&</sup>quot;Facts are not science **-** as the dictionary is not literature." *Martin Henry Fischer*

**1.7 References** 

York, 1975.

York, 1963).

& Hall, London, 1949.

2, pp. 73 – 125, 1966.

Press, N.Y., 1975.

Cliffs, New Jersey, 1995.

741 – 770, 1976.

Englewood Cliffs, New Jersey, 1994.

*étrangers)*, vol. 1, pp. 638 – 648, 1806.

*and Optimization*, vol. 30, no. 2, pp. 262 – 283, 1992.

*Linear Algebra with Applications*, vol. 8, pp. 467 – 496, 2001.

*Proceedings of the IEEE*, vol. 73, no. 3, pp. 433 – 481, Mar. 1985.

"All science is either physics or stamp collecting." *Baron William Thomson Kelvin*

*Control*, McGraw-Hill Book Company, New York, 1971.

[1] O. Neugebauer, *A history of ancient mathematical astronomy*, Springer, Berlin and New

[2] C. F. Gauss, *Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum*, Hamburg, 1809 (Translated: *Theory of the Motion of the Heavenly Bodies*, Dover, New

[3] A. N. Kolmogorov, "Sur l'interpolation et extrapolation des suites stationaires", *Comptes* 

[4] N. Wiener, *Extrapolation, interpolation and smoothing of stationary time series with engineering applications*, The MIT Press, Cambridge Mass.; Wiley, New York; Chapman

[5] P. Masani, "Wiener's Contributions to Generalized Harmonic Analysis, Prediction Theory and Filter Theory", *Bulletin of the American Mathematical Society*, vol. 72, no. 1, pt.

[6] T. Kailath, *Lectures on Wiener and Kalman Filtering*, Springer Verlag, Wien; New York, 1981. [7] M.-A. Parseval Des Chênes*, Mémoires présentés à l'Institut des Sciences, Lettres et Arts, par divers savans, et lus dans ses assemblées. Sciences mathématiques et physiques (Savans* 

[8] C. A. Desoer and M. Vidyasagar, *Feedback Systems : Input Output Properties*, Academic

[9] G. A. Einicke, "Asymptotic Optimality of the Minimum-Variance Fixed-Interval Smoother", *IEEE Transactions on Signal Processing*, vol. 55, no. 4, pp. 1543 – 1547, Apr. 2007.

[12] M. Green and D. J. N. Limebeer, *Linear Robust Control*, Prentice-Hall Inc, Englewood

[13] C. S. Burrus, J. H. McClellan, A. V. Oppenheim, T. W. Parks, R. W. Schafer and H. W. Schuessler, *Computer-Based Exercises for Signal Processing Using Matlab*, Prentice-Hall,

[14] U. Shaked, "A general transfer function approach to linear stationary filtering and steady state optimal control problems", *International Journal of Control*, vol. 24, no. 6, pp.

[15] A. H. Sayed and T. Kailath, "A Survey of Spectral Factorization Methods", *Numerical* 

[16] U. Shaked, "H∞–Minimum Error State Estimation of Linear Stationary Processes", *IEEE* 

[17] S. A. Kassam and H. V. Poor, "Robust Techniques for Signal Processing: A Survey",

[18] A. P. Sage and J. L. Melsa, *Estimation Theory with Applications to Communications and* 

*Transactions on Automatic Control*, vol. 35, no. 5, pp. 554 – 558, May 1990.

[10] T. Kailath, *Linear Systems*, Prentice-Hall Inc., Englewood Cliffs, New Jersey, 1980. [11] D. J. N. Limebeer, B. D. O. Anderson, P. Khargonekar and M. Green, "A Game Theoretic Approach to H Control for Time-varying Systems", *SIAM Journal of Control* 

*Rendus. de l'Academie des Sciences*, vol. 208, pp. 2043 – 2045, 1939.


<sup>&</sup>quot;Facts are stupid things." *Ronald Wilson Reagan*

#### **1.7 References**

Smoothing, Filtering and Prediction:

22 Estimating the Past, Present and Future

*Cx*(*t*) + *Dw*(*t*) is given by *G*(*s*) = *C*(*sI* − *A*)−1*B* + *D*. *v w*, The inner product of two continuous-time signals *v* and *w* which is

.

<sup>2</sup> *<sup>w</sup>* The 2-norm of the continuous-time signal *w* which is defined by

.

2 for any *w* 2. If Re{λ*i*(*A*)} are in the left-hand-plane or equivalently if the real part of transfer function's poles are in the left-

parameters {*A*, *B*, *C*, *D*} is a system parameterised by {– *AT*, – *CT*, *BT*,

is said to be asymptotically stable if its output *y*

. The adjoint of a system having the state-space

 

<sup>2</sup> The set of continuous-time signals having finite 2-norm, which is

*GH*(*s*) The adjoint (or Hermitian transpose) of the transfer function matrix

( )*s* The spectral factor of ( ) *zz s* which satisfies ( ) *<sup>H</sup> s* = ( ) *<sup>H</sup> GQG s* + *R*

*H*(*s*) Transfer function matrix of the minimum mean-square-error

*HOE*(*s*) Transfer function matrix of the minimum mean-square-error solution

*HIE*(*s*) Transfer function matrix of the minimum mean-square-error solution

defined by , *<sup>T</sup> v w v w dt*

<sup>2</sup> *<sup>w</sup>* = *w w*, = *<sup>T</sup> w wdt*

known as the Lebesgue 2-space.

hand-plane then the system is stable.

( ) *zz s* The spectral density matrix of the measurements *z*.

*G*–*<sup>H</sup>*(*s*) Inverse of the adjoint transfer function matrix *GH*(*s*). {*G*(*s*)}+ Causal part of the transfer function matrix *G*(*s*).

specialised for output estimation.

specialised for input estimation.

and ( ) *<sup>H</sup> s* = <sup>1</sup> ( ) () *<sup>H</sup> s* . *G*–1(*s*) Inverse of the transfer function matrix *G*(*s*).

λ*i*(*A*) The *i* eigenvalues of *A*.

Asymptotic stability A linear system

The adjoint of

*DT*}.

*G*(*s*).

solution.

"Facts are stupid things." *Ronald Wilson Reagan*

*H* 

Re{ λ*i*(*A*)} The real part of the eigenvalues of *A.*


[11] D. J. N. Limebeer, B. D. O. Anderson, P. Khargonekar and M. Green, "A Game Theoretic Approach to H Control for Time-varying Systems", *SIAM Journal of Control and Optimization*, vol. 30, no. 2, pp. 262 – 283, 1992.


<sup>&</sup>quot;All science is either physics or stamp collecting." *Baron William Thomson Kelvin*

## **Discrete-Time Minimum-Mean-Square-Error Filtering**

#### **2.1 Introduction**

This chapter reviews the solutions for the discrete-time, linear stationary filtering problems that are attributed to Wiener [1] and Kolmogorov [2]. As in the continuous-time case, a model-based approach is employed. Here, a linear model is specified by the coefficients of the input and output difference equations. It is shown that the same coefficients appear in the system's (frequency domain) transfer function. In other words, frequency domain model representations can be written down without background knowledge of z-transforms.

Discrete-Time Minimum-Mean-Square-Error Filtering 25

In the 1960s and 1970s, continuous-time filters were implemented on analogue computers. This practice has been discontinued for two main reasons. First, analogue multipliers and op amp circuits exhibit poor performance whenever (temperature-sensitive) calibrations become out of date. Second, updated software releases are faster to turn around than hardware design iterations. Continuous-time filters are now routinely implemented using digital computers, provided that the signal sampling rates and data processing rates are sufficiently high. Alternatively, continuous-time model parameters may be converted into discrete-time and differential equations can be transformed into difference equations. The ensuing discrete-time filter solutions are then amenable to more economical implementation, namely, employing relatively lower processing rates.

The discrete-time Wiener filtering problem is solved in the frequency domain. Once again, it is shown that the optimum minimum-mean-square-error solution is found by completing the square. The optimum solution is noncausal, which can only be implemented by forward and backward processes. This solution is actually a smoother and the optimum filter is found by taking the causal part.

The developments rely on solving a spectral factorisation problem, which requires pole-zero cancellations. Therefore, some pertinent discrete-time concepts are introduced in Section 2.2 prior to deriving the filtering results. The discussion of the prerequisite concepts is comparatively brief since it mirrors the continuous-time material introduced previously. In Section 2.3 it is shown that the structure of the filter solutions is unchanged – only the spectral factors are calculated differently.

<sup>&</sup>quot;If we value the pursuit of knowledge, we must be free to follow wherever that search may lead us. The free mind is not a barking dog, to be tethered on a ten foot-chain." *Adlai Ewing Stevenson Jr.*

<sup>1</sup> <sup>1</sup> ( ) <sup>2</sup> *jwT jwT <sup>e</sup> <sup>k</sup> <sup>k</sup> <sup>e</sup> <sup>y</sup> Y z z dz*

<sup>2</sup> 1 <sup>2</sup> ( ) <sup>2</sup> *jwT jwT e <sup>k</sup> <sup>e</sup> <sup>y</sup> dk Y z dz j*

In the continuous-time case, a system's differential equations lead to a transfer function in the Laplace transform variable. Here, in discrete-time, a system's difference equations lead to a transfer function in the z-transform variable. Applying the z-transform to both sides of

> 1 1 1 1 0 1 1 1 1 0

> > 1 1 1 1 0 1 1 1 1 0

... ( ) ( ) ...

*b z b z bz b Y z W z a z a z az a* 

*b z b z bz b G z a z a z az a* 

is known as the transfer function of the system. It can be seen that knowledge of the system

The numerator and denominator polynomials of (8) can be factored into *m* and *n* linear

1 2 1 2 ( )( )...( ) ( ) ( )( )...( ) *m m n n*

The numerator of *G*(*z*) is zero when *z* = *βi*, *i* = 1 … *m*. These values of *z* are called the zeros of *G*(*z*). Zeros inside the unit circle are called minimum-phase whereas zeros outside the unit

"There is no philosophy which is not founded upon knowledge of the phenomena, but to get any profit

 

. (9)

 

*bz z z G z az z z* 

from this knowledge it is absolutely necessary to be a mathematician." *Daniel Bernoulli*

That is, the energy in the time domain equals the energy in the frequency domain.

 <sup>1</sup> <sup>1</sup> <sup>1</sup> 1 0 ... ( ) *<sup>n</sup> <sup>n</sup> n n a z a z az a Y z*

<sup>1</sup> <sup>1</sup>

*m m m m n n n n*

... ( ) ... *m m m m n n n n*

difference equation (2) is sufficient to identify its transfer function (8).

. (4)

. (5)

<sup>1</sup> 1 0 ... ( ) *<sup>m</sup> <sup>m</sup> m m b z b z bz b W z* .

(8)

(6)

(7)

*j*

**2.2.4 Polynomial Fraction Transfer Functions** 

*Theorem 1 Parseval's Theorem:* 

(2) yields the difference equation

*GzW z* () () ,

Therefore

where

**2.2.5 Poles and Zeros** 

factors, respectively, to give

#### **2.2 Prerequisites**

#### **2.2.1 Spaces**

Discrete-time real-valued stochastic processes are denoted as *<sup>T</sup> <sup>k</sup> v* = 1, [ , *<sup>T</sup> <sup>k</sup> v* 2, , *<sup>T</sup> <sup>k</sup> v* …, , ] *<sup>T</sup> n k v* and *<sup>T</sup> wk* = 1, [ , *<sup>T</sup> w <sup>k</sup>* 2, , *<sup>T</sup> w <sup>k</sup>* …, , ] *<sup>T</sup> wn k* , where *vi,k*, *wi,k* , *i* = 1, … *n* and *k* (–∞, ∞). The *vk* and *wk* are said to belong to the space *<sup>n</sup>* . In this chapter, the vector *w* denotes the set of *wk* over all time *k*, that is, *w* = {*wk*, *k* (–∞,∞)}. The inner product *v w*, of two discrete-time vector processes *v* and *w* is defined by

$$\left\{\boldsymbol{v},\boldsymbol{w}\right\} = \sum\_{k=-\nu}^{\nu} \boldsymbol{v}\_{k}^{T} \boldsymbol{w}\_{k} \;. \tag{1}$$

The 2-norm or Euclidean norm of a discrete-time vector process *w*, <sup>2</sup> *w* , is defined as <sup>2</sup> *w*

$$=\sqrt{\{w,w\}} = \sqrt{\sum\_{k=-\upsilon}^{\upsilon} w\_k^\top w\_k} \quad \text{The square of the 2-norm, that is, } \\ \left\|w\right\|\_2^2 = \left\langle w^\top w \right\rangle = \sum\_{k=-\upsilon}^{\upsilon} w\_k^\top w\_k \text{ is the 2-norm.} $$

commonly known as energy of the signal *w*. The Lebesgue 2-space is denoted by <sup>2</sup> and is defined as the set of discrete-time processes having a finite 2-norm. Thus, *w* <sup>2</sup> means that the energy of *w* is bounded. See [3] for more detailed discussions of spaces and norms.

#### **2.2.2 Discrete-time Polynomial Fraction Systems**

Consider a linear, time-invariant system that operates on an input process *wk* and produces an output process *yk* , that is, : → . Suppose that the difference equation for this system is

$$a\_n y\_{k-n} + a\_{n-1} y\_{k-n+1} + \dots + a\_1 y\_{k-1} + a\_0 y\_k = b\_m w\_{k-m} + b\_{m-1} w\_{k-m+1} + \dots + b\_1 w\_{k-1} + b\_0 w\_{k-1} \tag{2}$$

where *a*0, …, *a*n and *b*0, …, *bn* are real-valued constant coefficients, with *an* ≠ 0 and zero initial conditions.

*Example 1.* The difference equation *yk* = 0.1*xk* + 0.2 *xk-1* + 0.3*yk-1* specifies a system in which the coefficients are *a0* = 1, *a1* = – 0.3, *b0* = 0.2 and *b1* = 0.3. Note that *yk* is known as the current output and *yk-1* is known as a past output.

#### **2.2.3 The Z-Transform of a Discrete-time Sequence**

The two-sided z-transform of a discrete-time process, *yk*, is denoted by *Y*(*z*) and is defined by

$$Y(z) = \sum\_{k=-\alpha}^{\alpha} y\_k z^{-k} \, , \tag{3}$$

where *z* = *ejωt* and *j* = 1 . Given a process *yk* with z-transform *Y*(*z*), *yk* can be calculated from *Y*(*z*) by taking the inverse z-transform of *y*(*z*),

<sup>&</sup>quot;To live effectively is to live with adequate information." *Norbert Wiener*

$$y\_k = \frac{1}{2\pi j} \int\_{-\epsilon^{\mu \prime \Gamma}}^{\epsilon^{\mu \prime \Gamma}} Y(z) z^{k-1} dz \,. \tag{4}$$

*Theorem 1 Parseval's Theorem:* 

$$\int\_{-\alpha}^{-\alpha} \left| y\_k \right|^2 \, dk = \frac{1}{2\pi j} \int\_{-\epsilon'}^{\epsilon'} \left| Y(z) \right|^2 dz \, . \tag{5}$$

That is, the energy in the time domain equals the energy in the frequency domain.

#### **2.2.4 Polynomial Fraction Transfer Functions**

In the continuous-time case, a system's differential equations lead to a transfer function in the Laplace transform variable. Here, in discrete-time, a system's difference equations lead to a transfer function in the z-transform variable. Applying the z-transform to both sides of (2) yields the difference equation

$$\begin{aligned} \left(a\_n z^{-n} + a\_{n-1} z^{-n+1} + \dots + a\_1 z^{-1} + a\_0 \right) Y(z) \\ = \left(b\_n z^{-m} + b\_{m-1} z^{-m+1} + \dots + b\_1 z^{-1} + b\_0 \right) V(z) \ . \end{aligned} \tag{6}$$

Therefore

Smoothing, Filtering and Prediction:

*<sup>k</sup> v* = 1, [ , *<sup>T</sup>*

. (1)

that operates on an input process *wk* and

*,* (3)

: → . Suppose that the difference

*<sup>k</sup> v* 2, , *<sup>T</sup>*

<sup>2</sup> *<sup>w</sup>* = *<sup>T</sup> w w* = *<sup>T</sup>*

*k*

*<sup>k</sup> v* …, , ] *<sup>T</sup>*

*n k v* and

*k k*

*w w* 

is

26 Estimating the Past, Present and Future

*<sup>T</sup> wk* = 1, [ , *<sup>T</sup> w <sup>k</sup>* 2, , *<sup>T</sup> w <sup>k</sup>* …, , ] *<sup>T</sup> wn k* , where *vi,k*, *wi,k* , *i* = 1, … *n* and *k* (–∞, ∞). The *vk* and *wk* are said to belong to the space *<sup>n</sup>* . In this chapter, the vector *w* denotes the set of *wk* over all time *k*, that is, *w* = {*wk*, *k* (–∞,∞)}. The inner product *v w*, of two discrete-time vector

, *<sup>T</sup>*

*k vw vw* 

. The square of the 2-norm, that is, <sup>2</sup>

The 2-norm or Euclidean norm of a discrete-time vector process *w*, <sup>2</sup> *w* , is defined as <sup>2</sup> *w*

commonly known as energy of the signal *w*. The Lebesgue 2-space is denoted by <sup>2</sup> and is defined as the set of discrete-time processes having a finite 2-norm. Thus, *w* <sup>2</sup> means that the energy of *w* is bounded. See [3] for more detailed discussions of spaces and norms.

1 1 11 0 ... *n kn n kn <sup>k</sup> <sup>k</sup> a y a y ay ay* 1 1 <sup>110</sup> ... *m km m km <sup>k</sup> <sup>k</sup> b w b w bw bw ,* (2)

where *a*0, …, *a*n and *b*0, …, *bn* are real-valued constant coefficients, with *an* ≠ 0 and zero initial

*Example 1.* The difference equation *yk* = 0.1*xk* + 0.2 *xk-1* + 0.3*yk-1* specifies a system in which the coefficients are *a0* = 1, *a1* = – 0.3, *b0* = 0.2 and *b1* = 0.3. Note that *yk* is known as the current

The two-sided z-transform of a discrete-time process, *yk*, is denoted by *Y*(*z*) and is defined by

where *z* = *ejωt* and *j* = 1 . Given a process *yk* with z-transform *Y*(*z*), *yk* can be calculated

( ) *<sup>k</sup> k*

*k Y z y z* 

*k k*

Discrete-time real-valued stochastic processes are denoted as *<sup>T</sup>*

**2.2 Prerequisites** 

processes *v* and *w* is defined by

*k*

equation for this system is

conditions.

*k k*

**2.2.2 Discrete-time Polynomial Fraction Systems** 

produces an output process *yk* , that is,

*w w*

Consider a linear, time-invariant system

output and *yk-1* is known as a past output.

**2.2.3 The Z-Transform of a Discrete-time Sequence** 

from *Y*(*z*) by taking the inverse z-transform of *y*(*z*),

"To live effectively is to live with adequate information." *Norbert Wiener*

= *w w*, = *<sup>T</sup>*

**2.2.1 Spaces** 

$$Y(z) = \left[ \frac{b\_n z^{-n} + b\_{m-1} z^{-m+1} + \dots + b\_l z^{-l} + b\_0}{a\_n z^{-n} + a\_{n-l} z^{-n+1} + \dots + a\_l z^{-l} + a\_0} \right] W(z) \tag{7}$$
 
$$= G(z) W(z) \ ,$$

where

$$\mathbf{G}(\mathbf{z}) = \left\lfloor \frac{\mathbf{b}\_w \mathbf{z}^{-w} + \mathbf{b}\_{w-1} \mathbf{z}^{-w+1} + \dots + \mathbf{b}\_1 \mathbf{z}^{-1} + \mathbf{b}\_0}{\mathbf{a}\_u \mathbf{z}^{-u} + \mathbf{a}\_{u-1} \mathbf{z}^{-u+1} + \dots + \mathbf{a}\_1 \mathbf{z}^{-1} + \mathbf{a}\_0} \right\rfloor \tag{8}$$

is known as the transfer function of the system. It can be seen that knowledge of the system difference equation (2) is sufficient to identify its transfer function (8).

#### **2.2.5 Poles and Zeros**

The numerator and denominator polynomials of (8) can be factored into *m* and *n* linear factors, respectively, to give

$$\mathbf{G}(z) = \frac{b\_m(z-\beta\_1)(z-\beta\_2)...(z-\beta\_m)}{a\_n(z-a\_1)(z-a\_2)...(z-a\_n)}.\tag{9}$$

The numerator of *G*(*z*) is zero when *z* = *βi*, *i* = 1 … *m*. These values of *z* are called the zeros of *G*(*z*). Zeros inside the unit circle are called minimum-phase whereas zeros outside the unit

<sup>&</sup>quot;There is no philosophy which is not founded upon knowledge of the phenomena, but to get any profit from this knowledge it is absolutely necessary to be a mathematician." *Daniel Bernoulli*

Figure 1. Discrete-time state-space system.

The state-space transfer function matrix (11) can be realised as a discrete-time system : *<sup>m</sup>*

*z -1*

*+ +* 

*A*

*B* Σ *C* 

*wk xk+1 xk*

where *wk <sup>m</sup>* is an input sequence, *xk <sup>n</sup>* is a state vector and *yk <sup>p</sup>* is an output. This system is depicted in Fig. 1. It is assumed that *wk* is a zero-mean, stationary process with

applications, discrete-time implementations are desired, however, the polynomial fraction transfer function or state-space transfer function parameters may be known in continuoustime. Therefore, two methods for transforming continuous-time parameters to discrete-time

Transfer functions in the z-plane can be mapped exactly to the s-plane by substituting *<sup>s</sup> sT z e* ,

can be used to map s-plane transfer functions into the z-plane. The bilinear transform is a

"I do not like it, and I am sorry I ever had anything to do with it." *Erwin Rudolf Josef Alexander* 

3 5 7

*<sup>k</sup>* <sup>1</sup> *k k x Ax Bw* ,

*kk k y Cx Dw* ,

*j k j k*

1 if 0 if *jk*

where s = *jw* and *TS* is the sampling period. Conversely, the substitution

2 11 1 1 1 1 1 13 1 5 1 7 1 *<sup>s</sup> zz z z Tz z z z* 

(12)

(13)

(14)

is the Kronecker delta function. In most

Σ

*+* 

*D* 

*+ yk*

**2.2.8 State-Space Realisation** 

**2.2.9 The Bilinear Approximation** 

*S s z T*

first order approximation to (14), namely,

<sup>1</sup> log( )

, where

→ *<sup>p</sup>*

{ }*<sup>T</sup> Ewwj <sup>k</sup>* = *Q jk*

are set out below.

*Schrödinger*

circle are called non-minimum phase. The denominator of *G*(*z*) is zero when *z* = *αi*, *i* = 1 … *n*. These values of *z* are called the poles of *G*(*z*).

*Example 2.* Consider a system described by the difference equation *yk* + 0.3*yk*-1 + 0.04*yk*-2 = *wk* + 0.5*wk*-1. It follows from (2) and (8) that the corresponding transfer function is given by

$$G(z) = \frac{1 + 0.5z^{-1}}{1 + 0.3z^{-1} + 0.04z^{-2}} = \frac{z^2 + 0.5z}{z^2 + 0.3z + 0.04} = \frac{z(z + 0.5)}{(z - 0.1)(z + 0.4)}$$

which possesses poles at *z* = 0.1, − 0.4 and zeros at *z* = 0, − 0.5.

#### **2.2.6 Polynomial Fraction Transfer Function Matrix**

In the single-input-single-output case, it is assumed that *w*(*z*), *G*(*z*) and *y*(*z*) . In the multiple-input-multiple-output case, *G*(*z*) is a transfer function matrix. For example, suppose that *w*(*z*) *<sup>m</sup>* , *y*(*z*) *<sup>p</sup>* , then *G*(*z*) *<sup>p</sup><sup>m</sup>* , namely

$$\mathbf{G}(\mathbf{z}) = \begin{bmatrix} \mathbf{G}\_{11}(\mathbf{z}) & \mathbf{G}\_{12}(\mathbf{s}) & \dots & \mathbf{G}\_{1m}(\mathbf{z}) \\ \mathbf{G}\_{21}(\mathbf{z}) & \mathbf{G}\_{22}(\mathbf{s}) \\ \vdots & & \ddots & \vdots \\ \mathbf{G}\_{p1}(\mathbf{z}) & & \dots & \mathbf{G}\_{pm}(\mathbf{z}) \end{bmatrix} \tag{10}$$

where the components *Gij*(*z*) have the polynomial transfer function form within (8) or (9).

#### **2.2.7 State-Space Transfer Function Matrix**

The polynomial fraction transfer function matrix (10) can be written in the state-space representation

$$\mathbf{G}(z) = \mathbf{C}(zI - A)^{-1}B + D\ \tag{11}$$

where *A n n* , *B n m* , *C <sup>p</sup> <sup>n</sup>* and *D <sup>p</sup> <sup>p</sup>* .

*Example 3.* For a state-space model with *A* = −0.5, *B* = *C* = 1 and *D* = 0, the transfer function is 1 1 *Gz z* ( ) ( 0.5) .

*Example 4.* For state-space parameters 0.33 0.04 1 0 *<sup>A</sup>* , <sup>1</sup> 0 *<sup>B</sup>* , *C* 0.2 0.04 and *D* = 1, the use of Cramer's rule, that is, 1 *a b c d* = 1 *d b ad bc c a* , yields the transfer function *G*(*z*) = ( 0.5) ( 0.1)( 0.4) *z z z z* .

<sup>&</sup>quot;A mathematician is a blind man in a dark room looking for a black cat which isn't there." *Charles Robert Darwin*

Figure 1. Discrete-time state-space system.

#### **2.2.8 State-Space Realisation**

Smoothing, Filtering and Prediction:

<sup>=</sup>( 0.5) ( 0.1)( 0.4) *z z z z* 

, (10)

28 Estimating the Past, Present and Future

circle are called non-minimum phase. The denominator of *G*(*z*) is zero when *z* = *αi*, *i* = 1 …

*Example 2.* Consider a system described by the difference equation *yk* + 0.3*yk*-1 + 0.04*yk*-2 = *wk* + 0.5*wk*-1. It follows from (2) and (8) that the corresponding transfer function is given by

> 2 2

0.5 0.3 0.04 *z z z z* 

*m*

<sup>1</sup> *G z C zI A B D* () ( ) , (11)

, <sup>1</sup> 0 *<sup>B</sup>*

1 *d b ad bc c a* 

, *C* 0.2 0.04 and

, yields the transfer

=

In the single-input-single-output case, it is assumed that *w*(*z*), *G*(*z*) and *y*(*z*) . In the multiple-input-multiple-output case, *G*(*z*) is a transfer function matrix. For example,

11 12 1

*Gz Gs G z*

( ) ( ) .. ( )

( ) .. ( )

*p pm*

where the components *Gij*(*z*) have the polynomial transfer function form within (8) or (9).

*G z G z*

The polynomial fraction transfer function matrix (10) can be written in the state-space

*Example 3.* For a state-space model with *A* = −0.5, *B* = *C* = 1 and *D* = 0, the transfer function

"A mathematician is a blind man in a dark room looking for a black cat which isn't there." *Charles Robert* 

0.33 0.04 1 0 *<sup>A</sup>*

=

1 *a b c d*

21 22

*Gz Gs*

() () ( ) : :

1

*n*. These values of *z* are called the poles of *G*(*z*).

*<sup>G</sup>*(*z*) <sup>1</sup>

**2.2.6 Polynomial Fraction Transfer Function Matrix** 

*G z*

**2.2.7 State-Space Transfer Function Matrix** 

*Example 4.* For state-space parameters

*D* = 1, the use of Cramer's rule, that is,

( 0.1)( 0.4) *z z z z* .

where *A n n* , *B n m* , *C <sup>p</sup> <sup>n</sup>* and *D <sup>p</sup> <sup>p</sup>* .

representation

is 1 1 *Gz z* ( ) ( 0.5) .

function *G*(*z*) = ( 0.5)

*Darwin*

1 0.5 1 0.3 0.04

 

which possesses poles at *z* = 0.1, − 0.4 and zeros at *z* = 0, − 0.5.

suppose that *w*(*z*) *<sup>m</sup>* , *y*(*z*) *<sup>p</sup>* , then *G*(*z*) *<sup>p</sup><sup>m</sup>* , namely

1 2

*z z z* 

> The state-space transfer function matrix (11) can be realised as a discrete-time system : *<sup>m</sup>* → *<sup>p</sup>*

$$\mathbf{x}\_{k+1} = A\mathbf{x}\_k + Bw\_{k'} \tag{12}$$

*kk k y Cx Dw* , (13)

where *wk <sup>m</sup>* is an input sequence, *xk <sup>n</sup>* is a state vector and *yk <sup>p</sup>* is an output. This system is depicted in Fig. 1. It is assumed that *wk* is a zero-mean, stationary process with

{ }*<sup>T</sup> Ewwj <sup>k</sup>* = *Q jk* , where 1 if 0 if *jk j k j k* is the Kronecker delta function. In most

applications, discrete-time implementations are desired, however, the polynomial fraction transfer function or state-space transfer function parameters may be known in continuoustime. Therefore, two methods for transforming continuous-time parameters to discrete-time are set out below. fraction

#### **2.2.9 The Bilinear Approximation**

Transfer functions in the z-plane can be mapped exactly to the s-plane by substituting *<sup>s</sup> sT z e* , where s = *jw* and *TS* is the sampling period. Conversely, the substitution

$$\begin{aligned} s &= \frac{1}{T\_s} \log(z) \\\\ &= \frac{2}{T\_s} \left[ \frac{z-1}{z+1} + \frac{1}{3} \left( \frac{z-1}{z+1} \right)^3 + \frac{1}{5} \left( \frac{z-1}{z+1} \right)^5 + \frac{1}{7} \left( \frac{z-1}{z+1} \right)^7 + \dots \right] \end{aligned} \tag{14}$$

can be used to map s-plane transfer functions into the z-plane. The bilinear transform is a first order approximation to (14), namely,

<sup>&</sup>quot;I do not like it, and I am sorry I ever had anything to do with it." *Erwin Rudolf Josef Alexander Schrödinger*

*AC s <sup>T</sup> A e <sup>D</sup>* ,

( 1) (( 1) ) *<sup>s</sup> C s*

<sup>0</sup> *<sup>C</sup> s A <sup>D</sup> <sup>C</sup> <sup>T</sup> B e Bd* 

> *<sup>s</sup> <sup>C</sup> <sup>T</sup> <sup>A</sup> <sup>C</sup> e Bd*

.

*<sup>s</sup> <sup>C</sup> <sup>C</sup> <sup>T</sup> <sup>A</sup> <sup>T</sup> <sup>A</sup> Q e BQBe d <sup>D</sup> C CC* 

2 2

*A t C C C At A t e I At*

*AT AT AT A I AT* ,

*AT AT AT B T* ,

continuous-time signal. In respect of (17), the output map may be written as

and using (25) it can be shown that [4]

.

The *τ* within the definite integral (24) varies from *kTs* to (*k*+1)*Ts*. For a change of variable λ = (*k+1*)*Ts* – *τ*, the limits of integration become λ = *Ts* and λ = 0, which results in the

2 !

<sup>2</sup> <sup>3</sup> <sup>4</sup> ( )( )( ) 2! 3! 4! *C s C s C s*

2 23 34 2! 3! 4! *Cs Cs Cs*

<sup>2</sup> ( ) 2!

*T CC CC C CC C s*

It is common practice ([4] – [6]) to truncate the above series after terms linear in *Ts*. Some higher order terms can be retained in applications where parameter accuracy is critical. Since the limit as *N* → ∞ of / ! *<sup>N</sup> T N <sup>s</sup>* is 0, the above series are valid for any value of *Ts*. However, the sample period needs to be sufficiently small, otherwise the above discretisations will be erroneous. According to the Nyquist-Shannon sampling theorem, the sampling rate is required to be at least twice the highest frequency component of the

"We are more easily persuaded, in general, by the reasons we ourselves discover than by those which

*T T T*

*ABQB BQBA T Q BQBT* . (30)

() () () *s Cs C s y kT C x kT D w kT* (31)

*N N*

*N*

. (26)

, (27)

(23) (24)

(25)

(28)

(29)

*k T AkT <sup>D</sup> <sup>C</sup> kT B e Bd*

*s*

0

simplification

which leads to

<sup>0</sup>

The exponential matrix is defined as

are given to us by others." *Blaise Pascal*

*C*

*D C s*

*D C CCs*

*D s*

Denoting { }*<sup>T</sup> Ewwj <sup>k</sup>* = *QD jk*

$$s \approx \frac{2}{T\_s} \left[ \frac{z-1}{z+1} \right]. \tag{15}$$

*Example 5.* Consider the continuous-time transfer function *H*(*s*) *=* (*s* + 2)*-1* with *TS = 2*. Substituting (15) yields the discrete-time transfer function *H*(*z*) *=* (3*z* + 1)*-1*. The higher order terms within the series of (14) can be included to improve the accuracy of converting a continuous-time model to discrete time.

#### **2.2.10 Discretisation of Continuous-time Systems**

The discrete-time state-space parameters, denoted here by {*AD*, *BD*, *CD*, *DD*, *QD*, *RD*}, can be obtained by discretising the continuous-time system

$$
\dot{\mathbf{x}}(t) = A\_{\odot}(t) + B\_{\odot}\mathbf{w}(t) \tag{16}
$$

$$\mathbf{y}(t) = \mathbf{C}\_{\mathbf{C}} \mathbf{x}(t) + D\_{\mathbf{C}} w(t) \, \tag{17}$$

$$z(t) = y(t) + v(t) \, \tag{18}$$

where { ( ) ( )} *<sup>T</sup> Ewtw* = ( ) *Q t <sup>C</sup>* and { ( ) ( )} *<sup>T</sup> Evtv* = ( ) *R t <sup>C</sup>* . Premultiplying (16) by *ACt e* and recognising that ( ( )) *A t <sup>C</sup> <sup>d</sup> e xt dt* = ( ) *A t <sup>C</sup> e xt* – ( ) *A t <sup>C</sup> <sup>C</sup> e Axt* yields

$$\frac{d}{dt}(e^{-A\_{\mathbb{C}}t}\mathbf{x}(t)) = e^{-A\_{\mathbb{C}}t}B\_{\mathbb{C}}w(t)\,. \tag{19}$$

Integrating (19) results in

$$e^{-A\_{\mathcal{C}}t}\mathbf{x}(t) - e^{-A\_{\mathcal{C}}t\_0}\mathbf{x}(t\_0) = \int\_{t\_0}^t e^{-F\tau}B\_{\mathcal{C}}\mathbf{z}\mathbf{w}(\tau)d\tau\tag{20}$$

and hence

$$\begin{split} \mathbf{x}(t) &= e^{\mathbf{A}\_{\mathbb{C}}(t - t\_0)} \mathbf{x}(t\_0) + e^{\mathbf{A}\_{\mathbb{C}}t} \int\_{t\_0}^{t} e^{-\mathbb{F}t} B\_{\mathbb{C}} \mathbf{w}(\tau) d\tau \\ &= e^{\mathbf{A}\_{\mathbb{C}}(t - t\_0)} \mathbf{x}(t\_0) + \int\_{t\_0}^{t} e^{\mathbf{f}(t - \tau)} B\_{\mathbb{C}} \mathbf{w}(\tau) d\tau \end{split} \tag{21}$$

is a solution to the differential equation (16). Suppose that *x*(*t*) is available at integer *k* multiples of *Ts*. Assuming that *w*(*t*) is constant during the sampling interval and substituting *t*0 = *kT*s, *t* = (*k*+1)*Ts* into (21) yields

$$\mathbf{x}((k+1)T\_s) = e^{A\_c T\_s} \mathbf{x}(kT\_s) + \int\_{kT\_s}^{(k+1)T\_s} e^{A\_c \cdot ((k+1)T\_s - \tau)} B\_{\mathbb{C}} d\tau w(kT\_s) \,. \tag{22}$$

With the identifications *xk* = *x*(*kTs*) and *wk* = *w*(*kTs*) in (22), it can be seen that

<sup>&</sup>quot;In the fields of observation, chance favours only the mind that is prepared." *Louis Pasteur*

*AC s <sup>T</sup> A e <sup>D</sup>* , (23)

$$B\_D = \int\_{-kT\_s}^{-(k+1)T\_s} e^{A\_C \cdot ((k+1)T\_s - \tau)} B\_C d\tau \,. \tag{24}$$

The *τ* within the definite integral (24) varies from *kTs* to (*k*+1)*Ts*. For a change of variable λ = (*k+1*)*Ts* – *τ*, the limits of integration become λ = *Ts* and λ = 0, which results in the simplification

$$\begin{split} B\_{D} &= -\int\_{\cdot \mathbb{T}\_{\ast}}^{0} e^{A\_{c} \cdot \mathbf{z}} B\_{c} d\lambda \\\\ &= \int\_{0}^{\mathbb{T}\_{\ast}} e^{A\_{c} \cdot \mathbf{z}} B\_{c} d\lambda \end{split} \tag{25}$$

Denoting { }*<sup>T</sup> Ewwj <sup>k</sup>* = *QD jk* and using (25) it can be shown that [4]

$$Q\_D = \int\_{-0}^{T\_s} e^{A\_C \cdot \lambda} B\_C Q\_C B\_C^\top e^{A\_C \cdot \lambda} d\lambda \,. \tag{26}$$

The exponential matrix is defined as

$$e^{A\_{\mathbb{C}}t} = I + A\_{\mathbb{C}}t + \frac{A\_{\mathbb{C}}^2 t^2}{2} + \dots + \frac{A\_{\mathbb{C}}^N t^N}{N!},\tag{27}$$

which leads to

Smoothing, Filtering and Prediction:

(16) (17) (18)

(21)

. Premultiplying (16) by

. (15)

30 Estimating the Past, Present and Future

*Example 5.* Consider the continuous-time transfer function *H*(*s*) *=* (*s* + 2)*-1* with *TS = 2*. Substituting (15) yields the discrete-time transfer function *H*(*z*) *=* (3*z* + 1)*-1*. The higher order terms within the series of (14) can be included to improve the accuracy of converting a

The discrete-time state-space parameters, denoted here by {*AD*, *BD*, *CD*, *DD*, *QD*, *RD*}, can be

() () () *C C xt A t Bwt* ,

() () () *<sup>C</sup> <sup>C</sup> y t C xt Dwt* ,

*zt yt vt* () () () ,

= ( ) *A t <sup>C</sup> e xt*

and { ( ) ( )} *<sup>T</sup> Evtv*

( ( )) ( ) *A t <sup>C</sup> A t <sup>C</sup>*

<sup>0</sup> () ( ) ( ) *<sup>C</sup> <sup>C</sup> <sup>t</sup> A t A t <sup>F</sup> <sup>C</sup> <sup>t</sup> e xt e xt e Bw d*

<sup>0</sup> () ( ) ( ) *<sup>C</sup> <sup>C</sup> <sup>t</sup> A tt A t <sup>F</sup> <sup>C</sup> <sup>t</sup> xt e xt e e Bw d*

> *<sup>t</sup> A tt F t <sup>C</sup> <sup>t</sup> e xt e Bw d*

( 1) (( 1) ) (( 1) ) ( ) ( ) *<sup>s</sup> C s C s s k T A T AkT <sup>s</sup> <sup>s</sup> <sup>C</sup> <sup>s</sup> kT x k T e x kT e B d w kT*

With the identifications *xk* = *x*(*kTs*) and *wk* = *w*(*kTs*) in (22), it can be seen that

"In the fields of observation, chance favours only the mind that is prepared." *Louis Pasteur*

0 ( ) ( ) <sup>0</sup> ( ) ( ) *<sup>C</sup>*

is a solution to the differential equation (16). Suppose that *x*(*t*) is available at integer *k* multiples of *Ts*. Assuming that *w*(*t*) is constant during the sampling interval and substituting

*<sup>d</sup> e xt e Bwt*

0

0

( )

*C*

 (20)

> 

 

. (22)

0

0

 – ( ) *A t <sup>C</sup> <sup>C</sup> e Axt*

 = ( ) *R t <sup>C</sup>* 

yields

. (19)

2 1 1 *<sup>S</sup> z*

*T z* 

*s*

continuous-time model to discrete time.

where { ( ) ( )} *<sup>T</sup> Ewtw*

Integrating (19) results in

*ACt e*

and hence

<sup>0</sup>

*t*0 = *kT*s, *t* = (*k*+1)*Ts* into (21) yields

and recognising that ( ( )) *A t <sup>C</sup> <sup>d</sup> e xt*

**2.2.10 Discretisation of Continuous-time Systems** 

obtained by discretising the continuous-time system

 = ( ) *Q t <sup>C</sup>* 

*dt*

*dt*

$$A\_D = I + A\_C T\_s + \frac{(A\_C T\_s)^2}{2!} + \frac{(A\_C T\_s)^3}{3!} + \frac{(A\_C T\_s)^4}{4!} + \cdots,\tag{28}$$

$$B\_D = T\_s + \frac{A\_\text{C} T\_s^2}{2!} + \frac{A\_\text{C}^2 T\_s^3}{3!} + \frac{A\_\text{C}^3 T\_s^4}{4!} + \dotsb,\tag{29}$$

$$Q\_D = B\_C Q\_C B\_C^T T\_s + \frac{(A\_C B\_C Q\_C B\_C^T + B\_C Q\_C B\_C^T A\_C^T) T\_s^2}{2!} + \dotsb \tag{30}$$

It is common practice ([4] – [6]) to truncate the above series after terms linear in *Ts*. Some higher order terms can be retained in applications where parameter accuracy is critical. Since the limit as *N* → ∞ of / ! *<sup>N</sup> T N <sup>s</sup>* is 0, the above series are valid for any value of *Ts*. However, the sample period needs to be sufficiently small, otherwise the above discretisations will be erroneous. According to the Nyquist-Shannon sampling theorem, the sampling rate is required to be at least twice the highest frequency component of the continuous-time signal. In respect of (17), the output map may be written as

$$\mathbf{y}(kT\_s) = \mathbf{C}\_\mathbf{C} \mathbf{x}(kT\_s) + D\_\mathbf{C} \mathbf{w}(kT\_s) \tag{31}$$

<sup>&</sup>quot;We are more easily persuaded, in general, by the reasons we ourselves discover than by those which are given to us by others." *Blaise Pascal*

*Lemma 1 (State-space representation of an adjoint system): Suppose that a discrete-time linear* 

(35) (36)

(37) (38)

(40)

(41)

*<sup>k</sup>* <sup>1</sup> *k k x Ax Bw ,* 

*kk k y Cx Dw ,*

*is the linear system having the realisation* 

 

 *<sup>k</sup> B D <sup>k</sup>*

*zI A B x t t C D wt y t*

*C Dw*

1 1 1

 

*k k k*

 

*T T T T zI A C x B D w* 

Thus, the adjoint of a discrete-time system having the parameters

such a trifling investment of fact." *Samuel Langhorne Clemens aka. Mark Twain*

. Adjoint systems have the property ( ) *H H*

"There is something fascinating about science. One gets such wholesale returns of conjecture out of

*N N N T T T*

*T T <sup>k</sup> A C k k*

*T T*

*,*

*<sup>k</sup> ,*

( ) 0( ) ( ) ( )

*k k kk k kk k*

,

*x Ax Bw Cx Dw*

 *is given by (37) – (38). �* 

(39)

( )( )

*A B*

 = 

*C D* is a system with

. The adjoint of

1

*time-invariant system* 

*with x0 = 0. The adjoint <sup>H</sup>*

*with* 0 *<sup>T</sup> .* 

*with x0 = 0. Thus* 

 *<y,* 

*where <sup>H</sup>* 

parameters

<sup>1</sup>

*T T T T A C B D*

<sup>1</sup>

*Proof: The system (35) – (36) can be written equivalently* 

*w>* , *zI A B x*

 , *<sup>H</sup> w*

 *is described by* 

and thus

$$\mathbf{C}\_{\mathcal{D}} = \mathbf{C}\_{\mathcal{C}'} \tag{32}$$

$$D\_{\mathbb{D}} = D\_{\mathbb{C}} \, . \tag{33}$$

Following the approach of [7], it is assumed that the continuous-time signals are integrated between samples, for example, the discretised measurement noise is 1 ( 1) ( ) ( ) *<sup>s</sup> s k T <sup>s</sup> kT <sup>s</sup> v kT v d T* . Then the corresponding measurement noise covariance is

$$R\_D = \frac{1}{T\_s^2} \int\_{kT\_s}^{(k+1)T\_s} R\_C d\tau = \frac{1}{T\_s} R\_{\odot} \,. \tag{34}$$

In some applications, such as inertial and satellite navigation [8], the underlying dynamic equations are in continuous-time, whereas the filters are implemented in discrete-time. In this case, any underlying continuous-time equations together with (28) – (30) can be calculated within a high rate foreground task, so that the discretised state-space parameters will be sufficiently accurate. The discrete-time filter recursions can then be executed within a lower rate background task.

#### **2.2.11 Asymptotic Stability**

Consider a discrete-time, linear, time-invariant system that operates on an input process *w* and produces an output process *y*. The system is said to be asymptotically stable if the output remains bounded, that is, *y* ℓ2, for any input *w* ℓ2. Two equivalent conditions for to be asymptotically stable are as follows.


*Example 6.* A state-space system having *A* = - 0.5, *B* = *C* = 1 and *D* = 0 is stable, since *λ*(*A*) = 0.5 is in the unit circle. Equivalently, the corresponding transfer function *G*(*z*) = (*z* + 0.5)-1 has a pole at *z* = - 0.5 which is inside the unit circle and so the system is stable.

#### **2.2.12 Adjoint Systems**

Let : *<sup>p</sup>* → *<sup>q</sup>* be a linear system operating on the interval [0, *T*]. Then : *<sup>H</sup> <sup>q</sup>* → *<sup>p</sup>* , the adjoint of , is the unique linear system such that, for all α *<sup>q</sup>* and *w <sup>p</sup>* , <α, *w*> =< *<sup>H</sup>* α, *w*>. The following derivation is a simplification of the time-varying version that appears in [9].

<sup>&</sup>quot;Eighty percent of success is showing up." *(Woody) Allen Stewart Konigsberg*

*Lemma 1 (State-space representation of an adjoint system): Suppose that a discrete-time linear time-invariant system is described by* 

$$\mathbf{x}\_{k+1} = A\mathbf{x}\_k + B\mathbf{w}\_{k'} \tag{35}$$

$$\mathcal{Y}\_k = \mathbb{C}\mathfrak{x}\_k + \mathrm{D}w\_k.\tag{36}$$

*with x0 = 0. The adjoint <sup>H</sup> is the linear system having the realisation* 

$$\mathcal{L}\_{k-1} = \mathbf{A}^T \mathcal{L}\_k - \mathbf{C}^T \mathbf{a}\_{k'} \tag{37}$$

$$\mathcal{J}\_k = -\mathbf{B}^\top \mathcal{L}\_k + \mathbf{D}^\top \boldsymbol{\alpha}\_k \,. \tag{38}$$

*with* 0 *<sup>T</sup> .* 

Smoothing, Filtering and Prediction:

that operates on an input process

→ *<sup>p</sup>* ,

is said to be asymptotically stable if the

(32) (33)

32 Estimating the Past, Present and Future

Following the approach of [7], it is assumed that the continuous-time signals are integrated between samples, for example, the discretised measurement noise is

. Then the corresponding measurement noise covariance is

In some applications, such as inertial and satellite navigation [8], the underlying dynamic equations are in continuous-time, whereas the filters are implemented in discrete-time. In this case, any underlying continuous-time equations together with (28) – (30) can be calculated within a high rate foreground task, so that the discretised state-space parameters will be sufficiently accurate. The discrete-time filter recursions can then be executed within a

. (34)

output remains bounded, that is, *y* ℓ2, for any input *w* ℓ2. Two equivalent conditions for

(i) The *i* eigenvalues of the system's state matrix are inside the unit circle, that is, for *Ai*

(ii) The *i* poles of the system's transfer function are inside the unit circle, that is, for *α*i

*Example 6.* A state-space system having *A* = - 0.5, *B* = *C* = 1 and *D* = 0 is stable, since *λ*(*A*) = 0.5 is in the unit circle. Equivalently, the corresponding transfer function *G*(*z*) = (*z* + 0.5)-1

: *<sup>p</sup>* → *<sup>q</sup>* be a linear system operating on the interval [0, *T*]. Then : *<sup>H</sup> <sup>q</sup>*

, is the unique linear system such that, for all α *<sup>q</sup>* and *w <sup>p</sup>* , <α,

α, *w*>. The following derivation is a simplification of the time-varying version

has a pole at *z* = - 0.5 which is inside the unit circle and so the system is stable.

"Eighty percent of success is showing up." *(Woody) Allen Stewart Konigsberg*

1 *<sup>s</sup>* 1

( 1)

*s k T <sup>D</sup> <sup>C</sup> <sup>C</sup> kT <sup>s</sup> <sup>s</sup> R Rd R T T*

2

*C C D C* , *D D D C* .

and thus

1 ( 1) ( ) ( ) *<sup>s</sup> s k T*

lower rate background task.

**2.2.11 Asymptotic Stability** 

of (11), () 1 

> *<sup>i</sup>* < 1.

of (9),

**2.2.12 Adjoint Systems** 

Let 

the adjoint of

 *w*> =< *<sup>H</sup>* 

that appears in [9].

Consider a discrete-time, linear, time-invariant system

*w* and produces an output process *y*. The system

to be asymptotically stable are as follows.

*<sup>i</sup> A* .

 

*<sup>s</sup> kT <sup>s</sup> v kT v d T*

*Proof: The system (35) – (36) can be written equivalently* 

$$
\begin{bmatrix} zI - A & -B \\ \mathbf{C} & D \end{bmatrix} \begin{bmatrix} \mathbf{x}(t) \\ w(t) \end{bmatrix} = \begin{bmatrix} \mathbf{0}(t) \\ y(t) \end{bmatrix} \tag{39}
$$

*with x0 = 0. Thus* 

$$<\mathcal{G},\ \mathcal{G}\ w\flat = \left\langle \begin{bmatrix} \zeta^{\top} \\ \alpha \end{bmatrix}, \begin{bmatrix} \operatorname{z}I - A & -B \\ \operatorname{C} & D \end{bmatrix} \begin{bmatrix} \operatorname{x} \\ \operatorname{w} \end{bmatrix} \right\rangle \tag{40}$$

$$= \sum\_{k=1}^{N} \zeta\_{k}^{\top} \mathbf{x}\_{k+1} - \sum\_{k=1}^{N} \zeta\_{k}^{\top} (\operatorname{A}\mathbf{x}\_{k} + B\boldsymbol{w}\_{k}) + \sum\_{k=1}^{N} \alpha\_{k}^{\top} (\operatorname{C}\mathbf{x}\_{k} + D\boldsymbol{w}\_{k})$$

$$= \begin{Bmatrix} \mathbb{Z}^{-1}I - A^{\top} & \mathbb{C}^{\top} \\\\ -\mathbb{B}^{\top} & D^{\top} \end{Bmatrix} \begin{bmatrix} \zeta \\ \alpha \end{bmatrix}, \begin{bmatrix} \operatorname{x} \\ \operatorname{w} \end{bmatrix} \end{Bmatrix}$$

$$= \mathcal{G}^{H}\mathfrak{a},\mathfrak{w}\rhd > \tag{41}$$

*where <sup>H</sup> is given by (37) – (38). �* 

Thus, the adjoint of a discrete-time system having the parameters *A B C D* is a system with parameters *T T T T A C B D* . Adjoint systems have the property ( ) *H H* = . The adjoint of

<sup>&</sup>quot;There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact." *Samuel Langhorne Clemens aka. Mark Twain*

*Example 10.* Suppose that it is desired to realise the system 2 1 () () () *<sup>H</sup> Gz G zG z* , in which *G*1(*z*)

*W*(*z*) *Y*1(*z*) *Y*2(*z*)

<sup>1</sup> ( ) *<sup>T</sup> Y z*

<sup>2</sup> ( ) *<sup>T</sup> Y z* <sup>1</sup>

, (46)

*yy z GQG z* , (45)

*uz yz vz* () () () , (47)

*uu z GQG z R* (48)

<sup>2</sup> *Gz z* ( ) ( 0.9) . This system can be realised using

1

*w* and its corresponding transfer

Timereverse transpose

<sup>2</sup> ( ) (0.9 1) *<sup>H</sup> Gz z z* , that is, <sup>1</sup>

<sup>1</sup> *G z*( ) <sup>2</sup> ( ) *<sup>T</sup> G z*

function matrix *G*(*z*). Then ( ) *yy z* , the power spectral density of *y*, is given by

*<sup>e</sup> <sup>T</sup> yy <sup>k</sup> <sup>e</sup> z dz y dk y t E y t y t*

() () *<sup>H</sup>*

which has the property ( ) *yy z* = <sup>1</sup> ( ) *yy z* . From Parseval's Theorem (5), the average total

2 2 <sup>2</sup> ( ) ( ) { ( ) ( )}

To avoid confusion with the z-transform variable, denote the noisy measurements of *y*(*z*) =

where *v*(*z*) *<sup>p</sup>* is the z-transform of an independent, zero-mean, stationary, white

*R jk* . Let

denote the spectral density matrix of the measurements *u*(*t*). A discrete-time transfer function is said to be minimum phase if its zeros lies inside the unit circle. Conversely, transfer functions having outside-unit-circle-zeros are known as non-minimum phase.

Suppose that Ф*uu*(*z*) is a spectral density matrix of transfer functions possessing equal order numerator and denominator polynomials that do not have roots on the unit circle. Then the

() () *<sup>H</sup>*

= (*z* + 0.6)-1 <sup>1</sup>

the processes shown in Fig. 2.

**2.2.15 Power Spectral Density** 

energy of *y(t)* is given by

**2.2.16 Spectral Factorisation** 

*G*(*z*)*w*(*z*) by

Figure 2. Realising an unstable 2 1 () () () *<sup>H</sup> Gz G zG z* .

Timereverse transpose

<sup>1</sup> *<sup>y</sup>* ( )*<sup>z</sup>*

Consider again a linear, time-invariant system *y* =

*jwT jwT*

measurement noise process with { }*<sup>T</sup> Evvj <sup>k</sup>* =

spectral factor matrix Δ(*z*) satisfies the following.

"Knowledge advances by steps, and not by leaps." *Baron Thomas Macaulay*

which equals the area under the power spectral density curve.

the transfer function matrix *G*(*z*) is denoted as *GH*(*z*) and is defined by the transfer function matrix

$$\mathcal{G}^{\mathbb{H}}(z) \equiv \mathcal{G}^{r}(z^{-}) . \tag{42}$$

*Example 7.* Suppose that a system has the state-space parameters *A* = - 0.5 and *B* = *C* = *D* = 1. From Lemma 1, an adjoint system has the state-space parameters *A* = −0.5, *B* = *C* = −1, *D* = 1 and the corresponding transfer function is *GH*(*z*) = 1 + (*z*-1 + 0.5)-1 = (3*z* + 2)(*z* + 2)-1, which is unstable and non-minimum-phase. Alternatively, the adjoint of *G*(*z*) = 1 + (*z* + 0.5)-1 = (*z* + 1.5)(*z* + 0.5)-1 can be obtained using (42), namely, *GH*(*z*) = *GT*(*z*-1) = (3*z* + 2)(*z* + 2)-1.

#### **2.2.13 Causal Systems**

A causal system is a system whose output depends exclusively on past and current inputs and outputs.

*Example 8.* Consider *xk*+1 = 0.3*xk* + 0.4*xk*-1 + *wk*. Since the output *xk*+1 depends only on past states *xk*, *xk*-1, and past inputs *wk*, this system is causal.

*Example 9.* Consider *xk* = 0.3*xk+*1 + 0.4*xk* + *wk+*1. Since the output *xk* depends on future outputs *xk+*1 and future *wk+*1 inputs, this system is non-causal.

#### **2.2.14 Realising Unstable System Components**

Unstable system components are termed unrealisable because their outputs are not in *ℓ2*, that is, they are unbounded. In other words, unstable systems cannot produce a useful output. However, an unstable causal component can be realised as a stable non-causal or backwards component. Consider the system (35) – (36) in which the eigenvalues of *A* all lie outside the unit circle. In this case, a stable adjoint system *β* = *<sup>H</sup> α* can be realised by the following three-step procedure.


$$
\mathcal{L}\_{\mathbf{r}\star 1} = \mathbf{A}^T \mathcal{L}\_{\mathbf{r}} + \mathbf{C}^T \mathcal{a}\_{\mathbf{r}\prime} \tag{43}
$$

$$\mathcal{B}\_{\boldsymbol{\tau}} = \mathbf{B}^{\boldsymbol{\tau}} \mathcal{L}\_{\boldsymbol{\tau}} + \mathbf{D}^{\boldsymbol{\tau}} \boldsymbol{\alpha}\_{\boldsymbol{\tau}}.\tag{44}$$

with 0 *<sup>T</sup>* .

(iii) Time-reverse the output signal , that is, construct *<sup>k</sup>* .

Thus if a system consists of a cascade of stable and unstable components, it can be realised by a combination of causal and non-causal components. This approach will be exploited in the realisation of smoothers subsequently.

<sup>&</sup>quot;I've lost my faith in science." *Ruth Elizabeth (Bette) Davis*

Smoothing, Filtering and Prediction:

( ) *<sup>H</sup> G z* <sup>1</sup> ( ) *<sup>T</sup> G z* . (42)

has the state-space parameters *A* = - 0.5 and *B* = *C* =

(35) – (36) in which the eigenvalues of *A* all

*α* can be realised by the

(43) (44)

*<sup>k</sup>* .

34 Estimating the Past, Present and Future

the transfer function matrix *G*(*z*) is denoted as *GH*(*z*) and is defined by the transfer function

*D* = 1. From Lemma 1, an adjoint system has the state-space parameters *A* = −0.5, *B* = *C* = −1, *D* = 1 and the corresponding transfer function is *GH*(*z*) = 1 + (*z*-1 + 0.5)-1 = (3*z* + 2)(*z* + 2)-1, which is unstable and non-minimum-phase. Alternatively, the adjoint of *G*(*z*) = 1 + (*z* + 0.5)-1 = (*z* + 1.5)(*z* + 0.5)-1 can be obtained using (42), namely, *GH*(*z*) = *GT*(*z*-1) = (3*z* + 2)(*z* + 2)-1.

A causal system is a system whose output depends exclusively on past and current inputs

*Example 8.* Consider *xk*+1 = 0.3*xk* + 0.4*xk*-1 + *wk*. Since the output *xk*+1 depends only on past

*Example 9.* Consider *xk* = 0.3*xk+*1 + 0.4*xk* + *wk+*1. Since the output *xk* depends on future outputs

Unstable system components are termed unrealisable because their outputs are not in *ℓ2*, that is, they are unbounded. In other words, unstable systems cannot produce a useful output. However, an unstable causal component can be realised as a stable non-causal or

*T T*

*T T*

Thus if a system consists of a cascade of stable and unstable components, it can be realised by a combination of causal and non-causal components. This approach will be exploited in

 

 *A C* 

 *B D* 

 

(i) Time-reverse the input signal α*k*, that is, construct α*τ*, where *τ* = *N* - *k* is a time-to-go

 

 

, that is, construct

 ,

 ,

matrix

*Example 7.* Suppose that a system

states *xk*, *xk*-1, and past inputs *wk*, this system is causal.

*xk+*1 and future *wk+*1 inputs, this system is non-causal.

**2.2.14 Realising Unstable System Components** 

lie outside the unit circle. In this case, a stable adjoint system *β* = *<sup>H</sup>*

1

backwards component. Consider the system

(ii) Realise the stable system *<sup>T</sup>*

(iii) Time-reverse the output signal

the realisation of smoothers subsequently.

"I've lost my faith in science." *Ruth Elizabeth (Bette) Davis*

following three-step procedure.

variable.

with 0 *<sup>T</sup>* .

**2.2.13 Causal Systems** 

and outputs.

*Example 10.* Suppose that it is desired to realise the system 2 1 () () () *<sup>H</sup> Gz G zG z* , in which *G*1(*z*) = (*z* + 0.6)-1 <sup>1</sup> <sup>2</sup> ( ) (0.9 1) *<sup>H</sup> Gz z z* , that is, <sup>1</sup> <sup>2</sup> *Gz z* ( ) ( 0.9) . This system can be realised using the processes shown in Fig. 2.

Figure 2. Realising an unstable 2 1 () () () *<sup>H</sup> Gz G zG z* .

#### **2.2.15 Power Spectral Density**

Consider again a linear, time-invariant system *y* = *w* and its corresponding transfer function matrix *G*(*z*). Then ( ) *yy z* , the power spectral density of *y*, is given by

 () () *<sup>H</sup> yy z GQG z* , (45)

which has the property ( ) *yy z* = <sup>1</sup> ( ) *yy z* . From Parseval's Theorem (5), the average total energy of *y(t)* is given by

$$\int\_{-\epsilon'}^{\epsilon'} \Phi\_{yy}(z) dz = \int\_{-\epsilon}^{\epsilon} \left| y\_k \right|^2 dk = \left\| y(t) \right\|\_2^2 = E\{y^\top(t)y(t)\} \, \, \, \, \tag{46}$$

which equals the area under the power spectral density curve.

#### **2.2.16 Spectral Factorisation**

To avoid confusion with the z-transform variable, denote the noisy measurements of *y*(*z*) = *G*(*z*)*w*(*z*) by

*uz yz vz* () () () , (47)

where *v*(*z*) *<sup>p</sup>* is the z-transform of an independent, zero-mean, stationary, white measurement noise process with { }*<sup>T</sup> Evvj <sup>k</sup>* = *R jk* . Let

$$\Phi\_{\mu\mu}(z) = \mathbf{G}Q\mathbf{G}^{\mu}(z) + R \tag{48}$$

denote the spectral density matrix of the measurements *u*(*t*). A discrete-time transfer function is said to be minimum phase if its zeros lies inside the unit circle. Conversely, transfer functions having outside-unit-circle-zeros are known as non-minimum phase.

Suppose that Ф*uu*(*z*) is a spectral density matrix of transfer functions possessing equal order numerator and denominator polynomials that do not have roots on the unit circle. Then the spectral factor matrix Δ(*z*) satisfies the following.

<sup>&</sup>quot;Knowledge advances by steps, and not by leaps." *Baron Thomas Macaulay*

transfer function is zero at *z* = 0. Thus, the non-causal part of *G*(*z*), denoted by {*G*(*z*)}− , is

{*G*(*z*)}+ = *G*(*z*) − {*G*(*z*)}− <sup>=</sup>*c0* + *Giucp*(*z*) + *Goucp*(0). (51)

Hence, the causal part of transfer function can be found by carrying out the following three

(i) If the transfer function is not strictly proper, that is, if the order of the numerator not less than the degree of the denominator, perform synthetic division to extract

(ii) Expand out the (strictly proper) transfer function into the sum of partial fractions

(iii) Obtain the causal part from (51), namely, take the sum of the constant term, the partial fractions with inside-unit-circle-poles and the partial fractions with outside-

*<sup>z</sup>* <sup>2</sup> . It follows from (50) and (51) that {*G*(*z*)}− =

, respectively. It is easily verified that *G*(*z*)

2 2

0.6

2 8.2 5.6 2.6 1.2

*<sup>z</sup>* <sup>2</sup> . It follows from (50) and (51) that

, respectively.

*z z z z* 

and the causal part of *G*(*z*), denoted by {*G*(*z*)}+ ,is whatever remains, that is,

*Example 13.* Consider the strictly proper transfer function *G*(*z*) = <sup>2</sup>

0.6

*z* 0.6

1 *z* 0.6

"The beginning of knowledge is the discovery of something we do not understand." *Frank Patrick Herber*

+

2

+ 1 + 2 = 3 2.8

*z z* 

+ 1 = 1.6

*z z* 

{*G*(*z*)}− = *Goucp*(*z*) − *Goucp*(0) (50)

3 3.2 2.6 1.2

> 2 *<sup>z</sup>* <sup>2</sup> − 1 =

=

*z z z* 

. Carrying out

obtained as

steps.

the constant term.

=

and {*G*(*z*)}+ =

2 *<sup>z</sup>* <sup>2</sup> − 1 =

1 *z* 0.6

synthetic division results in *G*(*z*) = 2 + <sup>1</sup>

2 *z z* 

unit-circle-poles evaluated at *z* = 0.

+

1 *z* 0.6 2

*Example 14.* Consider the proper transfer function *G*(*z*) =

and {*G*(*z*)}+ =

(49).

3 3.2 ( 0.6)( 2) *z z z* 

{*G*(*z*)}+ + {*G*(*z*)}−.

2 *z z* 

{*G*(*z*)}− =


The problem of spectral factorisation within discrete-time Wiener filtering problems is studied in [10]. The roots of the transfer function polynomials need to be sorted into those inside the unit circle and those outside the unit circle. Spectral factors can be found using Levinson-Durbin and Schur algorithms, Cholesky decomposition, Riccati equation solution [11] and Newton-Raphson iteration [12].

*Example 11.* Applying the Bilinear Transform (15) to the continuous-time low-pass plant *G*(*s*) = (*s* + 1)-1 for a sample frequency of 2 Hz yields *G*(*z*) = 0.2(*z*+1)(*z*-0.6)-1. With *Q* = *R* = 1, the measurement spectral density (48) is (1.08 0.517) ( 0.517 1.08) ( ) ( 0.6) ( 0.6 1.0) *uu z z z z z* . By

inspection, Δ(*z*) = (1.08*z* − 0.517)(*z* − 0.6)-1 has inside-unit-circle-poles and zeros that satisfy Δ(*z*)Δ*H*(*z*) = Ф*uu*(*z*).

*Example 12.* Consider the high-pass plant *G*(*z*) = 4.98(*z* − 0.6)(*z +* 0.99)-1 and *Q* = *R* = 1. The spectral density is (5.39 2.58) ( 2.58 5.39) ( ) ( 0.99) (0.99 1.0) *uu z z z z z* . Thus the stable, minimum phase spectral factor is Δ(*z*) = (5.39*z* − 2.58)(*z* + 0.99)-1, since it has inside-unit-circle-poles and zeros.

#### **2.2.17 Calculating Causal Parts**

Suppose that a discrete-time transfer function has the form

$$G(z) = c\_0 + \sum\_{\substack{i=1,\|a\_i\|<1}}^n \frac{d\_i}{z - a\_i} + \sum\_{\substack{j=1,\|b\_j\|>1}}^m \frac{c\_j}{z - b\_j} \tag{49}$$

$$=\mathcal{c}\_0 + \mathcal{G}\_{\text{inc}cp}(z) + \mathcal{G}\_{\text{coup}}(z),$$

where *c0*, *di*, *ej* , *Giucp*(*z*) = 1, 1 *<sup>i</sup> n i i a i d z a* is the sum of partial fractions having inside-unit-

circle-poles and *Goucp*(*z*) = 1, 1 *<sup>j</sup> <sup>m</sup> <sup>j</sup> j b j e z b* is the sum of partial fractions having outside-unit-

circle-poles. Assume that the roots of *G*(*z*) are distinct and do not lie on the unit circle. In this case the partial fraction coefficients *di* and *ei* within (49) can be calculated from the numerator and denominator polynomials of *G*(*z*) via ( ) () *<sup>i</sup> <sup>i</sup> <sup>i</sup> z a d z a Gz* and ( ) () *<sup>j</sup> <sup>j</sup> <sup>j</sup> z b e z b Gz* . Previously, in continuous-time, the convention was to define constants to be causal. This is consistent with ensuring that the non-causal part of the discrete-time

<sup>&</sup>quot;Knowledge rests not upon truth alone, but on error also." *Carl Gustav Jung*

Smoothing, Filtering and Prediction:

*z z*

*z z* . By

(49)

*<sup>i</sup> <sup>i</sup> <sup>i</sup> z a d z a Gz* and

36 Estimating the Past, Present and Future

(iii) Δ-1(*z*) is causal, that is, the zeros of Δ(*z*) which are the poles of Δ-1(*z*) are inside the

The problem of spectral factorisation within discrete-time Wiener filtering problems is studied in [10]. The roots of the transfer function polynomials need to be sorted into those inside the unit circle and those outside the unit circle. Spectral factors can be found using Levinson-Durbin and Schur algorithms, Cholesky decomposition, Riccati equation solution

*Example 11.* Applying the Bilinear Transform (15) to the continuous-time low-pass plant *G*(*s*) = (*s* + 1)-1 for a sample frequency of 2 Hz yields *G*(*z*) = 0.2(*z*+1)(*z*-0.6)-1. With *Q* = *R* = 1,

inspection, Δ(*z*) = (1.08*z* − 0.517)(*z* − 0.6)-1 has inside-unit-circle-poles and zeros that satisfy

*Example 12.* Consider the high-pass plant *G*(*z*) = 4.98(*z* − 0.6)(*z +* 0.99)-1 and *Q* = *R* = 1. The

spectral factor is Δ(*z*) = (5.39*z* − 2.58)(*z* + 0.99)-1, since it has inside-unit-circle-poles and

*i i a i*

circle-poles. Assume that the roots of *G*(*z*) are distinct and do not lie on the unit circle. In this case the partial fraction coefficients *di* and *ei* within (49) can be calculated from the

*<sup>j</sup> <sup>j</sup> <sup>j</sup> z b e z b Gz* . Previously, in continuous-time, the convention was to define constants to be causal. This is consistent with ensuring that the non-causal part of the discrete-time

*d z a* <sup>+</sup> *z*

. Thus the stable, minimum phase

1, 1 *<sup>j</sup>*

*<sup>m</sup> <sup>j</sup> j b j*

*e z b*

*z a* is the sum of partial fractions having inside-unit-

*z b* is the sum of partial fractions having outside-unit-

the measurement spectral density (48) is (1.08 0.517) ( 0.517 1.08) ( ) ( 0.6) ( 0.6 1.0) *uu*

*z z*

*z z*

1, 1 *<sup>i</sup> n*

*i i a i*

numerator and denominator polynomials of *G*(*z*) via ( ) ()

*d*

(ii) Δ(*z*) is causal, that is, the poles of Δ(*z*) are inside the unit circle.

(i) Δ(*z*) Δ*H*(*z*) = Ф*uu*(*z*).

[11] and Newton-Raphson iteration [12].

spectral density is (5.39 2.58) ( 2.58 5.39) ( ) ( 0.99) (0.99 1.0) *uu*

Suppose that a discrete-time transfer function has the form

= *c0* + *Giucp*(*z*) + *Goucp*(*z*),

*G*(*z*) = *c0* +

1, 1 *<sup>j</sup>*

1, 1 *<sup>i</sup> n*

*e*

*<sup>m</sup> <sup>j</sup> j b j*

"Knowledge rests not upon truth alone, but on error also." *Carl Gustav Jung*

*z*

**2.2.17 Calculating Causal Parts** 

where *c0*, *di*, *ej* , *Giucp*(*z*) =

circle-poles and *Goucp*(*z*) =

( ) ()

unit circle.

Δ(*z*)Δ*H*(*z*) = Ф*uu*(*z*).

zeros.

transfer function is zero at *z* = 0. Thus, the non-causal part of *G*(*z*), denoted by {*G*(*z*)}− , is obtained as

$$\{G(z)\}\_{-} = G\_{\text{uccp}}(z) - G\_{\text{uccp}}(0) \tag{50}$$

and the causal part of *G*(*z*), denoted by {*G*(*z*)}+ ,is whatever remains, that is,

$$\begin{aligned} \{G(z)\}\_{\*} &= G(z) - \{G(z)\}\_{-} \\ &= \mathfrak{c}\_{0} + \mathfrak{G}\_{\mathrm{in}cp}(z) + \mathfrak{G}\_{\mathrm{amp}}(0). \end{aligned} \tag{51}$$

Hence, the causal part of transfer function can be found by carrying out the following three steps.


*Example 13.* Consider the strictly proper transfer function *G*(*z*) = <sup>2</sup> 3 3.2 2.6 1.2 *z z z* = 3 3.2 ( 0.6)( 2) *z z z* = 1 *z* 0.6 + 2 *<sup>z</sup>* <sup>2</sup> . It follows from (50) and (51) that {*G*(*z*)}− = 2 *<sup>z</sup>* <sup>2</sup> − 1 = 2 *z z* and {*G*(*z*)}+ = 1 *z* 0.6 + 1 = 1.6 0.6 *z z* , respectively. It is easily verified that *G*(*z*) {*G*(*z*)}+ + {*G*(*z*)}−.

*Example 14.* Consider the proper transfer function *G*(*z*) = 2 2 2 8.2 5.6 2.6 1.2 *z z z z* . Carrying out synthetic division results in *G*(*z*) = 2 + <sup>1</sup> *z* 0.6 + 2 *<sup>z</sup>* <sup>2</sup> . It follows from (50) and (51) that {*G*(*z*)}− = 2 *<sup>z</sup>* <sup>2</sup> − 1 = 2 *z z* and {*G*(*z*)}+ = 1 *z* 0.6 + 1 + 2 = 3 2.8 0.6 *z z* , respectively.

<sup>&</sup>quot;The beginning of knowledge is the discovery of something we do not understand." *Frank Patrick Herber*

<sup>1</sup>

*jwT jwT*

*jwT jwT jwT jwT*

where

expressed as

bound of ( ) *jwT jwT e*

by a judicious choice for *H*(*z*).

*which minimises* ( )

2

*Walker Bush*

*measurements (52) and error (53) is* 

*jwT jwT e*

*ee <sup>e</sup> z dz .* 

<sup>0</sup> ( ) () () () <sup>0</sup> () ()

is the spectral density matrix of the measurements. Completing the square within (55) yields

in which <sup>1</sup> () ( ) () *<sup>H</sup> <sup>H</sup> z z* . It follows that the total energy of the error signal can be

The first term on the right-hand-side of (58) is independent of *H*(*z*) and represents a lower

*Theorem 1: The optimal solution for the above linear time-invariant estimation problem with* 

*Proof: The result follows by setting* 1 2 ( ) ( ) *H H H z G QG z equal to the zero matrix within (58). �*

By Parseval's theorem, the minimum mean-square-error solution (59) also minimises

unit-circle poles. This optimal non-causal solution is actually a smoother, which can be

"I think anybody who doesn't think I'm smart enough to handle the job is underestimating." *George* 

1

1 1 1 2 2 1 () () ( ) () *<sup>H</sup> H H <sup>H</sup> ee z G QG z G QG G QG z*

+ 1 2 1 2 ( () ( ))( ( ) ( )) *H H HH H H z G QG z H z G QG z* ,

1 1 1 2 2 1 ( ) () ( ) ()

1 2 1 2 ( () ( ))( ( ) ( ))

*<sup>e</sup> <sup>e</sup> <sup>H</sup> H H <sup>H</sup> ee <sup>e</sup> <sup>e</sup> z dz G QG z G QG G QG z dz*

<sup>2</sup> *e z*( ) . The solution (59) is non-causal because the factor <sup>1</sup>

realised by a combination of forward and backward processes.

*<sup>R</sup> H z H z HG z G z*

1 1 1 2 2 1 ( ) () () ( ) *<sup>H</sup> H H <sup>H</sup> H H G QG z G QG H z HG QG z H H z* ,

2 1

2 2 () () *<sup>H</sup> <sup>H</sup> z G QG z R* (56)

(57)

(58)

*Q GH z G z*

1

*<sup>e</sup> H H HH H <sup>e</sup> H z G QG z H z G QG z dz* .

*ee <sup>e</sup> z dz* . The second term on the right-hand-side of (58) may be minimised

1 2 ( ) ( ) *H H H z G QG z ,* (59)

<sup>2</sup> ( ) () *H H G z* possesses outside-

*H H H H*

Figure 3. The general z-domain filtering problem.

#### **2.3 Minimum-Mean-Square-Error Filtering**

#### **2.3.1 Filter Derivation**

This section derives the optimal non-causal minimum-mean-square-error solution for the problem configuration of Fig. 3. The derivation is identical to the continuous-time case which is presented in Chapter 1. It is assumed that the parameters of the stable transfer function *G*2(*z*) = *C*2(zI – *A*)-1*B* + *D*2 are known. Let *Y*2(*z*), *W*(*z*), *V*(*z*) and *U*(*z*) denote the ztransform of a system's output, process noise, measurement noise and observations, respectively. Then it follows from (47) that the z-transform of the measurements is

$$\mathcal{U}(z) = \mathcal{Y}\_2(z) + \mathcal{V}(z) \,. \tag{52}$$

Consider a fictitious reference system *G*1(*z*) = *C*1(zI – *A*)-1*B* + *D*1 as shown in Fig. 3. The problem is to design a filter transfer function *H*(z) to calculate estimates <sup>1</sup> ˆ *Y z*( ) = *H*(*z*)*U*(*z*) of

*<sup>Y</sup>*1(*z*) so that the energy () () *<sup>j</sup> <sup>H</sup> j E z E z dz* of the estimation error

$$E(z) = \begin{array}{c} \hat{Y}\_1(z) \ - \ Y\_1(z) \end{array} \tag{53}$$

is minimised. It can be seen from Fig. 3 that the estimation error is generated by the system

$$E(z) = -\begin{bmatrix} H(z) & H\_zG(z) - G\_1(z) \end{bmatrix} \begin{bmatrix} V(z) \\ W(z) \end{bmatrix}. \tag{54}$$

The error power spectrum density matrix is given by the covariance of *E*(*z*), that is,

$$\Phi\_{\alpha}(z) = E(z)E^{\mathsf{H}}(z) \tag{55}$$

<sup>&</sup>quot;I shall try to correct errors when shown to be errors; and I shall adopt new views so fast as they shall appear to be true views." *Abraham Lincoln*

$$= \begin{bmatrix} H(z) & H\mathbf{G}\_2(z) - \mathbf{G}\_1(z) \\ \end{bmatrix} \begin{bmatrix} R & \mathbf{0} \\ \mathbf{0} & \mathbf{Q} \end{bmatrix} \begin{bmatrix} H^H(z) \\ \mathbf{G}\_2^H H^H(z) - \mathbf{G}\_1^H(z) \end{bmatrix}$$

$$= \mathbf{G}\_1 \mathbf{Q} \mathbf{G}\_1^H(z) - \mathbf{G}\_1 \mathbf{Q} \mathbf{G}\_2^H H^H(z) - H \mathbf{G}\_2 \mathbf{Q} \mathbf{G}\_1^H(z) + H \boldsymbol{\Delta} \boldsymbol{\Delta}^H H^H(z),$$

where

Smoothing, Filtering and Prediction:

*E(z)* 

Σ

1 ˆ *Y z*( )

*+* 

38 Estimating the Past, Present and Future

*G2(z)* Σ *H(z)* 

*Y2(z)* 

*Y1(z)* 

*V(z)* 

*+* 

*U(z)* 

*+ \_* 

This section derives the optimal non-causal minimum-mean-square-error solution for the problem configuration of Fig. 3. The derivation is identical to the continuous-time case which is presented in Chapter 1. It is assumed that the parameters of the stable transfer function *G*2(*z*) = *C*2(zI – *A*)-1*B* + *D*2 are known. Let *Y*2(*z*), *W*(*z*), *V*(*z*) and *U*(*z*) denote the ztransform of a system's output, process noise, measurement noise and observations,

Consider a fictitious reference system *G*1(*z*) = *C*1(zI – *A*)-1*B* + *D*1 as shown in Fig. 3. The

*E z E z dz* of the estimation error

is minimised. It can be seen from Fig. 3 that the estimation error is generated by the system

2 1 ( ) () () () () ( )

"I shall try to correct errors when shown to be errors; and I shall adopt new views so fast as they shall

*U*(*z*) <sup>2</sup> *Y z*( ) + *V z*( ) . (52)

*Y z*( ) *Y*1(*z*) (53)

*V z*

*W z*

*ee z EzE z* (55)

ˆ

. (54)

*Y z*( ) = *H*(*z*)*U*(*z*) of

respectively. Then it follows from (47) that the z-transform of the measurements is

problem is to design a filter transfer function *H*(z) to calculate estimates <sup>1</sup>

*<sup>E</sup>*(*z*) <sup>1</sup> ˆ

*Ez Hz HGz G z*

The error power spectrum density matrix is given by the covariance of *E*(*z*), that is,

() () () *<sup>H</sup>*

Figure 3. The general z-domain filtering problem.

*G1(z)* 

**2.3 Minimum-Mean-Square-Error Filtering** 

*W(z)* 

*<sup>Y</sup>*1(*z*) so that the energy () ()

appear to be true views." *Abraham Lincoln*

 *<sup>j</sup> <sup>H</sup> j*

**2.3.1 Filter Derivation** 

$$
\Delta\Delta^H(z) = G\_2 Q G\_2^H(z) + R \tag{56}
$$

is the spectral density matrix of the measurements. Completing the square within (55) yields

$$\begin{aligned} \Phi\_{\boldsymbol{w}}(\mathbf{z}) &= \mathbf{G}\_{l}\mathbf{Q}\mathbf{G}\_{l}^{H}(\mathbf{z}) - \mathbf{G}\_{l}\mathbf{Q}\mathbf{G}\_{2}^{H}(\boldsymbol{\Delta}\boldsymbol{\Lambda}^{H})^{-1}\mathbf{G}\_{2}\mathbf{Q}\mathbf{G}\_{l}^{H}(\mathbf{z}) \\ &+ (\mathbf{H}\boldsymbol{\Lambda}(\mathbf{z}) - \mathbf{G}\_{l}\mathbf{Q}\mathbf{G}\_{2}^{H}\boldsymbol{\Lambda}^{-H}(\mathbf{z}))(\mathbf{H}\boldsymbol{\Lambda}(\mathbf{z}) - \mathbf{G}\_{l}\mathbf{Q}\mathbf{G}\_{2}^{H}\boldsymbol{\Lambda}^{-H}(\mathbf{z}))^{H}, \end{aligned} \tag{57}$$

in which <sup>1</sup> () ( ) () *<sup>H</sup> <sup>H</sup> z z* . It follows that the total energy of the error signal can be expressed as

$$\begin{split} \int\_{-\epsilon^{\mu\tau}}^{\mu\tau} \Phi\_{\alpha}(\mathbf{z}) d\mathbf{z} &= \int\_{-\epsilon^{\mu\tau}}^{\mu\tau} \mathbf{G}\_{1} \mathbf{Q} \mathbf{G}\_{1}^{H}(\mathbf{z}) - \mathbf{G}\_{1} \mathbf{Q} \mathbf{G}\_{2}^{H}(\mathbf{A}\boldsymbol{\Delta}^{H})^{-1} \mathbf{G}\_{2} \mathbf{Q} \mathbf{G}\_{1}^{H}(\mathbf{z}) d\mathbf{z} \\ &+ \int\_{-\epsilon^{\mu\tau}}^{\mu\tau} \left( H\Delta(\mathbf{z}) - \mathbf{G}\_{1} \mathbf{Q} \mathbf{G}\_{2}^{H}\boldsymbol{\Delta}^{-H}(\mathbf{z}) \right) \left( H\Delta(\mathbf{z}) - \mathbf{G}\_{1} \mathbf{Q} \mathbf{G}\_{2}^{H}\boldsymbol{\Delta}^{-H}(\mathbf{z}) \right)^{H} d\mathbf{z} \,. \end{split} \tag{58}$$

The first term on the right-hand-side of (58) is independent of *H*(*z*) and represents a lower bound of ( ) *jwT jwT e ee <sup>e</sup> z dz* . The second term on the right-hand-side of (58) may be minimised by a judicious choice for *H*(*z*).

*Theorem 1: The optimal solution for the above linear time-invariant estimation problem with measurements (52) and error (53) is* 

$$H(z) = \mathbf{G}\_1 \mathbf{Q} \mathbf{G}\_2^H \boldsymbol{\Delta}^{-H} \boldsymbol{\Delta}^{-1}(z) \,. \tag{59}$$

*which minimises* ( ) *jwT jwT e ee <sup>e</sup> z dz .* 

*Proof: The result follows by setting* 1 2 ( ) ( ) *H H H z G QG z equal to the zero matrix within (58). �*

By Parseval's theorem, the minimum mean-square-error solution (59) also minimises 2 <sup>2</sup> *e z*( ) . The solution (59) is non-causal because the factor <sup>1</sup> <sup>2</sup> ( ) () *H H G z* possesses outsideunit-circle poles. This optimal non-causal solution is actually a smoother, which can be realised by a combination of forward and backward processes.

<sup>&</sup>quot;I think anybody who doesn't think I'm smart enough to handle the job is underestimating." *George Walker Bush*

<sup>1</sup> ( ) (0) ( ) *<sup>H</sup> H z IR z OE*

0.098)(*z* + 0.341)-1. The same solution can be calculated using Δ*-H*(0) = 0.698 within (65).

0, 0 lim ( ) *jwT OE R e Hz I*

+ 0.2)-1, which illustrates the low measurement noise asymptote (66).

optimum non-causal solution (59) for the case of *G*1(*z*) = *I* is

<sup>1</sup>

<sup>1</sup> <sup>1</sup>

<sup>1</sup> <sup>1</sup>

*Example 15.* Consider *G*2(*z*) = (*z* + 0.2)(*z* + 0.5)-1 together with *R* = *Q* = 1. The spectral factor is Δ(*z*) = (1.43*z* + 0.489)(*z* + 0.5)-1, which leads to 2 2 ( ) *H H G QG z* = (0.2*z*2 + 1.04*z* + 0.2)(0.489*z*2 + 1.67*z* + 0.716)-1 and 2 2 { ( )} *H H G QG z* = (0.734*z* + 0.14)(*z* + 0.5)-1. Hence, from (63), *HOE*(*z*) = (0.513*z* +

When the measurement noise becomes negligibly small, the output estimator approaches a

The above observation can be verified by substituting *R* = 0 into (65). This asymptote is consistent with intuition, that is, when the measurements are perfect, output estimation will

*Example 16.* Substituting R = 0.001 within Example 15 yields the filter H(z) = (0.999z + 0.2)(z

In input estimation or equalisation problems, *G*2(*z*) is known as the channel model and it is desired to estimate the input process *w(t)*, as depicted in Fig. 5. The simplification of the

1

Assume that: the channel model *G*2(*z*) is proper, that is, the order of the numerator is the same as the order of the denominator; and that the channel model *G*2(*z*) is stable and minimum phase, that is, its poles and zeros are inside the unit circle. The causal equaliser for proper, stable, minimum-phase channels is obtained by substituting *G*1(*z*) = *I* into (60)

> <sup>2</sup> () { } () *H H H z QG z IE*

1 1 22 2 () { } () *H H H z G G QG z IE* 

<sup>2</sup> { ( ) } () *<sup>H</sup> <sup>H</sup> GR z*

"He who knows nothing is closer to the truth than he whose mind is filled with falsehoods and errors."

<sup>2</sup> ( { } ( )) *<sup>H</sup> GIR z*

<sup>2</sup> (0) (0) ( ) *H H QG z* .

<sup>2</sup> ( ) ( ) *H H H z QG z IE*

<sup>1</sup>

Under the above assumptions, the causal equaliser may be written equivalently as

which eliminates the need for calculating causal parts.

short circuit, that is,

be superfluous.

*Thomas Jefferson.*

**2.3.3 Input Estimation** 

, (65)

, (66)

, (67)

(68)

(69)

(70)

The transfer function matrix of the optimal causal solution or filter is obtained by setting the setting the causal part of 1 2 ( ) ( ) *H H H z G QG z* equal to the zero matrix, resulting in <sup>1</sup> { ( ) ( )} *Hz z* = <sup>1</sup> 1 2 { ( )} *H H G QG* , that is <sup>1</sup> *Hz z* () () = <sup>1</sup> 1 2 { ( )} *H H G QG* , which implies

$$H(\mathbf{z}) = \left\langle \mathbf{G}\_1 \mathbf{Q} \mathbf{G}\_2^H (\boldsymbol{\Delta}^H)^{-1} \right\rangle\_\* \boldsymbol{\Delta}^{-1}(\mathbf{z}) \,. \tag{60}$$

Figure 4. The z-domain output estimation problem.

#### **2.3.2 Output Estimation**

In output estimation, it is desired to estimate the output *Y*2(*z*) from the measurements *U*(*z*), in which case the reference system is the same as the generating system, as shown in Fig. 4. The optimal non-causal solution (59) with *G*1(*z*) = *G*2(*z*) becomes

$$H\_{\rm OE}(z) = \mathbf{G}\_2 \mathbf{Q} \mathbf{G}\_2^H \boldsymbol{\Delta}^{-H} \boldsymbol{\Delta}^{-1}(z) \,. \tag{61}$$

Substituting 2 2 ( ) *<sup>H</sup> G QG z* = ( ) *<sup>H</sup> z* − *R* into (61) leads to the alternative form

$$H\_{\rm OE}(\mathbf{z}) = (\Delta \boldsymbol{\Lambda}^H - \boldsymbol{R})(\Delta \boldsymbol{\Lambda}^H)^{-1}(\mathbf{z})\tag{62}$$

$$= I - R\boldsymbol{\Lambda}^{-H}\boldsymbol{\Lambda}^{-1}(\mathbf{z})\,. \tag{62}$$

The solutions (61) and (62) are non-causal since *GH*(*z*) and Δ-*H*(*z*) are non-causal. The optimal smoother or non-causal filter for output estimation is obtained by substituting *G*1(*z*) = *G*2(*z*) into (60), namely,

$$H\_{\rm OE}(z) = \left\langle \mathbf{G}\_2 \mathbf{Q} \mathbf{G}\_2^H \boldsymbol{\Delta}^{-H} \right\rangle\_\* \boldsymbol{\Delta}^{-1}(z) \,. \tag{63}$$

An alternative form arises by substituting *GQGH*(*z*) = ΔΔ*H*(*z*) − *R* into (63), which results in

$$H\_{\rm OE}(\mathbf{z}) = \{\boldsymbol{\Delta}(\mathbf{z}) - \boldsymbol{R}\boldsymbol{\Delta}^{\rm -H}\}\_{+} \boldsymbol{\Delta}^{-1}(\mathbf{z})\tag{64}$$

$$= \boldsymbol{I} - \boldsymbol{R}\{\boldsymbol{\Delta}^{\rm -H}\}\_{+} \boldsymbol{\Delta}^{-1}(\mathbf{z})\,.$$

In [10], it is recognised that {Δ*H*(*z*)}+ = lim ( ) *<sup>z</sup> z* , which is equivalent to {Δ*H*(*z*)}+ = Δ*H*(0). It follows that

<sup>&</sup>quot;There is much pleasure to be gained from useless knowledge." *Bertrand Arthur William Russell*

Smoothing, Filtering and Prediction:

(62)

1 2 { ( )} *H H G QG* , which implies

Σ

2 ˆ

*E(z)* 

*+* 

. (60)

*Y z*( ) *U(z)* 

40 Estimating the Past, Present and Future

The transfer function matrix of the optimal causal solution or filter is obtained by setting the setting the causal part of 1 2 ( ) ( ) *H H H z G QG z* equal to the zero matrix, resulting in

1 1

*+ \_* 

In output estimation, it is desired to estimate the output *Y*2(*z*) from the measurements *U*(*z*), in which case the reference system is the same as the generating system, as shown in Fig. 4.

1

2 2 ( ) ( ) *H H H z G QG z OE* . (61)

. (63)

*z* , which is equivalent to {Δ*H*(*z*)}+ = Δ*H*(0). It

1 2 { ( )} *H H G QG* , that is <sup>1</sup> *Hz z* () () = <sup>1</sup>

*G2(z)* Σ *HOE(z)* 

*+* 

*V(z)* 

*Y2(z)* 

1 2 () ( ) () *H H H z G QG z*

<sup>1</sup> { ( ) ( )} *Hz z* = <sup>1</sup>

*W(z)* 

**2.3.2 Output Estimation** 

into (60), namely,

follows that

Figure 4. The z-domain output estimation problem.

<sup>1</sup>

In [10], it is recognised that {Δ*H*(*z*)}+ = lim ( )

The optimal non-causal solution (59) with *G*1(*z*) = *G*2(*z*) becomes

Substituting 2 2 ( ) *<sup>H</sup> G QG z* = ( ) *<sup>H</sup> z* − *R* into (61) leads to the alternative form

<sup>1</sup> ( ) ( )( ) ( ) *<sup>H</sup> <sup>H</sup> Hz R z OE* 

( ) *<sup>H</sup> IR z* .

 <sup>1</sup> 2 2 ( ) ( ) *H H H z G QG z OE*

<sup>1</sup> () { () } () *<sup>H</sup> Hz zR z OE* 

> *<sup>z</sup>*

"There is much pleasure to be gained from useless knowledge." *Bertrand Arthur William Russell*

An alternative form arises by substituting *GQGH*(*z*) = ΔΔ*H*(*z*) − *R* into (63), which results in

The solutions (61) and (62) are non-causal since *GH*(*z*) and Δ-*H*(*z*) are non-causal. The optimal smoother or non-causal filter for output estimation is obtained by substituting *G*1(*z*) = *G*2(*z*)

<sup>1</sup> { } () *<sup>H</sup> IR z* . (64)

$$H\_{\rm OE}(z) = I - R\Delta^{-H}(0)\Delta^{-1}(z) \, , \tag{65}$$

which eliminates the need for calculating causal parts.

*Example 15.* Consider *G*2(*z*) = (*z* + 0.2)(*z* + 0.5)-1 together with *R* = *Q* = 1. The spectral factor is Δ(*z*) = (1.43*z* + 0.489)(*z* + 0.5)-1, which leads to 2 2 ( ) *H H G QG z* = (0.2*z*2 + 1.04*z* + 0.2)(0.489*z*2 + 1.67*z* + 0.716)-1 and 2 2 { ( )} *H H G QG z* = (0.734*z* + 0.14)(*z* + 0.5)-1. Hence, from (63), *HOE*(*z*) = (0.513*z* + 0.098)(*z* + 0.341)-1. The same solution can be calculated using Δ*-H*(0) = 0.698 within (65).

When the measurement noise becomes negligibly small, the output estimator approaches a short circuit, that is,

$$\lim\_{\epsilon \downarrow 0, \epsilon^{\mu \prime \tau} \to 0} \left| H\_{\rm OE}(\mathbf{z}) \right| = I \quad \text{s.} \tag{66}$$

The above observation can be verified by substituting *R* = 0 into (65). This asymptote is consistent with intuition, that is, when the measurements are perfect, output estimation will be superfluous.

*Example 16.* Substituting R = 0.001 within Example 15 yields the filter H(z) = (0.999z + 0.2)(z + 0.2)-1, which illustrates the low measurement noise asymptote (66).

#### **2.3.3 Input Estimation**

In input estimation or equalisation problems, *G*2(*z*) is known as the channel model and it is desired to estimate the input process *w(t)*, as depicted in Fig. 5. The simplification of the optimum non-causal solution (59) for the case of *G*1(*z*) = *I* is

$$H\_{\rm IE}(z) = \mathbb{Q}G\_2^{\rm H}\Delta^{\rm -H}\Delta^{-1}(z) \, , \tag{67}$$

Assume that: the channel model *G*2(*z*) is proper, that is, the order of the numerator is the same as the order of the denominator; and that the channel model *G*2(*z*) is stable and minimum phase, that is, its poles and zeros are inside the unit circle. The causal equaliser for proper, stable, minimum-phase channels is obtained by substituting *G*1(*z*) = *I* into (60)

$$\begin{aligned} \boldsymbol{H}\_{\rm II}(\mathbf{z}) &= \langle \mathbf{Q} \mathbf{G}\_2^H \boldsymbol{\Delta}^{-H} \rangle\_\* \boldsymbol{\Delta}^{-1}(\mathbf{z}) \\ &= \mathbf{Q} \mathbf{G}\_2^H(\mathbf{0}) \boldsymbol{\Delta}^{-H}(\mathbf{0}) \boldsymbol{\Delta}^{-1}(\mathbf{z}) \,. \end{aligned} \tag{68}$$

Under the above assumptions, the causal equaliser may be written equivalently as

$$H\_{\rm II}(\mathbf{z}) = \{ \mathbf{G}\_2^{-1} \mathbf{G}\_2 \mathbf{Q} \mathbf{G}\_2^{H} \boldsymbol{\Delta}^{-H} \}\_{\ast} \boldsymbol{\Delta}^{-1}(\mathbf{z}) \tag{69}$$

$$= \{ \mathbf{G}\_2^{-1} (\boldsymbol{\Delta} \boldsymbol{\Delta}^{H} - \boldsymbol{R}) \boldsymbol{\Delta}^{-H} \}\_{\ast} \boldsymbol{\Delta}^{-1}(\mathbf{z})$$

$$= \mathbf{G}\_2^{-1} (\boldsymbol{I} - \boldsymbol{R} \{ \boldsymbol{\Delta}^{-H} \}\_{\ast} \boldsymbol{\Delta}^{-1}(\mathbf{z})) \tag{70}$$

<sup>&</sup>quot;He who knows nothing is closer to the truth than he whose mind is filled with falsehoods and errors." *Thomas Jefferson.*

ASSUMPTIONS MAIN RESULTS

the unit circle. 2 2 () () *<sup>H</sup> <sup>H</sup> z G QG z R*

1 1 11 0 ... *n kn n kn <sup>k</sup> <sup>k</sup> a y a y ay ay* 1 1 <sup>110</sup> ... *m km m km <sup>k</sup> <sup>k</sup> b w b w bw bw* ,

1 1 1 1 0 1 1 1 1 0 ... ( ) () () () ...

It can be seen that knowledge of a system's differential equation is sufficient to identify its transfer function. The optimal Wiener solution minimises the energy of the error and the mean-square-error and the main results are summarised in Table 1. The noncausal (or smoother) solution has unstable factors and can only be realised by a combination of

example, in output estimation problems where *G*1(*z*) = *G*2(*z*), the minimum-mean-square-

*b z b z bz b Y z W z GzW z*

which can expressed as polynomial transfer functions in the z-transform variable

*a z a z az a* 

1

1

1 1

1 1

1 1 <sup>1</sup> *G z C zI A B D* () ( )

<sup>2</sup> <sup>2</sup> <sup>2</sup> *G z C zI A B D* () ( )

1 2 () ( ) () *H H H z G QG z*

1 2 () ( ) () *H H H z G QG z* 

.

= (0) *<sup>H</sup>* , which can simplify calculating causal parts. For

*E*{*wk*} = *E*{*W*(*z*)} = *E*{vk} = *E*{*V*(*z*)} = 0. { }*<sup>T</sup> Ewwk k* = *E*{*W(z*)*WT*(*z*)} = *Q > 0* and { }*<sup>T</sup> Evvk k* = *E*{*V*(*z*)*VT*(*z*)} = *R >*  0 are known. *A*, *B*, *C*1, *C*2, *D*1 and *D*2 are known. *G*1(*z*) and *G*2(*z*) are

Δ(*z*) and Δ-1(*z*) are stable, *i.e.*, the poles and zeros of Δ(*z*) are inside

Table 1. Main results for the discrete‐time general filtering problem.

*m m m m n n n n*

"Time is a great teacher, but unfortunately it kills all its pupils." *Louis Hector Berlioz*

Systems are written in the time-domain as difference equations

stable, *i.e.*, |*λi*(*A*)| < 1.

Signals and

Spectral

Noncausal

Causal

solution

solution

**2.4 Conclusion** 

forward and backward processes.

It is noted that { ( )} *<sup>H</sup> z* = lim ( )

*z <sup>z</sup>*

factorisation

systems

$$=\mathbf{G}\_2^{-1}(I - R\boldsymbol{\Delta}^{-H}(\mathbf{0})\boldsymbol{\Delta}^{-1}(z))^2$$

Thus, the equaliser is equivalent to a product of the channel inverse and the output estimator. It follows that when the measurement noise becomes negligibly small, the equaliser estimates the inverse of the system model, that is,

$$\lim\_{R \to 0} H\_{\ell E}(z) = \mathbb{G}\_2^{-1}(z) \,. \tag{71}$$

The above observation follows by substituting *R* = 0 into (69). In other words, if the channel model is invertible and signal to noise ratio is sufficiently high, the equaliser will estimate *w*(*t*). When measurement noise is present then the solution trades off channel inversion and filtering. In the high measurement noise case, the equaliser approaches an open circuit, that is,

$$\lim\_{\varepsilon \downarrow 0, e^{j\omega T} \to 0} \left| H\_{\, \vert \, E}(z) \right| = 0 \,. \tag{72}$$

The above observation can be verified by substituting ΔΔ*H* = *R* into (70). Thus, when the equalisation problem is dominated by measurement noise, the estimation error is minimised by ignoring the data.

Figure 5. The z-domain input estimation problem.

*Example 17.* Consider the high-pass plant *G*2(*s*) = 100(*s* + 0.1)(*s* + 10)-1 . Application of the bilinear transform for a sample frequency of 2 Hz yields *G*2(*z*) = (29.2857*z* − 27.8571)(*z* + 0.4286)-1. With *Q* = 1 and R = 0.001, the spectral factor is Δ(*z*) = (29.2861*z* + − 27.8568)(*z* + 0.4286)-1. From (67), *HIE*(*z*) = (*z* + 0.4286)(29.2861*z* − 27.8568)-1, which is high-pass and illustrates (71).

*Example 18.* Applying the bilinear transform for a sample frequency of 2 Hz to the low-pass plant *G*2(*z*) = (*s* + 10)(*s* + 0.1)-1 results in *G*2(*z*) = (3.4146*z* − 1.4634)(*z* − 0.9512)-1. With *Q* = 1 and R = 0.001, the spectral factor is Δ(*z*) = (3.4151*z* +1.4629)(*z* − 0.9512)-1. From (67), *HIE*(*z*) = (*z* − 0.9512)(3.4156*z* + 1.4631)-1, which is low pass and is consistent with (71).

<sup>&</sup>quot;They say that time changes things, but you actually have to change them yourself." *Andy Warhol*

Smoothing, Filtering and Prediction:

, (71)

. (72)

Σ

*E(z)*

*+* 

<sup>ˆ</sup>*W z*( ) *U(z)*

42 Estimating the Past, Present and Future

Thus, the equaliser is equivalent to a product of the channel inverse and the output estimator. It follows that when the measurement noise becomes negligibly small, the

1

The above observation follows by substituting *R* = 0 into (69). In other words, if the channel model is invertible and signal to noise ratio is sufficiently high, the equaliser will estimate *w*(*t*). When measurement noise is present then the solution trades off channel inversion and filtering. In the high measurement noise case, the equaliser approaches an open circuit, that

The above observation can be verified by substituting ΔΔ*H* = *R* into (70). Thus, when the equalisation problem is dominated by measurement noise, the estimation error is minimised

*HIE(z) G2(z)* <sup>Σ</sup>

*+* 

*+ \_* 

*Example 17.* Consider the high-pass plant *G*2(*s*) = 100(*s* + 0.1)(*s* + 10)-1 . Application of the bilinear transform for a sample frequency of 2 Hz yields *G*2(*z*) = (29.2857*z* − 27.8571)(*z* + 0.4286)-1. With *Q* = 1 and R = 0.001, the spectral factor is Δ(*z*) = (29.2861*z* + − 27.8568)(*z* + 0.4286)-1. From (67), *HIE*(*z*) = (*z* + 0.4286)(29.2861*z* − 27.8568)-1, which is high-pass and

*Example 18.* Applying the bilinear transform for a sample frequency of 2 Hz to the low-pass plant *G*2(*z*) = (*s* + 10)(*s* + 0.1)-1 results in *G*2(*z*) = (3.4146*z* − 1.4634)(*z* − 0.9512)-1. With *Q* = 1 and R = 0.001, the spectral factor is Δ(*z*) = (3.4151*z* +1.4629)(*z* − 0.9512)-1. From (67), *HIE*(*z*) =

(*z* − 0.9512)(3.4156*z* + 1.4631)-1, which is low pass and is consistent with (71).

"They say that time changes things, but you actually have to change them yourself." *Andy Warhol*

<sup>2</sup> <sup>0</sup> lim ( ) ( ) *IE <sup>R</sup> Hz Gz*

<sup>2</sup> ( (0) ( )) *<sup>H</sup> GIR z*

<sup>1</sup> <sup>1</sup>

Figure 5. The z-domain input estimation problem.

*W(z)*

is,

by ignoring the data.

illustrates (71).

equaliser estimates the inverse of the system model, that is,

0, 0 lim ( ) 0 *jwT IE Q e H z*

*Y2(z)*

*V(z)*


Table 1. Main results for the discrete‐time general filtering problem.

#### **2.4 Conclusion**

Systems are written in the time-domain as difference equations

$$a\_n y\_{k-n} + a\_{n-1} y\_{k-n+1} + \dots + a\_1 y\_{k-1} + a\_0 y\_k = b\_m w\_{k-m} + b\_{m-1} w\_{k-m+1} + \dots + b\_1 w\_{k-1} + b\_0 w\_{k-1}$$

which can expressed as polynomial transfer functions in the z-transform variable

$$Y(z) = \left[ \frac{b\_m z^{-m} + b\_{m-1} z^{-m+1} + \dots + b\_1 z^{-1} + b\_0}{a\_n z^{-n} + a\_{n-1} z^{-n+1} + \dots + a\_1 z^{-1} + a\_0} \right] W(z) = G(z) W(z) \dots$$

It can be seen that knowledge of a system's differential equation is sufficient to identify its transfer function. The optimal Wiener solution minimises the energy of the error and the mean-square-error and the main results are summarised in Table 1. The noncausal (or smoother) solution has unstable factors and can only be realised by a combination of forward and backward processes.

It is noted that { ( )} *<sup>H</sup> z* = lim ( ) *z <sup>z</sup>* = (0) *<sup>H</sup>* , which can simplify calculating causal parts. For example, in output estimation problems where *G*1(*z*) = *G*2(*z*), the minimum-mean-square-

<sup>&</sup>quot;Time is a great teacher, but unfortunately it kills all its pupils." *Louis Hector Berlioz*

**Problem 4.** In respect of the input estimation problem with *G*(*z*) = (*z* − β)(*z* − α)-1 , α = – 0.9,

*k* The integer-valued time variable. For example, *k* (-∞, ∞) and *k* (0, ∞) denote −∞ < *k* < ∞ and 0 ≤ *k* < ∞, respectively.

→ *<sup>q</sup>* A linear system that operates on a *p*-element input signal and

*A*, *B*, *C, D* Time-invariant state space matrices of appropriate dimension. The

*Q* and *R* Time-invariant covariance matrices of stochastic signals *wk* and *vk*,

*v w*, The inner product of two discrete-time signals *v* and *w* which is

*k k*

<sup>2</sup> *w* The 2-norm of the discrete-time signal *w* which is defined by <sup>2</sup> *w* =

<sup>2</sup> The set of continuous-time signals having finite 2-norm, which is

"If your result needs a statistician then you should design a better experiment." *Baron Ernest Rutherford*

+ *Dwk* in which *wk* is known as the process noise or input signal.

transfer function matrix of the system *xk+1* = *Axk* + *Bwk*, *yk* = *Cxk* + *Dwk*

is assumed to have the realisation *xk+1* = *Axk* + *Bwk*, *yk* = *Cxk*

that operates on an input signal *w*.

. For example, the

*w*<sup>k</sup> *<sup>n</sup>* A discrete-time, real-valued, *n*-element stochastic input signal.

produces a *q*-element output signal.

β = – 0.1 and *Q*=1, verify the following.

**2.6 Glossary** 

: *<sup>p</sup>* 

*y w* 

(a) *R* = 10 yields *H*(*z*) = (*z* + 0.1)(11.5988*z* + 1.9000)-1. (b) *R* = 1 yields *H*(*z*) = (*z* + 0.1)(2.4040*z* + 1.0000)-1. (b) *R* = 0.1 yields *H*(*z*) = (*z* + 0.1)(1.2468*z* + 0.9100)-1. (d) *R* = 0.01 yields *H*(*z*) = (*z* + 0.1)(1.0381*z* + 0.9010)-1. (e) *R* = 0.001 yields *H*(*z*) = (*z* + 0.1)(1.043*z* + 0.9001)-1.

The following terms have been introduced within this section.

*w* The set of *wk* over a prescribed interval.

*vk* A stationary stochastic measurement noise signal.

*Y*(*z*) The z-transform of a continuous-time signal *yk*. *G*(*z*) The transfer function matrix of the system

defined by , *<sup>T</sup>*

*w w*, = *<sup>T</sup>*

*k*

 .

is given by *G*(*z*) = *C*(*zI* − *A*)−1*B* + *D*.

*k vw vw* .

*k k*

known as the Lebesgue 2-space (see [3]).

*w w*

The output of a linear system

system

*δjk* The Kronecker delta function.

respectively.

error solution is *HOE*(*z*) = *I* – <sup>1</sup> (0) ( ) *<sup>H</sup> R z* . In the single-input-single-output case, when the measurement noise becomes negligible, the output estimator approaches a short circuit. Conversely, when the single-input-single-output problem is dominated by measurement noise, the output estimator approaches an open circuit.

In input estimation problems, *G*1(*z*) = *I*. If the channel model is invertible, the optimal causal equaliser is given by *HIE*(*z*) <sup>1</sup> <sup>2</sup> ( ) (0) (0) ( ) *H H H z QG IE z* . When the measurement noise becomes negligible, that is, <sup>1</sup> <sup>2</sup> () () *<sup>H</sup> z Gz* , the optimal equaliser approaches the channel inverse. Conversely, when the problem is dominated by measurement noise, the equaliser approaches an open circuit.

#### **2.5 Problems**

**Problem 1.** Consider the error spectral density matrix

$$\Phi\_{\alpha}(\mathbf{z}) = [H\boldsymbol{\Lambda} - \mathbf{G}\_{l}\mathbf{Q}\mathbf{G}\_{2}^{H}(\boldsymbol{\Lambda}^{H})^{-1}][H\boldsymbol{\Lambda} - \mathbf{G}\_{l}\mathbf{Q}\mathbf{G}\_{2}^{H}(\boldsymbol{\Lambda}^{H})^{-1}]^{H}(\mathbf{z}) \\ + [\mathbf{G}\_{l}\mathbf{Q}\mathbf{G}\_{l}^{H} - \mathbf{G}\_{l}\mathbf{Q}\mathbf{G}\_{2}^{H}(\boldsymbol{\Lambda}\boldsymbol{\Lambda}^{H})^{-1}\mathbf{G}\_{2}\mathbf{Q}\mathbf{G}\_{l}^{H}](\mathbf{z}) \\ \dots$$

(a) Derive the optimal non-causal solution.

(b) Derive the optimal causal filter from (a).

(c) Derive the optimal non-causal output estimator.

(d) Derive the optimal causal filter from (c).

(e) Derive the optimal non-causal input estimator.

(f) Derive the optimal causal equaliser assuming that the channel inverse exists.

**Problem 2.** Derive the asymptotes for the following single-input-single-output estimation problems.

(a) Non-causal output estimation at *R* = 0.


**Problem 3.** In respect of the output estimation problem with *G*(*z*) = (*z* − β)(*z* − α)-1 , α = – 0.3, β = – 0.5 and *Q*=1, verify the following.

(a) *R* = 10 yields *H*(*z*) = (0.0948*z* + 0.0272)(*z* + 0.4798)-1. (b) *R* = 1 yields *H*(*z*) = (0.5059*z* + 0.1482)(*z* + 0.3953)-1. (c) *R* = 0.1 yields *H*(*z*) = (0.90941*z* + 0.2717)(*z* + 0.3170)-1. (d) *R* = 0.01 yields *H*(*z*) = (0.9901*z* + 0.2969)(*z* + 0.3018)-1. (e) *R* = 0.001 yields *H*(*z*) = (0.9990*z* + 0.2997)(*z* + 0.3002)-1.

<sup>&</sup>quot;If I have ever made any valuable discoveries, it has been owing more to patient attention, that to any other talent." *Isaac Newton*

**Problem 4.** In respect of the input estimation problem with *G*(*z*) = (*z* − β)(*z* − α)-1 , α = – 0.9, β = – 0.1 and *Q*=1, verify the following.

(a) *R* = 10 yields *H*(*z*) = (*z* + 0.1)(11.5988*z* + 1.9000)-1. (b) *R* = 1 yields *H*(*z*) = (*z* + 0.1)(2.4040*z* + 1.0000)-1. (b) *R* = 0.1 yields *H*(*z*) = (*z* + 0.1)(1.2468*z* + 0.9100)-1. (d) *R* = 0.01 yields *H*(*z*) = (*z* + 0.1)(1.0381*z* + 0.9010)-1. (e) *R* = 0.001 yields *H*(*z*) = (*z* + 0.1)(1.043*z* + 0.9001)-1.

#### **2.6 Glossary**

Smoothing, Filtering and Prediction:

11 1 2 2 1 [ ( ) ]( ) *<sup>H</sup> H H <sup>H</sup> G QG G QG G QG z* .

44 Estimating the Past, Present and Future

error solution is *HOE*(*z*) = *I* – <sup>1</sup> (0) ( ) *<sup>H</sup> R z* . In the single-input-single-output case, when the measurement noise becomes negligible, the output estimator approaches a short circuit. Conversely, when the single-input-single-output problem is dominated by measurement

In input estimation problems, *G*1(*z*) = *I*. If the channel model is invertible, the optimal causal

inverse. Conversely, when the problem is dominated by measurement noise, the equaliser

1 1

*ee z H G QG H G QG z* <sup>1</sup>

(f) Derive the optimal causal equaliser assuming that the channel inverse exists.

**Problem 2.** Derive the asymptotes for the following single-input-single-output estimation

**Problem 3.** In respect of the output estimation problem with *G*(*z*) = (*z* − β)(*z* − α)-1 , α = – 0.3,

"If I have ever made any valuable discoveries, it has been owing more to patient attention, that to any

<sup>2</sup> ( ) (0) (0) ( ) *H H H z QG IE z* . When the measurement noise

<sup>2</sup> () () *<sup>H</sup> z Gz* , the optimal equaliser approaches the channel

noise, the output estimator approaches an open circuit.

equaliser is given by *HIE*(*z*) <sup>1</sup>

**Problem 1.** Consider the error spectral density matrix

(c) Derive the optimal non-causal output estimator.

(a) Derive the optimal non-causal solution. (b) Derive the optimal causal filter from (a).

(d) Derive the optimal causal filter from (c). (e) Derive the optimal non-causal input estimator.

(a) Non-causal output estimation at *R* = 0. (b) Non-causal output estimation at *Q* = 0. (c) Causal output estimation at *R* = 0. (d) Causal output estimation at *Q* = 0. (e) Non-causal input estimation at *R* = 0. (f) Non-causal input estimation at *Q* = 0. (g) Causal input estimation at *R* = 0. (h) Causal input estimation at *Q* = 0.

β = – 0.5 and *Q*=1, verify the following.

(a) *R* = 10 yields *H*(*z*) = (0.0948*z* + 0.0272)(*z* + 0.4798)-1. (b) *R* = 1 yields *H*(*z*) = (0.5059*z* + 0.1482)(*z* + 0.3953)-1. (c) *R* = 0.1 yields *H*(*z*) = (0.90941*z* + 0.2717)(*z* + 0.3170)-1. (d) *R* = 0.01 yields *H*(*z*) = (0.9901*z* + 0.2969)(*z* + 0.3018)-1. (e) *R* = 0.001 yields *H*(*z*) = (0.9990*z* + 0.2997)(*z* + 0.3002)-1.

1 2 1 2 () [ ( ) ][ ( ) ] () *H H HH H*

becomes negligible, that is, <sup>1</sup>

approaches an open circuit.

**2.5 Problems** 

problems.

other talent." *Isaac Newton*

The following terms have been introduced within this section.


<sup>&</sup>quot;If your result needs a statistician then you should design a better experiment." *Baron Ernest Rutherford*

[8] P. G. Savage, *Strapdown Analytics*, Strapdown Associates, Maple Plain, Minnesota, USA,

[9] G. A. Einicke, "Optimal and Robust Noncausal Filter Formulations", *IEEE Transactions* 

[10] U. Shaked, "A transfer function approach to the linear discrete stationary filtering and the steady state discrete optimal control problems", *International Journal of Control*, vol.

[11] A. H. Sayed and T. Kailath, "A Survey of Spectral Factorization Methods", *Numerical* 

[12] H. J. Orchard and A. N. Wilson, "On the Computation of a Minimum-Phase Spectral Factor", *IEEE Transactions on. Circuits and Systems I: Fundamental Theory and Applications*,

"One of the greatest discoveries a man makes, one of his greatest surprises, is to find he can do what he

*on Signal Processing,* vol. 54, no. 3, pp. 1069 – 1077, Mar. 2006.

*Linear Algebra with Applications*, vol. 8, pp. 467 – 496, 2001.

vol. 1 and 2, 2000.

was afraid he couldn't." *Henry Ford*

29, no. 2, pp 279 – 291, 1979.

vol. 50, no. 3, pp. 365 – 375, Mar. 2003.


#### **2.7 References**


<sup>&</sup>quot;There's two possible outcomes: if the result confirms the hypothesis, then you've made a discovery. If the result is contrary to the hypothesis, then you've made a discovery." *Enrico Fermi*

Smoothing, Filtering and Prediction:

is said to be asymptotically stable if

46 Estimating the Past, Present and Future

*GH*(*z*) The adjoint (or Hermitian transpose) of the transfer function matrix

Δ(*z*) The spectral factor of Ф*uu*(*z*) which satisfies ΔΔ*H*(*z*) = *GQGH*(*z*) + *R.* For

*H*(z) Transfer function matrix of the minimum mean-square-error solution. *HOE*(z) Transfer function matrix of the minimum mean-square-error solution

*HIE*(z) Transfer function matrix of the minimum mean-square-error solution

[1] N. Wiener, *Extrapolation, interpolation and smoothing of stationary time series with engineering applications*, The MIT Press, Cambridge Mass.; Wiley, New York; Chapman

[2] P. Masani, "Wiener's Contributions to Generalized Harmonic Analysis, Prediction Theory and Filter Theory", *Bulletin of the American Mathematical Society*, vol. 72, no. 1, pt.

[3] C. A. Desoer and M. Vidyasagar, *Feedback Systems : Input Output Properties*, Academic

[4] F. L. Lewis, L. Xie and D. Popa, *Optimal and Robust Estimation With an Introduction to Stochastic Control Theory*, Second Edition, CRC Press, Taylor & Francis Group, 2008. [5] A. P. Sage and J. L. Melsa, *Estimation Theory with Applications to Communications and* 

[6] K. Ogata, *Discrete-time Control Systems*, Prentice-Hall, Inc., Englewood Cliffs, New

[7] S. Shats and U. Shaked, "Discrete-Time Filtering of Noise Correlated Continuous-Time Processes: Modelling and derivation of the Sampling Period Sensitivities", *IEEE* 

"There's two possible outcomes: if the result confirms the hypothesis, then you've made a discovery. If

*Transactions on Automatic Control*, vol. 36, no. 1, pp. 115 – 119, Jan. 1991.

the result is contrary to the hypothesis, then you've made a discovery." *Enrico Fermi*

its output *y* <sup>2</sup> for any *w* <sup>2</sup> . If the real parts of the state matrix eigenvalues are inside the unit circle or equivalently if the real part of transfer function's poles are inside the unit circle then the system is

parameters {*A*, *B*, *C*, *D*} is a system parameterised by { *AT*, –*CT*, –*BT*,

. The adjoint of a system having the state-space

A linear discrete-time system

Ф*ee*(*z*) The spectral density matrix of the measurements *e*.

*G*–1(*z*) The inverse of the transfer function matrix *G*(*z*).

brevity denote Δ*-H*(*z*) = (Δ*H*)-1 (*z*).

*G*–*<sup>H</sup>*(*z*) The inverse of the adjoint transfer function matrix *GH*(*z*). {*G*(*z*)}+ The causal part of the transfer function matrix *G*(*z*).

specialised for output estimation.

specialised for input estimation.

*Control*, McGraw-Hill Book Company, New York, 1971.

stable.

*DT*}.

*G*(*z*).

*Ts* Sample period.

The adjoint of

Asymptotic stability

*H* 

**2.7 References** 

& Hall, London, 1949.

2, pp. 73 – 125, 1966.

Press, N.Y., 1975.

Jersey, 1987.


<sup>&</sup>quot;One of the greatest discoveries a man makes, one of his greatest surprises, is to find he can do what he was afraid he couldn't." *Henry Ford*

### **Continuous-Time Minimum-Variance Filtering**

Continuous-Time Minimum-Variance Filtering 49

#### **3.1 Introduction**

Rudolf E. Kalman studied discrete-time linear dynamic systems for his master's thesis at MIT in 1954. He commenced work at the Research Institute for Advanced Studies (RIAS) in Baltimore during 1957 and nominated Richard S. Bucy to join him in 1958 [1]. Bucy recognised that the nonlinear ordinary differential equation studied by an Italian mathematician, Count Jacopo F. Riccati, in around 1720, now called the Riccati equation, is equivalent to the Wiener-Hopf equation for the case of finite dimensional systems [1], [2]. In November 1958, Kalman recasted the frequency domain methods developed by Norbert Wiener and Andrei N. Kolmogorov in the 1940s to state-space form [2]. Kalman noted in his 1960 paper [3] that generalising the Wiener solution to nonstationary problems was difficult, which motivated his development of the optimal discrete-time filter in a state-space framework. He described the continuous-time version with Bucy in 1961 [4] and published a generalisation in 1963 [5]. Bucy later investigated the monotonicity and stability of the underlying Riccati equation [6]. The continuous-time minimum-variance filter is now commonly attributed to both Kalman and Bucy.

Compared to the Wiener Filter, Kalman's state-space approach has the following advantages.


Kalman's research at the RIAS was concerned with estimation and control for aerospace systems which was funded by the Air Force Office of Scientific Research. His explanation of why the dynamics-based Kalman filter is more important than the purely stochastic Wiener filter is that "Newton is more important than Gauss" [1]. The continuous-time Kalman filter produces state estimates *x t* ˆ( ) from the solution of a simple differential equation

$$
\dot{\hat{x}}(t) = A(t)\hat{x}(t) + K(t)\left(z(t) - C(t)\hat{x}(t)\right),
$$

Chapter title

Author Name

<sup>&</sup>quot;What a weak, credulous, incredulous, unbelieving, superstitious, bold, frightened, what a ridiculous world ours is, as far as concerns the mind of man. How full of inconsistencies, contradictions and absurdities it is. I declare that taking the average of many minds that have recently come before me ... I should prefer the obedience, affections and instinct of a dog before it." *Michael Faraday*

equation (1).

*with boundary condition* 

( ) ( )

*Poisson*

*t*

*<sup>t</sup>* (, )

*f t <sup>d</sup> t*

where ( ( ) ( )) *<sup>T</sup> xx f xtx*

*<sup>−</sup>* ( ) (, ) *d t f t dt*

**3.2.2 The State Transition Matrix** 

*Lemma 1: The equation (1) has the solution* 

*where the state transition matrix,* <sup>0</sup> (, ) *t t , satisfies* 

**3.2.3 The Lyapunov Differential Equation** 

system in depicted in Fig. 1. In many problems of interest, signals are band-limited, that is, the direct feedthrough matrix, *D*(*t*), is zero. Therefore, the simpler case of *D*(*t*) = 0 is

The state transition matrix is introduced below which concerns the linear differential

*<sup>t</sup> x t t t x t t s B s w s ds ,* (3)

*,* (4)

(, ) *t t = I.* (5)

( ) (, ) *<sup>t</sup> <sup>t</sup> <sup>f</sup> t d*

*<sup>=</sup>*

 

�

*t* 

*.* (6)

0 0 0 () (, ) ( ) (, ) () () *<sup>t</sup>*

> 0 0 0 (, ) (, ) () (, ) *d tt t t At tt dt*

*Proof: Differentiating both sides of (3) and using Leibnitz's rule, that is,* ( )

0 0 () (, ) ( ) (, ) ( ) ( ) (, ) () () *<sup>t</sup> <sup>t</sup> xt tt xt t B w d ttBtwt* 

 <sup>0</sup> 0 0 () () (, ) ( ) (, ) ( ) ( ) () () *<sup>t</sup> <sup>t</sup> xt At tt xt t B w d Btwt .* (7)

{ ( ) ( )} ( ) ( ) ( ) ( )) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> Extx xtx xx*

The mathematical expectation, *E*{*x*(*t*)*xT*(*τ*)} of *x*(*t*)*xT*(*τ*), is required below, which is defined as

*<sup>f</sup> x t x dx t*

"Life is good for only two things, discovering mathematics and teaching mathematics." *Siméon Denis* 

 

is the probability density function of *x*(*t*)*xT*(*τ*). A useful property of

, (8)

*, gives* 

*<sup>+</sup>*( ) (, ) *d t f t dt* 

0

*Substituting (4) and (5) into the right-hand-side of (6) results in* 

expectations is demonstrated in the following example.

addressed first and the inclusion of a nonzero *D*(*t*) is considered afterwards.

in which it is tacitly assumed that the model is correct, the noises are zero-mean, white and uncorrelated. It is straightforward to include nonzero means, coloured and correlated noises. In practice, the true model can be elusive but a simple (low-order) solution may return a cost benefit.

The Kalman filter can be derived in many different ways. In an early account [3], a quadratic cost function was minimised using orthogonal projections. Other derivation methods include deriving a maximum *a posteriori* estimate, using Itô's calculus, calculus-of-variations, dynamic programming, invariant imbedding and from the Wiener-Hopf equation [6] - [17]. This chapter provides a brief derivation of the optimal filter using a conditional mean (or equivalently, a least mean square error) approach.

The developments begin by introducing a time-varying state-space model. Next, the state transition matrix is defined, which is used to derive a Lyapunov differential equation. The Kalman filter follows immediately from a conditional mean formula. Its filter gain is obtained by solving a Riccati differential equation corresponding to the estimation error system. Generalisations for problems possessing deterministic inputs, correlated process and measurement noises, and direct feedthrough terms are described subsequently. Finally, it is shown that the Kalman filter reverts to the Wiener filter when the problems are timeinvariant.

Figure 1. The continuous-time system operates on the input signal *w*(*t*) *<sup>m</sup>* and produces the output signal *y*(*t*) *<sup>p</sup>* .

#### **3.2 Prerequisites**

#### **3.2.1 The Time-varying Signal Model**

The focus initially is on time-varying problems over a finite time interval *t* [0, *T*]. A system : *<sup>m</sup>* → *<sup>p</sup>* is assumed to have the state-space representation

$$
\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)w(t) \,. \tag{1}
$$

$$\mathbf{y}(t) = \mathbf{C}(t)\mathbf{x}(t) + D(t)\mathbf{w}(t) \,. \tag{2}$$

where *A*(*t*) *n n* , *B*(*t*) *n m* , *C*(*t*) *<sup>p</sup> <sup>n</sup>* , *D*(*t*) *<sup>p</sup> <sup>p</sup>* and *w*(*t*) is a zero-mean white process noise with *E*{*w*(*t*)*wT*(*τ*)} = *Q*(*t*)*δ*(*t* – *τ*), in which *δ*(*t*) is the Dirac delta function. This

<sup>&</sup>quot;A great deal of my work is just playing with equations and seeing what they give." *Paul Arien Maurice Dirac*

Smoothing, Filtering and Prediction:

<sup>50</sup> Estimating the Past, Present and Future

in which it is tacitly assumed that the model is correct, the noises are zero-mean, white and uncorrelated. It is straightforward to include nonzero means, coloured and correlated noises. In practice, the true model can be elusive but a simple (low-order) solution may

The Kalman filter can be derived in many different ways. In an early account [3], a quadratic cost function was minimised using orthogonal projections. Other derivation methods include deriving a maximum *a posteriori* estimate, using Itô's calculus, calculus-of-variations, dynamic programming, invariant imbedding and from the Wiener-Hopf equation [6] - [17]. This chapter provides a brief derivation of the optimal filter using a conditional mean (or

The developments begin by introducing a time-varying state-space model. Next, the state transition matrix is defined, which is used to derive a Lyapunov differential equation. The Kalman filter follows immediately from a conditional mean formula. Its filter gain is obtained by solving a Riccati differential equation corresponding to the estimation error system. Generalisations for problems possessing deterministic inputs, correlated process and measurement noises, and direct feedthrough terms are described subsequently. Finally, it is shown that the Kalman filter reverts to the Wiener filter when the problems are time-

*B(t)* Σ *C(t)* 

*w(t) x t* ( ) *x(t)* 

*+* 

*A(t)* 

*xt Atxt Btwt* () () () () () ,

*y*() () () () () *t Ctxt Dtwt* ,

The focus initially is on time-varying problems over a finite time interval *t* [0, *T*]. A system

where *A*(*t*) *n n* , *B*(*t*) *n m* , *C*(*t*) *<sup>p</sup> <sup>n</sup>* , *D*(*t*) *<sup>p</sup> <sup>p</sup>* and *w*(*t*) is a zero-mean white process noise with *E*{*w*(*t*)*wT*(*τ*)} = *Q*(*t*)*δ*(*t* – *τ*), in which *δ*(*t*) is the Dirac delta function. This

"A great deal of my work is just playing with equations and seeing what they give." *Paul Arien Maurice* 

operates on the input signal *w*(*t*) *<sup>m</sup>* and

*D(t)* 

*+ y(t)* 

Σ

*+* 

*+* 

(1) (2)

return a cost benefit.

invariant.

equivalently, a least mean square error) approach.

Figure 1. The continuous-time system

produces the output signal *y*(*t*) *<sup>p</sup>* .

**3.2.1 The Time-varying Signal Model** 

→ *<sup>p</sup>* is assumed to have the state-space representation

**3.2 Prerequisites** 

: *<sup>m</sup>* 

*Dirac*

system in depicted in Fig. 1. In many problems of interest, signals are band-limited, that is, the direct feedthrough matrix, *D*(*t*), is zero. Therefore, the simpler case of *D*(*t*) = 0 is addressed first and the inclusion of a nonzero *D*(*t*) is considered afterwards.

#### **3.2.2 The State Transition Matrix**

The state transition matrix is introduced below which concerns the linear differential equation (1).

*Lemma 1: The equation (1) has the solution* 

$$\mathbf{x}(t) = \Phi(t, t\_0)\mathbf{x}(t\_0) + \int\_{t\_0}^{t} \Phi(t, s)B(s)w(s)ds \,\,\,\,\tag{3}$$

*where the state transition matrix,* <sup>0</sup> (, ) *t t , satisfies* 

$$
\dot{\Phi}(t, t\_0) = \frac{d\Phi(t, t\_0)}{dt} = A(t)\Phi(t, t\_0) \,. \tag{4}
$$

*with boundary condition* 

$$
\Phi(t, t) \tag{5}
$$

*Proof: Differentiating both sides of (3) and using Leibnitz's rule, that is,* ( ) ( ) (, ) *<sup>t</sup> <sup>t</sup> <sup>f</sup> t d t <sup>=</sup> and* 

$$\int\_{a(t)}^{\beta(t)} \frac{\partial f(t,\tau)}{\partial t} d\tau = f(t,a)\frac{d\alpha(t)}{dt} + f(t,\beta)\frac{d\beta(t)}{dt}, \text{gives}$$

$$\dot{\mathbf{x}}(t) = \dot{\Phi}(t,t\_0)\mathbf{x}(t\_0) + \int\_{t\_0}^{t} \dot{\Phi}(t,\tau)B(\tau)w(\tau)d\tau + \Phi(t,t)B(t)w(t) \,. \tag{6}$$

*Substituting (4) and (5) into the right-hand-side of (6) results in* 

$$\dot{\mathbf{x}}(t) = A(t) \left( \Phi(t, t\_0) \mathbf{x}(t\_0) + \int\_{t\_0}^t \Phi(t, \tau) B(\tau) w(\tau) d\tau \right) + B(t) w(t) \,. \tag{7}$$

#### **3.2.3 The Lyapunov Differential Equation**

The mathematical expectation, *E*{*x*(*t*)*xT*(*τ*)} of *x*(*t*)*xT*(*τ*), is required below, which is defined as

$$E\{\mathbf{x}(t)\mathbf{x}^{\top}(\tau)\} = \int\_{-\kappa}^{\kappa} \mathbf{x}(t)\mathbf{x}^{\top}(\tau)f\_{xx}\mathbf{x}(t)\mathbf{x}^{\top}(\tau)d\mathbf{x}(t) \tag{8}$$

where ( ( ) ( )) *<sup>T</sup> xx f xtx* is the probability density function of *x*(*t*)*xT*(*τ*). A useful property of expectations is demonstrated in the following example.

<sup>&</sup>quot;Life is good for only two things, discovering mathematics and teaching mathematics." *Siméon Denis Poisson*

*It follows from (1) and (3) that* 

**3.2.4 Conditional Expectations** 

and covariances

and

{ ( ) ( ) ( )} ( ) { ( ) (0) ( ,0)} ( ) ( ) ( ) ( ) ( , ) <sup>0</sup> *<sup>t</sup> <sup>T</sup> <sup>T</sup> T T <sup>t</sup> EBtwtx BtEwtx t BtE wtw B t d*

*The assumptions E{w*(*t*)x*T*(*t*0)*} = 0 and E{w*(*t*)*wT*(*τ*)*} = Q*(*t)δ*(*t – τ*) *together with (15) lead to* 

{( ( ) ( ) ( )} ( ) ( ) ( ) ( ) ( , ) *<sup>t</sup> <sup>T</sup> <sup>T</sup> <sup>t</sup> E Btwtx BtQt t B t d*

In the case τ = *t*, denote *P*(*t*,*t*) = *E*{*x*(*t*)*xT*(*t*)} and *Ptt* (, ) = { ( ) ( )} *<sup>d</sup> <sup>T</sup> Extx t*

( ) ( ) *xt x*

*y t y* 

*E*

( ) () () ( )

*<sup>E</sup> xt x yt y yt y*

*xt x*

*Ext yt* { ( ) | ( )} is affine to *y*(*t*), namely,

they do not refer to reality." *Albert Einstein*

0

 

*The above Lyapunov differential equation follows by substituting (16) into (14).* �

The minimum-variance filter derivation that follows employs a conditional expectation formula, which is set out as follows. Consider a stochastic vector [*xT*(*t*) *yT*(*t*)]*T* having means

*xx xy T TT T*

respectively, where *<sup>T</sup> yx x <sup>y</sup>* . Suppose that it is desired to obtain an estimate of *x*(*t*) given *y*(*t*), denoted by *Ext yt* { ( ) | ( )} , which minimises *E xt* ( () − *Ext yt xt* { ( ) | ( )})( ( ) − { ( ) | ( )}) *<sup>T</sup> Ext yt* . A standard approach (*e.g.*, see [18]) is to assume that the solution for

"As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain,

0 ( ) { ( ) (0) ( ,0)} ( ) { ( ) ( )} ( ) ( , ) *<sup>t</sup> <sup>T</sup> T T <sup>t</sup> BtEwtx t Bt Ewtw B t d*

 

> 

() () () () () () () () *<sup>T</sup> <sup>T</sup> Pt AtPt Pt A t BtQtB t* . (17)

*dt*

(18)

*yx yy*

*E x t y t Ay t b* { ( ) | ( )} ( ) , (20)

 

(15)

(16)

. Then the

. (19)

 *.* 

corresponding Lyapunov differential equation is written as

0.5 ( ) ( ) ( ) *<sup>T</sup> BtQtB t .*

*Example 1.* Suppose that *x*(*t*) is a stochastic random variable and *h*(*t*) is a continuous function, then

$$E\left\{\int\_{a}^{b} h(t)\mathbf{x}(t)\mathbf{x}^{T}(\tau)dt\right\} = \int\_{a}^{b} h(t)E\{\mathbf{x}(t)\mathbf{x}^{T}(\tau)\}dt\tag{9}$$

To verify this, expand the left-hand-side of (9) to give

$$E\left\{\int\_{a}^{b} h(\mathbf{t}) \mathbf{x}(t) \mathbf{x}^{\top}(\tau) d\mathbf{t}\right\} = \int\_{-\alpha}^{\alpha} \int\_{a}^{b} h(\mathbf{t}) \mathbf{x}(t) \mathbf{x}^{\top}(\tau) d\mathbf{t} f\_{xx}(\mathbf{x}(t) \mathbf{x}^{\top}(\tau)) d\mathbf{x}(t) \tag{10}$$

$$= \int\_{-\alpha}^{\alpha} \int\_{a}^{b} h(\mathbf{t}) \mathbf{x}(t) \mathbf{x}^{\top}(\tau) f\_{xx}(\mathbf{x}(t) \mathbf{x}^{\top}(\tau)) d\mathbf{t} d\mathbf{x}(t) \,. \tag{11}$$

Using Fubini's theorem, that is, (, ) *d b c a <sup>g</sup> <sup>x</sup> <sup>y</sup> dxdy* <sup>=</sup>(, ) *b d a c <sup>g</sup> <sup>x</sup> <sup>y</sup> dydx* , within (10) results in

$$E\left\{\int\_{a}^{b} h(\mathbf{t}) \mathbf{x}(t) \mathbf{x}^{\top}(\tau) d\mathbf{t}\right\} = \int\_{a}^{b} \int\_{-\alpha}^{\alpha} h(\mathbf{t}) \mathbf{x}(t) \mathbf{x}^{\top}(\tau) f\_{xx}(\mathbf{x}(t) \mathbf{x}^{\top}(\tau)) d\mathbf{x}(t) d\mathbf{t} \tag{11}$$

$$= \int\_{a}^{b} h(\mathbf{t}) \int\_{-\alpha}^{\alpha} \mathbf{x}(t) \mathbf{x}^{\top}(\tau) f\_{xx}(\mathbf{x}(t) \mathbf{x}^{\top}(\tau)) d\mathbf{x}(t) d\mathbf{t} \tag{12}$$

The result (9) follows from the definition (8) within (11).

The Dirac delta function, <sup>0</sup> ( ) 0 0 *t t t* , satisfies the identity () 1 *t dt* . In the foregoing development, use is made of the partitioning

$$\int\_{-\alpha}^{0} \delta(t)dt = \int\_{0}^{\alpha} \delta(t)dt = 0.5\,\text{.}\tag{12}$$

*Lemma 2: In respect of equation (1), assume that w(t) is a zero-mean white process with E{w*(*t*)*wT*(*τ*)*} = Q*(*t)δ*(*t – τ*) *that is uncorrelated with x*(*t0*)*, namely, E{w*(*t*)x*T*(*t*0)*} = 0. Then the covariances P(t,τ*) *= E{x(t)xT(τ)} and P t*(, )  *=* { ( ) ( )} *<sup>d</sup> <sup>T</sup> Extx dt satisfy the Lyapunov differential equation* 

$$\dot{P}(t,\tau) = A(t)P(t,\tau) + P(t,\tau)A^\top(\tau) + B(t)Q(t)B^\top(t) \,. \tag{13}$$

*Proof: Using (1) within* { ( ) ( )} *<sup>d</sup> <sup>T</sup> Extx dt =* { () ( ) *<sup>T</sup> Extx +* ( ) ( )} *<sup>T</sup> xtx yields* 

$$\dot{P}(t,\tau) = E\{A(t)\mathbf{x}(t)\mathbf{x}^{\top}(\tau) + B(t)\mathbf{w}(t)\mathbf{x}^{\top}(\tau)\} + E\{\mathbf{x}(t)\mathbf{x}^{\top}(\tau)\boldsymbol{A}^{\top}(\tau) + \mathbf{x}(t)\mathbf{w}^{\top}(\tau)\boldsymbol{B}^{\top}(\tau)\}\tag{14}$$

$$= A(t)P(t,\tau) + P(t,\tau)A^{\top}(\tau) + E\{B(t)\mathbf{w}(t)\mathbf{x}^{\top}(\tau)\} + E\{\mathbf{x}(t)\mathbf{w}^{\top}(\tau)\boldsymbol{B}^{\top}(\tau)\}\ \,. \tag{15}$$

<sup>&</sup>quot;It is a mathematical fact that the casting of this pebble from my hand alters the centre of gravity of the universe." *Thomas Carlyle*

*It follows from (1) and (3) that* 

Smoothing, Filtering and Prediction:

(10)

(11)

(14)

<sup>52</sup> Estimating the Past, Present and Future

*Example 1.* Suppose that *x*(*t*) is a stochastic random variable and *h*(*t*) is a continuous

 .

*c a <sup>g</sup> <sup>x</sup> <sup>y</sup> dxdy* <sup>=</sup>(, ) *b d*

 .

, satisfies the identity

*<sup>f</sup> x t x dx t dt*

. (9)

> *f x t x dx t*

 *f x t x dtdx t* 

> 

 *f x t x dx t dt* 

. (12)

 *+* ( ) ( )} *<sup>T</sup> xtx* 

 *yields* 

 

 *.*

*.* (13)

*a c <sup>g</sup> <sup>x</sup> <sup>y</sup> dydx* , within (10) results in

 () 1 *t dt* . In the

 *satisfy the Lyapunov differential* 

 ( ) ( ) ( ) ( ) { ( ) ( )} *<sup>b</sup> <sup>b</sup> <sup>T</sup> <sup>T</sup> <sup>a</sup> <sup>a</sup> E h t x t x dt h t E x t x dt*

<sup>T</sup> <sup>T</sup> () () ( ) ( ) ( ) ( ) ( ( ) ( )) ( ) *<sup>b</sup> <sup>b</sup> <sup>T</sup>*

() () ( ) ( ) ( ) ( ) ( ( ) ( )) ( ) *<sup>b</sup> <sup>b</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup>*

*xx <sup>a</sup> ht xtx*

*xx <sup>a</sup> htxtx* 

*xx <sup>a</sup> <sup>a</sup> E h t x t x dt h t x t x dt*

*xx <sup>a</sup> <sup>a</sup> E h t x t x dt h t x t x*

*t*

( ) ( ) ( ) ( ( ) ( )) ( ) *<sup>b</sup> <sup>T</sup> <sup>T</sup>*

( ) 0 0 *t*

*t*

0

(, ) () (, ) (, ) ( ) () () () *<sup>T</sup> <sup>T</sup> Pt AtPt Pt A BtQtB t*

( , ) { ( ) ( ) ( ) ( ) ( ) ( )} { ( ) ( ) ( ) ( ) ( ) ( )}

*<sup>T</sup> <sup>T</sup> T T T T Pt E At xt x Btwt x E xt x A xtw B*

( ) ( , ) ( , ) ( ) { ( ) ( ) ( )} { ( ) ( ) ( )} *<sup>T</sup> <sup>T</sup> T T AtPt Pt A EBtwtx Extw B*

 

 *=* { () ( ) *<sup>T</sup> Extx*

 ( ) ( ) 0.5 *t dt t dt*

*Lemma 2: In respect of equation (1), assume that w(t) is a zero-mean white process with E{w*(*t*)*wT*(*τ*)*} = Q*(*t)δ*(*t – τ*) *that is uncorrelated with x*(*t0*)*, namely, E{w*(*t*)x*T*(*t*0)*} = 0. Then the* 

> *=* { ( ) ( )} *<sup>d</sup> <sup>T</sup> Extx dt*

"It is a mathematical fact that the casting of this pebble from my hand alters the centre of gravity of the

<sup>T</sup> <sup>T</sup> () () ( ) ( () ( ) () *<sup>b</sup>*

To verify this, expand the left-hand-side of (9) to give

The result (9) follows from the definition (8) within (11).

foregoing development, use is made of the partitioning 0

Using Fubini's theorem, that is, (, ) *d b*

The Dirac delta function, <sup>0</sup>

*covariances P(t,τ*) *= E{x(t)xT(τ)} and P t*(, )

*dt*

 

*Proof: Using (1) within* { ( ) ( )} *<sup>d</sup> <sup>T</sup> Extx*

*equation* 

universe." *Thomas Carlyle*

function, then

$$\begin{split} E\{B(t)w(t)\mathbf{x}^{\top}(\tau)\} &= B(t)E\{w(t)\mathbf{x}^{\top}(0)\Phi(t,0)\} + B(t)E\left\{\int\_{t\_0}^{t} w(t)w^{\top}(\tau)\mathbf{B}^{\top}(\tau)\Phi(t,\tau)d\tau\right\} \\ &= B(t)E\{w(t)\mathbf{x}^{\top}(0)\Phi(t,0)\} + B(t)\int\_{t\_0}^{t} E\{w(t)w^{\top}(\tau)\}\mathbf{B}^{\top}(\tau)\Phi(t,\tau)d\tau \ . \end{split} \tag{15}$$

*The assumptions E{w*(*t*)x*T*(*t*0)*} = 0 and E{w*(*t*)*wT*(*τ*)*} = Q*(*t)δ*(*t – τ*) *together with (15) lead to* 

$$\begin{aligned} E\{(B(t)w(t)\mathbf{x}^\top(\tau)) = B(t)Q(t)\Big|\_{t\_0}^t \delta(t-\tau)B^\top(\tau)\Phi(t,\tau)d\tau \\ = 0.5B(t)Q(t)B^\top(t) \,. \end{aligned} \tag{16}$$

*The above Lyapunov differential equation follows by substituting (16) into (14).* �

In the case τ = *t*, denote *P*(*t*,*t*) = *E*{*x*(*t*)*xT*(*t*)} and *Ptt* (, ) = { ( ) ( )} *<sup>d</sup> <sup>T</sup> Extx t dt* . Then the corresponding Lyapunov differential equation is written as

$$
\dot{P}(t) = A(t)P(t) + P(t)A^\dagger(t) + B(t)Q(t)B^\dagger(t) \,. \tag{17}
$$

#### **3.2.4 Conditional Expectations**

The minimum-variance filter derivation that follows employs a conditional expectation formula, which is set out as follows. Consider a stochastic vector [*xT*(*t*) *yT*(*t*)]*T* having means and covariances

$$E\left\{ \begin{bmatrix} \chi(t) \\ \chi(t) \end{bmatrix} \right\} = \begin{bmatrix} \overline{\chi} \\ \overline{\mathcal{Y}} \end{bmatrix} \tag{18}$$

and

$$E\left[\begin{bmatrix}\mathbf{x}(t) - \overline{\mathbf{x}} \\ \mathbf{y}(t) - \overline{\mathbf{y}} \end{bmatrix} \begin{bmatrix} \mathbf{x}^{\top}(t) - \overline{\mathbf{x}}^{\top} & \mathbf{y}^{\top}(t) - \overline{\mathbf{y}}^{\top} \end{bmatrix}\right] = \begin{bmatrix} \boldsymbol{\Sigma}\_{\text{xx}} & \boldsymbol{\Sigma}\_{\text{xy}} \\ \boldsymbol{\Sigma}\_{\text{yx}} & \boldsymbol{\Sigma}\_{\text{yy}} \end{bmatrix}.\tag{19}$$

respectively, where *<sup>T</sup> yx x <sup>y</sup>* . Suppose that it is desired to obtain an estimate of *x*(*t*) given *y*(*t*), denoted by *Ext yt* { ( ) | ( )} , which minimises *E xt* ( () − *Ext yt xt* { ( ) | ( )})( ( ) − { ( ) | ( )}) *<sup>T</sup> Ext yt* . A standard approach (*e.g.*, see [18]) is to assume that the solution for *Ext yt* { ( ) | ( )} is affine to *y*(*t*), namely, ( ) [

$$E\{\mathbf{x}(t) \mid y(t)\} = Ay(t) + b,\tag{20}$$

<sup>&</sup>quot;As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality." *Albert Einstein*

realisation

and

*Proof [8],[18]: From (23) and (25), it can be seen that* 

<sup>1</sup>

**3.3 The Continuous-time Minimum-Variance Filter** 

Consider again the linear time-varying system

 0 *.*

time-varying systems) to be derived.

**3.3.1 Derivation of the Optimal Filter** 

The objective is to design a linear system

estimation problem is depicted in Fig. 2.

and the phantom of truth." *René Daumal*

Suppose that observations

 <sup>1</sup> ( ( ) { ( )})( ( ) { ( )}) ( ) ( () ) () *<sup>T</sup> <sup>T</sup> E xt Ext yt Eyt E xt x yt y yt y xy yy* 

> *xt Atxt Btwt* () () () () () , *y*() () () *t Ctxt* ,

where *A*(*t*), *B*(*t*), *C*(*t*) are of appropriate dimensions and *w(t*) is a white process with

*+* 

produces an estimate *y*ˆ(|) *t t* = *Ctxt t* () ( | ) ˆ of *y*(*t*) = *C*(*t*)*x*(*t*) given measurements at time *t*, so that the covariance { ( | ) ( | )} *<sup>T</sup> Eet te t t* is minimised, where *et t* (|) = *x*(*t*) – *xt t* ˆ(|) . This output

Figure 2. The continuous-time output estimation problem. The objective is to find an

"Art has a double face, of expression and illusion, just like science has a double face: the reality of error

estimate *y*ˆ(|) *t t* of *y*(*t*) which minimises {( ( ) ( | ))( ( ) ( | )) } ˆ ˆ *<sup>T</sup> E yt yt t yt yt t* .

Σ

*w(t)*

*y(t)*

*v(t)* 

are available, where *v*(*t*) *<sup>p</sup>* is a white measurement noise process with

*xy xy yy yy* 

Sufficient background material has now been introduced for the finite-horizon filter (for

*E*{*w*(*t*)} =0, *E*{*w*(*t*)*wT*(*τ*)} = *Q*(*t*)*δ*(*t* – *τ*). (28)

*E*{*v*(*t*)} =0, *E*{*v*(*t*)*vT*(*τ*)} = *R*(*t*)*δ*(*t* – *τ*) (30)

*z(t) y*ˆ(|) *t t*

*+ \_* 

*z*(*t*) = *y*(*t*) + *v*(*t*) (29)

*E*{*w*(*t*)*vT*(*τ*)} = 0. (31)

that operates on the measurements *z*(*t*) and

Σ

*e(t|t)*

*+* 

: *<sup>m</sup>* → *<sup>p</sup>* having the state-space

�

(26) (27)

where *A* and *b* are unknowns to be found. It follows from (20) that

$$E\{\left(\mathbf{x}(t) - E\{\mathbf{x}(t) \mid \mathbf{y}(t)\}\right)\left(\mathbf{x}(t) - E\{\mathbf{x}(t) \mid \mathbf{y}(t)\}\right)^{\top}\}$$

$$= E\{\mathbf{x}(t)\mathbf{x}^{\top}(t) - \mathbf{x}(t)\mathbf{y}^{\top}(t)A^{\top} - \mathbf{x}(t)b^{\top} - A\mathbf{y}(t)\mathbf{x}^{\top}(t)$$

$$+ A\mathbf{y}(t)\mathbf{y}^{\top}(t)A^{\top} - A\mathbf{y}(t)b^{\top} - b\mathbf{x}^{\top}(t) + by^{\top}(t)A^{\top} + bb^{\top}\}\,. \tag{21}$$

Substituting *E*{*x*(*t*)*xT*(*t*)} = *<sup>T</sup> xx* + *xx* , *E*{x(*t*)y*T*(*t*)} = *<sup>T</sup> xy* + *xy* , *E*{*y*(*t*)*xT*(*t*)} = *<sup>T</sup> yx* + *yx* , *E*{y(*t*)y*T*(*t*)} = *<sup>T</sup> yy* + *yy* into (21) and completing the squares yields

$$\begin{split} \operatorname{E}\Big\{ (\mathbf{x}(t) - E\langle \mathbf{x}(t) \mid y(t) \rangle)(\mathbf{x}(t) - E\langle \mathbf{x}(t) \mid y(t) \rangle)^{\top} \Big\} \\ = (\overline{\mathbf{x}} - A\overline{\mathbf{y}} - b)(\overline{\mathbf{x}} - A\overline{\mathbf{y}} - b)^{\top} + \begin{bmatrix} I & -A \\ \end{bmatrix} \begin{bmatrix} \Sigma\_{xx} & \Sigma\_{xy} \\ \Sigma\_{yx} & \Sigma\_{yy} \end{bmatrix} \begin{bmatrix} I \\ -A^{\top} \\ \end{bmatrix}. \end{split} \tag{22}$$

The second term on the right-hand-side of (22) can be rearranged as

$$
\begin{bmatrix} I & -A \\ 
\end{bmatrix} \begin{bmatrix} \Sigma\_{xx} & \Sigma\_{xy} \\ \Sigma\_{yx} & \Sigma\_{yy} \\ \end{bmatrix} \begin{bmatrix} I \\ -A^T \\ \end{bmatrix} = (A - \Sigma\_{xy}\Sigma\_{yy}^{-1})\Sigma\_{yy} \left( A - \Sigma\_{xy}\Sigma\_{yy}^{-1} \right)^T + \Sigma\_{xx} - \Sigma\_{xy}\Sigma\_{yy}^{-1}\Sigma\_{yx} \dots \Sigma\_{xy}^{-1} \begin{bmatrix} I \\ -A^T \\ \end{bmatrix}
$$

Thus, the choice *A* = <sup>1</sup> *xy yy* and *b* = *x Ay* minimises (22), which gives

$$E\{\mathbf{x}(t) \mid \mathbf{y}(t)\} = \overline{\mathbf{x}} + \Sigma\_{xy} \Sigma\_{yy}^{-1} \left(\mathbf{y}(t) - \overline{\mathbf{y}}\right) \tag{23}$$

and

$$E\left\{ (\mathbf{x}(t) - E\{\mathbf{x}(t) \mid y(t)\})(\mathbf{x}(t) - E\{\mathbf{x}(t) \mid y(t)\})^\top \right\} = \Sigma\_{xx} + \Sigma\_{xy}\Sigma\_{yy}^{-1}\Sigma\_{yx} \tag{24}$$

The conditional mean estimate (23) is also known as the linear least mean square estimate [18]. An important property of the conditional mean estimate is established below.

*Lemma 3 (Orthogonal projections): In respect of the conditional mean estimate (23), in which the mean and covariances are respectively defined in (18) and (19), the error vector* 

$$\tilde{\mathbf{x}}(t) = \mathbf{x}(t) - E\{\mathbf{x}(t) \mid \mathbf{y}(t)\}\,. \tag{25}$$

*is orthogonal to y(t), that is,* { ( ) ( )} *<sup>T</sup> Exty t 0.* 

<sup>&</sup>quot;Statistics: The only science that enables different experts using the same figures to draw different conclusions." *Evan Esar* 

*Proof [8],[18]: From (23) and (25), it can be seen that* 

$$E\left\{ (\widetilde{\mathbf{x}}(t) - E\{\widetilde{\mathbf{x}}(t)\})(y(t) - E\{y(t)\})^{\top} \right\} = E\left\{ (\mathbf{x}(t) - \overline{\mathbf{x}} - \Sigma\_{xy}\Sigma\_{yy}^{-1}(y(t) - \overline{\mathbf{y}}))(y(t) - \overline{\mathbf{y}})^{\top} \right\}$$

$$= \Sigma\_{xy} - \Sigma\_{xy}\Sigma\_{yy}^{-1}\Sigma\_{yy}$$

$$= \mathbf{0} \cdot \mathbf{1}$$

Sufficient background material has now been introduced for the finite-horizon filter (for time-varying systems) to be derived.

#### **3.3 The Continuous-time Minimum-Variance Filter**

#### **3.3.1 Derivation of the Optimal Filter**

Consider again the linear time-varying system : *<sup>m</sup>* → *<sup>p</sup>* having the state-space realisation

$$
\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)w(t) \,. \tag{26}
$$

$$\mathbf{y}(t) = \mathbf{C}(t)\mathbf{x}(t) \,. \tag{27}$$

where *A*(*t*), *B*(*t*), *C*(*t*) are of appropriate dimensions and *w(t*) is a white process with

$$E\{w(t)\} \equiv 0, \; E\{w(t)w^T(\tau)\} \equiv Q(t)\delta(t-\tau). \tag{28}$$

Suppose that observations

$$z(t) = y(t) + v(t) \tag{29}$$

are available, where *v*(*t*) *<sup>p</sup>* is a white measurement noise process with

$$E\{\upsilon(t)\} \equiv 0,\\ E\{\upsilon(t)\upsilon^{\mathrm{T}}(\mathbf{r})\} \equiv R(t)\delta(t-\mathbf{r})\tag{30}$$

and

Smoothing, Filtering and Prediction:

*T*

.

.

(22)

*yx yy*

(23)

<sup>54</sup> Estimating the Past, Present and Future

() () () () () } *T T T T TTT Ay t y t A Ay t b bx t by t A bb* . (21)

Substituting *E*{*x*(*t*)*xT*(*t*)} = *<sup>T</sup> xx* + *xx* , *E*{x(*t*)y*T*(*t*)} = *<sup>T</sup> xy* + *xy* , *E*{*y*(*t*)*xT*(*t*)} = *<sup>T</sup> yx* + *yx* ,

*<sup>I</sup> x Ay b x Ay b I A <sup>A</sup>*

<sup>1</sup> <sup>1</sup> <sup>1</sup> ( )( ) *xx xy <sup>T</sup>*

<sup>1</sup> { ( ) | ( )} ( ) *Ext yt x yt y xy yy*

<sup>1</sup> ( ( ) { ( ) | ( )})( ( ) { ( ) | ( )})*<sup>T</sup> E xt Ext yt xt Ext yt xx xy yy yx*

[18]. An important property of the conditional mean estimate is established below.

*mean and covariances are respectively defined in (18) and (19), the error vector* 

and *b* = *x Ay* minimises (22), which gives

The conditional mean estimate (23) is also known as the linear least mean square estimate

*Lemma 3 (Orthogonal projections): In respect of the conditional mean estimate (23), in which the* 

"Statistics: The only science that enables different experts using the same figures to draw different

*T xy yy yy xy yy xx xy yy yx*

. (24)

*xt xt Ext yt* ( ) ( ) { ( ) | ( )} *.* (25)

where *A* and *b* are unknowns to be found. It follows from (20) that

{ () () () () () () () *<sup>T</sup> T T <sup>T</sup> <sup>T</sup> E x t x t x t y t A x t b Ay t x t*

*E*{y(*t*)y*T*(*t*)} = *<sup>T</sup> yy* + *yy* into (21) and completing the squares yields

( )( ) *xx xy <sup>T</sup>*

The second term on the right-hand-side of (22) can be rearranged as

*I I A A A A*

( ( ) { ( ) | ( )})( ( ) { ( ) | ( )}) *<sup>T</sup> E xt Ext yt xt Ext yt*

*yx yy*

*xy yy*

*is orthogonal to y(t), that is,* { ( ) ( )} *<sup>T</sup> Exty t 0.* 

Thus, the choice *A* = <sup>1</sup>

and

conclusions." *Evan Esar* 

{( ( ) { ( ) | ( )})( ( ) { ( ) | ( )}) } *<sup>T</sup> E xt E xt yt xt E xt yt*

$$E\{w(t)v^{\tau}(\tau)\}\,=0.\tag{31}$$

The objective is to design a linear system that operates on the measurements *z*(*t*) and produces an estimate *y*ˆ(|) *t t* = *Ctxt t* () ( | ) ˆ of *y*(*t*) = *C*(*t*)*x*(*t*) given measurements at time *t*, so that the covariance { ( | ) ( | )} *<sup>T</sup> Eet te t t* is minimised, where *et t* (|) = *x*(*t*) – *xt t* ˆ(|) . This output estimation problem is depicted in Fig. 2.

Figure 2. The continuous-time output estimation problem. The objective is to find an estimate *y*ˆ(|) *t t* of *y*(*t*) which minimises {( ( ) ( | ))( ( ) ( | )) } ˆ ˆ *<sup>T</sup> E yt yt t yt yt t* .

<sup>&</sup>quot;Art has a double face, of expression and illusion, just like science has a double face: the reality of error and the phantom of truth." *René Daumal*

*solution* 

{ ( | ) ( | )} *<sup>T</sup> Ext tx t t .* 

*which can be rearranged as* 

*the differential of (40)* 

given by the system

*for the algebratic Riccati equation (36) satisfying* 

*Proof: Subtracting (34) from (26) results in* 

*Applying Lemma 2 to the error system (39) gives* 

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )) *<sup>T</sup> <sup>T</sup> Pt AtPt Pt A t BtQtB t*

in which *P*(*t*) = { ( | ) ( | )} *<sup>T</sup> Ext tx t t* is the solution of the Riccati differential equation

<sup>1</sup> () () () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt AtPt Pt A t PtC tR tCtPt BtQtB t* . (36)

*P(t) = PT(t) ≥ 0* (37)

() () () 0 *<sup>T</sup> At PtC t* (38)

(43) (44)

*Lemma 4: In respect of the state estimation problem defined by (26) - (31), suppose that there exists a* 

*for all t in the interval [0,T]. Then the filter (34) having the gain (35) minimises P(t) =* 

( ) ( ) ( ) ( ) ( ) ( )( ( ) ( ) ( )) ( ) ( ) ( ) ( ) ( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt At KtCt Pt Pt At KtCt KtRtK t BtQtB t* (40)

 <sup>1</sup> <sup>1</sup> <sup>1</sup> () () () () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Kt PtC tR t Rt K t R tC tPt PtC tR tCtPt .* (41)

*Setting P t*( )  *equal to the zero matrix results in a stationary point at (35) which leads to (40). From* 

*and it can be seen that P t*( ) *≥ 0 provided that the assumptions (37) - (38) hold. Therefore, P(t) =*  { ( | ) ( | )} *<sup>T</sup> Ext tx t t is minimised at (35). �* 

The above development is somewhat brief and not very rigorous. Further discussions appear in [4] – [17]. It is tendered to show that the Kalman filter minimises the error covariance, provided of course that the problem assumptions are correct. In the case that it is desired to estimate an arbitrary linear combination *C*1(*t*) of states, the optimal filter is

*xt t Atxt t Kt zt Ctxt t* ˆ( | ) () ( | ) () () () ( | ) <sup>ˆ</sup> <sup>ˆ</sup> ,

1 1 *y*ˆ ˆ () () () *t C txt* .

"The worst wheel of the cart makes the most noise." *Benjamin Franklin*

<sup>1</sup> <sup>1</sup> () () () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt At PtC tR tCt Pt Pt A t PtC tR tCt* (42)

*.* (39)

*xt t At KtCt xt t Btwt Ktvt* ( | ) () () () ( | ) () () () ()

It is desired that *xt t* ˆ(|) and the estimate *xt t* ˆ(|) of *x t* ( ) are unbiased, namely

$$E\{\mathbf{x}(t) - \hat{\mathbf{x}}(t \mid t)\} = \mathbf{0} \tag{32}$$

$$E\{\dot{\mathbf{x}}(t\mid t) - \dot{\hat{\mathbf{x}}}(t\mid t)\} = \mathbf{0} \,. \tag{33}$$

If *xt t* ˆ(|) is a conditional mean estimate, from Lemma 3, criterion (32) will be met. Criterion (33) can be satisfied if it is additionally assumed that *Ext t* { ( | )} ˆ = *A*() ( | ) *txt t* ˆ , since this yields *Ext t* {(|) − *xt t* ˆ( | )} = *A*( )( { ( ) *t Ext* − *xt t* ˆ( | )} = 0. Thus, substituting ˆ () ( | ) ˆ (|) () ( | ) ˆ ( ) *xt t A txt t E z t Ctxt t* into (23), yields the conditional mean estimate

$$\begin{aligned} \hat{\hat{\mathbf{x}}}(t|\,\boldsymbol{t}) &= \boldsymbol{A}(t)\hat{\mathbf{x}}(t\,\boldsymbol{t}\,\boldsymbol{t}) + \boldsymbol{K}(t)\Big{(}\boldsymbol{z}(t) - \boldsymbol{\mathcal{C}}(t)\hat{\mathbf{x}}(t\,\boldsymbol{t}\,\boldsymbol{t})) \\\\ &= \left(\boldsymbol{A}(t) - \boldsymbol{K}(t)\boldsymbol{\mathcal{C}}(t)\right)\hat{\mathbf{x}}(t\,\boldsymbol{t}\,\boldsymbol{t}) + \boldsymbol{K}(t)\boldsymbol{z}(t) \end{aligned} \tag{34}$$

where *K*(*t*) = *E*{*x*(*t*)*z*T(*t*)}*E*{*z*(*t*)*z*T(*t*)}-1. Equation (34) is known as the continuous-time Kalman filter (or the Kalman-Bucy filter) and is depicted in Fig. 3. This filter employs the state matrix *A*(*t*) akin to the signal generating model , which Kalman and Bucy call the message process [4]. The matrix *K*(*t*) is known as the filter gain, which operates on the error residual, namely the difference between the measurement *z*(*t*) and the estimated output *Ct xt* ()() ˆ . The calculation of an optimal gain is addressed in the next section.

Figure 3. The continuous-time Kalman filter which is also known as the Kalman-Bucy filter. The filter calculates conditional mean estimates *xt t* ˆ(|) from the measurements *z*(*t*).

#### **3.3.2 The Riccati Differential Equation**

Denote the state estimation error by *xt t* (|) = *x*(*t*) – *xt t* ˆ(|) . It is shown below that the filter minimises the error covariance { ( | ) ( | )} *<sup>T</sup> Ext tx t t* if the gain is calculated as

$$K(t) = P(t)\mathbf{C}^{T}(t)R^{-1}(t) \,. \tag{35}$$

<sup>&</sup>quot;Somewhere, something incredible is waiting to be known." *Carl Edward Sagan*

Smoothing, Filtering and Prediction:

= *A*() ( | ) *txt t* ˆ , since this

, which Kalman and Bucy call the message

<sup>1</sup> () () () () *<sup>T</sup> Kt PtC tR t* , (35)

(32) (33)

(34)

<sup>56</sup> Estimating the Past, Present and Future

If *xt t* ˆ(|) is a conditional mean estimate, from Lemma 3, criterion (32) will be met. Criterion

into (23), yields the conditional mean estimate

*Ext xt t* { ( ) ( | )} 0 ˆ ,

*Ext t xt t* { ( | ) ( | )} 0 ˆ .

*xt t Atxt t Kt zt Ctxt t* ˆ( | ) () ( | ) () () () ( | ) <sup>ˆ</sup> <sup>ˆ</sup>

*A*() () () ( | ) () () *t KtCt xt t Ktzt* ˆ ,

where *K*(*t*) = *E*{*x*(*t*)*z*T(*t*)}*E*{*z*(*t*)*z*T(*t*)}-1. Equation (34) is known as the continuous-time Kalman filter (or the Kalman-Bucy filter) and is depicted in Fig. 3. This filter employs the state matrix

process [4]. The matrix *K*(*t*) is known as the filter gain, which operates on the error residual, namely the difference between the measurement *z*(*t*) and the estimated output *Ct xt* ()() ˆ . The

*K(t)* Σ *C(t) x t* ˆ( ) 

*+* 

*A(t)* 

Figure 3. The continuous-time Kalman filter which is also known as the Kalman-Bucy filter.

*Ctxt t* () ( | ) ˆ *xt t* ˆ(|)

Denote the state estimation error by *xt t* (|) = *x*(*t*) – *xt t* ˆ(|) . It is shown below that the filter

The filter calculates conditional mean estimates *xt t* ˆ(|) from the measurements *z*(*t*).

minimises the error covariance { ( | ) ( | )} *<sup>T</sup> Ext tx t t* if the gain is calculated as

"Somewhere, something incredible is waiting to be known." *Carl Edward Sagan*

(33) can be satisfied if it is additionally assumed that *Ext t* { ( | )} ˆ

calculation of an optimal gain is addressed in the next section.

*+ - +* 

of *x t* ( ) are unbiased, namely

= *A*( )( { ( ) *t Ext* − *xt t* ˆ( | )} = 0. Thus, substituting

It is desired that *xt t* ˆ(|) and the estimate *xt t* ˆ(|)

yields *Ext t* {(|) − *xt t* ˆ( | )}

ˆ () ( | ) ˆ (|) () ( | ) ˆ ( ) *xt t A txt t*

*z t Ctxt t*

*A*(*t*) akin to the signal generating model

Σ

*z(t)* 

**3.3.2 The Riccati Differential Equation** 

*E*

in which *P*(*t*) = { ( | ) ( | )} *<sup>T</sup> Ext tx t t* is the solution of the Riccati differential equation

$$\dot{P}(t) = A(t)P(t) + P(t)A^\dagger(t) - P(t)\mathbb{C}^\dagger(t)R^{-1}(t)\mathbb{C}(t)P(t) + B(t)Q(t)B^\dagger(t). \tag{36}$$

*Lemma 4: In respect of the state estimation problem defined by (26) - (31), suppose that there exists a solution* 

$$P(t) = P^\tau(t) \ge 0 \tag{37}$$

*for the algebratic Riccati equation (36) satisfying* 

$$A(t) - P(t)\mathbb{C}^{\top}(t) \ge 0\tag{38}$$

*for all t in the interval [0,T]. Then the filter (34) having the gain (35) minimises P(t) =*  { ( | ) ( | )} *<sup>T</sup> Ext tx t t .* 

*Proof: Subtracting (34) from (26) results in* 

$$
\dot{\tilde{\mathbf{x}}}(t|t) = \left( A(t) - K(t)\mathbb{C}(t) \right) \tilde{\mathbf{x}}(t|t) + B(t)w(t) - K(t)v(t) \,. \tag{39}
$$

*Applying Lemma 2 to the error system (39) gives* 

$$\dot{P}(t) = \left(A(t) - K(t)\mathbb{C}(t)\right)P(t) + P(t)(A(t) - K(t)\mathbb{C}(t))^\top + K(t)R(t)K^\top(t) + B(t)Q(t)B^\top(t)\tag{40}$$

*which can be rearranged as* 

$$\begin{aligned} \dot{P}(t) &= A(t)P(t) + P(t)A^\top(t) + B(t)Q(t)B^\top(t) \\ &+ \left(K(t) - P(t)\mathbf{C}^\top(t)R^{-1}(t)\right)R(t)\left(K^\top(t) - R^{-1}(t)\mathbf{C}^\top(t)P(t)\right) + P(t)\mathbf{C}^\top(t)R^{-1}(t)\mathbf{C}(t)P(t) \end{aligned} \tag{41}$$

*Setting P t*( )  *equal to the zero matrix results in a stationary point at (35) which leads to (40). From the differential of (40)* 

$$\ddot{P}(t) = \left(A(t) - P(t)\mathbb{C}^{\top}(t)\mathbb{R}^{-1}(t)\mathbb{C}(t)\right)\dot{P}(t) + \dot{P}(t)\left(A^{\top}(t) - P(t)\mathbb{C}^{\top}(t)\mathbb{R}^{-1}(t)\mathbb{C}(t)\right) \tag{42}$$

*and it can be seen that P t*( ) *≥ 0 provided that the assumptions (37) - (38) hold. Therefore, P(t) =*  { ( | ) ( | )} *<sup>T</sup> Ext tx t t is minimised at (35). �* 

The above development is somewhat brief and not very rigorous. Further discussions appear in [4] – [17]. It is tendered to show that the Kalman filter minimises the error covariance, provided of course that the problem assumptions are correct. In the case that it is desired to estimate an arbitrary linear combination *C*1(*t*) of states, the optimal filter is given by the system

$$
\dot{\hat{\mathbf{x}}}(t|\mathbf{t}) = A(t)\hat{\mathbf{x}}(t|\mathbf{t}) + K(t)\left(z(t) - \mathbf{C}(t)\hat{\mathbf{x}}(t|\mathbf{t})\right), \tag{43}
$$

$$
\hat{y}\_1(t) = C\_1(t)\hat{\mathbf{x}}(t) \,. \tag{44}
$$

<sup>&</sup>quot;The worst wheel of the cart makes the most noise." *Benjamin Franklin*

where

is a new state matrix,

( ) ( ) { ( ) ( )} *<sup>T</sup> Qt t Ewtw*

 

**3.3.4 Including Correlated Process and Measurement Noise** 

*T T*

is a new stochastic input that is uncorrelated with *v*(*t*), and

<sup>1</sup> <sup>1</sup> ( ) ( ) { ( ) ( )} ( ) ( ) *<sup>T</sup> <sup>T</sup> StR tEvtv R tS t*

<sup>1</sup> () () () () ( ) *<sup>T</sup> Qt StR tS t t*

few years, and it is time it should be stopped." *Simon Cameron*

.

derive the continuous-time version.

Suppose that the process and measurement noises are correlated, that is,

*w t Qt St E wv <sup>t</sup> v t S t Rt*

( ) () () () () ( ) ( ) () ()

*T*

The equation for calculating the optimal state estimate remains of the form (34), however, the differential Riccati equation and hence the filter gain are different. The generalisation of the optimal filter that takes into account (52) was published by Kalman in 1963 [5]. Kalman's approach was to first work out the corresponding discrete-time Riccati equation and then

The correlated noises can be accommodated by defining the signal model equivalently as

1

is a deterministic signal. It can easily be verified that the system (53) with the parameters

(54) – (56), has the structure (26) with *E{w*(*t*)*vT*(*τ*)*}* = 0. It is convenient to define

<sup>1</sup> <sup>1</sup> { ( ) ( )} { ( ) ( ) ( ) ( ) ( ) ( ) { ( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Ewtw Ewtv R tS t StR tEvtw*

"I am tired of all this thing called science here. We have spent millions in that sort of thing for the last

<sup>1</sup> *At At BtStR tCt* () () () () () () (54)

<sup>1</sup> *wt wt StR tvt* () () () () () (55)

() () () () () *t BtStR t y t* (56)

*xt Atxt Btwt t* () () () () () ()

  . (52)

, (53)

(57)

#### **3.3.3 Including Deterministic Inputs**

Suppose that the signal model is described by

$$
\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)w(t) + \mu(t) \tag{45}
$$

$$\mathbf{y}(t) = \mathbf{C}(t)\mathbf{x}(t) + \boldsymbol{\pi}(t) \,. \tag{46}$$

where *μ*(*t*) and *π*(*t*) are deterministic (or known) inputs. In this case, the filtered state estimate can be obtained by including the deterministic inputs as follows

$$
\dot{\hat{\mathbf{x}}}(t|t) = A(t)\hat{\mathbf{x}}(t|t) + K(t)\left(z(t) - C(t)\hat{\mathbf{x}}(t|t) - \pi(t)\right) + \mu(t) \tag{47}
$$

$$
\hat{\mathbf{y}}(t) = \mathbf{C}(t)\hat{\mathbf{x}}(t) + \boldsymbol{\pi}(t) \,. \tag{48}
$$

It is easily verified that subtracting (47) from (45) yields the error system (39) and therefore, the Kalman filter's differential Riccati equation remains unchanged.

*Example 2.* Suppose that an object is falling under the influence of a gravitational field and it is desired to estimate its position over [0, *t*] from noisy measurements. Denote the object's vertical position, velocity and acceleration by *x*(*t*), *x t* ( ) and *x t* ( ) , respectively. Let *g* denote the gravitational constant. Then *x t* ( ) = −g implies *x t* ( ) = *x*(0) − *gt* , so the model may be written as

$$\begin{aligned} \begin{bmatrix} \dot{\mathbf{x}}(t) \\ \ddot{\mathbf{x}}(t) \end{bmatrix} &= A \begin{bmatrix} \mathbf{x}(t) \\ \dot{\mathbf{x}}(t) \end{bmatrix} + \mu(t) \ \mathbf{\color{red}{\mathbf{x}}} \\\\ \mathbf{z}(t) &= \mathbf{C} \begin{bmatrix} \mathbf{x}(t) \\ \dot{\mathbf{x}}(t) \end{bmatrix} + \upsilon(t) \ \mathbf{\color{red}{\mathbf{x}}} \end{aligned} \tag{49}$$

where *A* = 0 1 0 0 is the state matrix, *µ*(*t*) = *x*(0) *gt g* is a deterministic input and *C*<sup>=</sup>

1 0 is the output mapping. Thus, the Kalman filter has the form

$$
\begin{bmatrix}
\dot{\hat{\mathbf{x}}}(t\mid t) \\
\ddot{\hat{\mathbf{x}}}(t\mid t)
\end{bmatrix} = A \begin{bmatrix}
\hat{\mathbf{x}}(t\mid t) \\
\dot{\hat{\mathbf{x}}}(t\mid t)
\end{bmatrix} + K \begin{bmatrix}
\hat{\mathbf{x}}(t\mid t) \\
\mathbf{c}(t) - \mathbf{C}
\end{bmatrix} \begin{bmatrix}
\hat{\mathbf{x}}(t\mid t) \\
\dot{\hat{\mathbf{x}}}(t\mid t)
\end{bmatrix} + \mu(t) \,,\tag{50}
$$

$$
\hat{y}(t|t) = \mathbf{C}(t)\hat{x}(t|t) \,. \tag{51}
$$

where the gain *K* is calculated from (35) and (36), in which *BQBT* = 0.

<sup>&</sup>quot;These, Gentlemen, are the opinions upon which I base my facts." *Winston Leonard Spencer-Churchill*

#### **3.3.4 Including Correlated Process and Measurement Noise**

Suppose that the process and measurement noises are correlated, that is,

$$E\left[\begin{bmatrix}w(t)\\v(t)\end{bmatrix}\begin{bmatrix}w^{\top}(\tau)&v^{\top}(\tau)\end{bmatrix}\right] = \begin{bmatrix}Q(t)&S(t)\\S^{\top}(t)&R(t)\end{bmatrix}\delta(t-\tau)\tag{52}$$

The equation for calculating the optimal state estimate remains of the form (34), however, the differential Riccati equation and hence the filter gain are different. The generalisation of the optimal filter that takes into account (52) was published by Kalman in 1963 [5]. Kalman's approach was to first work out the corresponding discrete-time Riccati equation and then derive the continuous-time version.

The correlated noises can be accommodated by defining the signal model equivalently as

$$
\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)\overline{\mathbf{w}}(t) + \mu(t) \tag{53}
$$

where

Smoothing, Filtering and Prediction:

(45) (46)

(47) (48)

(49)

(50)

(51)

<sup>58</sup> Estimating the Past, Present and Future

This filter minimises the error covariance <sup>1</sup> <sup>1</sup> () () () *<sup>T</sup> C tPtC t* . The generalisation of the Kalman filter for problems possessing deterministic inputs, correlated noises, and a direct feed-

> ,

where *μ*(*t*) and *π*(*t*) are deterministic (or known) inputs. In this case, the filtered state

.

It is easily verified that subtracting (47) from (45) yields the error system (39) and therefore,

*Example 2.* Suppose that an object is falling under the influence of a gravitational field and it is desired to estimate its position over [0, *t*] from noisy measurements. Denote the object's vertical position, velocity and acceleration by *x*(*t*), *x t* ( ) and *x t* ( ) , respectively. Let *g* denote the gravitational constant. Then *x t* ( ) = −g implies *x t* ( ) = *x*(0) − *gt* , so the model may be written as

 is a deterministic input and *C*<sup>=</sup>

*xt Atxt Btwt t* () () () () () ()

*y*() () () () *t Ctxt t*

*xt t Atxt t Kt zt Ctxt t t t* ˆ( | ) () ( | ) () () () ( | ) () () ˆ ˆ

*y*ˆ ˆ () () () () *t Ctxt t*

() () ( ) () ()

,

( ) ( ) ( ) ( ) *x t zt C vt x t* ,

ˆ(|) ˆ(|) ˆ(|) ( ) ( ) ˆ(|) ˆ(|) ˆ(|)

*y*ˆ ˆ ( | ) () ( | ) *t t Ctxt t* ,

,

*A K zt C t*

"These, Gentlemen, are the opinions upon which I base my facts." *Winston Leonard Spencer-Churchill*

*A t*

*x*(0) *gt g* 

*xt xt*

is the state matrix, *µ*(*t*) =

1 0 is the output mapping. Thus, the Kalman filter has the form

*xt t xt t xt t*

 

*xt t xt t xt t*

where the gain *K* is calculated from (35) and (36), in which *BQBT* = 0.

 

*xt xt*

estimate can be obtained by including the deterministic inputs as follows

the Kalman filter's differential Riccati equation remains unchanged.

through term is developed below.

where *A* =

**3.3.3 Including Deterministic Inputs**  Suppose that the signal model is described by

$$
\overline{A}(t) = A(t) - B(t)S(t)R^{-1}(t)C(t) \tag{54}
$$

is a new state matrix,

$$
\overline{w}(t) = w(t) - S(t)R^{-1}(t)v(t) \tag{55}
$$

is a new stochastic input that is uncorrelated with *v*(*t*), and

$$
\mu(t) = B(t)S(t)R^{-1}(t)y(t)\tag{56}
$$

is a deterministic signal. It can easily be verified that the system (53) with the parameters (54) – (56), has the structure (26) with *E{w*(*t*)*vT*(*τ*)*}* = 0. It is convenient to define

$$
\hat{\mathbf{r}} \times \hat{\mathbf{r}} \times \hat{\mathbf{r}} \times \hat{\mathbf{r}} \times \hat{\mathbf{r}} \times \hat{\mathbf{r}} \times \hat{\mathbf{r}} \tag{55}
$$

where

$$
\hat{\mathbf{A}}(t) = A(t) - B(t)K(t)K^\*(t)\mathbf{G}(t) \tag{56}
$$

is a new state matrix,

$$
\hat{\mathbf{n}}(t) = \mathbf{n}(t) - S(t)K^\*(t)\mathbf{v}(t) \tag{57}
$$

is a new stochastic input that is uncorrelated with  $\mathbf{v}(t)$ , and

$$
p(t) = \mathbf{N}(t)S(t)K^\*(t)\mathbf{v}(t) \tag{58}
$$

is a deterministic signal. If one can easily be verified that the system (55) with the parameters

(58) - (58) has the structure (28) with  $\mathbf{E}(t)\mathbf{v}(t) \mathbf{v}(t) = 0$ , the conservation of  $\mathbf{v}(t)$  and  $\mathbf{v}(t) = \mathbf{E}(t)\mathbf{v}(t)$ ,

$$
\hat{\mathbf{Q}}(t)S(t-\tau) = E\hat{\mathbf{u}}(t)\mathbf{v}(t) \tag{59}
$$

$$
= E\hat{\mathbf{u}}(t)\mathbf{v}(t) - E(t)\mathbf{n}(t)\mathbf{v}(t) + \mathbf{n}(t)\mathbf{v}(t) \tag{50}
$$

$$
+ S(t)\mathbf{v}(t)^2 \left[\mathbf{E}(t)\mathbf{n}(t)^T \mathbf{v}(t)\right] \mathbf{S}^T(t)
\tag{57}
$$

$$
= \left[Q(t) - S(t)K^\*(t)\mathbf{v}(t)\right] \mathbf{S}(t-\tau),
\tag{58}
$$

<sup>&</sup>quot;I am tired of all this thing called science here. We have spent millions in that sort of thing for the last few years, and it is time it should be stopped." *Simon Cameron*

**3.4 The Continuous-time Steady-State Minimum-Variance Filter** 

is fixed and can be pre-calculated. Consider the linear time-invariant system

*x t Ax t Bw t* () () () ,

*y*() () *t Cx t* ,

This section sets out the simplifications for the case where the signal model is stationary (or time-invariant). In this situation the structure of the Kalman filter is unchanged but the gain

assuming that Re{*λi*(*A*)} < 0, *E*{*w*(*t*)} = 0, *E*{*w*(*t*)*wT*(*τ*)} = *Q* (*t*)*δ*(*t* – *τ*), *E*{*v*(*t*)} = 0, *E*{*v*(*t*)*vT*(*τ*)} = *R* and *E*{w(*t*)*vT*(*τ*)} = 0. It follows from the approach of Section 3 that the Riccati differential

It will be shown that the solution for *P*(*t*) monotonically approaches a steady-state asymptote, in which case the filter gain can be calculated before running the filter. The following result is required to establish that the solutions of the above Riccati differential

*Lemma 5 [11], [19], [20]: Suppose that X(t) is a solution of the Lyapunov differential equation* 

*over an interval t [0, T]. Then the existence of a solution X*(*t0*) *≥ 0 implies X*(*t*) *≥ 0 for all t [0, T].* 

The monotonicity of Riccati differential equations has been studied by Bucy [6], Wonham [23], Poubelle *et al* [19] and Freiling [20]. The latter's simple proof is employed below.

"Today's scientists have substituted mathematics for experiments, and they wander off through equation after equation, and eventually build a structure which has no relation to reality." *Nikola Tesla*

*. Let P*(*t*) *=* (, ) () (, ) *<sup>T</sup> t Xt t*

*Proof: Denote the transition matrix of x t* ( ) *= - A*(*t*)*x*(*t*) *by* (, ) *<sup>T</sup> t*

(, ) () (, ) (, ) () (, ) (, ) () (, ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> t Xt t t Xt t t Xt t*

 *=* (, ) () *<sup>T</sup> t At*

*Therefore, a solution X*(*t0*) *≥ 0 of (70) implies that X*(*t*) *≥ 0 for all t* 

0 (, ) () () () (, ) *<sup>T</sup> <sup>T</sup> t X t AX t X t A t*

*zt yt vt* () () () , (68)

() () () *<sup>T</sup> X t AX t X t A* (70)

  

*, for which* (, ) *t*

*, then from (70)* 

 *[0, T]. �* 

 *=* 

<sup>1</sup> () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> P t AP t P t A P t C R CP t BQB* . (69)

(66) (67)

**3.4.1 Riccati Differential Equation Monotonicity** 

equation for the corresponding Kalman filter is given by

together with the observations

equation are monotonic.

() (, ) *<sup>T</sup> A t t*

 *P t*( ) *.* 

 *and* (, ) *<sup>T</sup> t* 

The corresponding Riccati differential equation is obtained by substituting *A t*( ) for *A*(*t*) and *Q t*( ) for *Q*(*t*) within (36), namely,

$$
\dot{P}(t) = \overline{A}(t)P(t) + P(t)\overline{A}^T(t) - P(t)\overline{C}^T(t)R^{-1}(t)C(t)P(t) + B(t)\overline{Q}(t)B^T(t) \,. \tag{58}
$$

This can be rearranged to give

$$\dot{P}(t) = A(t)P(t) + P(t)A^\top(t) - K(t)(t)R^{-1}(t)K^\top(t) + B(t)Q(t)B^\top(t) \,. \tag{59}$$

in which the gain is now calculated as

$$K(t) = \left(P(t)\mathbf{C}^{\top}(t) + B(t)S(t)\right)\mathbf{R}^{-1}(t)\,. \tag{60}$$

#### **3.3.5 Including a Direct Feedthrough Matrix**

The approach of the previous section can be used to address signal models that possess a direct feedthrough matrix, namely,

$$\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)w(t) \,. \tag{61}$$

$$\mathbf{y}(t) = \mathbf{C}(t)\mathbf{x}(t) + D(t)\mathbf{w}(t) \,. \tag{62}$$

As before, the optimal state estimate is given by

$$
\dot{\hat{\mathbf{x}}}(t|\,t) = \mathbf{A}(t)\hat{\mathbf{x}}(t|\,t) + \mathbf{K}(t)\Big\{\mathbf{z}(t) - \mathbf{C}(t)\hat{\mathbf{x}}(t|\,t)\Big\},\tag{63}
$$

where the gain is obtained by substituting *S*(*t*) = *Q*(*t*)*DT*(*t*) into (60),

$$K(t) = \left(P(t)C^T(t) + B(t)Q(t)D^T(t)\right)R^{-1}(t) \,. \tag{64}$$

in which *P*(*t*) is the solution of the Riccati differential equation

$$\begin{aligned} \dot{P}(t) &= \left( A(t) - B(t)Q(t)D^\top(t)R^{-1}(t)\mathbb{C}(t) \right) P(t) \\ &+ P(t) \left( A(t) - B(t)Q(t)D^\top(t)R^{-1}(t)\mathbb{C}(t) \right)^\top + B(t) \left( Q(t) - Q(t)D(t)R^{-1}(t)D^\top(t)Q(t) \right) \mathbb{B}^\top(t) \dots \mathbb{B}^\top(t) \end{aligned}$$

Note that the above Riccati equation simplifies to

$$\dot{P}(t) = A(t)P(t) + P(t)A^\dagger(t) - K(t)(t)R^{-1}(t)K^\dagger(t) + B(t)Q(t)B^\dagger(t)\,. \tag{65}$$

<sup>&</sup>quot;No human investigation can be called real science if it cannot be demonstrated mathematically." *Leonardo di ser Piero da Vinci*

Smoothing, Filtering and Prediction:

(61) (62)

<sup>60</sup> Estimating the Past, Present and Future

The corresponding Riccati differential equation is obtained by substituting *A t*( ) for *A*(*t*) and

The approach of the previous section can be used to address signal models that possess a

*xt Atxt Btwt* () () () () () ,

*y*() () () () () *t Ctxt Dtwt* .

<sup>1</sup> <sup>1</sup> () () () () () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Pt At BtQtD tR tCt Bt Qt QtDtR tD tQt B t* .

"No human investigation can be called real science if it cannot be demonstrated mathematically."

<sup>1</sup> ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt AtPt Pt A t Kt tR tK t BtQtB t* . (65)

*tK t* 

where the gain is obtained by substituting *S*(*t*) = *Q*(*t*)*DT*(*t*) into (60),

in which *P*(*t*) is the solution of the Riccati differential equation

<sup>1</sup> () () () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt AtPt Pt A t PtC tR tCtPt BtQtB t* . (58)

<sup>1</sup> ( ) ( ) ( ) ( ) ( ) ( )( ) ( ) ( ) ( ) ( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt AtPt Pt A t Kt tR tK t BtQtB t* , (59)

<sup>1</sup> () () () () () () *<sup>T</sup> Kt PtC t BtSt R t* . (60)

*xt t Atxt t Kt zt Ctxt t* ˆ ˆ ( | ) () ( | ) () () () ( | ) <sup>ˆ</sup> , (63)

<sup>1</sup> () () () () () () () *<sup>T</sup> <sup>T</sup> Kt PtC t BtQtD t R t* , (64)

*Q t*( ) for *Q*(*t*) within (36), namely,

This can be rearranged to give

in which the gain is now calculated as

direct feedthrough matrix, namely,

**3.3.5 Including a Direct Feedthrough Matrix** 

As before, the optimal state estimate is given by

<sup>1</sup> () () () () () () () () *<sup>T</sup> Pt At BtQtD tR tCt Pt*

Note that the above Riccati equation simplifies to

*Leonardo di ser Piero da Vinci*

#### **3.4 The Continuous-time Steady-State Minimum-Variance Filter**

#### **3.4.1 Riccati Differential Equation Monotonicity**

This section sets out the simplifications for the case where the signal model is stationary (or time-invariant). In this situation the structure of the Kalman filter is unchanged but the gain is fixed and can be pre-calculated. Consider the linear time-invariant system

*x t Ax t Bw t* () () () , (66)

$$\mathbf{y}(t) = \mathbf{C}\mathbf{x}(t) \,. \tag{67}$$

together with the observations

$$z(t) = y(t) + v(t) \, \tag{68}$$

assuming that Re{*λi*(*A*)} < 0, *E*{*w*(*t*)} = 0, *E*{*w*(*t*)*wT*(*τ*)} = *Q* (*t*)*δ*(*t* – *τ*), *E*{*v*(*t*)} = 0, *E*{*v*(*t*)*vT*(*τ*)} = *R* and *E*{w(*t*)*vT*(*τ*)} = 0. It follows from the approach of Section 3 that the Riccati differential equation for the corresponding Kalman filter is given by

$$\dot{P}(t) = AP(t) + P(t)A^T - P(t)\mathbf{C}^T \mathbf{R}^{-1} \mathbf{C} P(t) + BQB^T. \tag{69}$$

It will be shown that the solution for *P*(*t*) monotonically approaches a steady-state asymptote, in which case the filter gain can be calculated before running the filter. The following result is required to establish that the solutions of the above Riccati differential equation are monotonic.

*Lemma 5 [11], [19], [20]: Suppose that X(t) is a solution of the Lyapunov differential equation* 

$$
\dot{X}(t) = AX(t) + X(t)A^\dagger \tag{70}
$$

*over an interval t [0, T]. Then the existence of a solution X*(*t0*) *≥ 0 implies X*(*t*) *≥ 0 for all t [0, T].* 

*Proof: Denote the transition matrix of x t* ( ) *= - A*(*t*)*x*(*t*) *by* (, ) *<sup>T</sup> t , for which* (, ) *t =*  () (, ) *<sup>T</sup> A t t and* (, ) *<sup>T</sup> t =* (, ) () *<sup>T</sup> t At . Let P*(*t*) *=* (, ) () (, ) *<sup>T</sup> t Xt t , then from (70)* 

$$\begin{aligned} 0 &= \Phi^\top(t,\tau) \Big( \dot{X}(t) - AX(t) - X(t)A^\top \Big) \Phi(t,\tau) \\\\ &= \dot{\Phi}^\top(t,\tau)X(t)\Phi(t,\tau) + \Phi^\top(t,\tau)\dot{X}(t)\Phi(t,\tau) + \Phi^\top(t,\tau)X(t)\dot{\Phi}(t,\tau) \\\\ &= \dot{P}(t) \end{aligned}$$

*Therefore, a solution X*(*t0*) *≥ 0 of (70) implies that X*(*t*) *≥ 0 for all t [0, T]. �* 

The monotonicity of Riccati differential equations has been studied by Bucy [6], Wonham [23], Poubelle *et al* [19] and Freiling [20]. The latter's simple proof is employed below.

<sup>&</sup>quot;Today's scientists have substituted mathematics for experiments, and they wander off through equation after equation, and eventually build a structure which has no relation to reality." *Nikola Tesla*

*where* () ! *<sup>k</sup> k* 

*Proof: Recall from Chapter 2 that the solution of (66) is* 

*The exponential matrix is defined as* 

*x t Ax t* () () and *y*(*t*) = Cx(*t*), *that is, Bw(t) = 0, which leads to*

0

2 2

*At At A t e I At*

*k*

<sup>0</sup> 0 1 <sup>0</sup> <sup>1</sup> <sup>0</sup> ( ) ( ) ( ) ( ) ... ( ) ( ) *<sup>N</sup>*

*t CA x t* .

0 1 1 0

 

*rank* <sup>2</sup>

Kalman filter cannot estimate all the states from the measurements.

"Who will observe the observers ?" *Arthur Stanley Eddington*

*N* 1

*for all N ≥ n. Therefore, we can take N = n within (78). Thus, equation (78) uniquely determines x*(*t0*) *if and only if O has full rank n. �*  A system that does not satisfy the above criterion is said to be unobservable. An alternate proof for the above lemma is provided in [10]. If a signal model is not observable then a

*CA*

() () () ( ) *<sup>N</sup>*

*C CA CA*

 

*t t t xt CA*

( )

*k k t A*

1 0

0

 ,

*N*

*t tk* . *Substituting (77) into (76) gives* 

*k*

<sup>1</sup>

 *t Cx t t CAx t*

1

 

*N*

0 () () ( )

*From the Cayley-Hamilton Theorem [22],* 

*k k y t t CA x t* 

<sup>2</sup>

<sup>0</sup> () ( ) ( ) *<sup>t</sup> At A t <sup>t</sup> x t e x t e Bw d*

( )

*Since the input signal w(t) within (66) is known, it suffices to consider the unforced system* 

2 !

 

*N N*

*N*

*N*

1

= *rank* <sup>2</sup>

.

1

 *n*

*CA*

*C CA CA*

*N*

 

*CA*

*C CA*

. (75)

<sup>0</sup> () ( ) *At y t Ce x t* . (76)

(77)

(78)

*Lemma 6 [19], [20]: Suppose for a t ≥ 0 and a δt > 0 there exist solutions P*(*t*) *≥ 0 and P*(*t + δt*) *≥ 0 of the Riccati differential equations* 

$$\dot{P}(t) = AP(t) + P(t)A^T - P(t)\mathbb{C}^T R^{-1} \mathbb{C}P(t) + B\mathbb{Q}B^T \tag{71}$$

*and* 

$$\dot{P}(t+\delta\_{\cdot}) = AP(t+\delta\_{\cdot}) + P(t+\delta\_{\cdot})A^{\top} - P(t+\delta\_{\cdot})\mathbf{C}^{\top}R^{-1}\mathbf{C}P(t+\delta\_{\cdot}) + BQB^{\top},\tag{72}$$

*respectively, such that P*(*t*) *− P*(*t + δt* ) *≥ 0. Then the sequence of matrices P(t) is monotonic nonincreasing, that is,* 

$$P(t) = P(t + \delta\_l) \ge 0,\text{ for all } t \ge \delta\_l. \tag{73}$$

*Proof: The conditions of the Lemma are the initial step of an induction argument. For the induction step, denote* ( ) *P <sup>t</sup> = P t*( ) *−* ( ) *P t <sup>t</sup> ,* ( ) *<sup>P</sup> <sup>t</sup> = P*(*t*) *−* ( ) *P t <sup>t</sup> and A =*  <sup>1</sup> ( ) 0.5 ( ) *<sup>T</sup> AP t C R C P <sup>t</sup> <sup>t</sup> . Then* 

$$\dot{P}(\mathcal{S}\_{\boldsymbol{t}}) = AP(\mathcal{S}\_{\boldsymbol{t}}) + P(\mathcal{S}\_{\boldsymbol{t}})A^{\top} - P(\boldsymbol{t} + \mathcal{S}\_{\boldsymbol{t}})\mathbf{C}^{\top}R^{-1}\mathbf{C}P(\boldsymbol{t} + \mathcal{S}\_{\boldsymbol{t}}) + P(\boldsymbol{t})\mathbf{C}^{\top}R^{-1}\mathbf{C}P(\boldsymbol{t})$$

$$= \overline{A}P(\mathcal{S}\_{\boldsymbol{t}}) + P(\mathcal{S}\_{\boldsymbol{t}})\overline{A}^{\top}\text{ .}$$

*which is of the form (70), and so the result (73) follows. �* 

A monotonic nondecreasing case can be established similarly – see [20].

#### **3.4.2 Observability**

The continuous-time system (66) – (67) is termed completely observable if the initial states, *x*(*t*0), can be uniquely determined from the inputs and outputs, *w*(*t*) and *y*(*t*), respectively, over an interval [0, *T*]. A simple test for observability is is given by the following lemma.

*Lemma 7 [10], [21]. Suppose that A n n and C <sup>p</sup> <sup>n</sup> . The system is observable if and only if the observability matrix O np n is of rank n, where*

$$O = \begin{bmatrix} \mathbf{C} \\ \mathbf{CA} \\ \mathbf{CA}^2 \\ \vdots \\ \mathbf{CA}^{n-1} \end{bmatrix}. \tag{74}$$

<sup>&</sup>quot;You can observe a lot by just watching." *Lawrence Peter (Yogi) Berra*

Smoothing, Filtering and Prediction:

<sup>62</sup> Estimating the Past, Present and Future

*Lemma 6 [19], [20]: Suppose for a t ≥ 0 and a δt > 0 there exist solutions P*(*t*) *≥ 0 and P*(*t + δt*) *≥ 0 of* 

*respectively, such that P*(*t*) *− P*(*t + δt* ) *≥ 0. Then the sequence of matrices P(t) is monotonic* 

*Proof: The conditions of the Lemma are the initial step of an induction argument. For the induction* 

<sup>1</sup> <sup>1</sup> ( ) ( ) ( ) ( ) ( ) () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> P AP P A P t C R CP t P t C R CP t <sup>t</sup> <sup>t</sup> <sup>t</sup> <sup>t</sup> <sup>t</sup>*

*which is of the form (70), and so the result (73) follows. �* 

The continuous-time system (66) – (67) is termed completely observable if the initial states, *x*(*t*0), can be uniquely determined from the inputs and outputs, *w*(*t*) and *y*(*t*), respectively, over an interval [0, *T*]. A simple test for observability is is given by the following lemma.

*,* ( ) *<sup>P</sup> <sup>t</sup>*

*,* (72)

<sup>1</sup> ( ) ( )( ) ( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> P t AP t P t A P t C R CP t BQB <sup>t</sup> <sup>t</sup> <sup>t</sup> <sup>t</sup> <sup>t</sup>*

<sup>1</sup> () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> P t AP t P t A P t C R CP t BQB* (71)

 *= P*(*t*) *−* ( ) *P t <sup>t</sup>*

*<sup>p</sup> <sup>n</sup> . The system is observable if and only if* 

. (74)

 *and A =* 

*P(t) − P(t + δt ) ≥ 0, for all t ≥ δt*. (73)

*the Riccati differential equations* 

*nonincreasing, that is,* 

*step, denote* ( ) *P <sup>t</sup>*

**3.4.2 Observability** 

*the observability matrix O* 

<sup>1</sup> ( ) 0.5 ( ) *<sup>T</sup> AP t C R C P <sup>t</sup> <sup>t</sup>*

*Lemma 7 [10], [21]. Suppose that A* 

*. Then* 

> 

"You can observe a lot by just watching." *Lawrence Peter (Yogi) Berra*

() () *<sup>T</sup> AP P A <sup>t</sup> <sup>t</sup>*

 *= P t*( ) *−* ( ) *P t <sup>t</sup>*

 *,* 

A monotonic nondecreasing case can be established similarly – see [20].

*np n is of rank n, where*

*O CA*

*n n and C* 

*C CA*

2

*n* 1

*CA*

 

*and* 

*Proof: Recall from Chapter 2 that the solution of (66) is* 

$$\mathbf{x}(t) = e^{\Lambda t} \mathbf{x}(t\_0) + \int\_{t\_0}^{t} e^{\Lambda(t-\tau)} B w(\tau) d\tau \,. \tag{75}$$

*Since the input signal w(t) within (66) is known, it suffices to consider the unforced system x t Ax t* () () and *y*(*t*) = Cx(*t*), *that is, Bw(t) = 0, which leads to*

$$\mathbf{y}(t) = \mathbf{C}e^{\mathbf{A}t}\mathbf{x}(t\_0) \,. \tag{76}$$

*The exponential matrix is defined as* 

$$\begin{aligned} e^{At} &= I + At + \frac{A^2 t^2}{2} + \dots + \frac{A^N t^N}{N!} \\ &= \sum\_{k=0}^{N-1} a\_k(t) A^k \end{aligned} \tag{77}$$

*where* () ! *<sup>k</sup> k t tk* . *Substituting (77) into (76) gives* 

$$\begin{aligned} \mathbf{y}(t) &= \sum\_{l=0}^{N-1} a\_l(t) \mathbf{C} \mathbf{A}^k \mathbf{x}(t\_0) \\ &= a\_0(t) \mathbf{C} \mathbf{x}(t\_0) + a\_l(t) \mathbf{C} \mathbf{A} \mathbf{x}(t\_0) + \dots + a\_{N-1}(t) \mathbf{C} \mathbf{A}^{N-1} \mathbf{x}(t\_0) \ . \end{aligned} \tag{78}$$
 
$$\begin{bmatrix} a\_0(t) & a\_l(t) & \cdots & a\_{N-1}(t) \end{bmatrix} \begin{bmatrix} \mathbf{C} \\ \mathbf{C} \mathbf{A} \\ \mathbf{C} \mathbf{A}^2 \\ \vdots \\ \mathbf{C} \mathbf{A}^{N-1} \end{bmatrix} \mathbf{x}(t\_0) \ . \tag{79}$$

*From the Cayley-Hamilton Theorem [22],* 

$$\begin{array}{c} \begin{pmatrix} \mathbf{C} \\ \mathbf{C} \\ \mathbf{C} \mathbf{A}^{2} \\ \vdots \\ \mathbf{C} \mathbf{A}^{N-1} \end{pmatrix} = \text{rank} \begin{pmatrix} \mathbf{C} \\ \mathbf{C} \mathbf{A} \\ \mathbf{C} \mathbf{A}^{2} \\ \vdots \\ \mathbf{C} \mathbf{A}^{N-1} \end{pmatrix} \end{array}$$

*for all N ≥ n. Therefore, we can take N = n within (78). Thus, equation (78) uniquely determines x*(*t0*) *if and only if O has full rank n. �* 

A system that does not satisfy the above criterion is said to be unobservable. An alternate proof for the above lemma is provided in [10]. If a signal model is not observable then a Kalman filter cannot estimate all the states from the measurements.

<sup>&</sup>quot;Who will observe the observers ?" *Arthur Stanley Eddington*

where

them." *Phil Pastoret*

*t P*(*t*) *P t*( )

1 0.9800 −2.00

10 0.8316 −1.41

solution converges to the algebraic Riccati equation solution and lim ( ) 0

*x t t A KC x t t Kz t* ˆ( | ) ( ) ( | ) () ˆ , *y*ˆ ˆ (|) (|) *t t Cx t t* ,

<sup>1</sup> () ( ) *H s C sI A KC K OE*

*m m m m m n n n n n n n*

... ( ) ... *m m m m n n n n*

*b s b s bs b G s a s a s as a* 

"If you think dogs can't count, try putting three dog biscuits in your pocket and then giving Fido two of

*Example 5.* Suppose that a signal *y*(*t*) is generated by the system

which can be realised in the controllable canonical form [10]

Table 1. Solutions of (69) for Example 4.

filter (81) – (82) has the transfer function

This system's transfer function is

invariant state-space parameters into (34) - (35) to give

100 0.4419 −8.13\*10-2 1000 0.4121 −4.86\*10-13

*Example 4.* Suppose that *A* = −1 and *B* = *C* = *Q* = *R* = 1, for which the solution of the algebraic Riccati equation (80) is *P* = 0.4121. Using Euler's integration method (see Chapter 1) with *δt* = 0.01 and *P*(0) = 1, the calculated solutions of the Riccati differential equation (69) are listed in Table 1. The data in the table demonstrate that the Riccati differential equation

The so-called infinite-horizon (or stationary) Kalman filter is obtained by substituting time-

in which *P* is calculated by solving the algebraic Riccati equation (80). The output estimation

1

 

*d d <sup>d</sup> b b b b dt dt dt y t w t d d d a a a a dt dt dt*

1

( ) ( )

1 1 1 0

...

1 1 1 0

1 1 0

...

1 1 1 0

1

,

*t P t*

*<sup>T</sup>* <sup>1</sup> *K PC R* , (83)

. (84)

.

.

(81) (82)

*Example 3.* The pair *A* = 1 0 0 1 , *C* = 1 0 is expected to be unobservable because one of the two states appears as a system output whereas the other is hidden. By inspection, the rank of the observability matrix, *<sup>C</sup> CA* = 1 0 1 0 , is 1. Suppose instead that *C* = 1 0 0 1 , namely measurements of both states are available. Since the observability matrix *<sup>C</sup> CA* = 1 0 

0 1 1 0 0 1 is of rank 2, the pair (*A*, *C*) is observable, that is, the states can be uniquely

reconstructed from the measurements.

#### **3.4.3 The Algebraic Riccati Equation**

Some pertinent facts concerning the Riccati differential equation (69) are:


In view of the above, it is not surprising that if the states can be estimated uniquely, in the limit as *t* approaches infinity, the Riccati differential equation will have a unique steady state solution.

*Lemma 8 [20], [23], [24]: Suppose that Re{λi(A)} < 0, the pair (A, C) is observable, then the solution of the Riccati differential equation (69) satisfies* 

$$\lim\_{t \to \infty} P(t) = P\_{\text{a.s.}} \tag{79}$$

*where P is the solution of the algebraic Riccati equation* 

$$0 = AP + PA^{\top} - PC^{\top}R^{-1}CP + BQB^{\top}.\tag{80}$$

A proof that the solution *P* is in fact unique appears in [24]. A standard way for calculating solutions to (80) arises by finding an appropriate set of Schur vectors for the Hamiltonian

matrix *T* 1 *T T A CRC <sup>H</sup> BQB A* , see [25] and the Hamiltonian solver within *MatlabTM*.

<sup>&</sup>quot;Stand firm in your refusal to remain conscious during algebra. In real life, I assure you, there is no such thing as algebra." *Francis Ann Lebowitz*


Table 1. Solutions of (69) for Example 4.

*Example 4.* Suppose that *A* = −1 and *B* = *C* = *Q* = *R* = 1, for which the solution of the algebraic Riccati equation (80) is *P* = 0.4121. Using Euler's integration method (see Chapter 1) with *δt* = 0.01 and *P*(0) = 1, the calculated solutions of the Riccati differential equation (69) are listed in Table 1. The data in the table demonstrate that the Riccati differential equation solution converges to the algebraic Riccati equation solution and lim ( ) 0 *t P t* . can Riccati

The so-called infinite-horizon (or stationary) Kalman filter is obtained by substituting timeinvariant state-space parameters into (34) - (35) to give

$$
\dot{\hat{\mathbf{x}}}(t|\mathbf{t}) = (A - KC)\hat{\mathbf{x}}(t|\mathbf{t}) + Kz(t) \,. \tag{81}
$$

$$
\hat{y}(t \mid t) = \mathbb{C}\hat{\mathbf{x}}(t \mid t) \tag{82}
$$

where

Smoothing, Filtering and Prediction:

1 0 0 1 

*CA* =

,

, *C* = 1 0 is expected to be unobservable because one of

*<sup>t</sup> Pt P ,* (79)

<sup>1</sup> 0 *T T <sup>T</sup> AP PA PC R CP BQB .* (80)

, see [25] and the Hamiltonian solver within *MatlabTM*.

, is 1. Suppose instead that *C* =

<sup>64</sup> Estimating the Past, Present and Future

the two states appears as a system output whereas the other is hidden. By inspection, the

is of rank 2, the pair (*A*, *C*) is observable, that is, the states can be uniquely

From Lemma 6, if it is suitably initialised then its solutions will be monotonically

If the pair (*A*, *C*) is observable then the states can be uniquely determined from the

In view of the above, it is not surprising that if the states can be estimated uniquely, in the limit as *t* approaches infinity, the Riccati differential equation will have a unique steady

*Lemma 8 [20], [23], [24]: Suppose that Re{λi(A)} < 0, the pair (A, C) is observable, then the solution* 

A proof that the solution *P* is in fact unique appears in [24]. A standard way for calculating solutions to (80) arises by finding an appropriate set of Schur vectors for the Hamiltonian

"Stand firm in your refusal to remain conscious during algebra. In real life, I assure you, there is no such

namely measurements of both states are available. Since the observability matrix *<sup>C</sup>*

*CA* =

Some pertinent facts concerning the Riccati differential equation (69) are:

lim ( )

Its solutions correspond to the covariance of the state estimation error.

*Example 3.* The pair *A* =

 

rank of the observability matrix, *<sup>C</sup>*

reconstructed from the measurements.

**3.4.3 The Algebraic Riccati Equation** 

*of the Riccati differential equation (69) satisfies* 

*where P is the solution of the algebraic Riccati equation* 

*T* 1

*T T A CRC <sup>H</sup> BQB A* 

thing as algebra." *Francis Ann Lebowitz*

nonincreasing.

outputs.

state solution.

matrix

1 0 0 1 

$$K = \mathbf{P} \mathbf{C}^T \mathbf{R}^{-1} \tag{83}$$

in which *P* is calculated by solving the algebraic Riccati equation (80). The output estimation filter (81) – (82) has the transfer function

$$H\_{\rm OE}(\mathbf{s}) = \mathbb{C}(\mathbf{s}\mathbf{I} - \mathbf{A} + \mathbf{K}\mathbf{C})^{-1}\mathbf{K} \,. \tag{84}$$

*Example 5.* Suppose that a signal *y*(*t*) is generated by the system

$$y(t) = \left(\frac{b\_m}{dt^m}\frac{d^m}{dt^m} + b\_{m-1}\frac{d^{m-1}}{dt^{m-1}} + ... + b\_1\frac{d}{dt} + b\_0\right)y(t) \dots$$

$$\left(\frac{d^n}{a\_n}\frac{d^n}{dt^n} + a\_{n-1}\frac{d^{n-1}}{dt^{n-1}} + ... + a\_1\frac{d}{dt} + a\_0\right)$$

This system's transfer function is

$$G(\mathbf{s}) = \frac{b\_m \mathbf{s}^m + b\_{m-1} \mathbf{s}^{m-1} + \dots + b\_1 \mathbf{s} + b\_0}{a\_n \mathbf{s}^n + a\_{n-1} \mathbf{s}^{n-1} + \dots + a\_1 \mathbf{s} + a\_0} \text{ \textbf{\underline{\underline{\mathbf{s}}}}} $$

which can be realised in the controllable canonical form [10]

<sup>&</sup>quot;If you think dogs can't count, try putting three dog biscuits in your pocket and then giving Fido two of them." *Phil Pastoret*

The Kalman-Yakubovich-Popov Lemma is set out below. Further details appear in [15] and a historical perspective is provided in [26]. A proof of this Lemma makes use of the identity

*R I*

( )( ) *<sup>T</sup> T T P sI A sI A P BQB PC RCP .* (87)

, (90)

<sup>1</sup> <sup>0</sup> ( ) () ( ) <sup>0</sup>

*.* 

*(iii) There exists a nonnegative solution P of the algebraic Riccati equation (80).* 

*Proof: To establish equivalence between (i) and (iii), use (85) within (80) to obtain* 

*Premultiplying and postmultiplying (87) by* <sup>1</sup> *C sI A* ( )  *and* <sup>1</sup> ( ) *T T sI A C , respectively, results in* 

<sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> () ( ) () ( ) *<sup>T</sup> T T <sup>T</sup> T T C sI A PC RCP sI A C C sI A PC CP sI A C R*

1/2 1 ( ) ( ) *H s IR s OE*

"Arithmetic is being able to count up to twenty without taking off your shoes." *Mickey Mouse*

*The Schur complement formula can be used to verify the equivalence of (ii) and (iii). �*  In Chapter 1, it is shown that the transfer function matrix of the optimal Wiener solution for

 1 1/ 2 1/ 2 1/ 2 1 1/ 2 ( ) ( ) *<sup>T</sup> T T C sI A KR R R K sI A C R* 

<sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) ( ) ( )( )( ) *<sup>T</sup> T T T T T T C sI A PC CP sI A C C sI A BQB PC RCP sI A C .* (88)

*<sup>H</sup> <sup>Q</sup> sI A C s C sI A I*

*Lemma 9 [15], [26]: Consider the spectral density matrix*

 *(−∞,∞).* 

*Then the following statements are equivalent:* 

*(ii)* 0 *<sup>T</sup> T T BQB AP PA PC CP R* 

*≥ 0 for all ω*

<sup>1</sup> <sup>1</sup> () ( ) *<sup>T</sup> T T C sI A BQB sI A C R*

*(i)* ( ) *<sup>H</sup> j*

*Hence,* 

 0 *.*

where *s* = *jω* and

() () *s GQG s R*

output estimation is given by

( )( ) *<sup>T</sup> <sup>T</sup> PA AP P sI A sI A P* . (85)

1

*.* (86)

(89) 

*T T*

$$A = \begin{bmatrix} -a\_{n-1} & -a\_{n-2} & \dots & -a\_1 & -a\_0 \\ 1 & 0 & & \dots & 0 \\ 0 & 1 & & & \vdots \\ \vdots & & \ddots & 0 & 0 \\ 0 & 0 & \dots & 1 & 0 \end{bmatrix}, B = \begin{bmatrix} 1 \\ 0 \\ \vdots \\ 0 \\ 0 \end{bmatrix} \text{ and } C = \begin{bmatrix} b\_m & b\_{m-1} & \dots & b\_1 & b\_0 \end{bmatrix}.$$

The optimal filter for estimating *y*(*t*) from noisy measurements (29) is obtained by using the above state-space parameters within (81) – (83). It has the structure depicted in Figs. 3 and 4. These figures illustrate two features of interest. First, the filter's model matches that within the signal generating process. Second, designing the filter is tantamount to finding an optimal gain.

*z t*( ) <sup>1</sup> 1 1 1 0 1 1 1 1 0 ... ... *m m m m m n n n n n n n d d <sup>d</sup> b b b b dt dt dt d d d a a a a dt dt dt* <sup>∑</sup> *K yt t* ˆ(|) *─*

Figure 4. The optimal filter for Example 5.

#### **3.4.4 Equivalence of the Wiener and Kalman Filters**

When the model parameters and noise statistics are time-invariant, the Kalman filter reverts to the Wiener filter. The equivalence of the Wiener and Kalman filters implies that spectral factorisation is the same as solving a Riccati equation. This observation is known as the Kalman-Yakubovich-Popov Lemma (or Positive Real Lemma) [15], [26], which assumes familiarity with the following Schur complement formula.

For any matrices <sup>11</sup> , 12 and <sup>22</sup> , where 11 and 22 are symmetric, the following are equivalent.

(i) 11 12 12 22 <sup>0</sup> *<sup>T</sup>* .

$$\begin{array}{c} \text{(ii)} \qquad \Phi\_{11} \geq 0, \ \Phi\_{22} \geq \ \Phi\_{12}^{r} \Phi\_{11}^{-1} \Phi\_{12} \dots \end{array}$$

(iii) <sup>22</sup> ≥ 0, <sup>11</sup> ≥ <sup>1</sup> 12 22 12 *<sup>T</sup>* .

<sup>&</sup>quot;Mathematics is the queen of sciences and arithmetic is the queen of mathematics." *Carl Friedrich Gauss*

The Kalman-Yakubovich-Popov Lemma is set out below. Further details appear in [15] and a historical perspective is provided in [26]. A proof of this Lemma makes use of the identity

$$-PA^{\top} - AP = P(-\text{s}I - A^{\top}) + (\text{s}I - A)P \,. \tag{85}$$

*Lemma 9 [15], [26]: Consider the spectral density matrix*

$$
\Delta\Delta^H(\mathbf{s}) = \begin{bmatrix} \mathbf{C}(\mathbf{s}I - A)^{-1} & I \end{bmatrix} \begin{bmatrix} Q & \mathbf{0} \\ \mathbf{0} & R \end{bmatrix} \begin{bmatrix} (-\mathbf{s}I - A^T)^{-1}\mathbf{C}^T \\ I \end{bmatrix}.\tag{86}
$$

*Then the following statements are equivalent:* 

$$\text{(i)}\qquad\qquad\Delta\Delta^H(j\rho)\geq 0\text{ for all }\omega\in(-\infty,\infty).$$

$$\text{(iii)}\qquad\qquad\begin{bmatrix}\boldsymbol{B}\boldsymbol{Q}\boldsymbol{B}^{\boldsymbol{T}}+\boldsymbol{A}\boldsymbol{P}+\boldsymbol{P}\boldsymbol{A}^{\boldsymbol{T}}&\boldsymbol{P}\boldsymbol{C}^{\boldsymbol{T}}\\\boldsymbol{C}\boldsymbol{P}&\boldsymbol{R}\end{bmatrix}\geq\boldsymbol{0}$$

*(iii) There exists a nonnegative solution P of the algebraic Riccati equation (80).* 

*.* 

*Proof: To establish equivalence between (i) and (iii), use (85) within (80) to obtain* 

$$P(-sI - A^T) + (sI - A)P = BQB^T - PC^TRCP\,\,. \tag{87}$$

*Premultiplying and postmultiplying (87) by* <sup>1</sup> *C sI A* ( )  *and* <sup>1</sup> ( ) *T T sI A C , respectively, results in* 

$$\text{C}(\text{s}\text{I}-\text{A})^{-1}\text{PC}^{\top}+\text{C}\text{P}(-\text{s}\text{I}-\text{A}^{\top})^{-1}\text{C}^{\top}=\text{C}(\text{s}\text{I}-\text{A})^{-1}(\text{B}\text{Q}\text{B}^{\top}-\text{PC}^{\top}\text{R}\text{C}\text{P})(-\text{s}\text{I}-\text{A}^{\top})^{-1}\text{C}^{\top}.\tag{88}$$

*Hence,* 

Smoothing, Filtering and Prediction:

*yt t* ˆ(|)

and <sup>1</sup> 1 0 ... *C b b bb m m* .

1 1 1 0

...

1 1 1 0

...

1

*d d <sup>d</sup> b b b b dt dt dt d d d a a a a dt dt dt*

 

*m m m m m n n n n n n n*

<sup>66</sup> Estimating the Past, Present and Future

00 0

The optimal filter for estimating *y*(*t*) from noisy measurements (29) is obtained by using the above state-space parameters within (81) – (83). It has the structure depicted in Figs. 3 and 4. These figures illustrate two features of interest. First, the filter's model matches that within the signal generating process. Second, designing the filter is tantamount to finding an

When the model parameters and noise statistics are time-invariant, the Kalman filter reverts to the Wiener filter. The equivalence of the Wiener and Kalman filters implies that spectral factorisation is the same as solving a Riccati equation. This observation is known as the Kalman-Yakubovich-Popov Lemma (or Positive Real Lemma) [15], [26], which assumes

For any matrices <sup>11</sup> , 12 and <sup>22</sup> , where 11 and 22 are symmetric, the following are

"Mathematics is the queen of sciences and arithmetic is the queen of mathematics." *Carl Friedrich Gauss*

1 2 1 0 ... 1 1 0 ... 0 0 0 1 ,

*n n a a aa*

*A B*

Figure 4. The optimal filter for Example 5.

<sup>∑</sup> *K* 

**3.4.4 Equivalence of the Wiener and Kalman Filters** 

familiarity with the following Schur complement formula.

12 11 12 *<sup>T</sup>* .

12 22 12 *<sup>T</sup>* .

optimal gain.

*─*

equivalent.

(i) 11 12

12 22 <sup>0</sup> *<sup>T</sup>* .

(ii) <sup>11</sup> ≥ 0, <sup>22</sup> ≥ <sup>1</sup>

(iii) <sup>22</sup> ≥ 0, <sup>11</sup> ≥ <sup>1</sup>

0 0 ... 1 0 0

*z t*( ) <sup>1</sup>

$$\begin{aligned} \Delta\Delta(\mathbf{s}) &= \mathbf{C}Q\mathbf{G}(\mathbf{s}) + \mathbf{R} \\ &= \mathbf{C}(\mathbf{s}I - A)^{-1}BQB^{\top}(\mathbf{-s}I - A^{\top})^{-1}\mathbf{C}^{\top} + \mathbf{R} \\ &= \mathbf{C}(\mathbf{s}I - A)^{-1}PC^{\top}RCP(-\mathbf{s}I - A^{\top})^{-1}\mathbf{C}^{\top} + \mathbf{C}(\mathbf{s}I - A)^{-1}PC^{\top} + \mathbf{C}P(-\mathbf{s}I - A^{\top})^{-1}\mathbf{C}^{\top} + \mathbf{R} \\ &= \left(\mathbf{C}(\mathbf{s}I - A)^{-1}KR^{\top \top 2} + R^{\top 2}\right) \left(\mathbf{R}^{\top 2}K^{\top}(-\mathbf{s}I - A^{\top})^{-1}\mathbf{C}^{\top} + R^{\top 2}\right) \\ &\ge \mathbf{0} \end{aligned} \tag{89}$$

*The Schur complement formula can be used to verify the equivalence of (ii) and (iii). �* 

In Chapter 1, it is shown that the transfer function matrix of the optimal Wiener solution for output estimation is given by

1/2 1 ( ) ( ) *H s IR s OE* , (90)

where *s* = *jω* and

<sup>&</sup>quot;Arithmetic is being able to count up to twenty without taking off your shoes." *Mickey Mouse*

Signals and

Filtered state

Filter gain

Signals and

Filtered state

Filter gain and

algebraic Riccati

equation

system

and output

factorisation

and Riccati

differential

equation

system

and output

factorisation

*E*{*w*(*t*)} = *E*{v(*t*)} = 0. *E*{*w*(*t*)*wT*(*t*)} = *Q*(*t*) and *E*{v(*t*)*vT*(*t*)} = *R*(*t*) are known. *A*(*t*), *B*(*t*) and *C*(*t*)

Table 2. Main results for time-varying output estimation.

Table 3. Main results for time-invariant output estimation.

way is to be stupider than everybody else - but persistent." *Raoul Bott*

*E*{*w*(*t*)} = *E*{v(*t*)} = 0. *E*{*w*(*t*)*wT*(*t*)} = *Q* and *E*{v(*t*)*vT*(*t*)} = *R* are known. *A*, *B* and *C* are known. The pair (*A*, *C*) is observable.

are known.

ASSUMPTIONS MAIN RESULTS

*Q*(*t*) *>* 0 and *R*(*t*) *>* 0. *K*(*t*) = *P*(*t*)*C*(*t*)*R*-1(*t*)

ASSUMPTIONS MAIN RESULTS

*Q >* 0 and *R >* 0. *K* = *PCR*-1

"There are two ways to do great mathematics. The first is to be smarter than everybody else. The second

*xt Atxt Btwt* () () () () () *y*() () () *t Ctxt zt yt vt* () () ()

*xt t Atxt t Kt zt Ctxt t* ˆ( | ) ( ) ( | ) ( )( ( ) ( ) ( | )) ˆ ˆ

*y*ˆ ˆ ( | ) () ( | ) *t t Ctxt t*

<sup>1</sup> () () () () () () () () *<sup>T</sup> <sup>T</sup> PtC tR tCtPt BtQtB t*

*x t Ax t Bw t* () () ()

*y*() () *t Cx t*

*zt yt vt* () () ()

*x t t Ax t t K z t Cx t t* ˆ ˆ ( | ) ( | ) ( ( ) ( | )) ˆ

*y*ˆ ˆ (|) (|) *t t Cx t t*

<sup>1</sup> 0 *T T <sup>T</sup> AP PA PC R CP BQB*

() () () () () *<sup>T</sup> Pt AtPt PtA t*

$$
\Delta \boldsymbol{\Delta}^{H} (\mathbf{s}) = \mathbf{G} \boldsymbol{Q} \mathbf{G}^{H} (\mathbf{s}) + \boldsymbol{R} \,. \tag{91}
$$

is the spectral density matrix of the measurements. It follows from (91) that

$$
\Delta(\mathbf{s}) = \mathbf{C}(\mathbf{s}I - A)^{-1}\mathbf{K}\mathbf{R}^{1/2} + \mathbf{R}^{1/2}.\tag{92}
$$

The Wiener filter (90) requires the spectral factor inverse, <sup>1</sup> ( )*s* , which can be found from (92) and using [*I* + *C*(*sI* − *A*)-1*K]-1* = *I* + *C*(*sI* − *A* + *KC*)*-1K* to obtain

$$
\Delta^{-1}(\mathbf{s}) = R^{-1/2} - R^{-1/2}\mathbf{C}(\mathbf{s}I - A + K\mathbf{C})^{-1}K.\tag{93}
$$

Substituting (93) into (90) yields

$$H\_{\rm OE}(\mathbf{s}) = \mathbf{C}(\mathbf{s}\mathbf{I} - \mathbf{A} + \mathbf{K}\mathbf{C})^{-1}\mathbf{K} \,\, \mathbf{A} \tag{94}$$

which is identical to the minimum-variance output estimator (84).

*Example 5.* Consider a scalar output estimation problem where *G*(*s*) = (s + 1)- 1, *Q* = 1, *R* = 0.0001 and the Wiener filter transfer function is

$$H(\mathbf{s}) = 99(\mathbf{s} + 100)^{-1}.\tag{95}$$

Applying the bilinear transform yields *A* = −1, *B* = *C* = 1, for which the solution of (80) is *P* = 0.0099. By substituting *K* = *PCTR-1* = 99 into (90), one obtains (95).

#### **3.5 Conclusion**

The Kalman-Bucy filter which produces state estimates *xt t* ˆ(|) and output estimates *yt t* ˆ(|) from the measurements *z*(*t*) = *y*(*t*) + *v*(*t*) at time *t* is summarised in Table 2. This filter minimises the variances of the state estimation error, *E xt* {( ( ) − *xt t xt* ˆ( | ))( ( ) − ˆ( | )) }*<sup>T</sup> xt t* = *P*(*t*) and the output estimation error, *E yt* {( ( ) − *y*ˆ( | ))( ( ) *t t yt* − ˆ( | )) }*<sup>T</sup> yt t* = *C*(*t*)*P*(*t*)*CT*(*t*).

When the model parameters and noise covariances are time-invariant, the gain is also timeinvariant and can be precalculated. The time-invariant filtering results are summarised in Table 3. In this stationary case, spectral factorisation is equivalent to solving a Riccati equation and the transfer function of the output estimation filter, *HOE*(*s*) = <sup>1</sup> *C sI A KC K* ( ) , is identical to that of the Wiener filter. It is not surprising that the Wiener and Kalman filters are equivalent since they are both derived by completing the square of the error covariance.

<sup>&</sup>quot;Mathematics consists in proving the most obvious thing in the least obvious way." *George Polya*

Smoothing, Filtering and Prediction:

( )*s* , which can be found from

<sup>68</sup> Estimating the Past, Present and Future

is the spectral density matrix of the measurements. It follows from (91) that

<sup>1</sup> () ( ) *H s C sI A KC K OE*

*Example 5.* Consider a scalar output estimation problem where *G*(*s*) = (s + 1)- 1, *Q* = 1, *R* =

Applying the bilinear transform yields *A* = −1, *B* = *C* = 1, for which the solution of (80) is *P* =

The Kalman-Bucy filter which produces state estimates *xt t* ˆ(|) and output estimates *yt t* ˆ(|) from the measurements *z*(*t*) = *y*(*t*) + *v*(*t*) at time *t* is summarised in Table 2. This filter minimises the variances of the state estimation error, *E xt* {( ( ) − *xt t xt* ˆ( | ))( ( ) − ˆ( | )) }*<sup>T</sup> xt t* =

When the model parameters and noise covariances are time-invariant, the gain is also timeinvariant and can be precalculated. The time-invariant filtering results are summarised in Table 3. In this stationary case, spectral factorisation is equivalent to solving a Riccati equation and the transfer function of the output estimation filter, *HOE*(*s*) = <sup>1</sup> *C sI A KC K* ( ) , is identical to that of the Wiener filter. It is not surprising that the Wiener and Kalman filters are equivalent since they are both derived by completing the square of

*P*(*t*) and the output estimation error, *E yt* {( ( ) − *y*ˆ( | ))( ( ) *t t yt* − ˆ( | )) }*<sup>T</sup> yt t* = *C*(*t*)*P*(*t*)*CT*(*t*).

"Mathematics consists in proving the most obvious thing in the least obvious way." *George Polya*

The Wiener filter (90) requires the spectral factor inverse, <sup>1</sup>

Substituting (93) into (90) yields

**3.5 Conclusion** 

the error covariance.

0.0001 and the Wiener filter transfer function is

(92) and using [*I* + *C*(*sI* − *A*)-1*K]-1* = *I* + *C*(*sI* − *A* + *KC*)*-1K* to obtain

which is identical to the minimum-variance output estimator (84).

0.0099. By substituting *K* = *PCTR-1* = 99 into (90), one obtains (95).

() () *<sup>H</sup> <sup>H</sup> s GQG s R* . (91)

1 1/ 2 1/ 2 () ( ) *s C sI A KR R .* (92*)* 

<sup>1</sup> 1/ 2 1/ 2 <sup>1</sup> ( )*s R R C sI A KC K* ( ) . (93)

, (94)

<sup>1</sup> *Hs s* ( ) 99( 100) . (95)


Table 2. Main results for time-varying output estimation.


Table 3. Main results for time-invariant output estimation.

<sup>&</sup>quot;There are two ways to do great mathematics. The first is to be smarter than everybody else. The second way is to be stupider than everybody else - but persistent." *Raoul Bott*

**3.7 Glossary** 

: *<sup>p</sup>* 

*A*(*t*), *B*(*t*), *C*(*t*), *D*(*t*)

*H* 

In addition to the terms listed in Section 1.6, the following have been used herein.

produces a *q*-element output signal.

*B*(*t*)*w*(*t*), *y*(*t*) = *C*(*t*)*x*(*t*) + *D*(*t*)*w*(*t*).

with the boundary condition (, ) *t t = I*.

*Ext yt* { ( ) | ( )} Conditional expectation, namely the estimate of *x*(*t*) given *y*(*t*). *xt t* ˆ(|) Conditional mean estimate of the state *x*(*t*) given data at time *t*. *xt t* (|) State estimation error which is defined by *xt t* (|) = *x*(*t*) – *xt t* ˆ(|) .

of a Riccati differential equation.

signals *w*(*t*) and *v*(*t*), respectively.

*H*(*s*) Transfer function matrix of the minimum-variance solution.

specialised for output estimation.

*A*, *B*, *C*, *D* Time-invariant state space matrices of appropriate dimension.

*P*(*t*) Time-varying error covariance, *i.e.*, { ( ) ( )} *<sup>T</sup> Extx t* , which is the solution

*Q* and *R* Time-invariant covariance matrices of the stationary stochastic

*P* Time-invariant error covariance which is the solution of an algebraic

*HOE*(*s*) Transfer function matrix of the minimum-variance solution

system

Adjoint of

*v*(*t*), respectively.

– *CT*(*t*), *BT*(*t*), *DT*(*t*)}. *E*{.} , *Ext* { ( )} Expectation operator, expected value of *x*(*t*).

*K*(*t*) Time-varying filter gain matrix.

*K* Time-invariant filter gain matrix.

Riccati equation.

*G*(*s*) Transfer function matrix of the signal model.

"A mathematician is a device for turning coffee into theorems." *Paul Erdos*

*O* Observability matrix. *SNR* Signal to noise ratio.

*H* Hamiltonian matrix.

( ,0) *<sup>t</sup>* State transition matrix which satisfies ( ,0) *<sup>t</sup><sup>=</sup>*

*<sup>q</sup>* A linear system that operates on a *p*-element input signal and

*Q*(*t*) and *R*(*t*) Covariance matrices of the nonstationary stochastic signals *w*(*t*)and

Time-varying state space matrices of appropriate dimension. The

parameters {*A*(*t*), *B*(*t*), *C*(*t*), *D*(*t*)} is a system parameterised by {– *AT*(*t*),

is assumed to have the realisation *x t* ( ) = *A*(*t*)*x*(*t*) +

. The adjoint of a system having the state-space

*d t*( ,0) *dt* 

 *= At t* ( ) ( ,0)

#### **3.6 Problems**

**Problem 1.** Show that *xt Atxt* () () () has the solution *x*(*t*) = ( ,0) (0) *t x* where ( ,0) *t* = *At t* ( ) ( ,0) and (, ) *t t* = I. Hint: use the approach of [13] and integrate both sides of *xt Atxt* () () () .

**Problem 2.** Given that:

(i) the Lyapunov differential equation for the system *x t* ( ) = *F*(*t*)*x*(*t*) + *G*(*t*)*w*(*t*) is { ( ) ( )} *<sup>d</sup> <sup>T</sup> Extx t dt* ( ) { ( ) ( )} *<sup>T</sup> <sup>A</sup> tExtx t* + { ( ) ( )} ( ) *T T Extx t F t* + () () () *<sup>T</sup> GtQtG t* ;

(ii) the Kalman filter for the system *xt Atxt* () () () + *B*(*t*)*w*(*t*), z(t) = *C*(*t*)*x*(*t*) + *v*(*t*) has the structure *xt t Atxt t Kt zt Ctxt t* ˆ( | ) ( ) ( | ) ( )( ( ) ( ) ( | )) ˆ ˆ ;

write a Riccati differential equation for the evolution of the state error covariance and determine the optimal gain matrix *K*(*t*).

**Problem 3.** Derive the Riccati differential equation for the model *xt Atxt* () () () + *B*(*t*)*w*(*t*), *z*(*t*) = *C*(*t*)*x*(*t*) + *v*(*t*) with *E*{*w*(*t*)*wT*(*τ*)} = *Q*(*t*)δ(*t* − *τ*), *E*{*v*(*t*)*vT*(*τ*)} = *R(t*)δ(*t* − *τ*) and *E*{*w*(*t*)*vT*(*τ*)} = *S(t*)δ(*t* − *τ*). Hint: consider *xt Atxt* () () () + *B*(*t*)*w*(*t*) + *B*(*t*)*S*(*t*)*R-1*(*t*)(*z*(*t*) − *C*(*t*)*x*(*t*) − *v*(*t*)).

**Problem 4.** For output estimation problems with *B* = *C* = *R* = 1, calculate the algebraic Riccati equation solution, filter gain and transfer function for the following.


**Problem 5.** Prove the Kalman-Yakubovich-Popov Lemma for the case of

$$E\left\{\begin{bmatrix}\boldsymbol{w}(t)\\\boldsymbol{v}(t)\end{bmatrix}\begin{bmatrix}\boldsymbol{w}^{\boldsymbol{T}}(\boldsymbol{\tau})&\boldsymbol{v}^{\boldsymbol{T}}(\boldsymbol{\tau})\end{bmatrix}\right\}=\begin{bmatrix}\boldsymbol{Q}&\boldsymbol{S}\\\boldsymbol{S}^{\boldsymbol{T}}&\boldsymbol{R}\end{bmatrix}\delta(t-\boldsymbol{\tau})\text{, i.e., show}$$

$$\Delta\boldsymbol{\Delta}^{\boldsymbol{H}}(\boldsymbol{\mathbf{s}})=\begin{bmatrix}\boldsymbol{C}(\boldsymbol{s}\boldsymbol{I}-\boldsymbol{A})^{\boldsymbol{-1}}&\boldsymbol{I}\end{bmatrix}\begin{bmatrix}\boldsymbol{Q}&\boldsymbol{S}\\\boldsymbol{S}&\boldsymbol{R}\end{bmatrix}\begin{bmatrix}\left(\boldsymbol{-\operatorname{s}}\boldsymbol{I}-\boldsymbol{A}^{\boldsymbol{T}}\right)^{\boldsymbol{-1}}\boldsymbol{C}^{\boldsymbol{T}}\\\boldsymbol{I}\end{bmatrix}.$$

**Problem 6.** Derive a state space formulation for minimum-mean-square-error equaliser using <sup>1</sup> 1/ 2 1/ 2 <sup>1</sup> ( )*s R R C sI A KC K* ( ) .

<sup>&</sup>quot;Mathematics is a game played according to certain simple rules with meaningless marks on paper." *David Hilbert*

#### **3.7 Glossary**

Smoothing, Filtering and Prediction:

<sup>70</sup> Estimating the Past, Present and Future

**Problem 1.** Show that *xt Atxt* () () () has the solution *x*(*t*) = ( ,0) (0) *t x* where ( ,0) *t* = *At t* ( ) ( ,0) and (, ) *t t* = I. Hint: use the approach of [13] and integrate both sides of

(i) the Lyapunov differential equation for the system *x t* ( ) = *F*(*t*)*x*(*t*) + *G*(*t*)*w*(*t*) is

*dt* ( ) { ( ) ( )} *<sup>T</sup> <sup>A</sup> tExtx t* + { ( ) ( )} ( ) *T T Extx t F t* + () () () *<sup>T</sup> GtQtG t* ; (ii) the Kalman filter for the system *xt Atxt* () () () + *B*(*t*)*w*(*t*), z(t) = *C*(*t*)*x*(*t*) + *v*(*t*) has

write a Riccati differential equation for the evolution of the state error covariance and

**Problem 3.** Derive the Riccati differential equation for the model *xt Atxt* () () () + *B*(*t*)*w*(*t*), *z*(*t*) = *C*(*t*)*x*(*t*) + *v*(*t*) with *E*{*w*(*t*)*wT*(*τ*)} = *Q*(*t*)δ(*t* − *τ*), *E*{*v*(*t*)*vT*(*τ*)} = *R(t*)δ(*t* − *τ*) and *E*{*w*(*t*)*vT*(*τ*)} = *S(t*)δ(*t* − *τ*). Hint: consider *xt Atxt* () () () + *B*(*t*)*w*(*t*) + *B*(*t*)*S*(*t*)*R-1*(*t*)(*z*(*t*) − *C*(*t*)*x*(*t*) − *v*(*t*)).

**Problem 4.** For output estimation problems with *B* = *C* = *R* = 1, calculate the algebraic

, *i.e.*, show

*.* 

1

**Problem 6.** Derive a state space formulation for minimum-mean-square-error equaliser

"Mathematics is a game played according to certain simple rules with meaningless marks on paper."

*T T*

the structure *xt t Atxt t Kt zt Ctxt t* ˆ( | ) ( ) ( | ) ( )( ( ) ( ) ( | )) ˆ ˆ ;

Riccati equation solution, filter gain and transfer function for the following. (a) A = −1 and *Q* = 8. (b) A = −2 and *Q* = 12. (c) A = −3 and *Q* = 16. (d) A = −4 and *Q* = 20. (e) A = −5 and *Q* = 24. (f) A = −6 and *Q* = 28. (g) A = −7 and *Q* = 32. (h) A = −8 and *Q* = 36. (i) A = −9 and *Q* = 40. (j) A = −10 and *Q* = 44. **Problem 5.** Prove the Kalman-Yakubovich-Popov Lemma for the case of

> 

*S R I*

**3.6 Problems** 

*xt Atxt* () () () .

**Problem 2.** Given that:

{ ( ) ( )} *<sup>d</sup> <sup>T</sup> Extx t*

determine the optimal gain matrix *K*(*t*).

( ) () () ( ) ( )

<sup>1</sup> ( ) () ( )

using <sup>1</sup> 1/ 2 1/ 2 <sup>1</sup> ( )*s R R C sI A KC K* ( ) .

*<sup>H</sup> Q S sI A C s C sI A I*

*T*

*T T*

*David Hilbert*

*w t Q S E wv <sup>t</sup> v t S R*

In addition to the terms listed in Section 1.6, the following have been used herein.


<sup>&</sup>quot;A mathematician is a device for turning coffee into theorems." *Paul Erdos*

*Mathematics*, Springer, 2001.

*Control*, vol. 6, no. 4, pp. 681 – 697, 1968.

*on Automatic Control*, vol. 24, no. 6, pp. 913 – 921, 1979.

vol. 31, no. 7, pp. 651 – 654, 1986.

1810, 2006.

[22] T. Kaczorek, "Cayley-Hamilton Theorem" in Hazewinkel, Michiel, *Encyclopedia of* 

[23] W. M. Wonham, "On a Matrix Riccati Equation of Stochastic Control", *SIAM Journal on* 

[24] M. – A. Poubelle, I. R. Petersen, M. R. Gevers and R. R. Bitmead, "A Miscellany of Results on an Equation of Count J. F. Riccati", *IEEE Transactions on Automatic Control*,

[25] A. J. Laub, "A Schur Method for Solving Algebraic Riccati Equations", *IEEE Transactions* 

[26] S. V. Gusev and A. L. Likhtarnikov, "Kalman-Popov-Yakubovich Lemma and the Sprocedure: A Historical Essay", *Automation and Remote Control*, vol. 67, no. 11, pp. 1768 –

"Mathematics are like Frenchmen: whatever you say to them they translate into their own language,

and forthwith it is something entirely different." *Johann Wolfgang von Goethe*

#### **3.8 References**


<sup>&</sup>quot;But mathematics is the sister, as well as the servant, of the arts and is touched with the same madness and genius." *Harold Marston Morse*

Smoothing, Filtering and Prediction:

<sup>72</sup> Estimating the Past, Present and Future

[1] R. W. Bass, "Some reminiscences of control theory and system theory in the period 1955 – 1960: Introduction of Dr. Rudolf E. Kalman, *Real Time*, Spring/Summer Issue, The

[2] M. S. Grewal and A. P. Andrews, "Applications of Kalman Filtering in Areospace 1960 to the Present", *IEEE Control Systems Magazine*, vol. 30, no. 3, pp. 69 – 78, June 2010. [3] R. E. Kalman, "A New Approach to Linear Filtering and Prediction Problems", *Transactions of the ASME, Series D, Journal of Basic Engineering*, vol 82, pp. 35 – 45, 1960. [4] R. E. Kalman and R. S. Bucy, "New results in linear filtering and prediction theory", *Transactions of the ASME, Series D, Journal of Basic Engineering*, vol 83, pp. 95 – 107, 1961. [5] R. E. Kalman, "New Methods in Wiener Filtering Theory", *Proc. First Symposium on Engineering Applications of Random Function Theory and Probability*, Wiley, New York, pp.

[6] R. S. Bucy, "Global Theory of the Riccati Equation", *Journal of Computer and System* 

[7] A. H. Jazwinski, *Stochastic Processes and Filtering Theory*, Academic Press, Inc., New

[8] A. P. Sage and J. L. Melsa, *Estimation Theory with Applications to Communications and* 

[12] R. G. Brown and P. Y. C. Hwang, *Introduction to Random Signals and Applied Kalman* 

[13] P. A. Ruymgaart and T. T. Soong, *Mathematics of Kalman-Bucy Filtering*, Second Edition,

[14] M. S. Grewal and A. P. Andrews, *Kalman Filtering, Theory and Practice*, Prentice-Hall,

[15] T. Kailath, A. H. Sayed and B. Hassibi, *Linear Estimation*, Prentice-Hall, Upper Saddle

[16] D. Simon, *Optimal State Estimation, Kalman H∞ and Nonlinear Approaches*, John Wiley &

[17] F. L. Lewis, L. Xie and D. Popa, *Optimal and Robust Estimation With an Introduction to Stochastic Control Theory*, Second Edition, CRC Press, Taylor & Francis Group, 2008. [18] T. Söderström, *Discrete-time Stochastic Systems: estimation and control*, Springer-Verlag

[19] M. – A. Poubelle, R. R. Bitmead and M. R. Gevers, "Fake Algebraic Riccati Techniques and Stability", *IEEE Transactions on Automatic Control*, vol. 33, no. 4, pp. 379 – 381, 1988. [20] G. Freiling, V. Ionescu, H. Abou-Kandil and G. Jank, *Matrix Riccati Equations in Control* 

[21] K. Ogata, *Matlab for Control Engineers*, Pearson Prentice Hall, Upper Saddle River, New

"But mathematics is the sister, as well as the servant, of the arts and is touched with the same madness

[9] A. Gelb, *Applied Optimal Estimation*, The Analytic Sciences Corporation, USA, 1974. [10] T. Kailath, *Linear Systems*, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1980. [11] H. W. Knobloch and H. K. Kwakernaak, *Lineare Kontrolltheorie*, Springer-Verlag, Berlin,

**3.8 References** 

270 – 388, 1963.

York, 1970.

1980.

University of Alabama in Huntsville, 2002.

*Sciences*, vol. 1, pp. 349 – 361, 1967.

*Control*, McGraw-Hill Book Company, New York, 1971.

*Filtering*, John Wiley & Sons, Inc., USA, 1983.

Inc., Englewood Cliffs, New Jersey, 1993.

Sons, Inc., Hoboken, New Jersey, 2006.

*and Systems Theory*, Birkhauser, Boston, 2003.

Springer-Verlag, Berlin, 1988.

River, New Jersey, 2000.

London Ltd., 2002.

Jersey, 2008.

M.

and genius." *Harold Marston Morse*


<sup>&</sup>quot;Mathematics are like Frenchmen: whatever you say to them they translate into their own language, and forthwith it is something entirely different." *Johann Wolfgang von Goethe*

Chapter title

Author Name

### **Discrete-Time Minimum-Variance Prediction and Filtering**

#### **4.1 Introduction**

Kalman filters are employed wherever it is desired to recover data from the noise in an optimal way, such as satellite orbit estimation, aircraft guidance, radar, communication systems, navigation, medical diagnosis and finance. Continuous-time problems that possess differential equations may be easier to describe in a state-space framework, however, the filters have higher implementation costs because an additional integration step and higher sampling rates are required. Conversely, although discrete-time state-space models may be less intuitive, the ensuing filter difference equations can be realised immediately.

Discrete-Time Minimum-Variance Prediction and Filtering 75

The discrete-time Kalman filter calculates predicted states via the linear recursion

$$
\hat{\mathfrak{X}}\_{k+1/k} = A\_k \hat{\mathfrak{x}}\_{k-1/k} + \mathcal{K}\_k (z\_k - \mathcal{C}\_k \hat{\mathfrak{x}}\_{k-1/k}) \; ,
$$

where the predictor gain, *Kk*, is a function of the noise statistics and the model parameters. The above formula was reported by Rudolf E. Kalman in the 1960s [1], [2]. He has since received many awards and prizes, including the National Medal of Science, which was presented to him by President Barack Obama in 2009.

The Kalman filter calculations are simple and well-established. A possibly troublesome obstacle is expressing problems at hand within a state-space framework. This chapter derives the main discrete-time results to provide familiarity with state-space techniques and filter application. The continuous-time and discrete-time minimum-square-error Wiener filters were derived using a completing-the-square approach in Chapters 1 and 2, respectively. Similarly for time-varying continuous-time signal models, the derivation of the minimum-variance Kalman filter, presented in Chapter 3, relied on a least-mean-square (or conditional-mean) formula. This formula is used again in the solution of the discrete-time prediction and filtering problems. Predictions can be used when the measurements are irregularly spaced or missing at the cost of increased mean-square-error.

This chapter develops the prediction and filtering results for the case where the problem is nonstationary or time-varying. It is routinely assumed that the process and measurement noises are zero mean and uncorrelated. Nonzero mean cases can be accommodated by including deterministic inputs within the state prediction and filter output updates. Correlated noises can be handled by adding a term within the predictor gain and the underlying Riccati equation. The same approach is employed when the signal model

<sup>&</sup>quot;Man will occasionally stumble over the truth, but most of the time he will pick himself up and continue on." *Winston Leonard Spencer-Churchill*

**4.3 The State Prediction Problem** 

error residual *ek*/*k*-1 is minimised.

*wk yk*

Consider a stochastic vector [ ]*<sup>T</sup> T T*

= *yk* – / 1 ˆ *k k y* , is minimised. This problem is depicted in Fig. 2

 

**4.4 The Discrete-time Conditional Mean Estimate** 

design a predictor

Suppose that observations of (5) are available, that is,

where *vk* is a white measurement noise process with

∑

*vk*

Figure 2. The state prediction problem. The objective is to design a predictor

*zk*

operates on the measurements and produces state estimates such that the variance of the

It is noted above for the state recursion (4), there is a one-step delay between the current state and the input process. Similarly, it is expected that there will be one-step delay between the current state estimate and the input measurement. Consequently, it is customary to denote / 1 ˆ *k k x* as the state estimate at time *k*, given measurements at time *k –* 1. The / 1 ˆ *k k x* is also known as the one-step-ahead state prediction. The objective here is to

/ 1 ˆ *k k y* = / 1 ˆ *C xk kk* , of *yk* = *Ckyk*, so that the covariance, /1/1 { } *<sup>T</sup> Ee e kk kk* , of the error residual, *ek*/*k*-1

The predictor derivation that follows relies on the discrete-time version of the conditionalmean or least-mean-square estimate derived in Chapter 3, which is set out as follows.

*k k*

"Prediction is very difficult, especially if it's about the future." *Niels Henrik David Bohr*

*E* 

*k k* having means and covariances

*zk* = *yk* + *vk*, (6)

∑

/ 1 ˆ *k k y*

/ 1 / 1 ˆ *kk k kk e yy*

which

*E*{*vk*} = 0, { }*<sup>T</sup> Evvj <sup>k</sup>* = *Rkδjk* and { }*<sup>T</sup> Ewvj <sup>k</sup>* =0. (7)

that operates on the measurements *zk* and produces an estimate,

(8)

possesses a direct-feedthrough term. A simplification of the generalised regulator problem from control theory is presented, from which the solutions of output estimation, input estimation (or equalisation), state estimation and mixed filtering problems follow immediately.

Figure 1. The discrete-time system operates on the input signal *wk <sup>m</sup>* and produces the output *yk <sup>p</sup>* .

#### **4.2 The Time-varying Signal Model**

A discrete-time time-varying system : *<sup>m</sup>* → *<sup>p</sup>* is assumed to have the state-space representation

$$\mathbf{x}\_{k\*1} = A\_k \mathbf{x}\_k + B\_k \mathbf{w}\_{k'} \tag{1}$$

$$\mathbf{y}\_k = \mathbf{C}\_k \mathbf{x}\_k + D\_k \mathbf{w}\_k \tag{2}$$

where *Ak n n* , *Bk n m* , *Ck <sup>p</sup> <sup>n</sup>* and *Dk <sup>p</sup> <sup>p</sup>* over a finite interval *k* [0, *N*]. The *wk* is a stochastic white process with

$$E\{w\_k\} \ = 0, \ E\{w\_j w\_k^T\} \ = Q\_k \delta\_{jk\ \nu} \tag{3}$$

in which 1 if 0 if *jk j k j k* is the Kronecker delta function. This system is depicted in Fig. 1,

in which *z-1* is the unit delay operator. It is interesting to note that, at time *k* the current state

$$\mathbf{x}\_k = A\_{k \cdot 1} \mathbf{x}\_{k \cdot 1} + B\_{k \cdot 1} w\_{k \cdot 1} \tag{4}$$

does not involve *wk*. That is, unlike continuous-time systems, here there is a one-step delay between the input and output sequences. The simpler case of *Dk* = 0, namely,

$$y\_k = \mathbb{C}\_k \mathfrak{x}\_{k\nu} \tag{5}$$

is again considered prior to the inclusion of a nonzero *Dk*.

<sup>&</sup>quot;Rudy Kalman applied the state-space model to the filtering problem, basically the same problem discussed by Wiener. The results were astonishing. The solution was recursive, and the fact that the estimates could use only the past of the observations posed no difficulties." *Jan. C. Willems*

#### **4.3 The State Prediction Problem**

Smoothing, Filtering and Prediction:

operates on the input signal *wk <sup>m</sup>* and produces

*Dk*

: *<sup>m</sup>* → *<sup>p</sup>* is assumed to have the state-space

, (3)

(1) (2)

<sup>76</sup> Estimating the Past, Present and Future

possesses a direct-feedthrough term. A simplification of the generalised regulator problem from control theory is presented, from which the solutions of output estimation, input estimation (or equalisation), state estimation and mixed filtering problems follow

> *z-1 Bk Ck* ∑

*wk xk+1 xk yk*

∑

*Ak*

*k kk k k* <sup>1</sup> *x Ax Bw* ,

*k kk k k y Cx Dw* ,

{ } *E wk* = 0, { }*<sup>T</sup> Ewwj <sup>k</sup>* = *Qk jk*

between the input and output sequences. The simpler case of *Dk* = 0, namely,

estimates could use only the past of the observations posed no difficulties." *Jan. C. Willems*

where *Ak n n* , *Bk n m* , *Ck <sup>p</sup> <sup>n</sup>* and *Dk <sup>p</sup> <sup>p</sup>* over a finite interval *k* [0, *N*]. The

in which *z-1* is the unit delay operator. It is interesting to note that, at time *k* the current state

does not involve *wk*. That is, unlike continuous-time systems, here there is a one-step delay

"Rudy Kalman applied the state-space model to the filtering problem, basically the same problem discussed by Wiener. The results were astonishing. The solution was recursive, and the fact that the

is the Kronecker delta function. This system is depicted in Fig. 1,

*xk* = *Ak*-1*xk*-1 + *Bk*-1*wk*-1, (4)

*yk* = *Ckxk*, (5)

immediately.

Figure 1. The discrete-time system

**4.2 The Time-varying Signal Model**  A discrete-time time-varying system

*wk* is a stochastic white process with

1 if 0 if *jk*

*j k j k*

is again considered prior to the inclusion of a nonzero *Dk*.

the output *yk <sup>p</sup>* .

representation

in which

Suppose that observations of (5) are available, that is,

$$z\_k = y\_k + \upsilon\_{k\nu} \tag{6}$$

where *vk* is a white measurement noise process with

$$E\{\upsilon\_{k}\} = 0, \ E\{\upsilon\_{j}\upsilon\_{k}^{T}\} = R\_{k}\delta\_{jk} \text{ and } \ E\{\upsilon\_{j}\upsilon\_{k}^{T}\} = 0. \tag{7}$$

Figure 2. The state prediction problem. The objective is to design a predictor which operates on the measurements and produces state estimates such that the variance of the error residual *ek*/*k*-1 is minimised.

It is noted above for the state recursion (4), there is a one-step delay between the current state and the input process. Similarly, it is expected that there will be one-step delay between the current state estimate and the input measurement. Consequently, it is customary to denote / 1 ˆ *k k x* as the state estimate at time *k*, given measurements at time *k –* 1. The / 1 ˆ *k k x* is also known as the one-step-ahead state prediction. The objective here is to design a predictor that operates on the measurements *zk* and produces an estimate, / 1 ˆ *k k y* = / 1 ˆ *C xk kk* , of *yk* = *Ckyk*, so that the covariance, /1/1 { } *<sup>T</sup> Ee e kk kk* , of the error residual, *ek*/*k*-1 = *yk* – / 1 ˆ *k k y* , is minimised. This problem is depicted in Fig. 2

#### **4.4 The Discrete-time Conditional Mean Estimate**

The predictor derivation that follows relies on the discrete-time version of the conditionalmean or least-mean-square estimate derived in Chapter 3, which is set out as follows. Consider a stochastic vector [ ]*<sup>T</sup> T T k k* having means and covariances

$$E\left\{ \begin{bmatrix} \alpha\_k \\ \beta\_k \end{bmatrix} \right\} = \left[ \begin{.} \overline{\alpha} \\ \overline{\beta} \end{.} \right] \tag{8}$$

<sup>&</sup>quot;Prediction is very difficult, especially if it's about the future." *Niels Henrik David Bohr*

given measurements *zk*.

Σ 

*kz*

*─*

*for all k [0, N].* 

*and therefore* 

*Lemma 1: Suppose that* 0/0 *x = x* ˆ *0, then* 

/ 1 ˆ *k k kk z Cx*

/1 /1 ˆ ˆ *y Cx kk kkk*

**4.6 Design of the Predictor Gain** 

error covariance /1 /1 { } *<sup>T</sup> Ex x kk kk* .

*over [0, N], then the predictor gain* 

*within (12) minimises Pk k*/ 1 = /1 /1 { } *<sup>T</sup> Ex x kk kk* .

*solutions Pk k*/ 1 *=* / 1

*induction argument. Subtracting (12) from (1) gives* 

Figure 3. The optimal one-step-ahead predictor which produces estimates 1/ ˆ *k k x* of xk+1

Σ

*Kk*

Let *k k*/ 1 *x* = *xk* – / 1 ˆ *k k x* denote the state prediction error. It is shown below that the expectation of the prediction error is zero, that is, the predicted state estimate is unbiased.

*Proof: The condition* 0/0 *x = x* ˆ *0 is equivalent to* 0/0 *x = 0, which is the initialisation step for an* 

*From assumptions (3) and (7), the last two terms of the right-hand-side of (15) are zero. Thus, (13) follows by induction. �* 

It is shown below that the optimum predictor gain is that which minimises the prediction

*Lemma 2: In respect of the estimation problem defined by (1), (3), (5) - (7), suppose there exist* 

"When it comes to the future, there are three kinds of people: those who let it happen, those who make

, (16)

1

*<sup>T</sup> Pk k ≥ 0 to the Riccati difference equation* 

1/ / 1 / 1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A BQB AP C C P C R C P A k k k kk k k k k k kk k k kk k k k kk k*

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> K AP C CP C R k k kk k k kk k k*

it happen, and those who wondered what happened." *John M. Richardson Jr.*

1/ { } *E xk k = 0* (13)

*Ak*

*z-1*

1/ ˆ*k k x*

*Ck*

/ 1 ˆ*k k x*

1

, (17)

1/ / 1 ( ) *k k k k k kk k k k k x A KC x Bw Kv* (14)

1/ / 1 { } ( ){ } { } { } *Ex A KC Ex BEw KEv k k k k k kk k k k k* . (15)

and

$$E\left\{ \begin{bmatrix} \alpha\_{k} \\ \beta\_{k} \end{bmatrix} \begin{bmatrix} \alpha\_{k}^{\top} & \beta\_{k}^{\top} \end{bmatrix} \right\} = \begin{bmatrix} \Sigma\_{\alpha\_{k}\alpha\_{k}} & \Sigma\_{\alpha\_{k}\beta\_{k}} \\ \Sigma\_{\beta\_{k}\alpha\_{k}} & \Sigma\_{\beta\_{k}\beta\_{k}} \end{bmatrix}. \tag{9}$$

respectively, where *k k k k <sup>T</sup>* . An estimate of *<sup>k</sup>* given *<sup>k</sup>* , denoted by {|} *E k k* , which minimises ( *E <sup>k</sup>* − { | })( *E kk k* − { | }) *<sup>T</sup> E k k* , is given by

$$E\{\boldsymbol{\alpha}\_{k} \mid \boldsymbol{\beta}\_{k}\} = \overline{\boldsymbol{\alpha}} + \boldsymbol{\Sigma}\_{\boldsymbol{\alpha}\_{k}\boldsymbol{\beta}\_{k}} \boldsymbol{\Sigma}\_{\boldsymbol{\beta}\_{k}\boldsymbol{\beta}\_{k}}^{-1} (\boldsymbol{\beta}\_{k} - \overline{\boldsymbol{\beta}}) \tag{10}$$

The above formula is developed in [3] and established for Gaussian distributions in [4]. A derivation is requested in the problems. If *αk* and *βk* are scalars then (10) degenerates to the linear regression formula as is demonstrated below.

*Example 1 (Linear regression [5]).* The least-squares estimate ˆ *<sup>k</sup>* = *<sup>k</sup> a* + *b* of *<sup>k</sup>* given data *αk*, *βk* over [1, *N*], can be found by minimising the performance objective *J* = 1 <sup>1</sup> ( *N <sup>k</sup> N <sup>k</sup>* –

$$\left(\widehat{\alpha}\right)^{2} = \frac{1}{N} \sum\_{k=1}^{N} \left(a\_{k} - a\beta\_{k} - b\right)^{2}. \text{ Setting } \frac{dJ}{db} = 0 \text{ yields } b = \overline{\alpha} - a\overline{\beta}. \text{ Setting } \frac{dJ}{da} = 0, \text{ substituting } \frac{dJ}{da} = 0, \text{ resulting in the above problem is to compute the system.}$$

for *b* and using the definitions (8) – (9), results in *a* = <sup>1</sup> *kk kk* .

#### **4.5 Minimum-Variance Prediction**

It follows from (1), (6), together with the assumptions *E*{*wk*} = 0, *E*{*vk*} = 0, that *E*{*xk*+1} = *E*{*Akxk*} and *E*{*zk*} = *E*{*Ckxk*}. It is assumed that similar results hold in the case of predicted state estimates, that is,

$$E\left\{ \begin{bmatrix} \hat{\mathbf{x}}\_{k+1} \\ \mathbf{z}\_k \end{bmatrix} \right\} = \begin{bmatrix} A\_k \hat{\mathbf{x}}\_{k/k-1} \\ \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k-1} \end{bmatrix}. \tag{11}$$

Substituting (11) into (10) and denoting 1/ ˆ *k k x* = <sup>1</sup> { |} ˆ *Ex z k k* yields the predicted state

$$
\hat{\mathbf{x}}\_{k+1/k} = \mathbf{A}\_k \hat{\mathbf{x}}\_{k/k-1} + \mathbf{K}\_k (\mathbf{z}\_k - \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k-1}) \, \tag{12}
$$

where *Kk* <sup>1</sup> <sup>1</sup> { }{ } ˆ *T T Ex z Ezz k k kk* is known as the predictor gain, which is designed in the next section. Thus, the optimal one-step-ahead predictor follows immediately from the leastmean-square (or conditional mean) formula. A more detailed derivation appears in [4]. The structure of the optimal predictor is shown in Fig. 3. It can be seen from the figure that produces estimates / 1 ˆ *k k y* = / 1 ˆ *C xk kk* from the measurements *zk*.

<sup>&</sup>quot;I admired Bohr very much. We had long talks together, long talks in which Bohr did practically all the talking." *Paul Adrien Maurice Dirac*

Figure 3. The optimal one-step-ahead predictor which produces estimates 1/ ˆ *k k x* of xk+1 given measurements *zk*.

Let *k k*/ 1 *x* = *xk* – / 1 ˆ *k k x* denote the state prediction error. It is shown below that the expectation of the prediction error is zero, that is, the predicted state estimate is unbiased.

*Lemma 1: Suppose that* 0/0 *x = x* ˆ *0, then* 

$$E\{\tilde{\mathfrak{x}}\_{k+1/k}\}\_{} = 0 \tag{13}$$

*for all k [0, N].* 

Smoothing, Filtering and Prediction:

. (9)

 *k k* 

*<sup>k</sup>* given data

1 <sup>1</sup> ( *N <sup>k</sup> N <sup>k</sup>* 

 –

*da* = 0, substituting

, which

*<sup>k</sup>* , denoted by {|} *E*

<sup>78</sup> Estimating the Past, Present and Future

*k T T k k*

 

*k*

 

 *k k* 

linear regression formula as is demonstrated below.

*Example 1 (Linear regression [5]).* The least-squares estimate ˆ

– <sup>2</sup> *<sup>b</sup>*) . Setting *dJ*

for *b* and using the definitions (8) – (9), results in *a* = <sup>1</sup>

*E*

produces estimates / 1 ˆ *k k y* = / 1 ˆ *C xk kk* from the measurements *zk*.

*E*

*<sup>T</sup>* 

 *<sup>k</sup>* − { | })( *E kk k* 

respectively, where *k k k k*

*k k k k k k k k*

*<sup>k</sup>* given

*k k* , is given by

 

 

. (10)

*<sup>k</sup>* =

 *a*

.

1/ / 1 / 1 ˆ ˆ ( ˆ ) *k k k kk k k k kk x Ax K z Cx* , (12)

is known as the predictor gain, which is designed in the next

 *kk kk* 

*<sup>k</sup> a* + *b* of

. Setting *dJ*

. (11)

. An estimate of

 − { | }) *<sup>T</sup> E* 

<sup>1</sup> {|} ( ) *kk kk E*

*αk*, *βk* over [1, *N*], can be found by minimising the performance objective *J* =

 *k*

The above formula is developed in [3] and established for Gaussian distributions in [4]. A derivation is requested in the problems. If *αk* and *βk* are scalars then (10) degenerates to the

*db* = 0 yields *b*<sup>=</sup>

It follows from (1), (6), together with the assumptions *E*{*wk*} = 0, *E*{*vk*} = 0, that *E*{*xk*+1} = *E*{*Akxk*} and *E*{*zk*} = *E*{*Ckxk*}. It is assumed that similar results hold in the case of predicted

/ 1

1 / 1

Substituting (11) into (10) and denoting 1/ ˆ *k k x* = <sup>1</sup> { |} ˆ *Ex z k k* yields the predicted state

ˆ *k k kk k k kk x Ax*

section. Thus, the optimal one-step-ahead predictor follows immediately from the leastmean-square (or conditional mean) formula. A more detailed derivation appears in [4]. The structure of the optimal predictor is shown in Fig. 3. It can be seen from the figure that

"I admired Bohr very much. We had long talks together, long talks in which Bohr did practically all the

ˆ ˆ

 

*z Cx* 

and

minimises ( *E*

2 ˆ) =

1 <sup>1</sup> ( *N <sup>k</sup> N <sup>k</sup>* 

– *<sup>k</sup> <sup>a</sup>*

**4.5 Minimum-Variance Prediction** 

state estimates, that is,

where *Kk* <sup>1</sup>

talking." *Paul Adrien Maurice Dirac*

<sup>1</sup> { }{ } ˆ *T T Ex z Ezz k k kk*

*Proof: The condition* 0/0 *x = x* ˆ *0 is equivalent to* 0/0 *x = 0, which is the initialisation step for an induction argument. Subtracting (12) from (1) gives* 

$$
\tilde{\mathbf{x}}\_{k+1/k} = (\mathbf{A}\_k - \mathbf{K}\_k \mathbf{C}\_k) \tilde{\mathbf{x}}\_{k/k-1} + \mathbf{B}\_k \mathbf{w}\_k - \mathbf{K}\_k \mathbf{w}\_k \tag{14}
$$

*and therefore* 

$$E\{\tilde{\mathbf{x}}\_{k+1/k}\} = (A\_k - K\_k \mathbf{C}\_k) E\{\tilde{\mathbf{x}}\_{k/k-1}\} + B\_k E\{w\_k\} - K\_k E\{v\_k\} \,. \tag{15}$$

*From assumptions (3) and (7), the last two terms of the right-hand-side of (15) are zero. Thus, (13) follows by induction. �* 

#### **4.6 Design of the Predictor Gain**

It is shown below that the optimum predictor gain is that which minimises the prediction error covariance /1 /1 { } *<sup>T</sup> Ex x kk kk* .

*Lemma 2: In respect of the estimation problem defined by (1), (3), (5) - (7), suppose there exist solutions Pk k*/ 1 *=* / 1 *<sup>T</sup> Pk k ≥ 0 to the Riccati difference equation* 

$$P\_{k+1/k} = A\_k P\_{k/k-1} A\_k^T + B\_k Q\_k B\_k^T - A\_k P\_{k/k-1} \mathbf{C}\_k^T (\mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^T + R\_k)^{-1} \mathbf{C}\_k P\_{k/k-1} A\_k^T \tag{16}$$

*over [0, N], then the predictor gain* 

$$K\_k = A\_k P\_{k/k-1} \mathbb{C}\_k^\top \left(\mathbb{C}\_k P\_{k/k-1} \mathbb{C}\_k^\top + R\_k\right)^{-1},\tag{17}$$

*within (12) minimises Pk k*/ 1 = /1 /1 { } *<sup>T</sup> Ex x kk kk* .

<sup>&</sup>quot;When it comes to the future, there are three kinds of people: those who let it happen, those who make it happen, and those who wondered what happened." *John M. Richardson Jr.*

*Proof: Constructing Pk k* 1/ 1/ 1/ { } *<sup>T</sup> Ex x k kk k using (3), (7), (14),* / 1 { }*<sup>T</sup> Ex w kk k = 0 and* / 1 { }*<sup>T</sup> Ex v kk k = 0 yields* 

$$P\_{k+1/k} = (A\_k - K\_k C\_k) P\_{k/k-1} (A\_k - K\_k C\_k)^T + B\_k Q\_k B\_k^T + K\_k R\_k K\_k^T,\tag{18}$$

*which can be rearranged to give* 

$$\boldsymbol{P}\_{k+1\mid k} = \boldsymbol{A}\_{k}\boldsymbol{P}\_{k\mid k-1}\boldsymbol{A}\_{k}^{\top} - \boldsymbol{A}\_{k}\boldsymbol{P}\_{k\mid k-1}\boldsymbol{C}\_{k}^{\top}(\boldsymbol{C}\_{k}\boldsymbol{P}\_{k\mid k-1}\boldsymbol{C}\_{k}^{\top} + \boldsymbol{R}\_{k})^{-1}\boldsymbol{C}\_{k}\boldsymbol{P}\_{k\mid k-1}\boldsymbol{A}\_{k}^{\top} + \boldsymbol{B}\_{k}\boldsymbol{Q}\_{k}\boldsymbol{B}\_{k}^{\top}$$

$$+ (\boldsymbol{K}\_{k} - \boldsymbol{A}\_{k}\boldsymbol{P}\_{k\mid k-1}\boldsymbol{C}\_{k}^{\top}(\boldsymbol{C}\_{k}\boldsymbol{P}\_{k\mid k-1}\boldsymbol{C}\_{k}^{\top} + \boldsymbol{R}\_{k})^{-1})(\boldsymbol{C}\_{k}\boldsymbol{P}\_{k\mid k-1}\boldsymbol{C}\_{k}^{\top} + \boldsymbol{R}\_{k}) \tag{19}$$

$$\times (\boldsymbol{K}\_{k} - \boldsymbol{A}\_{k}\boldsymbol{P}\_{k\mid k-1}\boldsymbol{C}\_{k}^{\top}(\boldsymbol{C}\_{k}\boldsymbol{P}\_{k\mid k-1}\boldsymbol{C}\_{k}^{\top} + \boldsymbol{R}\_{k})^{-1})^{\top}\,,$$

*By inspection of (19), the predictor gain (17) minimises Pk k* 1/ *. �* 

*which together with (21) yields* 

**4.8 Design of the Filter Gain** 

*over [0, N], then the filter gain* 

*which can be rearranged as*<sup>7</sup>

2, 0

*R* 

*k*

*within (21) minimises Pk k*/ / / { } *<sup>T</sup> Ex x kk kk .* 

/ /1 /1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> P P P C CP C R CP kk kk kk k k kk k k k kk* 

*solution Pk k*/ *=* /

*and* 

= 1,

0 *k*

*R*

*From (23) and the assumptions (3), (7), it follows that* 

/ / { } *<sup>T</sup> Ex x kk kk* , where *k k*/ *x* = *xk* – / ˆ *k k x* is the filter error.

*<sup>T</sup> Pk k ≥ 0 to the Riccati difference equation* 

/ /1 /1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> P P P C CP C R CP kk kk kk k k kk k k k kk*

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> L P C CP C R k kk k k kk k k*

1

 <sup>1</sup> <sup>1</sup> / 1 / 1 / 1 / 1 / 1 ( ( ) )( )( ( ) ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> L P C CP C R CP C R L P C CP C R k kk k k kk k k k kk k k k kk k k kk k k* (29)

the same state variable (possibly from different sensors), namely *Ak*, *Bk*, *Qk* , *Ck* =

(25). By applying Cramer's rule within (26) it can be found that the filter gain is given by

"A professor is one who can speak on any subject - for precisely fifty minutes." *Norbert Wiener*

*By inspection of (29), the filter gain (26) minimises Pk k*/ *. � Example 2 (Data Fusion).* Consider a filtering problem in which there are two measurements of

*Proof: Following the approach of [6], combining (4) - (6) results in zk = CkAk-1xk-1 + CkBk-1wk-1 + vk,* 

1 1 1 0 0/0 ( ) ( ){} *kk k I L C A I LC A E x .* (24)

*Hence, with the initial condition* 0/0 *x = x* ˆ *0,* / { } *E xk k = 0.* �

It is shown below that the optimum filter gain is that which minimises the covariance

*Lemma 4: In respect of the estimation problem defined by (1), (3), (5) - (7), suppose there exists a* 

*Proof: Subtracting* / ˆ *k k x from xk yields k k*/ *x = xk –* / ˆ *k k x = xk −* / 1 ˆ *k k x −* ( *L Cx k k + vk −* / 1 ˆ ) *Cxk k , that is,* 

/ 1 1/ 1 { }( ) { } *Ex I LC A Ex k k kk k k k*

/ 1 1/ 1 1 1 ( ) ( ) *k k kk k k k k k k k kk x I LC A x I LC B w Lv .* (23)

1

1

, (25)

, (26)

/ / 1 ( ) *k k k k kk k k x I LC x Lv* (27)

1 1 

and *Rk*

/ / 1 ( )( )*<sup>T</sup> <sup>T</sup> P I LC P I LC LRL k k k k kk k k k kk* , (28)

*,* with *R1,k*, *R2,k*  . Let *Pk/k-1* denote the solution of the Riccati difference equation

#### **4.7 Minimum-Variance Filtering**

It can be seen from (12) that the predicted state estimate / 1 ˆ *k k x* is calculated using the previous measurement *zk*-1 as opposed to the current data *zk*. A state estimate, given the data at time *k*, which is known as the filtered state, can similarly be obtained using the linear least squares or conditional-mean formula. In Lemma 1 it was shown that the predicted state estimate is unbiased. Therefore, it is assumed that the expected value of the filtered state equals the expected value of the predicted state, namely,

$$E\left\{ \begin{bmatrix} \hat{\mathbf{x}}\_{k/k} \\ \boldsymbol{z}\_{k} \end{bmatrix} \right\} = \begin{bmatrix} \hat{\mathbf{x}}\_{k/k-1} \\ \mathbf{C}\_{k} \hat{\mathbf{x}}\_{k/k-1} \end{bmatrix}. \tag{20}$$

Substituting (20) into (10) and denoting / ˆ *k k x* = {|} ˆ *Ex z k k* yields the filtered estimate

$$
\hat{\mathbf{x}}\_{k/k} = \hat{\mathbf{x}}\_{k/k-1} + L\_k (\mathbf{z}\_k - \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k-1}) \,\prime \tag{21}
$$

where *Lk* = <sup>1</sup> { }{ } ˆ *T T Exz Ezz kk kk* is known as the filter gain, which is designed subsequently. Let *k k*/ *x* = *xk* – / ˆ *k k x* denote the filtered state error. It is shown below that the expectation of the filtered error is zero, that is, the filtered state estimate is unbiased.

*Lemma 3: Suppose that* 0/0 *x = x* ˆ *0, then* 

$$E\{\tilde{\mathfrak{x}}\_{k/k}\} = \mathbf{0} \tag{22}$$

*for all k [0, N].* 

<sup>&</sup>quot;To be creative you have to contribute something different from what you've done before. Your results need not be original to the world; few results truly meet that criterion. In fact, most results are built on the work of others." *Lynne C. Levesque*

*Proof: Following the approach of [6], combining (4) - (6) results in zk = CkAk-1xk-1 + CkBk-1wk-1 + vk, which together with (21) yields* 

$$
\tilde{\mathbf{x}}\_{k/k} = (I - L\_k \mathbf{C}\_k) A\_{k-1} \tilde{\mathbf{x}}\_{k-1/k-1} + (I - L\_k \mathbf{C}\_k) B\_{k-1} w\_{k-1} - L\_k \mathbf{v}\_k \tag{23}
$$

*From (23) and the assumptions (3), (7), it follows that* 

$$\begin{split} E\{\tilde{\mathbf{x}}\_{k/k}\} &= (I - \mathbf{L}\_k \mathbf{C}\_k) A\_{k-1} E\{\tilde{\mathbf{x}}\_{k-1/k-1}\} \\ &= (I - \mathbf{L}\_k \mathbf{C}\_k) A\_{k-1} \cdots (I - \mathbf{L}\_1 \mathbf{C}\_1) A\_0 E\{\tilde{\mathbf{x}}\_{0/0}\} \,. \end{split} \tag{24}$$

*Hence, with the initial condition* 0/0 *x = x* ˆ *0,* / { } *E xk k = 0.* �

#### **4.8 Design of the Filter Gain**

It is shown below that the optimum filter gain is that which minimises the covariance / / { } *<sup>T</sup> Ex x kk kk* , where *k k*/ *x* = *xk* – / ˆ *k k x* is the filter error.

*Lemma 4: In respect of the estimation problem defined by (1), (3), (5) - (7), suppose there exists a solution Pk k*/ *=* / *<sup>T</sup> Pk k ≥ 0 to the Riccati difference equation* 

$$P\_{k/k} = P\_{k/k-1} - P\_{k/k-1} \mathbf{C}\_k^\top \left(\mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^\top + \mathbf{R}\_k\right)^{-1} \mathbf{C}\_k P\_{k/k-1} \tag{25}$$

*over [0, N], then the filter gain* 

$$L\_k = P\_{k/k-1} \mathbb{C}\_k^T \left( \mathbb{C}\_k P\_{k/k-1} \mathbb{C}\_k^T + R\_k \right)^{-1},\tag{26}$$

*within (21) minimises Pk k*/ / / { } *<sup>T</sup> Ex x kk kk .* 

*Proof: Subtracting* / ˆ *k k x from xk yields k k*/ *x = xk –* / ˆ *k k x = xk −* / 1 ˆ *k k x −* ( *L Cx k k + vk −* / 1 ˆ ) *Cxk k , that is,* 

$$
\tilde{\mathbf{x}}\_{k/k} = (I - L\_k \mathbf{C}\_k) \tilde{\mathbf{x}}\_{k/k - 1} - L\_k \mathbf{v}\_k \tag{27}
$$

*and* 

Smoothing, Filtering and Prediction:

. (20)

(19)

<sup>80</sup> Estimating the Past, Present and Future

*Proof: Constructing Pk k* 1/ 1/ 1/ { } *<sup>T</sup> Ex x k kk k using (3), (7), (14),* / 1 { }*<sup>T</sup> Ex w kk k = 0 and* / 1 { }*<sup>T</sup> Ex v kk k*

1/ / 1 / 1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP C C P C R C P A BQB k k k kk k k kk k k kk k k k kk k k k k*

/ 1 / 1 / 1 ( ( ) )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> K AP C CP C R CP C R k k kk k k kk k k k kk k k* 

/ / 1

Substituting (20) into (10) and denoting / ˆ *k k x* = {|} ˆ *Ex z k k* yields the filtered estimate

*k k k k k k kk*

 

*z Cx*

ˆ ˆ

*x x*

filtered error is zero, that is, the filtered state estimate is unbiased.

*By inspection of (19), the predictor gain (17) minimises Pk k* 1/ *. �* 

It can be seen from (12) that the predicted state estimate / 1 ˆ *k k x* is calculated using the previous measurement *zk*-1 as opposed to the current data *zk*. A state estimate, given the data at time *k*, which is known as the filtered state, can similarly be obtained using the linear least squares or conditional-mean formula. In Lemma 1 it was shown that the predicted state estimate is unbiased. Therefore, it is assumed that the expected value of the filtered state

/ 1

/ /1 / 1 ˆ ˆ ( ˆ ) *kk kk k k k kk x x L z Cx* , (21)

is known as the filter gain, which is designed subsequently. Let

/ { } *E xk k = 0* (22)

 

ˆ

*k k*/ *x* = *xk* – / ˆ *k k x* denote the filtered state error. It is shown below that the expectation of the

"To be creative you have to contribute something different from what you've done before. Your results need not be original to the world; few results truly meet that criterion. In fact, most results are built on

/ 1 / 1 ( ( ) ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> K AP C CP C R k k kk k k kk k k* ,

1/ / 1 ( )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> P A KC P A KC BQB K RK k k k k k kk k k k k k k k k k ,* (18)

1

*= 0 yields* 

*which can be rearranged to give* 

**4.7 Minimum-Variance Filtering** 

where *Lk* = <sup>1</sup> { }{ } ˆ *T T Exz Ezz kk kk*

*for all k [0, N].* 

*Lemma 3: Suppose that* 0/0 *x = x* ˆ *0, then* 

the work of others." *Lynne C. Levesque*

<sup>1</sup>

<sup>1</sup>

equals the expected value of the predicted state, namely,

*E*

$$P\_{k/k} = (I - L\_k \mathbf{C}\_k) P\_{k/k-1} (I - L\_k \mathbf{C}\_k)^T + L\_k R\_k L\_{k \text{ } \prime}^T \tag{28}$$

*which can be rearranged as*<sup>7</sup>

$$\begin{aligned} P\_{k/k} &= P\_{k/k-1} - P\_{k/k-1} \mathbf{C}\_k^T (\mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^T + \mathbf{R}\_k)^{-1} \mathbf{C}\_k P\_{k/k-1} \\ &+ (L\_k - P\_{k/k-1} \mathbf{C}\_k^T (\mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^T + \mathbf{R}\_k)^{-1}) (\mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^T + \mathbf{R}\_k) (L\_k - P\_{k/k-1} \mathbf{C}\_k^T (\mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^T + \mathbf{R}\_k)^{-1})^T \end{aligned} \tag{29}$$

*By inspection of (29), the filter gain (26) minimises Pk k*/ *. �*

*Example 2 (Data Fusion).* Consider a filtering problem in which there are two measurements of the same state variable (possibly from different sensors), namely *Ak*, *Bk*, *Qk* , *Ck* = 1 1 and *Rk*

= 1, 2, 0 0 *k k R R ,* with *R1,k*, *R2,k*  . Let *Pk/k-1* denote the solution of the Riccati difference equation (25). By applying Cramer's rule within (26) it can be found that the filter gain is given by

<sup>&</sup>quot;A professor is one who can speak on any subject - for precisely fifty minutes." *Norbert Wiener*

and [9].

The expression

**4.10 The A Posteriori Filter** 

**4.11 The Information Form** 

so-called corrected information state

and a predicted information state

reactions to this work." *Rudolf Emil Kalman*

The above predictor-corrector form is used in the construction of extended Kalman filters for nonlinear estimation problems (see Chapter 10). When state predictions are not explicitly required, the following one-line recursion for the filtered state can be employed. Substituting / 1 ˆ *k k x* = 1 1/ 1 ˆ *A x k kk* into / ˆ *k k x* = / 1 ( )ˆ *k k kk I LC x* + *Lkzk* yields / ˆ *k k x* = (*I* –

This form is called the *a posteriori* filter within [7], [8] and [9]. The absence of a direct feedthrough matrix above reduces the complexity of the robust filter designs described in [7], [8]

Algebraically equivalent recursions of the Kalman filter can be obtained by propagating a

which is variously known as the Matrix Inversion Lemma, the Sherman-Morrison formula and Woodbury's identity, is used to derive the information filter, see [3], [4], [11], [14] and [15]. To confirm the above identity, premultiply both sides of (38) by <sup>1</sup> ( ) *A BD C* to obtain

<sup>1</sup> 1 11 1 1 1 11 1 *I I BCDA B C DA B DA BCDA B C DA B DA* ( ) ( )

"I have been aware from the outset that the deep analysis of something which is now called Kalman filtering was of major importance. But even with this immodesty I did not quite anticipate all the

<sup>1</sup> 11 1 11 1 *I BCDA B I CDA B C DA B DA* ( )( )

<sup>1</sup> 1 11 1 11 1 *I BCDA BC C DA B C DA B DA* ( ) ( ) ,

/ // ˆ ˆ *kk kk kk x Px* , (36)

. (37)

1 1 1 1 11 1 ( ) *A BCD A A B C DA B DA* ( ) , (38)

1

1 1/ 1/ 1/ ˆ ˆ *k k k kk k x Px* , (35)

1 1/ 1 ) ˆ *LC A x kk k k k* + *Lkzk*. Hence, the output estimator may be written as

/

ˆ

/ 1 1/ 1

*k k kk k k k k k k k k x I LC A L x y C z* 

ˆ ( ) ˆ

$$L\_k = \left\lfloor \frac{R\_{2,k} P\_{k/k-1}}{R\_{2,k} P\_{k/k-1} + R\_{1,k} P\_{k/k-1} + R\_{1,k} R\_{2,k}} \right\rfloor \\ \frac{R\_{1,k} P\_{k/k-1}}{R\_{2,k} P\_{k/k-1} + R\_{1,k} P\_{k/k-1} + R\_{1,k} R\_{2,k}} \left\lfloor \frac{1}{R\_{1,k}} \right\rfloor$$

from which it follows that 1, 2, 0 0 lim 1 0 *k k <sup>k</sup> R R <sup>L</sup>* and 2, 1, 0 0 lim 0 1 *k k <sup>k</sup> R R <sup>L</sup> .* That is, when the first measurement is noise free, the filter ignores the second measurement and *vice versa*.

Thus, the Kalman filter weights the data according to the prevailing measurement qualities.

#### **4.9 The Predictor-Corrector Form**

The Kalman filter may be written in the following predictor-corrector form. The corrected (or filtered) error covariances and states are respectively given by

$$\boldsymbol{P}\_{k\,\,\,k} = \boldsymbol{P}\_{k\,\,\,k-1} - \boldsymbol{P}\_{k\,\,\,k-1} \boldsymbol{C}\_{k}^{T} (\mathbf{C}\_{k}\boldsymbol{P}\_{k\,\,\,k-1}\boldsymbol{C}\_{k}^{T} + \boldsymbol{R}\_{k})^{-1} \boldsymbol{C}\_{k}\boldsymbol{P}\_{k\,\,\,k-1} \tag{30}$$

$$= \boldsymbol{P}\_{k\,\,\,k-1} - \boldsymbol{L}\_{k} (\mathbf{C}\_{k}\boldsymbol{P}\_{k\,\,\,k-1}\boldsymbol{C}\_{k}^{T} + \boldsymbol{R}\_{k})^{-1} \boldsymbol{L}\_{k}^{T}$$

$$= (\boldsymbol{I} - \boldsymbol{L}\_{k}\mathbf{C}\_{k})\boldsymbol{P}\_{k\,\,\,k-1} \, \, \, \, \, \, \, \, \tag{31}$$

$$\hat{\boldsymbol{\boldsymbol{x}}}\_{k\,\,\,k} = \hat{\boldsymbol{x}}\_{k\,\,\,k-1} + \boldsymbol{L}\_{k} (\mathbf{z}\_{k} - \mathbf{C}\_{k}\hat{\mathbf{x}}\_{k\,\,\,k-1})$$

$$= (\boldsymbol{I} - \boldsymbol{L}\_{k}\mathbf{C}\_{k})\hat{\mathbf{x}}\_{k\,\,\,k-1} + \boldsymbol{L}\_{k}\mathbf{z}\_{k\,\,k} \,. \,\, \, \tag{32}$$

where *Lk* = / 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + *Rk*)-1. Equation (31) is also known as the measurement update. The predicted state and error covariances are respectively given by

$$
\hat{\mathbf{x}}\_{k+1/k} = A\_k \hat{\mathbf{x}}\_{k/k} \tag{32}
$$

$$
= (A\_k - K\_k \mathbf{C}\_k) \hat{\mathbf{x}}\_{k/k-1} + K\_k \mathbf{z}\_k \, \prime
$$

$$
P\_{k+1/k} = A\_k P\_{k/k} A\_k^T + B\_k Q\_k B\_k^T \, \prime \tag{33}
$$

where *Kk* = / 1 / 1 ( *<sup>T</sup> <sup>T</sup> Ak kk k k kk k P C CP C* + *Rk*)-1. It can be seen from (31) that the corrected estimate, / ˆ *k k x* , is obtained using measurements up to time k. This contrasts with the prediction at time k + 1 in (32), which is based on all previous measurements. The output estimate is given by

$$\begin{aligned} \hat{\mathbf{y}}\_{k/k} &= \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k} \\ &= \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k-1} + \mathbf{C}\_k L\_k (\mathbf{z}\_k - \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k-1}) \\ &= \mathbf{C}\_k (I - L\_k \mathbf{C}\_k) \hat{\mathbf{x}}\_{k/k-1} + \mathbf{C}\_k L\_k \mathbf{z}\_k \end{aligned} \tag{34}$$

<sup>&</sup>quot;Before the advent of the Kalman filter, most mathematical work was based on Norbert Wiener's ideas, but the 'Wiener filtering' had proved difficult to apply. Kalman's approach, based on the use of state space techniques and a recursive least-squares algorithm, opened up many new theoretical and practical possibilities. The impact of Kalman filtering on all areas of applied mathematics, engineering, and sciences has been tremendous." *Eduardo Daniel Sontag*

#### **4.10 The A Posteriori Filter**

Smoothing, Filtering and Prediction:

*<sup>k</sup> R R <sup>L</sup> .* That is, when the

*,*

(30)

(31)

(32)

(33)

(34)

<sup>82</sup> Estimating the Past, Present and Future

*R P R P*

*RP RP RR RP RP RR*

 

*<sup>k</sup> R R <sup>L</sup>* and 2, 1, 0 0

first measurement is noise free, the filter ignores the second measurement and *vice versa*. Thus, the Kalman filter weights the data according to the prevailing measurement qualities.

The Kalman filter may be written in the following predictor-corrector form. The corrected

where *Lk* = / 1 / 1 ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + *Rk*)-1. Equation (31) is also known as the measurement

*<sup>T</sup> <sup>T</sup> P AP A BQB k k k kk k k k k* ,

where *Kk* = / 1 / 1 ( *<sup>T</sup> <sup>T</sup> Ak kk k k kk k P C CP C* + *Rk*)-1. It can be seen from (31) that the corrected estimate, / ˆ *k k x* , is obtained using measurements up to time k. This contrasts with the prediction at time k + 1 in (32), which is based on all previous measurements. The output estimate is given by

"Before the advent of the Kalman filter, most mathematical work was based on Norbert Wiener's ideas, but the 'Wiener filtering' had proved difficult to apply. Kalman's approach, based on the use of state space techniques and a recursive least-squares algorithm, opened up many new theoretical and practical possibilities. The impact of Kalman filtering on all areas of applied mathematics, engineering,

1, 2, 0 0

(or filtered) error covariances and states are respectively given by

<sup>1</sup>

/ 1 ( )ˆ *k k kk k k I LC x Lz* ,

/ 1 ( ) *k k kk I LC P* ,

*k k*

lim 1 0

/ /1 /1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> P P P C CP C R CP kk kk kk k k kk k k k kk* 

/ /1 / 1 ˆ ˆ ( ˆ ) *kk kk k k k kk x x L z Cx*

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> P L CP C R L kk k k kk k k k* 

update. The predicted state and error covariances are respectively given by

/ 1 ( )ˆ *Ak k k kk k k KC x Kz* ,

1/ /

/ / ˆ ˆ *kk k kk y C x*

/ 1 / 1 ˆ ( ˆ ) *Cx CL z Cx k kk k k k k kk*

/ 1 ( )ˆ *C I LC x CLz k k k kk k k k* .

and sciences has been tremendous." *Eduardo Daniel Sontag*

1/ / ˆ ˆ *k k k kk x Ax*

*k*

**4.9 The Predictor-Corrector Form** 

*L*

from which it follows that

2, / 1 1, / 1 2, / 1 1, / 1 1, 2, 2, / 1 1, / 1 1, 2, *k kk k kk*

lim 0 1

*k k*

1

*k kk k kk k k k kk k kk k k*

The above predictor-corrector form is used in the construction of extended Kalman filters for nonlinear estimation problems (see Chapter 10). When state predictions are not explicitly required, the following one-line recursion for the filtered state can be employed. Substituting / 1 ˆ *k k x* = 1 1/ 1 ˆ *A x k kk* into / ˆ *k k x* = / 1 ( )ˆ *k k kk I LC x* + *Lkzk* yields / ˆ *k k x* = (*I* – 1 1/ 1 ) ˆ *LC A x kk k k k* + *Lkzk*. Hence, the output estimator may be written as

$$
\begin{bmatrix}
\hat{\mathbf{x}}\_{k\times k} \\
\hat{\mathbf{y}}\_{k\times k}
\end{bmatrix} = \begin{bmatrix}
(I - L\_k \mathbf{C}\_k) A\_{k-1} & L\_k \\
& \mathbf{C}\_k
\end{bmatrix} \begin{bmatrix}
\hat{\mathbf{x}}\_{k-1/k-1} \\
\mathbf{z}\_k
\end{bmatrix} \tag{35}
$$

This form is called the *a posteriori* filter within [7], [8] and [9]. The absence of a direct feedthrough matrix above reduces the complexity of the robust filter designs described in [7], [8] and [9].

#### **4.11 The Information Form**

Algebraically equivalent recursions of the Kalman filter can be obtained by propagating a so-called corrected information state

$$
\hat{\underline{\mathbf{x}}}\_{k/k} = P\_{k/k}^{-1} \hat{\underline{\mathbf{x}}}\_{k/k} \,\prime \tag{36}
$$

and a predicted information state

$$
\hat{\underline{\mathbf{x}}}\_{k+1/k} = P\_{k+1/k}^{-1} \hat{\underline{\mathbf{x}}}\_{k+1/k} \,. \tag{37}
$$

The expression

$$(A+B\!\!\Box B)^{-1} = A^{-1} - A^{-1}B(\mathcal{C}^{-1} + DA^{-1}B)^{-1}DA^{-1} \tag{38}$$

which is variously known as the Matrix Inversion Lemma, the Sherman-Morrison formula and Woodbury's identity, is used to derive the information filter, see [3], [4], [11], [14] and [15]. To confirm the above identity, premultiply both sides of (38) by <sup>1</sup> ( ) *A BD C* to obtain

$$I = I + BCDA^{-1} - B(\mathbf{C}^{-1} + DA^{-1}B)^{-1}DA^{-1} - BCDA^{-1}B(\mathbf{C}^{-1} + DA^{-1}B)^{-1}DA^{-1}$$

$$= I + BCDA^{-1} - B(I + CDA^{-1}B)^{-1}(\mathbf{C}^{-1} + DA^{-1}B)^{-1}DA^{-1}$$

$$= I + BCDA^{-1} - BC(\mathbf{C}^{-1} + DA^{-1}B)^{-1}(\mathbf{C}^{-1} + DA^{-1}B)^{-1}DA^{-1}$$

<sup>&</sup>quot;I have been aware from the outset that the deep analysis of something which is now called Kalman filtering was of major importance. But even with this immodesty I did not quite anticipate all the reactions to this work." *Rudolf Emil Kalman*

The predicted information state follows from (37), (41) and the definition of *Fk*, namely, 1 1/ 1/ 1/ ˆ ˆ *k k k kk k x Px* 

1/ / ˆ *P Ax k k k kk*

Recall from Lemma 1 and Lemma 3 that 1/ { } ˆ *Ex x k kk* = 0 and / { } ˆ *Ex x k kk* = 0, provided

/ / { ˆ } *Ex P x k kk kk* = 0. That is, the information states (scaled by the appropriate covariances) will be unbiased, provided that the filter is suitably initialised. The calculation cost and potential for numerical instability can influence decisions on whether to implement the predictor-corrector form (30) - (33) or the information form (39) - (46) of the Kalman filter. The filters have similar complexity, both require a *p* × *p* matrix inverse in the measurement updates (31) and (45). However, inverting the measurement covariance matrix for the

The recursive least squares (RLS) algorithm is equivalent to the Kalman filter designed with the simplifications *Ak* = *I* and *Bk* = *0*; see the derivations within [10], [11]. For convenience, consider a more general RLS algorithm that retains the correct *Ak* but relies on the simplifying assumption *Bk* = 0. Under these conditions, denote the RLS algorithm's predictor gain by

It is argued below that the cost of the above model simplification is an increase in mean-

*Lemma 5: Let Pk k* 1/ *denote the predicted error covariance within (33) for the optimal filter. Under the above conditions, the predicted error covariance, Pk k*/ 1 *, exhibited by the RLS algorithm satisfies* 

"All of the books in the world contain no more information than is broadcast as video in a single large

American city in a single year. Not all bits have equal value." *Carl Edward Sagan*

/ ( ( ) ) ˆ *<sup>T</sup> <sup>T</sup> k k k k k k k k k kk I FB B FB Q B FAx* (46)

/ ( ( ) ) ˆ *<sup>T</sup> T T k k k k k k k k kk I FB B FB Q B A x* .

1

 . (48)

, (47)

*P P kk kk* /1 /1 *.* (49)

, it follows that 1/ 1/ { ˆ } *Ex P x k k kk k* = 0 and

information filter may be troublesome when the measurement noise is negligible.

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> K AP C CP C R k k kk k k kk k k*

1/ / 1 / 1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP C CP C R CP A k k k kk k k kk k k kk k k k kk k*

1 1

1 1

<sup>1</sup>

0/0 *x*ˆ = *x*0. Similarly, with 0/0 *x*ˆ = <sup>1</sup> *P x* 0/0 0

**4.12 Comparison with Recursive Least Squares** 

where *Pk k*/ 1 is obtained from the Riccati difference equation

<sup>1</sup>

square-error.

from which the result follows. From the above Matrix Inversion Lemma and (30) it follows that

$$\begin{split} P\_{k/k}^{-1} &= \left( P\_{k/k-1} - P\_{k/k-1} \mathbf{C}\_{k}^{\top} \left( \mathbf{C}\_{k} P\_{k/k-1} \mathbf{C}\_{k}^{\top} + \mathbf{R}\_{k} \right)^{-1} \mathbf{C}\_{k} P\_{k/k-1} \right)^{-1} \\ &= P\_{k/k-1}^{-1} + \mathbf{C}\_{k}^{\top} \mathbf{R}\_{k}^{-1} \mathbf{C}\_{k} \, \prime \end{split} \tag{39}$$

assuming that <sup>1</sup> *Pk k*/ 1 and <sup>1</sup> *Rk* exist. An expression for <sup>1</sup> *Pk k* 1/ can be obtained from the Matrix Inversion Lemma and (33), namely,

$$\begin{aligned} P\_{k+1/k}^{-1} &= (A\_k P\_{k/k} A\_k^T + B\_k Q\_k B\_k^T)^{-1} \\ &= (F\_k^{-1} + B\_k Q\_k B\_k^T)^{-1} \end{aligned} \tag{40}$$

where *Fk* = <sup>1</sup> / ( ) *<sup>T</sup> AP A k kk k* = 1 1 / *<sup>T</sup> Ak kk k P A* , which gives

$$P\_{k+1/k}^{-1} = (I - F\_k B\_k (B\_k^T F\_k B\_k + Q\_k^{-1})^{-1} B\_k^T) F\_k \,. \tag{41}$$

Another useful identity is

$$(A + BCD)^{-1}BC = A^{-1}(I + BCDA^{-1})^{-1}BC$$

$$= A^{-1}B(I + CDA^{-1}B)^{-1}C \tag{42}$$

$$= A^{-1}B(C^{-1} + DA^{-1}B)^{-1} \dots$$

From (42) and (39), the filter gain can be expressed as

$$L\_k = P\_{k \times k-1} \mathbf{C}\_k^T (\mathbf{C}\_k \mathbf{P}\_{k \times k-1} \mathbf{C}\_k^T + \mathbf{R}\_k)^{-1}$$

$$= (\mathbf{P}\_{k \times k-1}^{-1} + \mathbf{C}\_k^T \mathbf{R}\_k^{-1} \mathbf{C}\_k)^{-1} \mathbf{C}\_k^T \mathbf{R}\_k^{-1} \tag{43}$$

$$= \mathbf{P}\_{k \times k} \mathbf{C}\_k^T \mathbf{R}\_k^{-1} \tag{44}$$

Premultiplying (39) by *Pk k*/ and rearranging gives

$$I - L\_k \mathbf{C}\_k = P\_{k/k} P\_{k/k - 1}^{-1} \,. \tag{44}$$

It follows from (31), (36) and (44) that the corrected information state is given by

$$\begin{aligned} \hat{\underline{\mathbf{x}}}\_{k \wedge k} &= P\_{k \wedge k}^{-1} \hat{\mathbf{x}}\_{k \wedge k} \\ &= P\_{k \wedge k}^{-1} (I - L\_k \mathbf{C}\_k) \hat{\mathbf{x}}\_{k \wedge k-1} + P\_{k \wedge k}^{-1} L\_k \mathbf{z}\_k \\ &= \hat{\underline{\mathbf{x}}}\_{k \wedge k-1} + C\_k^T R\_k^{-1} \mathbf{z}\_k \end{aligned} \tag{45}$$

<sup>&</sup>quot;Information is the oxygen of the modern age. It seeps through the walls topped by barbed wire, it wafts across the electrified borders." *Ronald Wilson Reagan*

Smoothing, Filtering and Prediction:

can be obtained from the

(39)

(40)

(42)

(43)

(45)

<sup>84</sup> Estimating the Past, Present and Future

from which the result follows. From the above Matrix Inversion Lemma and (30) it follows

exist. An expression for <sup>1</sup> *Pk k* 1/

. (41)

1

/ /1 / ( )ˆ *P I LC x P Lz kk k k kk kk k k* 

*k k kk kk* / /1 *I LC P P* . (44)

1 1 1 / /1 /1 / 1 / 1 ( ( ) ) *<sup>T</sup> <sup>T</sup> P P P C CP C R CP kk kk kk k k kk k k k kk* 

> 1 1 1/ / ( ) *<sup>T</sup> <sup>T</sup> P AP A BQB k k k kk k k k k*

> > ,

*<sup>T</sup> Ak kk k P A* , which gives 1 1 1 1/ ( ( ) ) *<sup>T</sup> <sup>T</sup> P I FB B FB Q B F k k kk kkk k k k* 

<sup>1</sup> <sup>1</sup> 1 1 () ( ) *A BCD BC A I BCDA BC*

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> L P C CP C R k kk k k kk k k* 

It follows from (31), (36) and (44) that the corrected information state is given by

1 / // ˆ ˆ *kk kk kk x Px*

> / 1 ˆ *<sup>T</sup> kk k k k x CR z* .

"Information is the oxygen of the modern age. It seeps through the walls topped by barbed wire, it

/ 1 ( ) *<sup>T</sup> <sup>T</sup> P CRC CR kk k k k k k* 

1

that

<sup>1</sup> <sup>1</sup>

assuming that <sup>1</sup> *Pk k*/ 1

where *Fk* = <sup>1</sup>

Another useful identity is

/ ( ) *<sup>T</sup> AP A k kk k*

/ 1 *<sup>T</sup> P CRC kk k k k* ,

and <sup>1</sup> *Rk*

<sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> F BQB k k kk*

 = 1 1 /

<sup>1</sup> 1 1 *A B I CDA B C* ( )

1 1 11 *A B C DA B* ( ) .

From (42) and (39), the filter gain can be expressed as

<sup>1</sup> 11 1

Premultiplying (39) by *Pk k*/ and rearranging gives

/ *<sup>T</sup> P CR kk k k* .

<sup>1</sup> <sup>1</sup>

<sup>1</sup>

wafts across the electrified borders." *Ronald Wilson Reagan*

<sup>1</sup>

Matrix Inversion Lemma and (33), namely,

The predicted information state follows from (37), (41) and the definition of *Fk*, namely,

$$\begin{aligned} \hat{\underline{\mathbf{x}}}\_{k+1/k} &= P\_{k+1/k}^{-1} \hat{\underline{\mathbf{x}}}\_{k+1/k} \\ &= P\_{k+1/k}^{-1} A\_k \hat{\underline{\mathbf{x}}}\_{k/k} \\ &= (I - F\_k B\_k (B\_k^T F\_k B\_k + Q\_k^{-1})^{-1} B\_k^T) F\_k A\_k \hat{\underline{\mathbf{x}}}\_{k/k} \\ &= (I - F\_k B\_k (B\_k^T F\_k B\_k + Q\_k^{-1})^{-1} B\_k^T) A\_k^{-T} \hat{\underline{\mathbf{x}}}\_{k/k} .\end{aligned} \tag{46}$$

Recall from Lemma 1 and Lemma 3 that 1/ { } ˆ *Ex x k kk* = 0 and / { } ˆ *Ex x k kk* = 0, provided 0/0 *x*ˆ = *x*0. Similarly, with 0/0 *x*ˆ = <sup>1</sup> *P x* 0/0 0 , it follows that 1/ 1/ { ˆ } *Ex P x k k kk k* = 0 and / / { ˆ } *Ex P x k kk kk* = 0. That is, the information states (scaled by the appropriate covariances) will be unbiased, provided that the filter is suitably initialised. The calculation cost and potential for numerical instability can influence decisions on whether to implement the predictor-corrector form (30) - (33) or the information form (39) - (46) of the Kalman filter. The filters have similar complexity, both require a *p* × *p* matrix inverse in the measurement updates (31) and (45). However, inverting the measurement covariance matrix for the information filter may be troublesome when the measurement noise is negligible.

#### **4.12 Comparison with Recursive Least Squares**

The recursive least squares (RLS) algorithm is equivalent to the Kalman filter designed with the simplifications *Ak* = *I* and *Bk* = *0*; see the derivations within [10], [11]. For convenience, consider a more general RLS algorithm that retains the correct *Ak* but relies on the simplifying assumption *Bk* = 0. Under these conditions, denote the RLS algorithm's predictor gain by

$$\underline{\mathbf{K}}\_{k} = A\_{k} \underline{P}\_{k \times k-1} \mathbf{C}\_{k}^{\top} (\mathbf{C}\_{k} \underline{P}\_{k \times k-1} \mathbf{C}\_{k}^{\top} + R\_{k})^{-1},\tag{47}$$

where *Pk k*/ 1 is obtained from the Riccati difference equation

$$\underline{P}\_{\mathbb{k}\times\mathbb{1}/\mathbb{k}} = \mathbf{A}\_{\mathbf{k}} \underline{P}\_{\mathbb{k}\times\mathbb{k}-1} \mathbf{A}\_{\mathbf{k}}^{\mathrm{T}} - \mathbf{A}\_{\mathbf{k}} \underline{P}\_{\mathbb{k}\times\mathbb{k}-1} \mathbf{C}\_{\mathbf{k}}^{\mathrm{T}} (\mathbf{C}\_{\mathbf{k}} \underline{P}\_{\mathbb{k}\times\mathbb{k}-1} \mathbf{C}\_{\mathbf{k}}^{\mathrm{T}} + \mathbf{R}\_{\mathbf{k}})^{-1} \mathbf{C}\_{\mathbf{k}} \underline{P}\_{\mathbb{k}\times\mathbb{k}-1} \mathbf{A}\_{\mathbf{k}}^{\mathrm{T}}.\tag{48}$$

It is argued below that the cost of the above model simplification is an increase in meansquare-error.

*Lemma 5: Let Pk k* 1/ *denote the predicted error covariance within (33) for the optimal filter. Under the above conditions, the predicted error covariance, Pk k*/ 1 *, exhibited by the RLS algorithm satisfies* 

$$P\_{k/k-1} \le \overline{P}\_{k/k-1} \,. \tag{49}$$

<sup>&</sup>quot;All of the books in the world contain no more information than is broadcast as video in a single large American city in a single year. Not all bits have equal value." *Carl Edward Sagan*

*Proof:* 

*(i) The claim follows by inspection of (30) since* 1 1 1/ 2 1 1 1 ( ) *<sup>T</sup> <sup>T</sup> L CP C R L k k kk k k k ≥ 0.* 

*Example 3.* Consider a filtering problem where *A* = 0.9 and *B* = *C* = *Q* = *R* = 1, for which *AAT* + *BQBT* = 1.81 > 1. The predicted error covariances, *Pk <sup>j</sup>* / *<sup>k</sup>* , *j* = 1 … 10, are plotted in Fig. 4. The monotonically increasing sequence of error variances shown in the figure demonstrates that degraded performance occurs during repeated predictions. Fig. 5 shows some sample trajectories of the model output (dotted line), filter output (crosses) and predictions (circles) assuming that *z3* … *z8* are unavailable. It can be seen from the figure that the prediction error

*Pk <sup>j</sup>* 1/ *k which together with (56) results in Pk <sup>j</sup>* / *k ≥ Pk <sup>j</sup>* 1/ *<sup>k</sup> .* �

0

2

y*k*, ˆy*k/k*,

,

Figure 4. Predicted error variances for Example 3. Figure 5. Sample trajectories for Example 3: *yk*

*k kk k k k* <sup>1</sup> *x Ax Bw*

*k kk k y Cx*

/ /1 / 1 ˆ ˆ ( ˆ ) *kk kk k k k kk k x x L z Cx*

/ 1 ˆ *k k y* = / 1 ˆ *C xk kk* + *πk*. The filtered and predicted states are then given by

"I think there is a world market for maybe five computers." *Thomas John Watson*

,

where *µk* and *πk* are deterministic inputs (such as known non-zero means). The modifications to the Kalman recursions can be found by assuming 1/ ˆ *k k x* = / ˆ *Ak kk x* + *μk* and

ˆy*k*+*j/k*

4

6

*<sup>T</sup> AP A <sup>k</sup> <sup>j</sup> <sup>k</sup> <sup>j</sup> k k j +* 1 11

<sup>0</sup> <sup>5</sup> <sup>10</sup>

k

(60)

(dotted line), / ˆ *k k y* (crosses) and / ˆ *<sup>k</sup> <sup>j</sup> <sup>k</sup> y* (circles).

*<sup>T</sup> BQB <sup>k</sup> <sup>j</sup> <sup>k</sup> <sup>j</sup> <sup>k</sup> <sup>j</sup> ≥*

(58) (59)

*Thus, the filter outperforms the one-step-ahead predictor. (ii) For Pk <sup>j</sup>* 1/ *<sup>k</sup> ≥ 0, condition (57) yields* 1 1/ 1

increases with time *k*, which illustrates Lemma 6.

0 5 10

j

**4.14 Accommodating Deterministic Inputs**  Suppose that the signal model is described by

1

2

3

P*k*+*j/k*

4

5

*Proof: From the approach of Lemma 2, the RLS algorithm's predicted error covariance is given by* 

$$
\overline{P}\_{k+1/k} = A\_k \overline{P}\_{k/k-1} A\_k^T - A\_k \overline{P}\_{k/k-1} \mathbf{C}\_k^T (\mathbf{C}\_k \overline{P}\_{k/k-1} \mathbf{C}\_k^T + R\_k)^{-1} \mathbf{C}\_k \overline{P}\_{k/k-1} A\_k^T + R\_k Q\_k B\_k^T
$$

$$
+ (\underline{\mathbf{K}}\_k - A\_k \overline{P}\_{k/k-1} \mathbf{C}\_k^T (\mathbf{C}\_k \overline{P}\_{k/k-1} \mathbf{C}\_k^T + R\_k)^{-1}) (\mathbf{C}\_k \overline{P}\_{k/k-1} \mathbf{C}\_k^T + R\_k) \tag{50}
$$

$$
\times (\underline{\mathbf{K}}\_k - A\_k \overline{P}\_{k/k-1} \mathbf{C}\_k^T (\mathbf{C}\_k \overline{P}\_{k/k-1} \mathbf{C}\_k^T + R\_k)^{-1})^T.
$$

*The last term on the right-hand-side of (50) is nonzero since the above RLS algorithm relies on the erroneous assumption <sup>T</sup> BQB k kk = 0. Therefore (49) follows. �* 

#### **4.13 Repeated Predictions**

When there are gaps in the data record, or the data is irregularly spaced, state predictions can be calculated an arbitrary number of steps ahead. The one-step-ahead prediction is given by (32). The two, three and *j*-step-ahead predictions, given data at time *k*, are calculated as

$$
\hat{\mathbf{x}}\_{k+2/k} = A\_{k+1} \hat{\mathbf{x}}\_{k+1/k} \tag{51}
$$

$$
\hat{\mathfrak{X}}\_{k \ast 3/k} = A\_{k \ast 2} \hat{\mathfrak{X}}\_{k \ast 2/k} \tag{52}
$$

$$
\hat{\mathbf{x}}\_{k+j/k} = \mathbf{A}\_{k+j-1} \hat{\mathbf{x}}\_{k+j-1/k} \tag{53}
$$

see also [4], [12]. The corresponding predicted error covariances are given by

$$P\_{k+2\times k} = A\_{k+1} P\_{k+1\times k} A\_{k+1}^T + B\_{k+1} Q\_{k+1} B\_{k+1}^T \tag{54}$$

$$P\_{k \leftrightarrow 3/k} = A\_{k \leftrightarrow 2} P\_{k \leftrightarrow 2/k} A\_{k \leftrightarrow 2}^T + B\_{k \leftrightarrow 2} Q\_{k \leftrightarrow 2} B\_{k \leftrightarrow 2}^T \tag{55}$$

$$P\_{k \leftrightarrow j/k} = A\_{k \leftrightarrow j-1} P\_{k \leftrightarrow j-1/k} A\_{k \leftrightarrow j-1}^T + B\_{k \leftrightarrow j-1} Q\_{k \leftrightarrow j-1} B\_{k \leftrightarrow j-1}^T \ . \tag{56}$$

Another way to handle missing measurements at time *i* is to set *Ci* = 0, which leads to the same predicted states and error covariances. However, the cost of relying on repeated predictions is an increased mean-square-error which is demonstrated below.

#### *Lemma 6:*

*(i) Pk k*/ *≤ Pk k*/ 1 *. (ii) Suppose that*

$$A\_k A\_k^T + B\_k Q\_k B\_k^T \ge I \tag{57}$$

*for all k [0, N], then Pk <sup>j</sup>* / *k ≥ Pk <sup>j</sup>* 1/ *k for all (j+k) [0, N] .* 

<sup>&</sup>quot;Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tones, computers in the future may have only 1,000 vacuum tubes and perhaps weigh 1.5 tons." *Popular Mechanics*, 1949

*Proof:* 

Smoothing, Filtering and Prediction:

(50)

(51) (52)

(53)

(54) (55)

(56)

<sup>86</sup> Estimating the Past, Present and Future

*Proof: From the approach of Lemma 2, the RLS algorithm's predicted error covariance is given by* 

*The last term on the right-hand-side of (50) is nonzero since the above RLS algorithm relies on the erroneous assumption <sup>T</sup> BQB k kk = 0. Therefore (49) follows. �* 

When there are gaps in the data record, or the data is irregularly spaced, state predictions can be calculated an arbitrary number of steps ahead. The one-step-ahead prediction is given by

(32). The two, three and *j*-step-ahead predictions, given data at time *k*, are calculated as

2 / 1 1/ ˆ ˆ *k k k kk x Ax*

3/ 2 2/ ˆ ˆ *kk k kk x Ax* / 1 1/ ˆ ˆ *<sup>k</sup> <sup>j</sup> k k <sup>j</sup> <sup>k</sup> <sup>j</sup> <sup>k</sup> x Ax* ,

see also [4], [12]. The corresponding predicted error covariances are given by

2 / 1 1/ 1 1 1 1 *<sup>T</sup> <sup>T</sup> P AP A BQB k k k k kk k k k*

3 / 2 2/ 2 2 2 2 *<sup>T</sup> <sup>T</sup> P AP A BQB k k k k kk k k k*

/ 1 1/ 1 1 11 *<sup>T</sup> <sup>T</sup> P AP A BQB <sup>k</sup> <sup>j</sup> k k <sup>j</sup> <sup>k</sup> <sup>j</sup> k k <sup>j</sup> <sup>k</sup> <sup>j</sup> <sup>k</sup> <sup>j</sup> <sup>k</sup> <sup>j</sup>* .

Another way to handle missing measurements at time *i* is to set *Ci* = 0, which leads to the same predicted states and error covariances. However, the cost of relying on repeated

> *[0, N] .*

"Where a calculator on the ENIAC is equipped with 18,000 vacuum tubes and weighs 30 tones, computers in the future may have only 1,000 vacuum tubes and perhaps weigh 1.5 tons." *Popular* 

*<sup>T</sup> <sup>T</sup> Ak k k kk A BQB I* (57)

predictions is an increased mean-square-error which is demonstrated below.

 *[0, N], then Pk <sup>j</sup>* / *k ≥ Pk <sup>j</sup>* 1/ *k for all (j+k)* 

1/ / 1 / 1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP C C P C R C P A BQB k k k kk k k kk k k kk k k k kk k k k k*

/ 1 / 1 / 1 ( ( ) )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> K AP C CP C R CP C R k k kk k k kk k k k kk k k* 

/ 1 / 1 ( ( ) ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> K AP C CP C R k k kk k k kk k k .*

<sup>1</sup>

<sup>1</sup>

**4.13 Repeated Predictions** 

*Lemma 6:* 

*for all k* 

*Mechanics*, 1949

*(i) Pk k*/ *≤ Pk k*/ 1 *. (ii) Suppose that*

1


*Example 3.* Consider a filtering problem where *A* = 0.9 and *B* = *C* = *Q* = *R* = 1, for which *AAT* + *BQBT* = 1.81 > 1. The predicted error covariances, *Pk <sup>j</sup>* / *<sup>k</sup>* , *j* = 1 … 10, are plotted in Fig. 4. The monotonically increasing sequence of error variances shown in the figure demonstrates that degraded performance occurs during repeated predictions. Fig. 5 shows some sample trajectories of the model output (dotted line), filter output (crosses) and predictions (circles) assuming that *z3* … *z8* are unavailable. It can be seen from the figure that the prediction error increases with time *k*, which illustrates Lemma 6.

 <sup>0</sup> <sup>5</sup> <sup>10</sup> 0 2 4 6 k y*k*, ˆy*k/k*, ˆy*k*+*j/k*

Figure 4. Predicted error variances for Example 3. Figure 5. Sample trajectories for Example 3: *yk*

(dotted line), / ˆ *k k y* (crosses) and / ˆ *<sup>k</sup> <sup>j</sup> <sup>k</sup> y* (circles).

#### **4.14 Accommodating Deterministic Inputs**

Suppose that the signal model is described by

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{w}\_k + \boldsymbol{\mu}\_{k \text{ \textquotedblleft}} \tag{58}$$

$$y\_k = \mathbb{C}\_k \mathfrak{x}\_k + \mathfrak{x}\_{k'} \tag{59}$$

where *µk* and *πk* are deterministic inputs (such as known non-zero means). The modifications to the Kalman recursions can be found by assuming 1/ ˆ *k k x* = / ˆ *Ak kk x* + *μk* and / 1 ˆ *k k y* = / 1 ˆ *C xk kk* + *πk*. The filtered and predicted states are then given by

$$
\hat{\mathbf{x}}\_{k/k} = \hat{\mathbf{x}}\_{k/k-1} + L\_k (\mathbf{z}\_k - \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k-1} - \boldsymbol{\pi}\_k) \tag{60}
$$

<sup>&</sup>quot;I think there is a world market for maybe five computers." *Thomas John Watson*

**4.15 Correlated Process and Measurement Noises** 

Kalman in 1963 [2]. The expressions for the state prediction

1/ / 1

and the state prediction error

covariance /1 /1 { } *<sup>T</sup> Ex x kk kk* .

remain the same. It follows from (68) that

*suppose there exist solutions Pk k*/ 1 *=* / 1

*minimises Pk k*/ 1 = /1 /1 { } *<sup>T</sup> Ex x kk kk* .

*Proof: It follows from (69) that* 

*over [0, N], then the state prediction (67) with the gain* 

1/ 1/ /1 /1 { } ( ) { }( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> Ex x A KC Ex x A KC k k k k k k k kk kk k k k*

/1 /1 ( ) { }( ) *<sup>T</sup> <sup>T</sup> A KC Ex x A KC k k k kk kk k k k*

*<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> BQB K RK BSK KSB k kk k k k kk k kkk* .

"640K ought to be enough for anybody." *William Henry (Bill) Gates III*

Consider the case where the process and measurement noises are correlated

*<sup>w</sup> Q S E wv v S R*

*j T T k k k k T jk j k k*

{ } { } ( ){ } { }

*E v* 

1/ / 1 / 1 / 1 / 1 ( )( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A BQB AP C BS C P C R AP C BS k k k kk k k k k k kk k k k k kk k k k kk k k k*

/ 1 / 1 ( )( ) *<sup>T</sup> <sup>T</sup> K AP C BS C P C R k k kk k k k k kk k k*

As before, the optimum predictor gain is that which minimises the prediction error

*Lemma 7: In respect of the estimation problem defined by (1), (5), (6) with noise covariance (66),* 

(70)

*T*

*k k k*

*k k k*

*S R K* 

*k k T T*

*Q S <sup>B</sup> B K*

*k k k k k kk k k*

*Ex A KC Ex B K*

The generalisation of the optimal filter that takes the above into account was published by

1/ / 1 / 1 ˆ ˆ ( ˆ ) *k k k kk k k k kk x Ax K z Cx* (67)

1/ / 1 ( ) *k k k k k kk k k k k x A KC x Bw Kv* (68)

. (69)

*<sup>T</sup> Pk k ≥ 0 to the Riccati difference equation* 

*k*

*k E w*

1

(72)

1

, (71)

. (66)

and

$$
\hat{\mathbf{x}}\_{k+1\wedge k} = \mathbf{A}\_k \hat{\mathbf{x}}\_{k\wedge k} + \boldsymbol{\mu}\_k \tag{61}
$$

$$=\mathbf{A}\_{\mathbf{k}}\hat{\mathbf{x}}\_{k\times k-1} + \mathbf{K}\_{\mathbf{k}}(\mathbf{z}\_{k} - \mathbf{C}\_{\mathbf{k}}\hat{\mathbf{x}}\_{k\times k-1} - \boldsymbol{\pi}\_{k}) + \boldsymbol{\mu}\_{k\times k} \tag{62}$$

respectively. Subtracting (62) from (58) gives

$$\begin{aligned} \tilde{\mathbf{x}}\_{k+1/k} &= A\_k \tilde{\mathbf{x}}\_{k/k-1} - K\_k (\mathbf{C}\_k \tilde{\mathbf{x}}\_{k/k-1} + \boldsymbol{\pi}\_k + \boldsymbol{\upsilon}\_k - \boldsymbol{\pi}\_k) + B\_k \boldsymbol{w}\_k + \mu\_k - \mu\_k \\ &= (A\_k - K\_k \mathbf{C}\_k) \tilde{\mathbf{x}}\_{k/k-1} + B\_k \boldsymbol{w}\_k - K\_k \boldsymbol{\upsilon}\_k. \end{aligned} \tag{63}$$

where *k k*/ 1 *x* = *xk* – / 1 ˆ *k k x* . Therefore, the predicted error covariance,

$$\begin{aligned} P\_{k+1/k} &= (A\_k - K\_k \mathbf{C}\_k) P\_{k/k-1} (A\_k - K\_k \mathbf{C}\_k)^T + B\_k \mathbf{Q}\_k B\_k^T + K\_k R\_k K\_k^T \\ &= A\_k P\_{k/k-1} A\_k^T - K\_k (\mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^T + R\_k) \mathbf{K}\_k^T + B\_k \mathbf{Q}\_k B\_k^T \end{aligned} \tag{64}$$

is unchanged. The filtered output is given by

$$
\hat{\mathbf{y}}\_{k/k} = \mathbb{C}\_k \hat{\mathbf{x}}\_{k/k} + \boldsymbol{\pi}\_k \,. \tag{65}
$$

Figure 6. Measurements (dotted line) and filtered states (solid line) for Example 4.

*Example 4.* Consider a filtering problem where *A* = diag(0.1, 0.1), *B* = *C* = diag(1, 1), *Q* = *R* = diag(0.001, 0.001), with *µk* = sin(2 ) cos(3 ) *k k* . The filtered states calculated from (60) are shown in Fig. 6. The resulting Lissajous figure illustrates that states having nonzero means can be modelled using deterministic inputs.

<sup>&</sup>quot;There is no reason anyone would want a computer in their home." *Kenneth Harry Olson*

#### **4.15 Correlated Process and Measurement Noises**

Consider the case where the process and measurement noises are correlated

$$E\left[\begin{bmatrix}w\_j\\v\_j\end{bmatrix}\begin{bmatrix}w\_k^T & v\_k^T\end{bmatrix}\right] = \begin{bmatrix}Q\_k & S\_k\\S\_k^T & R\_k\end{bmatrix} \delta\_{jk} \tag{66}$$

The generalisation of the optimal filter that takes the above into account was published by Kalman in 1963 [2]. The expressions for the state prediction

$$
\hat{\mathbf{x}}\_{k+1/k} = A\_k \hat{\mathbf{x}}\_{k/k-1} + \mathbf{K}\_k (\mathbf{z}\_k - \mathbf{C}\_k \hat{\mathbf{x}}\_{k/k-1}) \tag{67}
$$

and the state prediction error

Smoothing, Filtering and Prediction:

(61) (62)

<sup>88</sup> Estimating the Past, Present and Future

/ 1 ( ) *Ak k k kk k k k k KC x Bw Kv* , (63)

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> Ak kk k k k kk k k k k k k P A K C P C R K BQB* , (64)

−2 <sup>0</sup> <sup>2</sup> −2

*Example 4.* Consider a filtering problem where *A* = diag(0.1, 0.1), *B* = *C* = diag(1, 1), *Q* = *R* =

Fig. 6. The resulting Lissajous figure illustrates that states having nonzero means can be

Figure 6. Measurements (dotted line) and filtered states (solid line) for Example 4.

"There is no reason anyone would want a computer in their home." *Kenneth Harry Olson*

z1*,k*, xˆ1*,k/k*

,

 

. (65)

. The filtered states calculated from (60) are shown in

1/ / 1 / 1 ( ) *k k k kk k k kk k k k k k k k x Ax K Cx v Bw* 

1/ / 1 ( )( )*<sup>T</sup> <sup>T</sup> <sup>T</sup> P A KC P A KC BQB K RK k k k k k kk k k k k k k k k k*

/ / ˆ ˆ *kk k kk k y Cx*

/ 1 / 1 ˆ ( ˆ ) *Ax K z Cx k kk k k k kk k k*

where *k k*/ 1 *x* = *xk* – / 1 ˆ *k k x* . Therefore, the predicted error covariance,

−1

sin(2 ) cos(3 ) *k k* 

z2*,k*,

diag(0.001, 0.001), with *µk* =

modelled using deterministic inputs.

ˆx2*,k/k*

0

1

2

and

1/ / ˆ ˆ *k k k kk k x Ax*

respectively. Subtracting (62) from (58) gives

is unchanged. The filtered output is given by

$$
\tilde{\mathbf{x}}\_{k \leftrightarrow 1/k} = (\mathbf{A}\_k - \mathbf{K}\_k \mathbf{C}\_k) \tilde{\mathbf{x}}\_{k/k \to 1} + \mathbf{B}\_k \mathbf{w}\_k - \mathbf{K}\_k \mathbf{w}\_k \tag{68}
$$

remain the same. It follows from (68) that

$$E\{\tilde{\mathbf{x}}\_{k+1/k}\} = (A\_k - K\_k \mathbf{C}\_k) E\{\tilde{\mathbf{x}}\_{k/k-1}\} + \begin{bmatrix} B\_k & -K\_k \\ \end{bmatrix} \begin{bmatrix} E\{w\_k\} \\ E\{v\_k\} \end{bmatrix}.\tag{69}$$

As before, the optimum predictor gain is that which minimises the prediction error covariance /1 /1 { } *<sup>T</sup> Ex x kk kk* .

*Lemma 7: In respect of the estimation problem defined by (1), (5), (6) with noise covariance (66), suppose there exist solutions Pk k*/ 1 *=* / 1 *<sup>T</sup> Pk k ≥ 0 to the Riccati difference equation* 

$$P\_{k+1\mid k} = A\_k P\_{k\mid k-1} A\_k^\top + B\_k Q\_k B\_k^\top - (A\_k P\_{k\mid k-1} C\_k^\top + B\_k S\_k)(C\_k P\_{k\mid k-1} C\_k^\top + R\_k)^{-1} (A\_k P\_{k\mid k-1} C\_k^\top + B\_k S\_k)^\top \tag{70}$$

*over [0, N], then the state prediction (67) with the gain* 

$$\mathbf{K}\_k = (A\_k P\_{k/k-1} \mathbf{C}\_k^\top + B\_k \mathbf{S}\_k)(\mathbf{C}\_k P\_{k/k-1} \mathbf{C}\_k^\top + \mathbf{R}\_k)^{-1},\tag{71}$$

*minimises Pk k*/ 1 = /1 /1 { } *<sup>T</sup> Ex x kk kk* .

*Proof: It follows from (69) that* 

$$E\{\tilde{\mathbf{x}}\_{k+1|k}\tilde{\mathbf{x}}\_{k+1|k}^T\} = (A\_k - K\_k\mathbf{C}\_k)E\{\tilde{\mathbf{x}}\_{k'|k-1}\tilde{\mathbf{x}}\_{k'|k-1}^T\}(A\_k - K\_k\mathbf{C}\_k)^T$$

$$+ \begin{bmatrix} B\_k & -K\_k \\ \end{bmatrix} \begin{bmatrix} Q\_k & S\_k \\ S\_k^T & R\_k \end{bmatrix} \begin{bmatrix} B\_k^T \\ -K\_k^T \end{bmatrix} \tag{72}$$

$$= (A\_k - K\_k\mathbf{C}\_k)E\{\tilde{\mathbf{x}}\_{k'|k-1}\tilde{\mathbf{x}}\_{k'|k-1}^T\}(A\_k - K\_k\mathbf{C}\_k)^T$$

$$+ B\_kQ\_kB\_k^T + K\_kR\_kK\_k^T - B\_kS\_kK\_k^T - K\_kS\_kB\_k^T.$$

<sup>&</sup>quot;640K ought to be enough for anybody." *William Henry (Bill) Gates III*

where

/ 1

The filtered states can be calculated from (74) , (82), (83) and *Lk* = <sup>1</sup>

The general filtering problem is shown in Fig. 7, in which it is desired to develop a filter

*k kk k k* <sup>1</sup> *x Ax Bw* ,

2, 2, *k kk k k* 2, *y Cx Dw* .

*zk*

Figure 7. The general filtering problem. The objective is to estimate the output of

has the realisation (84) and

"He was a multimillionaire. Wanna know how he made all of his money? He designed the little

The objective is to produce estimates 1, / ˆ *k k y* of 1,*<sup>k</sup> y* from the measurements

domain solutions for time-invariant systems were developed in Chapters 1 and 2. Here, for

and *Pk k*/ 1 is the solution of the Riccati difference equation

**4.17 Solution of the General Filtering Problem** 

the time-varying case, it is assumed that the system

.

*y1,k*

 ∑

*vk*

diagrams that tell which way to put batteries on." *Stephen Wright*

that operates on noisy measurements of

noisy measurements of

*wk y2,k*

Suppose that the system

1/ / 1

*<sup>T</sup> <sup>T</sup> k k kk k k k k k C P C DQD R* (82)

*<sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A K K BQB k k k kk k k k k k k k* . (83)

and estimates the output of

1, 1, *k kk k k* 1, *y Cx Dw* . (86)

<sup>∑</sup>

1, / ˆ *k k y*

*k kk k* 2, *z Cx v* , (87)

/ 1 *<sup>T</sup> P C kk k k* .

has the state-space realisation

 (84) (85)

. Frequency

from

/ 1, 1, / ˆ *kk k kk e yy*

*Expanding (72) and denoting Pk k*/ 1 *=* /1 /1 { } *<sup>T</sup> Ex x kk kk gives*

$$P\_{k+1\mid k} = A\_k P\_{k\mid k-1} A\_k^\top + B\_k Q\_k B\_k^\top - (A\_k P\_{k\mid k-1} \mathbb{C}\_k^\top + B\_k S\_k) (\mathbf{C}\_k P\_{k\mid k-1} \mathbb{C}\_k^\top + R\_k)^{-1} (A\_k P\_{k\mid k-1} \mathbb{C}\_k^\top + B\_k S\_k)^{\top}$$

$$+ \left( K\_k - (A\_k P\_{k\mid k-1} \mathbb{C}\_k^\top + B\_k S\_k) (\mathbf{C}\_k P\_{k\mid k-1} \mathbb{C}\_k^\top + R\_k)^{-1} \right) \left( \mathbb{C}\_k P\_{k\mid k-1} \mathbb{C}\_k^\top + R\_k \right)$$

$$\times \left( K\_k - (A\_k P\_{k\mid k-1} \mathbb{C}\_k^\top + B\_k S\_k) (\mathbf{C}\_k P\_{k\mid k-1} \mathbb{C}\_k^\top + R\_k)^{-1} \right)^\top. \tag{73}$$

*By inspection of (73), the predictor gain (71) minimises Pk k* 1/ *. �* 

Thus, the predictor gain is calculated differently when *wk* and *vk* are correlated. The calculation of the filtered state and filtered error covariance are unchanged, *viz.*

$$
\hat{\mathbf{x}}\_{k/k} = (I - L\_k \mathbf{C}\_k) \hat{\mathbf{x}}\_{k/k-1} + L\_k \mathbf{z}\_k \tag{74}
$$

$$P\_{k/k} = (I - L\_k \mathbb{C}\_k) P\_{k/k-1} (I - L\_k \mathbb{C}\_k)^\top + L\_k R\_k L\_k^\top \,. \tag{75}$$

where

$$L\_k = P\_{k/k-1} \mathbb{C}\_k^T \left( \mathbb{C}\_k P\_{k/k-1} \mathbb{C}\_k^T + R\_k \right)^{-1}. \tag{76}$$

However, *Pk k*/ 1 is now obtained from the Riccati difference equation (70).

#### **4.16 Including a Direct-Feedthrough Matrix**

Suppose now that the signal model possesses a direct-feedthrough matrix, *Dk*, namely

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{w}\_k \tag{77}$$

$$\mathbf{y}\_k = \mathbf{C}\_k \mathbf{x}\_k + \mathbf{D}\_k \mathbf{w}\_k \,. \tag{78}$$

Let the observations be denoted by

$$
\underline{\mathbf{z}}\_{k} = \mathbf{C}\_{k}\mathbf{x}\_{k} + \underline{\mathbf{v}}\_{k'} \tag{79}
$$

where *k kk k v Dw v* , under the assumptions (3) and (7). It follows that

$$E\left[\begin{bmatrix}w\_{j}\\ \underline{w}\_{k}\end{bmatrix}\begin{bmatrix}w\_{k}^{T} & \underline{w}\_{k}^{T}\end{bmatrix}\right] = \begin{bmatrix}Q\_{k} & Q\_{k}D\_{k}^{T}\\ D\_{k}Q\_{k} & D\_{k}Q\_{k}D\_{k}^{T} + R\_{k}\end{bmatrix}\delta\_{k} \tag{80}$$

The approach of the previous section may be used to obtain the minimum-variance predictor for the above system. Using (80) within Lemma 7 yields the predictor gain

$$\mathbf{K}\_k = (\mathbf{A}\_k \mathbf{P}\_{k/k-1} \mathbf{C}\_k^T + \mathbf{B}\_k \mathbf{Q}\_k \mathbf{D}\_k^T) \mathbf{Q}\_k^{-1} \tag{81}$$

<sup>&</sup>quot;Everything that can be invented has been invented." *Charles Holland Duell*

where

Smoothing, Filtering and Prediction:

(74) (75)

(77) (78)

1

<sup>90</sup> Estimating the Past, Present and Future

. (73)

1

. (76)

*k kk k z Cx v* , (79)

, (81)

. (80)

*By inspection of (73), the predictor gain (71) minimises Pk k* 1/ *. �* 

Thus, the predictor gain is calculated differently when *wk* and *vk* are correlated. The

1/ / 1 / 1 / 1 / 1 ( )( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A BQB AP C BS C P C R AP C BS k k k kk k k k k k kk k k k k kk k k k kk k k k*

calculation of the filtered state and filtered error covariance are unchanged, *viz.*

/ / 1 ˆ ( )ˆ *k k k k kk k k x I LC x Lz* ,

/ / 1 ( )( )*<sup>T</sup> <sup>T</sup> P I LC P I LC LRL k k k k kk k k k kk* ,

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> L P C CP C R k kk k k kk k k*

Suppose now that the signal model possesses a direct-feedthrough matrix, *Dk*, namely

*k kk k k* <sup>1</sup> *x Ax Bw* ,

*k kk k k y Cx Dw* .

where *k kk k v Dw v* , under the assumptions (3) and (7). It follows that

*<sup>w</sup> Q QD E wv*

"Everything that can be invented has been invented." *Charles Holland Duell*

*<sup>T</sup> <sup>j</sup> T T <sup>k</sup> k k*

*v DQ DQD R*

/ 1 ( ) *<sup>T</sup> <sup>T</sup> K AP C BQD k k kk k k k k k*

predictor for the above system. Using (80) within Lemma 7 yields the predictor gain

*k k T jk j kk kkk k*

The approach of the previous section may be used to obtain the minimum-variance

1

However, *Pk k*/ 1 is now obtained from the Riccati difference equation (70).

 <sup>1</sup> / 1 / 1 / 1 ( )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> K AP C BS C P C R C P C R k k kk k k k k kk k k k kk k k* 

/ 1 / 1 ( )( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> K AP C BS C P C R k k kk k k k k kk k k*

*Expanding (72) and denoting Pk k*/ 1 *=* /1 /1 { } *<sup>T</sup> Ex x kk kk gives*

<sup>1</sup>

**4.16 Including a Direct-Feedthrough Matrix** 

Let the observations be denoted by

where

$$\mathbf{Q}\_{k} = \mathbf{C}\_{k}\mathbf{P}\_{k/k-1}\mathbf{C}\_{k}^{T} + D\_{k}\mathbf{Q}\_{k}\mathbf{D}\_{k}^{T} + \mathbf{R}\_{k} \tag{82}$$

and *Pk k*/ 1 is the solution of the Riccati difference equation

$$P\_{k \leftrightarrow 1/k} = A\_k P\_{k/k-1} A\_k^T - K\_k \Omega\_k K\_k^T + B\_k Q\_k B\_k^T \,. \tag{83}$$

The filtered states can be calculated from (74) , (82), (83) and *Lk* = <sup>1</sup> / 1 *<sup>T</sup> P C kk k k* .

#### **4.17 Solution of the General Filtering Problem**

The general filtering problem is shown in Fig. 7, in which it is desired to develop a filter that operates on noisy measurements of and estimates the output of . Frequency domain solutions for time-invariant systems were developed in Chapters 1 and 2. Here, for the time-varying case, it is assumed that the system has the state-space realisation

$$\mathbf{x}\_{k+1} = A\_k \mathbf{x}\_k + B\_k \mathbf{w}\_{k'} \tag{84}$$

$$\mathbf{y}\_{2,k} = \mathbf{C}\_{2,k}\mathbf{x}\_k + D\_{2,k}\mathbf{w}\_k \,. \tag{85}$$

Figure 7. The general filtering problem. The objective is to estimate the output of from noisy measurements of .

Suppose that the system has the realisation (84) and

$$\mathbf{y}\_{1,k} = \mathbf{C}\_{1,k}\mathbf{x}\_k + D\_{1,k}\mathbf{w}\_k \,. \tag{86}$$

The objective is to produce estimates 1, / ˆ *k k y* of 1,*<sup>k</sup> y* from the measurements

$$\mathbf{z}\_k = \mathbf{C}\_{2,k}\mathbf{x}\_k + \underline{\mathbf{v}}\_k \tag{87}$$

<sup>&</sup>quot;He was a multimillionaire. Wanna know how he made all of his money? He designed the little diagrams that tell which way to put batteries on." *Stephen Wright*

as

*Proof: It follows from (94) that* 

which can be expanded to give

<sup>1</sup>

framework from control theory [13].

nobody in particular?" *David Sarnoff*

<sup>1</sup>

<sup>1</sup>

/ / 1, 2, / 1 1, 2, { }( ) ( ) *<sup>T</sup> T TT Ee e C LC P C C L kk kk k k k kk k k k*

*<sup>T</sup> T T C P C C L LC k kk k k k k k k* ,

/ / 1, / 1 1, 2, 2, 1, / 1 1, 2, 1, / 1 2, 1, 2, { } ( 2, ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> T T Ee e C P C D QD C P C D QD C P C D QD kk kk k kk k k k k k kk k k k k k k kk k k k k* 

*By inspection of (97), the filter gain (95) minimises* / / { } *<sup>T</sup> Ee e kk kk . �*

The filter gain (95) has been generalised to include arbitrary *C*1*,k*, *D*1*,k*, and *D*2*,k*. For state estimation, *C2* = *I* and *D2* = 0, in which case (95) reverts to the simpler form (26). The problem (84) – (88) can be written compactly in the following generalised regulator

0

where 1,1, <sup>0</sup> *B B <sup>k</sup> <sup>k</sup>* , *C C* 1,1, 1, *k k* , *C C* 2,1, 2, *k k* 1,1, 1, <sup>0</sup> *D D <sup>k</sup> <sup>k</sup>* , *D I* 1,2,*<sup>k</sup>* and *D ID* 2,1,*<sup>k</sup>* 2,*<sup>k</sup>* . With the above definitions, the minimum-variance solution can be written

"The wireless music box has no imaginable commercial value. Who would pay for a message sent to

<sup>0</sup> <sup>ˆ</sup>

1, / 1 1, 2, 1, / 1 2, 1, 2, ( 2, ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> T T C P C D QD C P C D QD k kk k k k k k k kk k k k k*

1 1,1, / 1,1, 1,1, 1,2, 2,1, 2,1,

*e C DD*

*k k k*

*k k kkk*

1/ / 1 2,1, / 1 ˆ ˆ ( ˆ ) *k k k kk k k k kk x Ax K z C x* ,

1, / 1,1, / 1 2,1, / 1 ˆ ˆ ( ˆ ) *kk k kk k k k kk y C x Lz C x* ,

*k k k*

*x AB*

*z CD*

1, / 1 2, 1, 2, ( ( ) ) *<sup>T</sup> <sup>T</sup> L C P C D QD k k kk k k k k k k* 

1, / 1 2, 1, 2, ( ( ) ) *<sup>T</sup> T T L C P C D QD k k kk k k k k k*

2, 2, 2,

*k k T T T*

*Q QD <sup>D</sup> D L*

. (97)

1, /

*k k*

*k*

*k*

*x*

*v*

*w*

*y*

*k*

*T T*

1

, (98)

(96)

(99) (100)

*k k k k*

*kk kk k k k*

*D Q D QD R L* 

 2, 1, 1,

1, / 1 1 2, 2,

where *k kk k* 2, *v Dw v* , so that the variance of the estimation error,

$$e\_{k/k} = y\_{1,k} - \hat{y}\_{1,k/k} \,\prime \tag{88}$$

is minimised. The predicted state follows immediately from the results of the previous sections, namely,

$$\begin{aligned} \hat{\mathbf{x}}\_{k+1/k} &= A\_k \hat{\mathbf{x}}\_{k/k-1} + \mathbf{K}\_k (\mathbf{z}\_k - \mathbf{C}\_{2,k} \hat{\mathbf{x}}\_{k/k-1}) \\ &= (A\_k - K\_k \mathbf{C}\_{2,k}) \hat{\mathbf{x}}\_{k/k-1} + K\_k \mathbf{z}\_k \end{aligned} \tag{89}$$

where

$$K\_k = (A\_k P\_{k/k-1} \mathbf{C}\_{2,k}^T + B\_k \mathbf{Q}\_k \mathbf{D}\_{2,k}^T) \mathbf{Q}\_k^{-1} \tag{90}$$

and

$$\boldsymbol{\Omega}\_{k} = \mathbf{C}\_{2,k} \mathbf{P}\_{k/k-1} \mathbf{C}\_{2,k}^{\top} + \mathbf{D}\_{2,k} \mathbf{Q}\_{k} \mathbf{D}\_{2,k}^{\top} + \mathbf{R}\_{k'} \tag{91}$$

in which *Pk k*/ 1 evolves from

$$P\_{k+1/k} = A\_k P\_{k/k-1} A\_k^T - K\_k \Omega\_k K\_k^T + B\_k \underline{Q}\_k B\_k^T \,. \tag{92}$$

In view of the structure (89), an output estimate of the form

$$\begin{aligned} \hat{y}\_{1,k\,\,\,k} &= \mathbf{C}\_{1,k}\hat{\mathbf{x}}\_{k\,\,\,k=1} + L\_k(\mathbf{z}\_k - \mathbf{C}\_{2,k}\hat{\mathbf{x}}\_{k\,\,\,k=1}) \\ &= (\mathbf{C}\_{1,k} - L\_k\mathbf{C}\_{2,k})\hat{\mathbf{x}}\_{k\,\,\,k=1} + L\_k\mathbf{z}\_{k\,\,\,\,k} \end{aligned} \tag{93}$$

is sought, where *Lk* is a filter gain to be designed. Subtracting (93) from (86) gives

$$\begin{aligned} e\_{k \wedge k} &= y\_{1,k} - \hat{y}\_{1,k \wedge k} \\ &= (\mathbf{C}\_{1,k} - L\_k \mathbf{C}\_{2,k}) \tilde{\mathbf{x}}\_{k \wedge k+1} + \begin{bmatrix} D\_{1,k} & -L\_k \end{bmatrix} \begin{bmatrix} w\_k \\ \underline{w}\_k \end{bmatrix} .\end{aligned} \tag{94}$$

It is shown below that an optimum filter gain can be found by minimising the output error covariance / / { } *<sup>T</sup> Ee e kk kk* .

*Lemma 8: In respect of the estimation problem defined by (84) - (88), the output estimate* 1, / ˆ *k k y with the filter gain* 

$$L\_k = (\mathbf{C}\_{1,k} P\_{k \;/\, k-1} \mathbf{C}\_{2,k}^{\top} + D\_{1,k} Q\_k D\_{2,k}^{\top}) \mathbf{Q}\_k^{-1} \tag{95}$$

*minimises* / / { } *<sup>T</sup> Ee e kk kk .* 

<sup>&</sup>quot;This 'telephone' has too many shortcomings to be seriously considered as a means of communication. The device is inherently of no value to us." *Western Union* memo, 1876

*Proof: It follows from (94) that* 

Smoothing, Filtering and Prediction:

(89)

(93)

(94)

/ 1, 1, / ˆ *kk k kk e yy* , (88)

1

(90)

*<sup>T</sup> <sup>T</sup> k k kk k k k k k C P C D QD R* , (91)

*<sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A K K BQB k k k kk k k k k k k k* . (92)

*k w*

<sup>92</sup> Estimating the Past, Present and Future

is minimised. The predicted state follows immediately from the results of the previous

1/ / 1 2, / 1 ˆ ˆ ( ˆ ) *k k k kk k k k kk x Ax K z C x*

/ 1 2, 2, ( ) *<sup>T</sup> <sup>T</sup> K AP C BQD k k kk k k k k k*

2, / 1 2, 2, 2,

1, / 1, / 1 2, / 1 ˆ ˆ ( ˆ ) *kk k kk k k k kk y Cx Lz Cx*

is sought, where *Lk* is a filter gain to be designed. Subtracting (93) from (86) gives

1, / 1 2, 1, 2, ( ) *<sup>T</sup> <sup>T</sup> L C P C D QD k k kk k k k k k*

*k k k kk k k*

It is shown below that an optimum filter gain can be found by minimising the output error

*Lemma 8: In respect of the estimation problem defined by (84) - (88), the output estimate* 1, / ˆ *k k y with* 

"This 'telephone' has too many shortcomings to be seriously considered as a means of communication.

*v*

1

(95)

 .

*C LC x D L*

1/ / 1

In view of the structure (89), an output estimate of the form

1, 2, / 1 ( )ˆ *C LC x Lz k k k kk k k* ,

1, 2, / 1 1, ( ) *<sup>k</sup>*

The device is inherently of no value to us." *Western Union* memo, 1876

/ 1, 1, / ˆ *kk k kk e yy*

covariance / / { } *<sup>T</sup> Ee e kk kk* .

*minimises* / / { } *<sup>T</sup> Ee e kk kk .* 

*the filter gain* 

where *k kk k* 2, *v Dw v* , so that the variance of the estimation error,

2, / 1 ( )ˆ *Ak k k kk k k KC x Kz*

sections, namely,

in which *Pk k*/ 1 evolves from

where

and

$$\begin{aligned} \mathbf{E}\{\mathbf{e}\_{k\times k}\mathbf{u}\_{k\times k}^{\top}\} &= (\mathbf{C}\_{1,k} - \mathbf{L}\_{1}\mathbf{C}\_{2,k})P\_{k\times k-1}(\mathbf{C}\_{1,k}^{\top} - \mathbf{C}\_{2,k}^{\top}\mathbf{L}\_{k}^{\top}) \\ &+ \begin{bmatrix} D\_{1,k} & -L\_{k} \\ D\_{2,k}\mathbf{Q}\_{k}^{\top} & D\_{2,k}\mathbf{Q}\_{k}\mathbf{D}\_{2,k}^{\top} + R\_{k} \end{bmatrix} \begin{bmatrix} D\_{1,k}^{\top} \\ -L\_{k}^{\top} \end{bmatrix} \\ &= \mathbf{C}\_{1,k}P\_{k\times k-1}\mathbf{C}\_{1k}^{\top} - \mathbf{C}\_{2,k}\overline{L}\_{k}\mathbf{Q}\_{k}\overline{L}\_{k}^{\top}\mathbf{C}\_{2,k}^{\top} \end{aligned} \tag{96}$$

which can be expanded to give

1 / / 1, / 1 1, 2, 2, 1, / 1 1, 2, 1, / 1 2, 1, 2, { } ( 2, ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> T T Ee e C P C D QD C P C D QD C P C D QD kk kk k kk k k k k k kk k k k k k k kk k k k k* <sup>1</sup> 1, / 1 1, 2, 1, / 1 2, 1, 2, ( 2, ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> T T C P C D QD C P C D QD k kk k k k k k k kk k k k k* <sup>1</sup> 1, / 1 2, 1, 2, ( ( ) ) *<sup>T</sup> <sup>T</sup> L C P C D QD k k kk k k k k k k* <sup>1</sup> 1, / 1 2, 1, 2, ( ( ) ) *<sup>T</sup> T T L C P C D QD k k kk k k k k k* . (97)

*By inspection of (97), the filter gain (95) minimises* / / { } *<sup>T</sup> Ee e kk kk . �*

The filter gain (95) has been generalised to include arbitrary *C*1*,k*, *D*1*,k*, and *D*2*,k*. For state estimation, *C2* = *I* and *D2* = 0, in which case (95) reverts to the simpler form (26). The problem (84) – (88) can be written compactly in the following generalised regulator framework from control theory [13].

$$
\begin{bmatrix} \mathbf{x}\_{k+1} \\ \mathbf{e}\_{k/k} \\ \mathbf{z}\_k \end{bmatrix} = \begin{bmatrix} A\_k & B\_{1,1,k} & \mathbf{0} \\ \mathbf{C}\_{1,1,k} & D\_{1,1,k} & D\_{1,2,k} \\ \mathbf{C}\_{2,1,k} & D\_{2,1,k} & \mathbf{0} \end{bmatrix} \begin{bmatrix} \mathbf{x}\_k \\ \mathbf{w}\_k \\ \mathbf{w}\_k \end{bmatrix} \Big|\_{\begin{bmatrix} \mathbf{w}\_k \\ \mathbf{w}\_k \end{bmatrix}} \tag{98}
$$

where 1,1, <sup>0</sup> *B B <sup>k</sup> <sup>k</sup>* , *C C* 1,1, 1, *k k* , *C C* 2,1, 2, *k k* 1,1, 1, <sup>0</sup> *D D <sup>k</sup> <sup>k</sup>* , *D I* 1,2,*<sup>k</sup>* and *D ID* 2,1,*<sup>k</sup>* 2,*<sup>k</sup>* . With the above definitions, the minimum-variance solution can be written as 

$$
\hat{\mathbf{x}}\_{k \ast 1/k} = \mathbf{A}\_k \hat{\mathbf{x}}\_{k/k-1} + \mathbf{K}\_k (\mathbf{z}\_k - \mathbf{C}\_{2,1,k} \hat{\mathbf{x}}\_{k/k-1}) \,\tag{99}
$$

$$
\hat{y}\_{1,k/k} = \mathbf{C}\_{1,1,k}\hat{\mathbf{x}}\_{k/k-1} + L\_k(\mathbf{z}\_k - \mathbf{C}\_{2,1,k}\hat{\mathbf{x}}\_{k/k-1}) \,. \tag{100}
$$

<sup>&</sup>quot;The wireless music box has no imaginable commercial value. Who would pay for a message sent to nobody in particular?" *David Sarnoff*

given by

**4.19 Conclusion** 

Toulouse, 1872

where *Lk* = <sup>1</sup>

A linear, time-varying system

second reference system

/ 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> P C CP C R kk k k kk k k*

dynamics from medical image sequences.

the cascaded system

2, 1, *k k I DD* .

**4.18 Hybrid Continuous-Discrete Filtering** 

discrete time increments. This problem is modelled in [20] as

predicted states and error covariances are obtained from

> 1, 0 *k kk k*

Often a system's dynamics evolve continuously but measurements can only be observed in

where *E*{*w*(*t*)} = 0, *E*{*w*(*t*)*wT*(*τ*)} = *Q*(*t*)*δ*(*t* – *τ*), *E*{*vk*} = 0, { }*<sup>T</sup> Evvj <sup>k</sup>* = *Rkδ*jk and *xk* = *x*(*kTs*), in which *Ts* is the sampling interval. Following the approach of [20], state estimates can be obtained from a hybrid of continuous-time and discrete-time filtering equations. The

Define / 1 ˆ *k k x* = *x t* ˆ( ) and *Pk*/*<sup>k</sup>*-1 = *P*(*t*) at *t* = *kTs*. The corrected states and error covariances are

discrete observation times. The states evolve according to the continuous-time dynamics (106) in-between the sampling instants. This filter is applied in [20] for recovery of cardiac

*y2,k* = *C*2,*kxk* + *D*2,*kwk*. In the general filtering problem, it is desired to estimate the output of a

"Louis Pasteur's theory of germs is ridiculous fiction." *Pierre Pachet*, Professor of Physiology at

which estimates *y1,k* from the measurements *zk* = *y2,k* + *vk* at time *k* is listed in Table 1.

. The above filter is a linear system having jumps at the

is assumed to have the realisation *xk*+1 = *Ak*x*k* + *Bkwk* and

which is modelled as *y1,k* = *C*1,*kxk* + *D*1,*kwk*. The Kalman filter

*xt Atxt Btwt* () () () () () ,

*k kk k z Cx v* ,

*xt Atxt* ˆ ˆ () () () ,

() () () () () () () () *<sup>T</sup> <sup>T</sup> Pt AtPt Pt A t BtQtB t* .

/ /1 / 1 ˆ ˆ ( ˆ ) *kk kk k k k kk x x L z Cx* ,

/ / 1 ( ) *P I LC P k k k k kk* ,

*<sup>C</sup>*1,1,*<sup>k</sup>* = 1, <sup>0</sup> *<sup>C</sup> <sup>k</sup>* , *C2,1,*k = *C DC* 2, 2, 1, *k kk* , *D*1,1,*<sup>k</sup>* = 1, <sup>0</sup> *<sup>D</sup> <sup>k</sup>* and *D*2,1,*<sup>k</sup>*<sup>=</sup>

*A BC A* 

can be found by setting *Ak* = 2, 2, 1,

(see Problem 7), the minimum-variance solution

, *B*1,1*k* = 2, 1,

0 0

1,

*k k k B D B*  , *B*1,2,*<sup>k</sup>* <sup>=</sup> <sup>0</sup>

0 *,* 

(104) (105)

(106) (107)

(108) (109)

where

$$\boldsymbol{K}\_{k} = \left(\boldsymbol{A}\_{k}\boldsymbol{P}\_{k\backslash k-1}\boldsymbol{\mathsf{C}}\_{2,1,k}^{\mathrm{T}} + \boldsymbol{B}\_{1,1k}\begin{bmatrix}\boldsymbol{R}\_{k} & \mathbf{0}\\\mathbf{0} & \mathbf{Q}\_{k}\end{bmatrix}\boldsymbol{D}\_{2,1,k}^{\mathrm{T}}\right)\left(\boldsymbol{\mathsf{C}}\_{2,1,k}\boldsymbol{P}\_{k\backslash k-1}\boldsymbol{\mathsf{C}}\_{2,1,k}^{\mathrm{T}} + \boldsymbol{D}\_{2,1k}\begin{bmatrix}\boldsymbol{R}\_{k} & \mathbf{0}\\\mathbf{0} & \mathbf{Q}\_{k}\end{bmatrix}\boldsymbol{D}\_{2,1,k}^{\mathrm{T}}\right)^{-1},\tag{101}$$

$$L\_k = \left(\mathbf{C}\_{1,1,k}\mathbf{P}\_{k\backslash k-1}\mathbf{C}\_{2,1,k}^T + D\_{1,1,k}\begin{bmatrix} \mathbf{R}\_k & \mathbf{0} \\ \mathbf{0} & Q\_k \end{bmatrix} \mathbf{D}\_{2,1,k}^T\right) \left(\mathbf{C}\_{2,1,k}\mathbf{P}\_{k\backslash k-1}\mathbf{C}\_{2,1,k}^T + D\_{2,1,k}\begin{bmatrix} \mathbf{R}\_k & \mathbf{0} \\ \mathbf{0} & Q\_k \end{bmatrix} \mathbf{D}\_{2,1,k}^T\right)^{-1} \tag{102}$$

in which *Pk k*/ 1 is the solution of the Riccati difference equation

$$P\_{k+1/k} = A\_k P\_{k/k-1} A\_k^T - K\_k (\mathbf{C}\_{2,1,k} P\_{1/k-1} \mathbf{C}\_{2,1,k}^T + D\_{2,1,k} \begin{bmatrix} R\_k & \mathbf{0} \\ \mathbf{0} & Q\_k \end{bmatrix} D\_{2,1,k}^T) K\_k^T + B\_{1,1,k} \begin{bmatrix} R\_k & \mathbf{0} \\ \mathbf{0} & Q\_k \end{bmatrix} B\_{1,1,k}^T. \tag{103}$$

The application of the solution (99) – (100) to output estimation, input estimation (or equalisation), state estimation and mixed filtering problems is demonstrated in the example below.

Figure 8. The mixed filtering and equalisation problem considered in Example 5. The objective is to estimate the output of the plant which has been corrupted by the channel and the measurement noise *vk*.

*Example* 5*.*


$$\text{that } \mathcal{G}\_{\mathbf{z}} \text{ has the realization} \\
\begin{bmatrix} \mathbf{x}\_{1,k+1} \\ \mathbf{y}\_{1,k} \end{bmatrix} = \begin{bmatrix} A\_{1,k} & B\_{1,k} \\ C\_{1,k} & D\_{1,k} \end{bmatrix} \begin{bmatrix} \mathbf{x}\_{1,k} \\ \mathbf{w}\_k \end{bmatrix}. \text{ Notting the realization of } \mathbf{x}$$

<sup>&</sup>quot;Video won't be able to hold on to any market it captures after the first six months. People will soon get tired of staring at a plywood box every night." *Daryl Francis Zanuck*

Smoothing, Filtering and Prediction:

1

,

(101)

(102)

1

,

<sup>94</sup> Estimating the Past, Present and Future

/ 1 2,1, 1,1, 2,1, 2,1, / 1 2,1, 2,1, 2,1,

*k k kk k k k k kk k k k*

*k k kk k k k k kk k k k*

1,1, / 1 2,1, 1,1, 2,1, 2,1, / 1 2,1, 2,1, 2,1,

1/ / 1 2,1, / 1 2,1, 2,1, 2,1, 1,1, 1,1,

The application of the solution (99) – (100) to output estimation, input estimation (or equalisation), state estimation and mixed filtering problems is demonstrated in the example

∑

*vk*

*zk*

Figure 8. The mixed filtering and equalisation problem considered in Example 5. The

(i) For output estimation problems, where *C1,k* = *C2,k* and *D1,k* = *D2,k*, the predictor

(iv) Consider a mixed filtering and equalisation problem depicted in Fig. 8, where

1, 1, 1, *k k k k k k k k*

*x AB x y CD w* 

gain (101) and filter gain (102) are identical to the previously derived (90) and

has been corrupted by the channel

*k k k kk k k k kk k k kk k k*

*P AP A K C P C D DKB B Q Q*

*K AP C B D CPC D D*

*L CP C D D CPC D D*

in which *Pk k*/ 1 is the solution of the Riccati difference equation

objective is to estimate the output of the plant

*wk yk*

*y1,k*

(95), respectively.

the output of the plant

tired of staring at a plywood box every night." *Daryl Francis Zanuck*

(ii) For state estimation problems, set *C1,k* = *I* and *D1,k* = *0*. (iii) For equalisation problems, set *C1,k* = *0* and *D1,k* = *I*.

has the realisation 1, 1 1, 1, 1,

"Video won't be able to hold on to any market it captures after the first six months. People will soon get

and the measurement noise *vk*.

that 

0 0 *T k T T k T*

0 0 *T k T T k T*

*R R*

*R R*

0 0

*Q Q*

*k k*

0 0

*Q Q*

*k k*

( ) . <sup>0</sup> <sup>0</sup> *T T k T T k T*

(103)

0 0

*R R*

*k k*

which has been corrupted by the channel

 

∑

*k k*/ *e*

1, / ˆ *k k y*

. Noting the realisation of

. Assume

where

below.

*Example* 5*.*

the cascaded system  (see Problem 7), the minimum-variance solution can be found by setting *Ak* = 2, 2, 1, 1, 0 *k kk k A BC A* , *B*1,1*k* = 2, 1, 1, 0 0 *k k k B D B* , *B*1,2,*<sup>k</sup>* <sup>=</sup> <sup>0</sup> 0 *, <sup>C</sup>*1,1,*<sup>k</sup>* = 1, <sup>0</sup> *<sup>C</sup> <sup>k</sup>* , *C2,1,*k = *C DC* 2, 2, 1, *k kk* , *D*1,1,*<sup>k</sup>* = 1, <sup>0</sup> *<sup>D</sup> <sup>k</sup>* and *D*2,1,*<sup>k</sup>*<sup>=</sup> 2, 1, *k k I DD* .

#### **4.18 Hybrid Continuous-Discrete Filtering**

Often a system's dynamics evolve continuously but measurements can only be observed in discrete time increments. This problem is modelled in [20] as

$$\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)\mathbf{w}(t) \,. \tag{104}$$

$$\mathbf{z}\_k = \mathbf{C}\_k \mathbf{x}\_k + \mathbf{v}\_k \tag{105}$$

where *E*{*w*(*t*)} = 0, *E*{*w*(*t*)*wT*(*τ*)} = *Q*(*t*)*δ*(*t* – *τ*), *E*{*vk*} = 0, { }*<sup>T</sup> Evvj <sup>k</sup>* = *Rkδ*jk and *xk* = *x*(*kTs*), in which *Ts* is the sampling interval. Following the approach of [20], state estimates can be obtained from a hybrid of continuous-time and discrete-time filtering equations. The predicted states and error covariances are obtained from

$$
\dot{\hat{\mathbf{x}}}(t) = A(t)\hat{\mathbf{x}}(t) \,. \tag{106}
$$

$$
\dot{P}(t) = A(t)P(t) + P(t)A^\dagger(t) + B(t)Q(t)B^\dagger(t) \,. \tag{107}
$$

Define / 1 ˆ *k k x* = *x t* ˆ( ) and *Pk*/*<sup>k</sup>*-1 = *P*(*t*) at *t* = *kTs*. The corrected states and error covariances are given by

$$
\hat{\mathfrak{X}}\_{k/k} = \hat{\mathfrak{x}}\_{k/k-1} + L\_k(\mathbf{z}\_k - \mathbf{C}\_k \hat{\mathfrak{x}}\_{k/k-1}) \, , \tag{108}
$$

$$P\_{k/k} = (I - L\_k \mathbb{C}\_k) P\_{k/k-1} \,\prime \tag{109}$$

where *Lk* = <sup>1</sup> / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> P C CP C R kk k k kk k k* . The above filter is a linear system having jumps at the discrete observation times. The states evolve according to the continuous-time dynamics (106) in-between the sampling instants. This filter is applied in [20] for recovery of cardiac dynamics from medical image sequences.

#### **4.19 Conclusion**

A linear, time-varying system is assumed to have the realisation *xk*+1 = *Ak*x*k* + *Bkwk* and *y2,k* = *C*2,*kxk* + *D*2,*kwk*. In the general filtering problem, it is desired to estimate the output of a second reference system which is modelled as *y1,k* = *C*1,*kxk* + *D*1,*kwk*. The Kalman filter which estimates *y1,k* from the measurements *zk* = *y2,k* + *vk* at time *k* is listed in Table 1.

<sup>&</sup>quot;Louis Pasteur's theory of germs is ridiculous fiction." *Pierre Pachet*, Professor of Physiology at Toulouse, 1872

/ 1 2, 2, / 1 2, ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup> ) *Rk*

increase in mean-square-error.

**Problem 1.** Suppose that *<sup>k</sup>*

*<sup>k</sup>* given *k*

= <sup>1</sup> ( ) *kk kk*

*<sup>T</sup> Ak kk k P A* - <sup>1</sup>

 corrected error covariance is given by *Pk k*/ = *Pk k*/ 1 − / 1 ( *<sup>T</sup> L CP C k k kk k* + ) *<sup>T</sup> R L k k* .

*<sup>k</sup>* , which minimises ( *E*

*E* 

**Problem 2.** Derive the predicted error covariance

 *k* .

**4.20 Problems** 

an estimate of

given by {|} *E k k* 

*Pk k* 1/ = / 1

measurements *zk* = *yk* + *vk*.

where <sup>1</sup> / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> L P C CP C R k kk k k kk k k*

on the micro chip, 1968

. This predictor-corrector form is used to obtain robust, hybrid

and extended Kalman filters. When the predicted states are not explicitly required, the state corrections can be calculated from the one-line recursion / ˆ *k k x* = (*I* – 2, 1 1/ 1 ) ˆ *LC A x k k k kk* + *Lkzk*.

If the simplifications *Bk* = *D*2*,k* = 0 are assumed and the pair (*Ak*, *C*2*,k*) is retained, the Kalman filter degenerates to the RLS algorithm. However, the cost of this model simplification is an

> *E*

/ 1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> Ak kk k k kk k k k kk k P C CP C R CP A*

prediction 1/ ˆ *k k x* = / 1 ˆ *A xk kk* + / 1 ( ˆ ) *K z Cx k k k kk* , the model *xk*+1 = *Ak*x*k* + *Bkwk*, *yk* = *Ck*x*k* and the

**Problem 3.** Assuming the state correction / ˆ *k k x* = / 1 ˆ *k k x* + ( *L z k k* − / 1 ˆ ) *C xk kk* , show that the

/ 1 ˆ *k k x* = 1/ 1 ˆ *A xkk k* ,

/ ˆ *k k x* = / 1 ˆ *k k x* + ( *L z k k* − / 1 ˆ ) *C xk kk* ,

*Pk k*/ = *Pk k*/ 1 − / 1 ( *<sup>T</sup> L CP C k k kk k* + ) *<sup>T</sup> R L k k* ,

. Derive the continuous-time filter equations, namely

ˆ ˆ ( ) ( )( ) ( ) ( ) ( )( ) <sup>ˆ</sup> *<sup>k</sup> kk k k kk xt At xt Kt zt Ct xt* ,

<sup>1</sup> () ()() () () () () ()()() () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt At Pt Pt A t Pt C t R t Ct Pt Bt Qt B t <sup>k</sup> kk k k k k kkk k k k* ,

"But what is it good for?" Engineer at the Advanced Computing Systems Division of IBM, commenting

*<sup>T</sup> Akk k k P A* + *<sup>T</sup> BQB k kk* ,

**Problem 4 [11], [14], [17], [18], [19].** Consider the standard discrete-time filter equations

*Pk k*/ 1 = 1/ 1

and *<sup>k</sup> T T*

*k*

*k k*

 *<sup>k</sup>* − { | })( *E kk k* 

+ *<sup>T</sup> BQB k kk* from the state

 = *k k k k k k k k*

 

 

. Show that

 

 − { | }) *<sup>T</sup> E k k* , is

 

If the state-space parameters are known exactly then this filter minimises the predicted and corrected error covariances {( *E xk* − / 1 ˆ )( *kk k x x* − / 1 ˆ ) }*<sup>T</sup> k k x* and {( *E xk* − / ˆ )( *kk k x x* − / ˆ ) }*<sup>T</sup> k k x* , respectively. When there are gaps in the data record, or the data is irregularly spaced, state predictions can be calculated an arbitrary number of steps ahead, at the cost of increased mean-square-error.


Table 1.1. Main results for the general filtering problem.

The filtering solution is specialised to output estimation with *C*1,*<sup>k</sup>* = *C*2,*<sup>k</sup>* and *D*1,*<sup>k</sup>* = *D*2,*<sup>k</sup>*. In the case of input estimation (or equalisation), *C*1,*<sup>k</sup>* = *0* and *D*1,*<sup>k</sup>* = *I*, which results in / ˆ*wk k* = 2, / 1 ˆ *LC x k k kk* + *Lkzk*, where the filter gain is instead calculated as *Lk* = 2, 2, / 1 2, ( *<sup>T</sup> <sup>T</sup> QD C P C k k k kk k* + 1 2, 2, ) *<sup>T</sup> D QD R kk k k* .

For problems where *C*1,*<sup>k</sup>* = *I* (state estimation) and *D*1,*<sup>k</sup>* = *D*2,*<sup>k</sup>* = 0, the filtered state calculation simplifies to / ˆ *k k x* = (*I* – 2, / 1 )ˆ *LC x k k kk* + *Lkzk*, where / 1 ˆ *k k x* = 1/ 1 ˆ *A xkk k* and *Lk* =

<sup>&</sup>quot;Heavier-than-air flying machines are impossible. " *Baron William Thomson Kelvin*

/ 1 2, 2, / 1 2, ( *<sup>T</sup> <sup>T</sup> P C CP C kk k k kk k* + <sup>1</sup> ) *Rk* . This predictor-corrector form is used to obtain robust, hybrid and extended Kalman filters. When the predicted states are not explicitly required, the state corrections can be calculated from the one-line recursion / ˆ *k k x* = (*I* – 2, 1 1/ 1 ) ˆ *LC A x k k k kk* + *Lkzk*. 

If the simplifications *Bk* = *D*2*,k* = 0 are assumed and the pair (*Ak*, *C*2*,k*) is retained, the Kalman filter degenerates to the RLS algorithm. However, the cost of this model simplification is an increase in mean-square-error.

#### **4.20 Problems**

Smoothing, Filtering and Prediction:

*T k k x* ,

*k k x* and {( *E xk* − / ˆ )( *kk k x x* − / ˆ ) }

*xk*+1 = *Akxk* + *Bkwk y2,k* = *C*2,*kxk* + *D*2,*kwk zk* = *y2,k* + *vk y1,k* = *C*1,*kxk* + *D*1,*kwk*

1/ 2, / 1 ˆ ( )ˆ *k k k k k kk k k x A KC x Kz*

1, / 1, 2, / 1 ˆ ( )ˆ *kk k k k kk k k y C LC x Lz*

<sup>1</sup>

<sup>1</sup>

2, / 1 2, 2, 2, ( ) *<sup>T</sup> <sup>T</sup> C P C D QD R k kk k k k k k* 

2, / 1 2, 2, 2, ( ) *<sup>T</sup> <sup>T</sup> C P C D QD R k kk k k k k k* 

/ 1 2, 2, ( ) *<sup>T</sup> <sup>T</sup> K AP C BQD k k kk k k k k*

1, / 1 2, 1, 2, ( ) *<sup>T</sup> <sup>T</sup> L C P C D QD k k kk k k k k*

*<sup>T</sup> <sup>T</sup> P AP A BQB k k k kk k k k k*

2, / 1 2, 2, 2, ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> Kk k kk k k k k k k C P C D QD R K*

1/ / 1

The filtering solution is specialised to output estimation with *C*1,*<sup>k</sup>* = *C*2,*<sup>k</sup>* and *D*1,*<sup>k</sup>* = *D*2,*<sup>k</sup>*.

In the case of input estimation (or equalisation), *C*1,*<sup>k</sup>* = *0* and *D*1,*<sup>k</sup>* = *I*, which results in / ˆ*wk k* = 2, / 1 ˆ *LC x k k kk* + *Lkzk*, where the filter gain is instead calculated as *Lk* = 2, 2, / 1 2, ( *<sup>T</sup> <sup>T</sup> QD C P C k k k kk k* +

For problems where *C*1,*<sup>k</sup>* = *I* (state estimation) and *D*1,*<sup>k</sup>* = *D*2,*<sup>k</sup>* = 0, the filtered state calculation simplifies to / ˆ *k k x* = (*I* – 2, / 1 )ˆ *LC x k k kk* + *Lkzk*, where / 1 ˆ *k k x* = 1/ 1 ˆ *A xkk k* and *Lk* =

"Heavier-than-air flying machines are impossible. " *Baron William Thomson Kelvin*

<sup>96</sup> Estimating the Past, Present and Future

If the state-space parameters are known exactly then this filter minimises the predicted and

respectively. When there are gaps in the data record, or the data is irregularly spaced, state predictions can be calculated an arbitrary number of steps ahead, at the cost of increased

ASSUMPTIONS MAIN RESULTS

*T*

corrected error covariances {( *E xk* − / 1 ˆ )( *kk k x x* − / 1 ˆ ) }

*E*{*wk*} = *E*{v*k*} = 0. { }*<sup>T</sup> Ewwk k* = *Qk* and { }*<sup>T</sup> Evvk k* = *Rk* are known. *Ak*, *Bk*, *C*1*,k*, *C*2*,k*, *D*1*,k*,

*Qk >* 0, *Rk >* 0. 2, / 1 2,

Table 1.1. Main results for the general filtering problem.

*<sup>T</sup> D QD kk k* + *Rk* > 0.

*<sup>T</sup> CP C k kk k* +

*D*2*,k* are known.

2, 2,

1

.

2, 2, ) *<sup>T</sup> D QD R kk k k*

mean-square-error.

Signals and

Filtered state

Predictor gain, filter gain and

Riccati difference equation

system

and output

factorisation

$$\textbf{Problem 1. Suppose that } E\left[\begin{bmatrix} a\_{i} \\ \boldsymbol{\beta\_{i}} \end{bmatrix}\right] = \begin{bmatrix} \overline{a} \\ \overline{\boldsymbol{\beta}} \end{bmatrix} \text{ and } E\left[\begin{bmatrix} a\_{k} \\ \boldsymbol{\beta\_{k}} \end{bmatrix} \begin{bmatrix} a\_{k}^{\top} & \boldsymbol{\beta\_{k}^{\top}} \end{bmatrix}\right] = \begin{bmatrix} \Sigma\_{a\_{i}a\_{i}} & \Sigma\_{a\_{i}\beta\_{i}} \\ \Sigma\_{\boldsymbol{\beta\_{k}a\_{i}}} & \Sigma\_{\boldsymbol{\beta\_{k}b\_{i}}} \end{bmatrix}. \text{ Show that } \boldsymbol{\beta\_{k}^{\top}} = \begin{bmatrix} \Sigma\_{a\_{i}a\_{i}} & \Sigma\_{a\_{i}b\_{i}} \\ \Sigma\_{\boldsymbol{\beta\_{k}a\_{i}}} & \Sigma\_{\boldsymbol{\beta\_{k}b\_{i}}} \end{bmatrix}. \text{ Show that } \boldsymbol{\beta\_{k}^{\top}} = \begin{bmatrix} \Sigma\_{a\_{i}a\_{i}} & \Sigma\_{a\_{i}b\_{i}} \\ \Sigma\_{\boldsymbol{\beta\_{k}}a\_{k}} & \Sigma\_{a\_{i}b\_{i}} \end{bmatrix}. \text{ Show that } \boldsymbol{\beta\_{k}^{\top}} = \begin{bmatrix} \Sigma\_{a\_{i}a\_{i}} & \Sigma\_{a\_{i}b\_{i}} \\ \Sigma\_{a\_{i}b\_{i}} & \Sigma\_{a\_{i}b\_{i}} \end{bmatrix}.$$

an estimate of *<sup>k</sup>* given *<sup>k</sup>* , which minimises ( *E <sup>k</sup>* − { | })( *E kk k* − { | }) *<sup>T</sup> E k k* , is given by {|} *E k k* = <sup>1</sup> ( ) *kk kk k* .

**Problem 2.** Derive the predicted error covariance

*Pk k* 1/ = / 1 *<sup>T</sup> Ak kk k P A* - <sup>1</sup> / 1 / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> Ak kk k k kk k k k kk k P C CP C R CP A* + *<sup>T</sup> BQB k kk* from the state prediction 1/ ˆ *k k x* = / 1 ˆ *A xk kk* + / 1 ( ˆ ) *K z Cx k k k kk* , the model *xk*+1 = *Ak*x*k* + *Bkwk*, *yk* = *Ck*x*k* and the measurements *zk* = *yk* + *vk*.

**Problem 3.** Assuming the state correction / ˆ *k k x* = / 1 ˆ *k k x* + ( *L z k k* − / 1 ˆ ) *C xk kk* , show that the corrected error covariance is given by *Pk k*/ = *Pk k*/ 1 − / 1 ( *<sup>T</sup> L CP C k k kk k* + ) *<sup>T</sup> R L k k* .

**Problem 4 [11], [14], [17], [18], [19].** Consider the standard discrete-time filter equations

$$\hat{\boldsymbol{\mathfrak{x}}}\_{k/k=1} = \boldsymbol{A}\_{k}\hat{\boldsymbol{\mathfrak{x}}}\_{k-1/k=1}$$

$$\hat{\boldsymbol{\mathfrak{x}}}\_{k/k} = \hat{\boldsymbol{\mathfrak{x}}}\_{k/k=1} + \boldsymbol{L}\_{k}(\boldsymbol{z}\_{k} - \boldsymbol{\mathsf{C}}\_{k}\hat{\boldsymbol{\mathfrak{x}}}\_{k/k=1}) \; \prime$$

$$\boldsymbol{P}\_{k/k=1} = \boldsymbol{A}\_{k}\boldsymbol{P}\_{k-1/k=1}\boldsymbol{A}\_{k}^{\top} + \boldsymbol{B}\_{k}\boldsymbol{Q}\_{k}\boldsymbol{B}\_{k}^{\top} \; \prime$$

$$\boldsymbol{P}\_{k/k} = \boldsymbol{P}\_{k/k=1} - \boldsymbol{L}\_{k}(\boldsymbol{C}\_{k}\boldsymbol{P}\_{k/k=1}\boldsymbol{C}\_{k}^{\top} + \boldsymbol{R}\_{k})\boldsymbol{L}\_{k}^{\top} \; \prime$$

where <sup>1</sup> / 1 / 1 ( ) *<sup>T</sup> <sup>T</sup> L P C CP C R k kk k k kk k k* . Derive the continuous-time filter equations, namely

$$
\dot{\hat{\mathbf{x}}}(t\_k) = A(t\_k)\hat{\mathbf{x}}(t\_k) + K(t\_k)\left(z(t\_k) - C(t\_k)\hat{\mathbf{x}}(t\_k)\right),
$$

$$\dot{P}(t\_k) = A(t\_k)P(t\_k) + P(t\_k)A^\dagger(t\_k) - P(t\_k)\mathbf{C}^\dagger(t\_k)R^{-1}(t\_k)\mathbf{C}(t\_k)P(t\_k) + B(t\_k)Q(t\_k)B^\dagger(t\_k),$$

<sup>&</sup>quot;But what is it good for?" Engineer at the Advanced Computing Systems Division of IBM, commenting on the micro chip, 1968

*Lk* Time-varying filter gain matrix.

1/ ˆ *k k x* .

time *k*.

RLS Recursive Least Squares.

**4.22 References** 

270 – 388, 1963.

London Ltd., 2002.

New Jersey, 1979.

New York, 1992.

*Bank*

Verlag, Berlin, 1999.

*of the MTNS*, pp. 261 – 267, Jun. 1991.

Saddle River, New Jersey, 2000.

Sons, Inc., Hoboken, New Jersey, 2006.

Springer-Verlag New York, Inc., 1991.

Engineering, Taylor & Francis Group, LLC, 2008.

*Kk* Time-varying predictor gain matrix.

1/ ˆ *k k x* Predicted estimate of the state *xk*+1 given measurements at time *k*.

*k k* 1/ *x* Predicted state estimation error which is defined by *k k* 1/ *x* = *xk*+1 –

*Pk+*1*/k* Predicted error covariance matrix at time *k* + 1 given measurements at

[1] R. E. Kalman, "A New Approach to Linear Filtering and Prediction Problems", *Transactions of the ASME, Series D, Journal of Basic Engineering*, vol 82, pp. 35 – 45, 1960. [2] R. E. Kalman, "New Methods in Wiener Filtering Theory", *Proc. First Symposium on Engineering Applications of Random Function Theory and Probability*, Wiley, New York, pp.

[3] T. Söderström, *Discrete-time Stochastic Systems: Estimation and Control*, Springer-Verlag

[4] B. D. O. Anderson and J. B. Moore, *Optimal Filtering*, Prentice-Hall Inc,Englewood Cliffs,

[5] G. S. Maddala, *Introduction to Econometrics*, Second Edition, Macmillan Publishing Co.,

[6] C. K. Chui and G. Chen, *Kalman Filtering with Real-Time Applications*, 3rd Ed., Springer-

[7] I. Yaesh and U. Shaked, "H∞-Optimal Estimation – The Discrete Time Case", *Proceedings* 

[8] U. Shaked and Y. Theodor, "H∞ Optimal Estimation: A Tutorial", *Proceedings 31st IEEE Conference on Decision and Control*, pp. 2278 – 2286, Tucson, Arizona, Dec. 1992. [9] F. L. Lewis, L. Xie and D. Popa, *Optimal and Robust Estimation: With an Introduction to Stochastic Control Theory*, Second Edition, Series in Automation and Control

[10] T. Kailath, A. H. Sayed and B. Hassibi, *Linear Estimation*, Prentice-Hall, Inc., Upper

[11] D. Simon, *Optimal State Estimation, Kalman H∞ and Nonlinear Approaches*, John Wiley &

[12] P. J. Brockwell and R. A. Davis, *Time Series: Theory and Methods,* Second Edition,

[13] D. J. N. Limebeer, M. Green and D. Walker, "Discrete-time H Control", *Proceedings 28th*

[14] R. G. Brown and P. Y. C. Hwang, *Introduction to Random Signals and Applied Kalman* 

"The horse is here today, but the automobile is only a novelty - a fad." President of *Michigan Savings* 

*IEEE Conference on Decision and Control*, Tampa, pp. 392 – 396, Dec., 1989.

*Filtering*, Second Edition, John Wiley & Sons, Inc., New York, 1992.

where *K*(*tk*) = *P*(*tk*)*C*(*tk*)*R*-1(*tk*). (Hint: Introduce the quantities *Ak* = (*I* + *A*(*tk*))Δ*t*, *B*(*tk*) = *Bk*, *C*(*tk*) <sup>=</sup>*Ck*, *Pk k*/ , *Q*(*tk*) = *Qk*/Δ*t*, *R*(*tk*) = *Rk*Δ*t*, ˆ( )*<sup>k</sup> x t* = / <sup>ˆ</sup> *k k <sup>x</sup>* , ( ) *P tk* = *Pk k*/ , ˆ( )*<sup>k</sup> x t* = / 1/ 1 0 ˆ ˆ lim *kk k k t x x t* , ( ) *P tk* = 1/ / 1 <sup>0</sup> lim *k k kk t P P t* and Δ*t* = *tk* – *tk-1*.)

**Problem 5.** Derive the two-step-ahead predicted error covariance *Pk k* 2 / = 1 1/ 1 *<sup>T</sup> Ak k kk P A* + 1 11 *<sup>T</sup> BQB k kk* .

**Problem 6.** Verify that the Riccati difference equation *Pk k* 1/ = / 1 *<sup>T</sup> Ak kk k P A* − / 1 ( *<sup>T</sup> K CP C k kk k* + ) *<sup>T</sup> R Kk k* + *<sup>T</sup> BQB k kk* , where *Kk* = / 1 (*Ak kk k P C* + / 1 )( *<sup>T</sup> BS C P C k k k kk k* + <sup>1</sup> ) *Rk* , is equivalent to *Pk k* 1/ = (*Ak* − / 1 ) ( *KC P A k k kk k* − ) *<sup>T</sup> K Ck k* + *<sup>T</sup> KRK kkk* + *<sup>T</sup> BQB k kk* − *<sup>T</sup> BSK kk k* − *<sup>T</sup> KSB kkk* .

**Problem 7 [**16**].** Suppose that the systems *y1,k* = *wk* and *y2,k* = *wk* have the state-space realisations

$$
\begin{bmatrix} \mathbf{x}\_{1,k+1} \\ \mathbf{y}\_{1,k} \end{bmatrix} = \begin{bmatrix} A\_{1,k} & B\_{1,k} \\ \mathbf{C}\_{1,k} & D\_{1,k} \end{bmatrix} \begin{bmatrix} \mathbf{x}\_{1,k} \\ \mathbf{w}\_k \end{bmatrix} \text{ and } \begin{bmatrix} \mathbf{x}\_{2,k+1} \\ \mathbf{y}\_{2,k} \end{bmatrix} = \begin{bmatrix} A\_{2,k} & B\_{2,k} \\ \mathbf{C}\_{2,k} & D\_{2,k} \end{bmatrix} \begin{bmatrix} \mathbf{x}\_{2,k} \\ \mathbf{w}\_k \end{bmatrix}.
$$

Show that the system *y3,k* =  *wk* is given by25

$$
\begin{bmatrix} \boldsymbol{\omega}\_{1,k+1} \\ \boldsymbol{\omega}\_{3,k} \end{bmatrix} = \begin{bmatrix} \boldsymbol{A}\_{1,k} & \mathbf{0} & \boldsymbol{B}\_{1,k} \\ \boldsymbol{B}\_{2,k}\mathbf{C}\_{1,k} & \boldsymbol{A}\_{2,k} & \boldsymbol{B}\_{2,k}\boldsymbol{D}\_{1,k} \\ \boldsymbol{D}\_{2,k}\mathbf{C}\_{1,k} & \mathbf{C}\_{2,k} & \boldsymbol{D}\_{2,k}\boldsymbol{D}\_{1,k} \end{bmatrix} \begin{bmatrix} \boldsymbol{\omega}\_{1,k} \\ \boldsymbol{\omega}\_{2,k} \\ \boldsymbol{w}\_{k} \end{bmatrix}.
$$

#### **4.21 Glossary**

In addition to the notation listed in Section 2.6, the following nomenclature has been used herein.


<sup>&</sup>quot;What sir, would you make a ship sail against the wind and currents by lighting a bonfire under her deck? I pray you excuse me. I have no time to listen to such nonsense." *Napoléon Bonaparte*


#### **4.22 References**

Smoothing, Filtering and Prediction:

 = / 1/ 1 0

*<sup>T</sup> Ak kk k P A* − / 1 ( *<sup>T</sup> K CP C k kk k* +

, is equivalent to *Pk k* 1/

*wk* have the state-space

.

*t*

ˆ ˆ lim *kk k k*

*x x t* 

*<sup>T</sup> Ak k kk P A* +

,

<sup>98</sup> Estimating the Past, Present and Future

where *K*(*tk*) = *P*(*tk*)*C*(*tk*)*R*-1(*tk*). (Hint: Introduce the quantities *Ak* = (*I* + *A*(*tk*))Δ*t*, *B*(*tk*) = *Bk*, *C*(*tk*)

**Problem 5.** Derive the two-step-ahead predicted error covariance *Pk k* 2 / = 1 1/ 1

*<sup>T</sup> K Ck k* + *<sup>T</sup> KRK kkk* + *<sup>T</sup> BQB k kk* − *<sup>T</sup> BSK kk k* − *<sup>T</sup> KSB kkk* .

1, 1, 1,

*kk k k k k*

*kk k k k k*

*yk* = *Ckxk* + *Dkwk* where *Ak*, *Bk*, *Ck* and *Dk* are time-varying matrices of

parameters {*Ak*, *Bk*, *Ck*, *Dk*} is a system parameterised by { *<sup>T</sup> Ak* ,

0 *<sup>k</sup> <sup>k</sup> <sup>k</sup>*

2, 1, 2, 2, 1, 2,

*A B x*

2, 1, 2, 2, 1,

*<sup>x</sup> BC A BD <sup>x</sup> <sup>y</sup> DC C DD <sup>w</sup>*

In addition to the notation listed in Section 2.6, the following nomenclature has been used herein.

A system that is assumed to have the realisation *xk*+1 = *Ak*x*k* + *Bkwk* and

*Qk, Rk* Time-varying covariance matrices of stochastic signals *wk* and *vk*,

"What sir, would you make a ship sail against the wind and currents by lighting a bonfire under her

/ ˆ *k k x* Filtered estimate of the state *xk* given measurements at time *k*. *k k*/ *x* Filtered state estimation error which is defined by *k k*/ *x* = *xk* – / ˆ *k k x* . *Pk/k* Corrected error covariance matrix at time *k* given measurements at

deck? I pray you excuse me. I have no time to listen to such nonsense." *Napoléon Bonaparte*

*wk* and *y2,k* =

 and 2, 1 2, 2, 2, 2, 2, 2,

*k k k k k k k k*

.

. The adjoint of a system having the state-space

*x AB x y CD w* 

= *Ck*, *Pk k*/ , *Q*(*tk*) = *Qk*/Δ*t*, *R*(*tk*) = *Rk*Δ*t*, ˆ( )*<sup>k</sup> x t* = / ˆ *k k x* , ( ) *P tk* = *Pk k*/ , ˆ( )*<sup>k</sup> x t*

**Problem 6.** Verify that the Riccati difference equation *Pk k* 1/ = / 1

) *<sup>T</sup> R Kk k* + *<sup>T</sup> BQB k kk* , where *Kk* = / 1 (*Ak kk k P C* + / 1 )( *<sup>T</sup> BS C P C k k k kk k* + <sup>1</sup> ) *Rk*

1, 1 1, 1, 1, 1, 1, 1, *k k k k k k k k*

 *wk* is given by25

*x AB x y CD w* 

> > 1, 1

*k*

3,

*k*

appropriate dimension.

respectively.

*<sup>T</sup> Ck* , *<sup>T</sup> Bk* , *<sup>T</sup> Dk* }.

time *k*.

and Δ*t* = *tk* – *tk-1*.)

( ) *P tk*

 = 1/ / 1 <sup>0</sup> lim

 *k k kk*

*P P t*

= (*Ak* − / 1 ) ( *KC P A k k kk k* − )

Show that the system *y3,k* =

Adjoint of

**Problem 7 [**16**].** Suppose that the systems *y1,k* =

*t*

1 11 *<sup>T</sup> BQB k kk* .

realisations

**4.21 Glossary** 

*H* 


<sup>&</sup>quot;The horse is here today, but the automobile is only a novelty - a fad." President of *Michigan Savings Bank*


<sup>&</sup>quot;Airplanes are interesting toys but of no military value." *Marechal Ferdinand Foch*

Chapter title

Author Name

### **Discrete-Time Steady-State Minimum-Variance Prediction and Filtering**

#### **5.1 Introduction**

Smoothing, Filtering and Prediction:

<sup>100</sup> Estimating the Past, Present and Future

[15] K. Ogata, *Discrete-time Control Systems*, Prentice-Hall, Inc., Englewood Cliffs, New

[16] M. Green and D. J. N. Limebeer, *Linear Robust Control*, Prentice-Hall Inc. Englewood

[17] A. H. Jazwinski, *Stochastic Processes and Filtering Theory*, Academic Press, Inc., New

[18] A. P. Sage and J. L. Melsa, *Estimation Theory with Applications to Communications and* 

[19] A. Gelb, *Applied Optimal Estimation*, The Analytic Sciences Corporation, USA, 197 [20] S. Tong and P. Shi, "Sampled-Data Filtering Framework for Cardiac Motion Recovery: Optimal Estimation of Continuous Dynamics From Discrete Measurements", *IEEE* 

*Transactions on Biomedical Engineering*, vol. 54, no. 10, pp. 1750 – 1761, Oct. 2007.

*Control*, McGraw-Hill Book Company, New York, 1971.

"Airplanes are interesting toys but of no military value." *Marechal Ferdinand Foch*

Jersey, 1987.

York, 1970.

Cliffs, New Jersey, 1995.

This chapter presents the minimum-variance filtering results simplified for the case when the model parameters are time-invariant and the noise processes are stationary. The filtering objective remains the same, namely, the task is to estimate a signal in such as way to minimise the filter error covariance.

Discrete-Time Steady-State Minimum-Variance Prediction and Filtering 101

A somewhat naïve approach is to apply the standard filter recursions using the timeinvariant problem parameters. Although this approach is valid, it involves recalculating the Riccati difference equation solution and filter gain at each time-step, which is computationally expensive. A lower implementation cost can be realised by recognising that the Riccati difference equation solution asymptotically approaches the solution of an algebraic Riccati equation. In this case, the algebraic Riccati equation solution and hence the filter gain can be calculated before running the filter.

The steady-state discrete-time Kalman filtering literature is vast and some of the more accessible accounts [1] – [14] are canvassed here. The filtering problem and the application of the standard time-varying filter recursions are described in Section 2. An important criterion for checking whether the states can be uniquely reconstructed from the measurements is observability. For example, sometimes states may be internal or sensor measurements might not be available, which can result in the system having hidden modes. Section 3 describes two common tests for observability, namely, checking that an observability matrix or an observability gramian are of full rank. The subject of Riccati equation monotonicity and convergence has been studied extensively by Chan [4], De Souza [5], [6], Bitmead [7], [8], Wimmer [9] and Wonham [10], which is discussed in Section 4. Chan, *et al* [4] also showed that if the underlying system is stable and observable then the minimum-variance filter is stable. Section 6 describes a discrete-time version of the Kalman-Yakubovich-Popov Lemma, which states for time-invariant systems that solving a Riccati equation is equivalent to spectral factorisation. In this case, the Wiener and Kalman filters are the same.

<sup>&</sup>quot;Science is nothing but trained and organized common sense differing from the latter only as a veteran may differ from a raw recruit: and its methods differ from those of common sense only as far as the guardsman's cut and thrust differ from the manner in which a savage wields his club." *Thomas Henry Huxley*

*matrix* 

is of rank n .

*It follows from (8) – (9) that* 

**5.3 Observability** 

**5.3.1 The Discrete-time Observability Matrix** 

Observability is a fundamental concept in system theory. If a system is unobservable then it will not be possible to recover the states uniquely from the measurements. The pair (*A*, *C*) within the discrete-time system (1) – (2) is defined to be completely observable if the initial states, *x0*, can be uniquely determined from the known inputs *wk* and outputs *yk* over an interval *k* [0, *N*]. A test for observability is to check whether an observability matrix is of full rank. The discrete-time observability matrix, which is defined in the lemma below, is the same the continuous-time version. The proof is analogous to the presentation in Chapter 3. *Lemma 1 [1], [2]: The discrete-time system (1) – (2) is completely observable if the observability* 

2

, *N* ≥ *n –* 1, (7)

(8) (9)

(10)

*C CA*

*N*

*Proof: Since the input wk is assumed to be known, it suffices to consider the unforced system* 

*<sup>k</sup>* <sup>1</sup> *<sup>k</sup> x Ax* ,

*k k y Cx* .

0 0 *y Cx*

1 1 <sup>0</sup> *y Cx CAx*

2 2 <sup>0</sup> *y Cx CA x*

*N N y Cx CA x* ,

"What happens depends on our way of observing it or the fact that we observe it." *Werner Heisenberg*

2

0 *N*

*CA*

 

*N*

*O CA*

#### **5.2 Time-Invariant Filtering Problem**

#### **5.2.1 The Time-Invariant Signal Model**

A discrete-time time-invariant system (or plant) : *<sup>m</sup>* → *<sup>p</sup>* is assumed to have the statespace representation

$$\mathbf{x}\_{k+1} = A\mathbf{x}\_k + Bw\_{k'} \tag{1}$$

$$\mathbf{y}\_k = \mathbf{C}\mathbf{x}\_k + D\mathbf{w}\_{k'} \tag{2}$$

where *A n n* , *B n m* , *C <sup>p</sup> <sup>n</sup>* , *D <sup>p</sup> <sup>p</sup>* , *wk* is a stationary process with { } *E wk* = 0 and { }*<sup>T</sup> Ewwj <sup>k</sup>* = *Q jk* . For convenience, the simplification *D* = 0 is initially assumed within the developments. A nonzero feedthrough matrix, *D,* can be accommodated as described in Chapter 4. Observations *zk* of the system output *yk* are again modelled as

$$
\omega\_k = \mathcal{y}\_k + \upsilon\_{k\text{-}\text{-}\text{-}} \tag{3}
$$

where *vk* is a stationary measurement noise sequence over an interval *k* [1, *N*], with { } *E vk* = 0, { }*<sup>T</sup> Ewvj <sup>k</sup>* = 0, { }*<sup>T</sup> Evvj <sup>k</sup>* = *R jk* . An objective is to design a filter that operates on the above measurements and produces an estimate, / ˆ *k k y* / ˆ *C xk kk* , of *yk* so that the covariance, / / { } *<sup>T</sup> Ey y kk kk* , of the filter error, *k k*/ *<sup>y</sup>* = *yk* / <sup>ˆ</sup> *k k <sup>y</sup>* , is minimised.

#### **5.2.2 Application of the Time-Varying Filter Recursions**

A naïve but entirely valid approach to state estimation is to apply the standard minimumvariance filter recursions of Section 4 for the problem (1) – (3). The predicted and corrected state estimates are given by

$$
\hat{\mathbf{x}}\_{k+1/k} = (A - K\_{\mathbf{k}} \mathbf{C}\_{\mathbf{k}}) \hat{\mathbf{x}}\_{k/k-1} + K\_{\mathbf{k}} z\_{\mathbf{k}} \,\prime\,\tag{4}
$$

$$
\hat{\mathbf{x}}\_{k/k} = (I - L\_{\mathbf{k}} \mathbf{C}\_{\mathbf{k}}) \hat{\mathbf{x}}\_{k/k-1} + L\_{\mathbf{k}} z\_{\mathbf{k}} \,. \tag{5}
$$

where *Lk* = / 1 / 1 ( *<sup>T</sup> P C CP C k k k k* + <sup>1</sup> *R*) is the filter gain, *Kk* = / 1 / 1 ( *<sup>T</sup> AP C CP C k k k k* + <sup>1</sup> *R*) is the predictor gain, in which *Pk k*/ 1 = /1 /1 { } *<sup>T</sup> Ex x kk kk* is obtained from the Riccati difference equation

$$P\_{k+1} = AP\_kA^T - AP\_k\mathbf{C}^T(\mathbf{C}P\_k\mathbf{C}^T + \mathbf{R})^{-1}\mathbf{C}P\_kA^T + B\mathbf{Q}B^T.\tag{6}$$

As before, the above Riccati equation is iterated forward at each time *k* from an initial condition *P0*. A necessary condition for determining whether the states within (1) can be uniquely estimated is observability which is discussed below.

<sup>&</sup>quot;We can understand almost anything, but we can't understand how we understand." *Albert Einstein*

#### **5.3 Observability**

Smoothing, Filtering and Prediction:

(1) (2)

(4) (5)

is the

→ *<sup>p</sup>* is assumed to have the state-

*k kk zyv* , (3)

is the filter gain, *Kk* = / 1 / 1 ( *<sup>T</sup> AP C CP C k k k k* + <sup>1</sup> *R*)

that operates on the

<sup>102</sup> Estimating the Past, Present and Future

*<sup>k</sup>* <sup>1</sup> *k k x Ax Bw* ,

*kk k y Cx Dw* ,

where *A n n* , *B n m* , *C <sup>p</sup> <sup>n</sup>* , *D <sup>p</sup> <sup>p</sup>* , *wk* is a stationary process with { } *E wk* = 0

the developments. A nonzero feedthrough matrix, *D,* can be accommodated as described in

where *vk* is a stationary measurement noise sequence over an interval *k* [1, *N*], with { } *E vk*

above measurements and produces an estimate, / ˆ *k k y* / ˆ *C xk kk* , of *yk* so that the covariance,

A naïve but entirely valid approach to state estimation is to apply the standard minimumvariance filter recursions of Section 4 for the problem (1) – (3). The predicted and corrected

1/ / 1 ˆ ( )ˆ *k k k k kk k k x A KC x Kz* ,

/ / 1 ˆ ( )ˆ *k k k k kk k k x I LC x Lz* ,

<sup>1</sup> ( ) *<sup>T</sup> T T <sup>T</sup> <sup>T</sup> P AP A AP C CP C R CP A BQB <sup>k</sup> <sup>k</sup> k k <sup>k</sup>*

"We can understand almost anything, but we can't understand how we understand." *Albert Einstein*

predictor gain, in which *Pk k*/ 1 = /1 /1 { } *<sup>T</sup> Ex x kk kk* is obtained from the Riccati difference

As before, the above Riccati equation is iterated forward at each time *k* from an initial condition *P0*. A necessary condition for determining whether the states within (1) can be

1

. (6)

. An objective is to design a filter

Chapter 4. Observations *zk* of the system output *yk* are again modelled as

/ / { } *<sup>T</sup> Ey y kk kk* , of the filter error, *k k*/ *<sup>y</sup>* = *yk* / <sup>ˆ</sup> *k k <sup>y</sup>* , is minimised.

**5.2.2 Application of the Time-Varying Filter Recursions** 

uniquely estimated is observability which is discussed below.

. For convenience, the simplification *D* = 0 is initially assumed within

**5.2 Time-Invariant Filtering Problem 5.2.1 The Time-Invariant Signal Model** 

= 0, { }*<sup>T</sup> Ewvj <sup>k</sup>* = 0, { }*<sup>T</sup> Evvj <sup>k</sup>* = *R jk*

state estimates are given by

equation

where *Lk* = / 1 / 1 ( *<sup>T</sup> P C CP C k k k k* + <sup>1</sup> *R*)

space representation

and { }*<sup>T</sup> Ewwj <sup>k</sup>* = *Q jk*

A discrete-time time-invariant system (or plant) : *<sup>m</sup>*

#### **5.3.1 The Discrete-time Observability Matrix**

Observability is a fundamental concept in system theory. If a system is unobservable then it will not be possible to recover the states uniquely from the measurements. The pair (*A*, *C*) within the discrete-time system (1) – (2) is defined to be completely observable if the initial states, *x0*, can be uniquely determined from the known inputs *wk* and outputs *yk* over an interval *k* [0, *N*]. A test for observability is to check whether an observability matrix is of full rank. The discrete-time observability matrix, which is defined in the lemma below, is the same the continuous-time version. The proof is analogous to the presentation in Chapter 3.

*Lemma 1 [1], [2]: The discrete-time system (1) – (2) is completely observable if the observability matrix* 

$$O\_N = \begin{bmatrix} \mathbf{C} \\ \mathbf{CA} \\ \mathbf{CA^2} \\ \vdots \\ \mathbf{CA^N} \end{bmatrix}, N \ge n - 1,\tag{7}$$

is of rank n .

*Proof: Since the input wk is assumed to be known, it suffices to consider the unforced system* 

$$\mathbf{x}\_{k+1} = A\mathbf{x}\_k.\tag{8}$$

$$\mathbf{y}\_k = \mathbf{C} \mathbf{x}\_k \; . \tag{9}$$

*It follows from (8) – (9) that* 

$$\begin{aligned} y\_0 &= \mathbf{C} \mathbf{x}\_0 \\\\ y\_1 &= \mathbf{C} \mathbf{x}\_1 = \mathbf{C} \mathbf{A} \mathbf{x}\_0 \\\\ y\_2 &= \mathbf{C} \mathbf{x}\_2 = \mathbf{C} \mathbf{A}^2 \mathbf{x}\_0 \\\\ &\vdots \\\\ y\_N &= \mathbf{C} \mathbf{x}\_N = \mathbf{C} \mathbf{A}^N \mathbf{x}\_0 \end{aligned} \tag{10}$$

<sup>&</sup>quot;What happens depends on our way of observing it or the fact that we observe it." *Werner Heisenberg*

of a Lyapunov equation.

*is of full rank.* 

*0 to N results in* 

*Since* <sup>1</sup> <sup>1</sup> lim( ) *T k <sup>k</sup> <sup>N</sup> <sup>k</sup> <sup>A</sup> W A*

*Proof: It follows from (16) that* 

and *W*1 = 1 1

breakfast." *Konrad Zacharias Lorenz*

*<sup>T</sup> O O* =

1 1 0.1 0.6 

It is shown below that an equivalent observability gramian can be found from the solution

*Lemma 3: Suppose that the system (8) – (9) is stable, that is, |λi(A)| < 1, i = 1 to n, then the pair (A, C) is completely observable if the nonnegative symmetric solution of the Lyapunov equation*

*Proof: Pre-multiplying CTC = W – ATWA by (AT)k, post-multiplying by Ak and summing from k =* 

*Tk T k Tk k Tk k*

*A C CA A WA A WA*

.

 *= 0, by inspection of (16),* lim *<sup>N</sup> <sup>k</sup>*

*equation (15). Observability follows from Lemma 2. �* 

*≤ WN and therefore* 

<sup>0</sup> 00 0 <sup>2</sup> <sup>0</sup> <sup>0</sup> ( )

*y y y x A C CA x x W x*

*from which the claim follows. �*  Another criterion that is encountered in the context of filtering and smoothing is detectability. A linear time-invariant system is said to be detectable when all its modes and in particular its unobservable modes are stable. An observable system is alsodetectable.

*T T Tk T k T k k N*

,

0 0 0 ( ) () () *N N N*

 

*k k k*

It is noted below that observability is equivalent to asymptotic stability.

*Tk T k*

*A C CA*

*N N*

*k k*

**Example 1.** (i) Consider a stable second-order system with *<sup>A</sup>* 0.1 0.2

1.01 1.06 1.06 1.36 

The observability matrix from (7) and the observability gramian from (12) are *O*1 =

"It is a good morning exercise for a research scientist to discard a pet hypothesis every day before

0 ( ) *N*

*k*

<sup>1</sup> <sup>1</sup> ( ) *T k <sup>k</sup> W A WA <sup>N</sup> <sup>N</sup>*

*Lemma 4 [3]: Under the conditions of Lemma 3, x0*

2

*W A WA C C <sup>T</sup> <sup>T</sup>* . (15)

1 1

<sup>2</sup> *.* 

 *is a solution of the Lyapunov* 

0 0.4 

, respectively. It can easily be verified that the

and *<sup>C</sup>* 1 1 .

*C CA* =

*W W*

<sup>2</sup>  *implies y* 

(16)

*which can be written as* 

$$\begin{aligned} \boldsymbol{y} &= \begin{bmatrix} \boldsymbol{y}\_{0} \\ \boldsymbol{y}\_{1} \\ \boldsymbol{y}\_{2} \\ \vdots \\ \boldsymbol{y}\_{N} \end{bmatrix} = \mathbf{C} \begin{bmatrix} \boldsymbol{I} \\ \boldsymbol{A} \\ \boldsymbol{A}^{2} \\ \vdots \\ \boldsymbol{A}^{N} \end{bmatrix} \boldsymbol{x}\_{0} \end{aligned} \tag{11}$$

*From the Cayley-Hamilton Theorem, Ak, for k ≥ n, can be expressed as a linear combination of A0, A1, ..., An-1 . Thus, with N ≥ n – 1, equation (11) uniquely determines x0 if ON has full rank n. �* 

Thus, if ON is of full rank then its inverse exists and so x0 can be uniquely recovered as x0 = <sup>1</sup> *ON y* . Observability is a property of the deterministic model equations (8) – (9). Conversely, if the observability matrix is not rank n then the system (1) – (2) is termed unobservable and the unobservable states are called unobservable modes.

#### **5.3.2 Discrete-time Observability Gramians**

Alternative tests for observability arise by checking the rank of one of the observability gramians that are described below.

*Lemma 2: The pair (A, C) is completely observable if the observability gramian* 

$$\mathcal{W}\_N = \mathcal{O}\_N^T \mathcal{O}\_N = \sum\_{k=0}^N (\mathcal{A}^T)^k \mathcal{C}^T \mathcal{C} \mathcal{A}^k \text{ , } N \ge n \text{-1} \tag{12}$$

*is of full rank.* 

*Proof: It follows from (8) – (9) that* 

$$\mathbf{y}^T \mathbf{y} = \mathbf{x}\_0^T \begin{bmatrix} I & A^T & (A^T)^2 & \cdots & (A^T)^N \end{bmatrix} \mathbf{C}^T \mathbf{C} \begin{bmatrix} I \\ A \\ A^2 \\ \vdots \\ A^N \\ A^N \end{bmatrix} \mathbf{x}\_0 \tag{13}$$

*From the Cayley-Hamilton Theorem, Ak, for k ≥ n, can be expressed as a linear combination of A0, A1, ..., An-1 . Thus, with N = n – 1,* 

$$\mathbf{y}^T \mathbf{y} = \mathbf{x}\_0^T \mathbf{O}\_N^T \mathbf{O}\_N \mathbf{x}\_0 = \mathbf{x}\_0^T \mathbf{W}\_N \mathbf{x}\_0 = \mathbf{x}\_0^T \left(\sum\_{k=0}^{n-1} (A^T)^k \mathbf{C}^T \mathbf{C} A^k \right) \mathbf{x}\_0 \tag{14}$$

*is unique provided that WN is of full rank. �* 

<sup>&</sup>quot;You affect the world by what you browse." *Tim Berners-Lee*

It is shown below that an equivalent observability gramian can be found from the solution of a Lyapunov equation.

*Lemma 3: Suppose that the system (8) – (9) is stable, that is, |λi(A)| < 1, i = 1 to n, then the pair (A, C) is completely observable if the nonnegative symmetric solution of the Lyapunov equation*

$$\mathcal{W} = \mathbf{A}^{\mathsf{T}} \mathbf{W} \mathbf{A} + \mathbf{C}^{\mathsf{T}} \mathbf{C} \ . \tag{15}$$

*is of full rank.* 

Smoothing, Filtering and Prediction:

. (11)

<sup>104</sup> Estimating the Past, Present and Future

2 2 0

*N*

*From the Cayley-Hamilton Theorem, Ak, for k ≥ n, can be expressed as a linear combination of A0, A1, ..., An-1 . Thus, with N ≥ n – 1, equation (11) uniquely determines x0 if ON has full rank n. �*  Thus, if ON is of full rank then its inverse exists and so x0 can be uniquely recovered as x0 = <sup>1</sup> *ON y* . Observability is a property of the deterministic model equations (8) – (9). Conversely, if the observability matrix is not rank n then the system (1) – (2) is termed unobservable and

Alternative tests for observability arise by checking the rank of one of the observability

, *N ≥ n-1* (12)

*N*

. (13)

and

*A*

*I A*

2 2

1

 

*n*

*k*

0 ( )

(14)

0 1

*y I y A y y C x A*

*y A*

 

*N*

*Lemma 2: The pair (A, C) is completely observable if the observability gramian* 

*N NN*

0 ( ) *<sup>N</sup> <sup>T</sup> Tk T k*

<sup>0</sup> <sup>0</sup> () () *TT TT TN T*

 

*From the Cayley-Hamilton Theorem, Ak, for k ≥ n, can be expressed as a linear combination of* 

0 00 00 0

*is unique provided that WN is of full rank. �* 

*y y x I A A A CC x A*

*T TT T T Tk T k*

*y y x O O x x W x x A C CA x*

*N N N*

"You affect the world by what you browse." *Tim Berners-Lee*

*k W O O A C CA* 

the unobservable states are called unobservable modes.

**5.3.2 Discrete-time Observability Gramians** 

gramians that are described below.

*Proof: It follows from (8) – (9) that* 

*A0, A1, ..., An-1 . Thus, with N = n – 1,* 

*is of full rank.* 

*which can be written as* 

*Proof: Pre-multiplying CTC = W – ATWA by (AT)k, post-multiplying by Ak and summing from k = 0 to N results in* 

$$\sum\_{k=0}^{N} (\mathbf{A}^{\top})^k \mathbf{C}^{\top} \mathbf{C} \mathbf{A}^k = \sum\_{k=0}^{N} (\mathbf{A}^{\top})^k \mathsf{V} \mathsf{W} \mathbf{A}^k - \sum\_{k=0}^{N} (\mathbf{A}^{\top})^{k+1} \mathsf{V} \mathsf{W} \mathsf{A}^{k+1} \tag{16}$$

$$= \mathsf{V} \mathsf{V}\_{\mathsf{N}} - (\mathbf{A}^{\top})^{k+1} \mathsf{V} \mathsf{V}\_{\mathsf{N}} \mathsf{A}^{k+1}.$$

*Since* <sup>1</sup> <sup>1</sup> lim( ) *T k <sup>k</sup> <sup>N</sup> <sup>k</sup> <sup>A</sup> W A = 0, by inspection of (16),* lim *<sup>N</sup> <sup>k</sup> W W is a solution of the Lyapunov equation (15). Observability follows from Lemma 2. �* 

It is noted below that observability is equivalent to asymptotic stability.

*Lemma 4 [3]: Under the conditions of Lemma 3, x0* <sup>2</sup>  *implies y* <sup>2</sup> *.* 

$$\textbf{Proof: It follows from (16) that } \sum\_{k=0}^{N} (\boldsymbol{A}^{\top})^{k} \boldsymbol{\mathsf{C}}^{\top} \boldsymbol{\mathsf{C}} \boldsymbol{A}^{k} \leq \mathsf{W}\_{\textbf{N}} \text{ and therefore}$$

$$\left\| \boldsymbol{y} \right\|\_{2}^{2} = \sum\_{k=0}^{N} \boldsymbol{y}\_{k}^{\top} \boldsymbol{y}\_{k} = \boldsymbol{\mathsf{x}}\_{0}^{\top} \left( \sum\_{k=0}^{N} (\boldsymbol{A}^{\top})^{k} \boldsymbol{\mathsf{C}}^{\top} \boldsymbol{\mathsf{C}} \boldsymbol{A}^{k} \right) \mathbf{x}\_{0} \leq \boldsymbol{\mathsf{x}}\_{0}^{\top} \mathsf{W}\_{\textbf{N}} \mathbf{x}\_{0} \leq \boldsymbol{\mathsf{x}}\_{0}^{\top} \boldsymbol{\mathsf{W}} \boldsymbol{A}^{\top} \mathbf{x}\_{0} \leq \boldsymbol{\mathsf{x}}\_{0}^{\top} \boldsymbol{\mathsf{W}} \boldsymbol{A}^{\top} \mathbf{x}\_{0} \leq \boldsymbol{\mathsf{x}}\_{0}^{\top} \boldsymbol{\mathsf{C}} \boldsymbol{A}^{\top} \mathbf{x}\_{0}$$

*from which the claim follows. �* 

Another criterion that is encountered in the context of filtering and smoothing is detectability. A linear time-invariant system is said to be detectable when all its modes and in particular its unobservable modes are stable. An observable system is alsodetectable.

**Example 1.** (i) Consider a stable second-order system with *<sup>A</sup>* 0.1 0.2 0 0.4 and *<sup>C</sup>* 1 1 . The observability matrix from (7) and the observability gramian from (12) are *O*1 = *C* =

*CA* 1 1 0.1 0.6 and *W*1 = 1 1 *<sup>T</sup> O O* = 1.01 1.06 1.06 1.36 , respectively. It can easily be verified that the

<sup>&</sup>quot;It is a good morning exercise for a research scientist to discard a pet hypothesis every day before breakfast." *Konrad Zacharias Lorenz*

*step, it follows from* ( *<sup>T</sup> <sup>T</sup> CP C CP C tk tk* + <sup>1</sup> ) *Rk*

convergence result which is presented below.

predictor gain is also time-invariant andpre-calculated as

( )( )*<sup>T</sup> <sup>T</sup> <sup>T</sup> A KC P A KC BQB KRK* .

existence of7 solutions for the algebraic Riccati equation (21).

*Lemma 5 [4]: Provided that the pair (A, C) is detectable, then* 

*solution.* 

*i) the pair (A, C) is observable;* 

*ii) |λi(A)| ≤ 1, i = 1 to n;* 

below.

*Lemma 6 [4]: Subject to:* 

*iii) (P0 − P) ≥ 0;* 

**5.4.2 Convergence** 

*Proof: The assumption Pt k ≥ 0 is the initial condition for an induction argument. For the induction* 

*together with Theorem 1 implies Pt k ≥ 0*.  *�*

The above theorem serves to establish conditions under which a Riccati difference equation solution monotonically approaches its steady state solution. This requires a Riccati equation

When the model parameters and second-order noise statistics are constant then the

where *P* is the symmetric positive definite solution of the algebraic Riccati equation

*i) the strong solution of the algebraic Riccati equation (21) exists and is unique;* 

<sup>1</sup> ( ) *<sup>T</sup> T T <sup>T</sup> <sup>T</sup> P APA APC CPC R CPA BQB*

A real symmetric nonnegative definite solution of the Algebraic Riccati equation (21) is said to be a *strong solution* if the eigenvalues of (*A* – *KC*) lie inside or on the unit circle [4], [5]. If there are no eigenvalues on the unit circle then the strong solution is termed the *stabilising solution*. The following lemma by Chan, Goodwin and Sin [4] sets out conditions for the

*ii) if A has no modes on the unit circle then the strong solution coincides with the stabilising* 

A detailed proof is presented in [4]. If the linear time-invariant system (1) – (2) is stable and completely observable and the solution *Pk* of the Riccati difference equation (6) is suitably initialised, then in the limit as *k* approaches infinity, *Pk* will asymptotically converge to the solution of the algebraic Riccati equation. This convergence property is formally restated

"We know very little, and yet it is astonishing that we know so much, and still more astonishing that so

little knowledge can give us so much power." *Bertrand Arthur William Russell*

*≤ I that Pt k ≤* ( *<sup>T</sup> <sup>T</sup> P C CP C tk tk* + <sup>1</sup> ) *R CP k tk*

<sup>1</sup> ( ) *T T K APC CPC R* , (20)

*, which* 

(21) (22)

solution of the Lyapunov equation (15) is *W* = 1.01 1.06 1.06 1.44 = *W*4 to three significant figures. Since rank(O1) = rank(*W*1) = rank(*W*4) 2, the pair (*A*, *C*) is observable.

(ii) Now suppose that measurements of the first state are not available, that is, *C* = 0 1 .

Since *O1* = 0 1 0 0.4 and *W1* = 0 0 0 1.16 are of rank 1, the pair (*A*, *C*) is unobservable. This

system is detectable because the unobservable mode is stable.

#### **5.4 Riccati Equation Properties**

#### **5.4.1 Monotonicity**

It will be shown below that the solution *Pk k* 1/ of the Riccati difference equation (6) monotonically approaches a steady-state asymptote, in which case the gain is also timeinvariant and can be precalculated. Establishing monotonicity requires the following result. It is well known that the difference between the solutions of two Riccati equations also obeys a Riccati equation, see Theorem 4.3 of [4], (2.12) of [5], Lemma 3.1 of [6], (4.2) of [7], Lemma 10.1 of [8], (2.11) of [9] and (2.4) of [10].

*Theorem 1: Riccati Equation Comparison Theorem [4] – [10]: Suppose for a t ≥ 0 and for all k ≥ 0 the two Riccati difference equations* 

$$P\_{t \ast k} = A P\_{t \ast k-1} A^T - A P\_{t \ast k-1} \mathbb{C}^{\top} (\mathbb{C}P\_{t \ast k-1} \mathbb{C}^{\top} + \mathbb{R})^{-1} \mathbb{C}P\_{t \ast k-1} A^T + B \mathbb{Q} B^T,\tag{17}$$

$$P\_{t \leftrightarrow k+1} = A P\_{t \leftrightarrow k} A^\top - A P\_{t \leftrightarrow k} \mathbf{C}^\top (\mathbf{C} P\_{t \leftrightarrow k} \mathbf{C}^\top + \mathbf{R})^{-1} \mathbf{C} P\_{t \leftrightarrow k} A^\top + B \mathbf{Q} B^\top,\tag{18}$$

*have solutions Pt k* ≥ 0 and *Pt k* <sup>1</sup> ≥ 0, *respectively*. *Then PPP tk tk tk* 1 *satisfies*

$$
\overline{P}\_{t \ast k+1} = \overline{A}\_{t \ast k+1} \overline{P}\_{t \ast k} \overline{A}\_{t \ast k}^{T} - \overline{A}\_{t \ast k+1} \overline{P}\_{t \ast k} \mathbb{C}^{T} (\mathbb{C} \overline{P}\_{t \ast k} \mathbb{C}^{T} + \overline{R}\_{k})^{-1} \mathbb{C} \overline{P}\_{t \ast k} \overline{A}\_{t \ast k+1}^{T} \tag{19}
$$

*where At k* 1 = *A* − <sup>1</sup> <sup>1</sup> ( *<sup>T</sup> <sup>T</sup> AP C CP C t k t k* + <sup>1</sup> 1) *R C tk tk and Rt k* = *<sup>T</sup> CP C t k* + *R*.

The above result can be verified by substituting *At k* 1 and *Rt k* 1 into (19). The above theorem is used below to establish Riccati difference equation monotonicity.

*Theorem 2 [6], [9], [10], [11]: Under the conditions of Theorem 1, suppose that the solution of the Riccati difference equation (19) has a solution Pt k ≥ 0 for a t ≥ 0 and k = 0. Then Pt k ≥ Pt k* 1 *for all k ≥ 0.*

<sup>&</sup>quot;We follow abstract assumptions to see where they lead, and then decide whether the detailed differences from the real world matter." *Clinton Richard Dawkins*

*Proof: The assumption Pt k ≥ 0 is the initial condition for an induction argument. For the induction step, it follows from* ( *<sup>T</sup> <sup>T</sup> CP C CP C tk tk* + <sup>1</sup> ) *Rk ≤ I that Pt k ≤* ( *<sup>T</sup> <sup>T</sup> P C CP C tk tk* + <sup>1</sup> ) *R CP k tk , which together with Theorem 1 implies Pt k ≥ 0*.  *�*

The above theorem serves to establish conditions under which a Riccati difference equation solution monotonically approaches its steady state solution. This requires a Riccati equation convergence result which is presented below.

#### **5.4.2 Convergence**

Smoothing, Filtering and Prediction:

= *W*4 to three significant figures.

(17) (18)

are of rank 1, the pair (*A*, *C*) is unobservable. This

<sup>106</sup> Estimating the Past, Present and Future

(ii) Now suppose that measurements of the first state are not available, that is, *C* = 0 1 .

It will be shown below that the solution *Pk k* 1/ of the Riccati difference equation (6) monotonically approaches a steady-state asymptote, in which case the gain is also timeinvariant and can be precalculated. Establishing monotonicity requires the following result. It is well known that the difference between the solutions of two Riccati equations also obeys a Riccati equation, see Theorem 4.3 of [4], (2.12) of [5], Lemma 3.1 of [6], (4.2) of [7],

*Theorem 1: Riccati Equation Comparison Theorem [4] – [10]: Suppose for a t ≥ 0 and for all k ≥*

<sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP C CP C R CP A BQB tk tk t k t k t k*

<sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP C CP C R CP A BQB t k t k tk tk t k*

<sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P A P A A P C CP C R CP A tk tk tk tk tk tk tk k tk tk*

1) *R C tk tk* 

The above result can be verified by substituting *At k* 1 and *Rt k* 1 into (19). The above

*Theorem 2 [6], [9], [10], [11]: Under the conditions of Theorem 1, suppose that the solution of the Riccati difference equation (19) has a solution Pt k ≥ 0 for a t ≥ 0 and k = 0. Then Pt k ≥ Pt k* 1 *for* 

"We follow abstract assumptions to see where they lead, and then decide whether the detailed

*have solutions Pt k* ≥ 0 and *Pt k* <sup>1</sup> ≥ 0, *respectively*. *Then PPP tk tk tk* 1 *satisfies*

theorem is used below to establish Riccati difference equation monotonicity.

,

,

1

1

1

*and Rt k* = *<sup>T</sup> CP C t k* + *R*.

, (19)

Since rank(O1) = rank(*W*1) = rank(*W*4) 2, the pair (*A*, *C*) is observable.

0 0 0 1.16 

system is detectable because the unobservable mode is stable.

1.01 1.06 1.06 1.44 

solution of the Lyapunov equation (15) is *W* =

and *W1* =

Lemma 10.1 of [8], (2.11) of [9] and (2.4) of [10].

*where At k* 1 = *A* − <sup>1</sup> <sup>1</sup> ( *<sup>T</sup> <sup>T</sup> AP C CP C t k t k* + <sup>1</sup>

differences from the real world matter." *Clinton Richard Dawkins*

Since *O1* =

*all k ≥ 0.*

0 1 0 0.4 

**5.4 Riccati Equation Properties** 

*0 the two Riccati difference equations* 

**5.4.1 Monotonicity** 

When the model parameters and second-order noise statistics are constant then the predictor gain is also time-invariant andpre-calculated as

<sup>1</sup> ( ) *T T K APC CPC R* , (20)

where *P* is the symmetric positive definite solution of the algebraic Riccati equation

$$P = APA^T - APC^T(\text{CPC}^T + R)^{-1} \text{CPA}^T + BQB^T \tag{21}$$

$$= (A - KC)P(A - KC)^{\top} + BQR^{\top} + KRK^{\top}.\tag{22}$$

A real symmetric nonnegative definite solution of the Algebraic Riccati equation (21) is said to be a *strong solution* if the eigenvalues of (*A* – *KC*) lie inside or on the unit circle [4], [5]. If there are no eigenvalues on the unit circle then the strong solution is termed the *stabilising solution*. The following lemma by Chan, Goodwin and Sin [4] sets out conditions for the existence of7 solutions for the algebraic Riccati equation (21).

*Lemma 5 [4]: Provided that the pair (A, C) is detectable, then* 


A detailed proof is presented in [4]. If the linear time-invariant system (1) – (2) is stable and completely observable and the solution *Pk* of the Riccati difference equation (6) is suitably initialised, then in the limit as *k* approaches infinity, *Pk* will asymptotically converge to the solution of the algebraic Riccati equation. This convergence property is formally restated below.

*Lemma 6 [4]: Subject to:* 


<sup>&</sup>quot;We know very little, and yet it is astonishing that we know so much, and still more astonishing that so little knowledge can give us so much power." *Bertrand Arthur William Russell*

**5.5.2 Asymptotic Stability** 

*(i) V*(*x*) > 0 for *x* ≠ 0.

*(iii) V*(0) = 0.

**5.5.3 Output Estimation** 

estimate is given by

*(ii) V*(*xk+1*) *– V*(*xk*) *≤* 0 for *xk* ≠ 0.

Observe that <sup>1</sup> ( ) *V xk –* ( ) *V xk* = 1 1

said to be stable in the sense of Lyapunov.

filter (24) – (25) can be written compactly as

from which its transfer function is

think deeply and be quite insane." *Nikola Tesla*

*T*

/ 1 / 1 ˆ ( ) ˆ *Cx L z Cx k k k kk*

where the filter gain is now obtained by *L* = ( *T T CPC CPC* + <sup>1</sup> *R*)

/ 1 ( )ˆ *C LC x Lz kk k* ,

/

*k k x Px – <sup>T</sup>*

exists a real symmetric positive definite *P* solution to the Lyapunov equation

above stability requirements are satisfied if for a real symmetric positive definite *Q*, there

By inspection, the design algebraic Riccati equation (22) is of the form (26) and so the filter is

For output estimation problems, the filter gain, *L,* is calculated differently. The output

/ / ˆ ˆ *kk kk y Cx*

1/ / 1

*k k k k k k k*

<sup>1</sup> ( ) ( )( ) *H z C LC zI A KC K L OE*

"The scientists of today think deeply instead of clearly. One must be sane to think clearly, but one can

ˆ ( ) ˆ

*x A KC K x y C LC L z* 

ˆ ( )

*(iv) V*(*x*) → ∞ as <sup>2</sup> *x* → ∞.

Consider the function ( ) *V xk* = *<sup>T</sup>*

continuous function *V*(*x*), satisfying the following.

The asymptotic stability of the filter (24) – (25) is asserted in two ways. First, recall from Lemma 4 (ii) that if |*λi*(*A*)| < 1, *i* = 1 to *n*, and the pair (*A*, *C*) is completely observable, then |*λi*(*A − KC*)| < 1, *i* = 1 to *n.* That is, since the eigenvalues of the filter's state matrix are within the unit circle, the filter is asymptotically stable. Second, according to the Lyapunov stability theory [1], the unforced system (8) is asymptotically stable if there exists a scalar

*k k x Px* where *P* is a real positive definite symmetric matrix.

*<sup>T</sup> APA P Q* . (26)

. (29)

*<sup>k</sup> x A PA –* ) *P xk ≤* 0. Therefore, the

(27)

. The output estimation

, (28)

*k k x Px* = ( *T T*

*then the solution of the Riccati difference equation (6) satisfies* 

$$\lim\_{k \to \nu} P\_k = P \text{ .} \tag{23}$$

A proof appears in [4]. This important property is used in [6], which is in turn cited within [7] and [8]. Similar results are reported in [5], [13] and [14]. Convergence can occur exponentially fast which is demonstrated by the following numerical example.

*Example 2.* Consider an output estimation problem where *A* = 0.9 and *B* = *C* = *Q* = *R* = 1. The solution to the algebraic Riccati equation (21) is *P* = 1.4839. Some calculated solutions of the Riccati difference equation (6) initialised with *P0 = 10P* are shown in Table 1. The data in the table demonstrate that the Riccati difference equation solution converges to the algebraic Riccati equation solution, which illustrates the Lemma.


Table. 1. Solutions of (21) for Example 2.

#### **5.5 The Steady-State Minimum-Variance Filter**

#### **5.5.1 State Estimation**

The formulation of the steady-state Kalman filter (which is also known as the limiting Kalman filter) follows by allowing *k* to approach infinity and using the result of Lemma That is, the filter employs fixed gains that are calculated using the solution of the algebraic Riccati equation (21) instead of the Riccati difference equation (6). The filtered state is calculated as

$$\begin{aligned} \hat{\mathbf{x}}\_{k/k} &= \hat{\mathbf{x}}\_{k/k-1} + \mathbf{L}(\mathbf{z}\_k - \mathbf{C}\hat{\mathbf{x}}\_{k/k-1}) \\ &= (I - \mathbf{L}\mathbf{C})\hat{\mathbf{x}}\_{k/k-1} + \mathbf{L}\mathbf{z}\_k. \end{aligned} \tag{24}$$

where *L* = ( *T T PC CPC* + <sup>1</sup> *R*) is the time-invariant filter gain, in which *P* is the solution of the algebraic Riccati equation (21). The predicted state is given by

$$\begin{aligned} \hat{\mathbf{x}}\_{k+l \wedge k} &= A \hat{\mathbf{x}}\_{k \wedge k} \\ &= (\mathbf{A} - \mathbf{K} \mathbf{C}) \hat{\mathbf{x}}\_{k \wedge k-l} + \mathbf{K} \mathbf{z}\_{k \wedge l} \end{aligned} \tag{25}$$

where the time-invariant predictor gain, *K*, is calculated from (20).

<sup>&</sup>quot;Great is the power of steady misrepresentation - but the history of science shows how, fortunately, this power does not endure long". *Charles Robert Darwin*

#### **5.5.2 Asymptotic Stability**

Smoothing, Filtering and Prediction:

. (23)

<sup>108</sup> Estimating the Past, Present and Future

A proof appears in [4]. This important property is used in [6], which is in turn cited within [7] and [8]. Similar results are reported in [5], [13] and [14]. Convergence can occur

*Example 2.* Consider an output estimation problem where *A* = 0.9 and *B* = *C* = *Q* = *R* = 1. The solution to the algebraic Riccati equation (21) is *P* = 1.4839. Some calculated solutions of the Riccati difference equation (6) initialised with *P0 = 10P* are shown in Table 1. The data in the table demonstrate that the Riccati difference equation solution converges to the algebraic

The formulation of the steady-state Kalman filter (which is also known as the limiting Kalman filter) follows by allowing *k* to approach infinity and using the result of Lemma That is, the filter employs fixed gains that are calculated using the solution of the algebraic Riccati equation (21) instead of the Riccati difference equation (6). The filtered state is

/ /1 / 1 ˆ ˆ ( ) ˆ *kk kk k kk x x L z Cx*

1/ / ˆ ˆ *k k kk x Ax*

the algebraic Riccati equation (21). The predicted state is given by

where the time-invariant predictor gain, *K*, is calculated from (20).

power does not endure long". *Charles Robert Darwin*

/ 1 ( )ˆ *kk k I LC x Lz* , (24)

/ 1 ( )ˆ *<sup>A</sup> KC x Kz kk k* , (25)

"Great is the power of steady misrepresentation - but the history of science shows how, fortunately, this

is the time-invariant filter gain, in which *P* is the solution of

lim *<sup>k</sup> <sup>k</sup> P P*

exponentially fast which is demonstrated by the following numerical example.

*k Pk P P k k* <sup>1</sup>

1 1.7588 13.0801 2 1.5164 0.2425 5 1.4840 4.7955\*10-4 10 1.4839 1.8698\*10-8

*then the solution of the Riccati difference equation (6) satisfies* 

Riccati equation solution, which illustrates the Lemma.

Table. 1. Solutions of (21) for Example 2.

**5.5.1 State Estimation** 

where *L* = ( *T T PC CPC* + <sup>1</sup> *R*)

calculated as

**5.5 The Steady-State Minimum-Variance Filter** 

The asymptotic stability of the filter (24) – (25) is asserted in two ways. First, recall from Lemma 4 (ii) that if |*λi*(*A*)| < 1, *i* = 1 to *n*, and the pair (*A*, *C*) is completely observable, then |*λi*(*A − KC*)| < 1, *i* = 1 to *n.* That is, since the eigenvalues of the filter's state matrix are within the unit circle, the filter is asymptotically stable. Second, according to the Lyapunov stability theory [1], the unforced system (8) is asymptotically stable if there exists a scalar continuous function *V*(*x*), satisfying the following.


Consider the function ( ) *V xk* = *<sup>T</sup> k k x Px* where *P* is a real positive definite symmetric matrix. Observe that <sup>1</sup> ( ) *V xk –* ( ) *V xk* = 1 1 *T k k x Px – <sup>T</sup> k k x Px* = ( *T T <sup>k</sup> x A PA –* ) *P xk ≤* 0. Therefore, the above stability requirements are satisfied if for a real symmetric positive definite *Q*, there exists a real symmetric positive definite *P* solution to the Lyapunov equation

$$
\boldsymbol{A}\boldsymbol{P}\mathbf{A}^{\top} - \boldsymbol{P} = -\boldsymbol{Q} \,. \tag{26}$$

By inspection, the design algebraic Riccati equation (22) is of the form (26) and so the filter is said to be stable in the sense of Lyapunov.

#### **5.5.3 Output Estimation**

For output estimation problems, the filter gain, *L,* is calculated differently. The output estimate is given by

$$\begin{aligned} \hat{\mathbf{y}}\_{k/k} &= \mathbb{C}\hat{\mathbf{x}}\_{k/k} \\ &= \mathbb{C}\hat{\mathbf{x}}\_{k/k-1} + \mathrm{L}(\mathbf{z}\_k - \mathbb{C}\hat{\mathbf{x}}\_{k/k-1}) \\ &= (\mathbb{C} - \mathrm{LC})\hat{\mathbf{x}}\_{k/k-1} + \mathrm{L}\boldsymbol{z}\_{k/k} \end{aligned} \tag{27}$$

where the filter gain is now obtained by *L* = ( *T T CPC CPC* + <sup>1</sup> *R*) . The output estimation filter (24) – (25) can be written compactly as

$$
\begin{bmatrix}
\hat{\mathfrak{X}}\_{k \times 1/k} \\
\hat{\mathfrak{Y}}\_{k/k}
\end{bmatrix} = \begin{bmatrix}
(A - KC) & K \\
(C - LC) & L
\end{bmatrix} \begin{bmatrix}
\hat{\mathfrak{X}}\_{k/k - 1} \\
\hat{\mathfrak{z}}\_{k}
\end{bmatrix} \tag{28}
$$

from which its transfer function is

$$H\_{O\mathbb{E}}(z) = (\mathbb{C} - \text{LC})(zI - A + \text{KC})^{-1}K + \text{L.} \tag{29}$$

<sup>&</sup>quot;The scientists of today think deeply instead of clearly. One must be sane to think clearly, but one can think deeply and be quite insane." *Nikola Tesla*

output estimation is given by

By inspection of (33) it follows that the spectral factor is

1 1 <sup>1</sup> *I R R C zI A KC K* ( )

<sup>1</sup> *L C LC zI A KC K* ( )( ) ,

equaliser) for proper, stable, minimum-phase plants is

The Wiener output estimator (34) involves <sup>1</sup>

*1K*. Thus, the spectral factor inverse is

1 1 ( ) ( ) *H z IR z OE*

Substituting (35) into (38) gives

immediately from *L* = ( *T T CPC CPC* + <sup>1</sup> *R*)

By inspectionof (34) and (40), it follows that

In Chapter 2, it is shown that the transfer function matrix of the optimal Wiener solution for

where { }+ denotes the causal part. This filter produces estimates / ˆ *k k y* from measurements *zk*.

special case of the matrix inversion lemma, namely, [*I* + *C*(*zI* − *A*)-1*K]-1* = *I* − *C*(*zI* − *A* + *KC*)*-*

It can be seen from (36) that { } *<sup>H</sup>* = 1/2 . Recognising that <sup>1</sup> *I R* = (*CPCT + R*)(*CPCT + R*)-1 − *R*(*CPCT + R*)-1 = *CPCT*(*CPCT + R*)-1 = *L*, the Wiener filter (34) can be written equivalently

which is identical to the transfer function matrix of the Kalman filter for output estimation (29). In Chapter 2, it is shown that the transfer function matrix of the input estimator (or

<sup>1</sup> <sup>1</sup> ( ) ( )( { } ( )) *<sup>H</sup> H z G zI R z IE*

<sup>1</sup> () () () *H z G zH z IE OE*

that

0 lim *R L I*

0 { ,} lim sup ( ) *<sup>j</sup> OE <sup>R</sup> He I*

 

seeker and brings happiness to him." *Max Karl Ernst Ludwig Planck*

The above Wiener equaliser transfer function matrices require common poles and zeros to be cancelled. Although the solution (39) is not minimum-order (since some pole-zero cancellations can be made), its structure is instructive. In particular, an estimate of *wk* can be obtained by operating the plant inverse on / ˆ *k k y* , provided the inverse exists. It follows

Thus, under conditions of diminishing measurement noise, the output estimator will be devoid of dynamics and its maximum magnitude will approach the identity matrix.

"It is not the possession of truth, but the success which attends the seeking after it, that enriches the

, (34)

( ) *z* which can be found using (35) and a

(37)

1 1/2 1/2 () ( ) *z C zI A K .* (35)

<sup>1</sup> 1/2 1/2 <sup>1</sup> ( ) *z C zI A KC K* ( ) . (36)

. (38)

. (39)

. (40)

. (41)

<sup>1</sup> () { } () *<sup>H</sup> H z IR z OE*

#### **5.6 Equivalence of the Wiener and Kalman Filters**

As in continuous-time, solving a discrete-time algebraic Riccati equation is equivalent to spectral factorisation and the corresponding Kalman-Yakubovich-Popov Lemma (or Positive Real Lemma) is set out below. A proof of this Lemma makes use of the following identity

$$P - APA^T = (zI - A)P(z^{-1}I - A^T) + AP(z^{-1}I - A^T) + (zI - A)PA^T \,. \tag{30}$$

*Lemma 7. Consider the spectral density matrix* 

$$
\Delta\Delta^H(z) = \begin{bmatrix} \mathbf{C}(zI - A)^{-1} & I \end{bmatrix} \begin{bmatrix} Q & \mathbf{0} \\ \mathbf{0} & R \end{bmatrix} \begin{bmatrix} (z^{-1}I - A^T)^{-1}\mathbf{C}^T \\ I \end{bmatrix}.\tag{31}
$$

*Then the following statements are equivalent.* 

$$\begin{array}{ll} \text{(i)} \ \Delta \Delta^{H} (e^{\downarrow \alpha}) \ \geq 0 \text{, for all } \alpha \in (-\pi, \pi). \\\\ \text{(ii)} \quad \begin{bmatrix} BQB^{\top} - P + APA^{\top} & APC^{\top} \\ CPA^{\top} & CPC^{\top} + R \end{bmatrix} \geq 0 \end{array}$$

*(iii) There exists a nonnegative solution P of the algebraic Riccati equation (21).* 

*.* 

*Proof: Following the approach of [12], to establish equivalence between (i) and (iii), use (21) within (30) to obtain* 

$$\mathbf{B}\mathbf{Q}\mathbf{B}^{\top} - \mathbf{A}\mathbf{P}\mathbf{C}^{\top}(\mathbf{C}\mathbf{P}\mathbf{C}^{\top} + \mathbf{R})\mathbf{C}\mathbf{P}\mathbf{A}^{\top} = (z\mathbf{I} - \mathbf{A})\mathbf{P}(z^{-1}\mathbf{I} - \mathbf{A}^{\top}) + \mathbf{A}\mathbf{P}(z^{-1}\mathbf{I} - \mathbf{A}^{\top}) + (z\mathbf{I} - \mathbf{A})\mathbf{P}\mathbf{A}^{\top}.\tag{32}$$

*Premultiplying and postmultiplying (32) by* <sup>1</sup> *C zI A* ( )  *and* <sup>1</sup> <sup>1</sup> ( ) *T T zI A C , respectively, results in* 

$$\mathbf{C}(\mathbf{z}\mathbf{I}-\mathbf{A})^{-1}(\mathbf{B}\mathbf{Q}\mathbf{B}^{\mathsf{T}}-\mathbf{A}\mathbf{P}\mathbf{C}^{\mathsf{T}}\mathbf{Q}\mathbf{C}\mathbf{P}\mathbf{A}^{\mathsf{T}})(\mathbf{z}^{-1}\mathbf{I}-\mathbf{A}^{\mathsf{T}})\mathbf{C}^{\mathsf{T}}=\mathbf{C}\mathbf{P}\mathbf{C}^{\mathsf{T}}+\mathbf{C}(\mathbf{z}\mathbf{I}-\mathbf{A})^{-1}\mathbf{A}\mathbf{P}\mathbf{C}^{\mathsf{T}}+\mathbf{C}\mathbf{P}\mathbf{A}^{\mathsf{T}}(\mathbf{z}^{-1}\mathbf{I}-\mathbf{A}^{\mathsf{T}})^{-1}\mathbf{C}^{\mathsf{T}},\mathbf{z}\mathbf{I}$$

*where <sup>T</sup> CPC R* . *Hence,*

$$\begin{aligned} \Delta \mathbf{A}^{\top}(\mathbf{z}) &= \mathbf{C} \mathbf{Q} \mathbf{C}^{H}(\mathbf{z}) + \mathbf{R} \\ &= \mathbf{C} (zI - A)^{-1} \mathbf{B} \mathbf{Q} \mathbf{B}^{T} (\mathbf{z}^{-1} I - A^{T})^{-1} \mathbf{C}^{T} + \mathbf{R} \\ &= \mathbf{C} (zI - A)^{-1} A \mathbf{P} \mathbf{C}^{T} \boldsymbol{\Omega} \mathbf{C} \mathbf{P} \mathbf{A}^{T} (\mathbf{z}^{-1} I - A^{T})^{-1} \mathbf{C}^{T} + \mathbf{C} (zI - A)^{-1} A \mathbf{P} \mathbf{C}^{T} + \mathbf{C} \mathbf{P} \mathbf{A}^{T} (\mathbf{z}^{-1} I - A^{T})^{-1} \mathbf{C}^{T} + \mathbf{D} \mathbf{C}^{T} (\mathbf{z}^{-1} I - A^{T})^{-1} \mathbf{C}^{T} \\ &= \left( \mathbf{C} (zI - A)^{-1} K + I \right) \mathbf{Q} \left( K^{T} (z^{-1} I - A^{T})^{-1} \mathbf{C}^{T} + I \right) \\ &\ge 0. \end{aligned} \tag{33}$$

*The Schur complement formula can be used to verify the equivalence of (ii) and (iii). �* 

<sup>&</sup>quot;Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage to move in the opposite direction." *Albert Einstein*

Smoothing, Filtering and Prediction:

*.* (31)

<sup>110</sup> Estimating the Past, Present and Future

As in continuous-time, solving a discrete-time algebraic Riccati equation is equivalent to spectral factorisation and the corresponding Kalman-Yakubovich-Popov Lemma (or Positive Real Lemma) is set out below. A proof of this Lemma makes use of the following

<sup>1</sup> <sup>0</sup> ( ) () ( ) <sup>0</sup>

*<sup>H</sup> <sup>Q</sup> zI A C z C zI A I*

*.* 

*Proof: Following the approach of [12], to establish equivalence between (i) and (iii), use (21) within* 

*Premultiplying and postmultiplying (32) by* <sup>1</sup> *C zI A* ( )  *and* <sup>1</sup> <sup>1</sup> ( ) *T T zI A C , respectively, results* 

<sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> ( )( )( ) ( ) ( ) *<sup>T</sup> T T T T <sup>T</sup> <sup>T</sup> <sup>T</sup> T T C zI A BQB APC CPA z I A C CPC C zI A APC CPA z I A C* ,

<sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) ( ) () ( ) *T T T T <sup>T</sup> <sup>T</sup> T T C zI A APC CPA z I A C C zI A APC CPA z I A C*

 <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) ( ) *<sup>T</sup> T T C zI A K I K z I A C I* (33)

*The Schur complement formula can be used to verify the equivalence of (ii) and (iii). �* 

"Any intelligent fool can make things bigger and more complex... It takes a touch of genius - and a lot of courage

<sup>1</sup> <sup>1</sup> ( ) ( )( ) ( ) ( ) *<sup>T</sup> T T <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> BQB APC CPC R CPA zI A P z I A AP z I A zI A PA* . (32)

*(iii) There exists a nonnegative solution P of the algebraic Riccati equation (21).* 

<sup>1</sup> <sup>1</sup> ( )( ) ( ) ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P APA zI A P z I A AP z I A zI A PA* . (30)

*R I*

1 1

*T T*

**5.6 Equivalence of the Wiener and Kalman Filters** 

*Lemma 7. Consider the spectral density matrix* 

*Then the following statements are equivalent.* 

*(ii)* 0 *T T T T T*

*CPA CPC R* 

*BQB P APA APC*

 *(−π, π ).* 

*(i)* ( ) *<sup>H</sup> <sup>j</sup> e ≥ 0, for all ω*

*(30) to obtain* 

*where <sup>T</sup> CPC R* . *Hence,*

<sup>1</sup> <sup>1</sup> <sup>1</sup> () ( ) *<sup>T</sup> T T C zI A BQB z I A C R*

to move in the opposite direction." *Albert Einstein*

() () *<sup>H</sup> <sup>H</sup> z GQG z R*

 0 *.* 

*in* 

identity

In Chapter 2, it is shown that the transfer function matrix of the optimal Wiener solution for output estimation is given by

$$H\_{\rm OE}(z) = I - R\{\boldsymbol{\Delta}^{-H}\}\_{+} \boldsymbol{\Delta}^{-1}(z) \, \tag{34}$$

where { }+ denotes the causal part. This filter produces estimates / ˆ *k k y* from measurements *zk*. By inspection of (33) it follows that the spectral factor is

$$
\Delta(z) = \mathbb{C}(zI - A)^{-1}K\Omega^{1/2} + \Omega^{1/2} \,. \tag{35}
$$

The Wiener output estimator (34) involves <sup>1</sup> ( ) *z* which can be found using (35) and a special case of the matrix inversion lemma, namely, [*I* + *C*(*zI* − *A*)-1*K]-1* = *I* − *C*(*zI* − *A* + *KC*)*- 1K*. Thus, the spectral factor inverse is

$$
\Delta^{-1}(z) = \Omega^{-1/2} - \Omega^{-1/2}\mathbf{C}(zI - A + K\mathbf{C})^{-1}K\,\,. \tag{36}
$$

It can be seen from (36) that { } *<sup>H</sup>* = 1/2 . Recognising that <sup>1</sup> *I R* = (*CPCT + R*)(*CPCT + R*)-1 − *R*(*CPCT + R*)-1 = *CPCT*(*CPCT + R*)-1 = *L*, the Wiener filter (34) can be written equivalently

$$H\_{\rm OE}(z) = I - R\Omega^{-1}\Lambda^{-1}(z)$$

$$= I - R\Omega^{-1} + R\Omega^{-1}\Gamma(zI - A + K\mathcal{C})^{-1}K \tag{37}$$

$$= L + (\mathcal{C} - L\mathcal{C})(zI - A + K\mathcal{C})^{-1}K\_{\prime}$$

which is identical to the transfer function matrix of the Kalman filter for output estimation (29). In Chapter 2, it is shown that the transfer function matrix of the input estimator (or equaliser) for proper, stable, minimum-phase plants is

$$H\_{\rm l\!E}(z) = G^{-1}(z)(I - R\{\boldsymbol{\Delta}^{-H}\}\_{+} \boldsymbol{\Delta}^{-1}(z))\,. \tag{38}$$

Substituting (35) into (38) gives

$$H\_{\rm lE}(z) = \mathbb{G}^{-1}(z) H\_{\rm OE}(z) \,. \tag{39}$$

The above Wiener equaliser transfer function matrices require common poles and zeros to be cancelled. Although the solution (39) is not minimum-order (since some pole-zero cancellations can be made), its structure is instructive. In particular, an estimate of *wk* can be obtained by operating the plant inverse on / ˆ *k k y* , provided the inverse exists. It follows immediately from *L* = ( *T T CPC CPC* + <sup>1</sup> *R*) that

$$\lim\_{R \to 0} L = I \text{ .} \tag{40}$$

By inspectionof (34) and (40), it follows that

$$\lim\_{R \to 0} \sup\_{\boldsymbol{\nu} \in \{-\boldsymbol{x}, \boldsymbol{x}\}} \left| H\_{\mathrm{OE}} (\boldsymbol{e}^{\boldsymbol{\nu} \boldsymbol{\nu}}) \right| = \boldsymbol{I} \;. \tag{41}$$

Thus, under conditions of diminishing measurement noise, the output estimator will be devoid of dynamics and its maximum magnitude will approach the identity matrix.

<sup>&</sup>quot;It is not the possession of truth, but the success which attends the seeking after it, that enriches the seeker and brings happiness to him." *Max Karl Ernst Ludwig Planck*

1/ / 1

,

for the algebraic Riccati

0.0026 0.0026 *<sup>P</sup>*

*k k k k k k k*

ˆ ( ) ˆ

*x A KC K x y C LC L z* 

Figure 2. Sample trajectories for Example 5: (i) measurement sequence (dotted line); (ii)

*HOE*(*z*) = (*z* + 0.9)2(*z* + 0.9)*−*2, which illustrates the low-measurement noise asymptote (41). The minimum-variance input

ˆ ( ) ˆ

*x A KC K x w LC L z* 

1/ / 1

,

*k k k k k k k*

actual and estimated process noise sequences (superimposed solid lines).

/

"Your theory is crazy, but its not crazy enough to be true." *Niels Henrik David Bohr*

ˆ

The resulting transfer function of the output estimator is

where *L* = *QDTΩ-1*. The input estimator transfer function is

estimator is calculated as

ˆ ( )

/

where *L* = (*CPCT + DQDT*)*Ω-1*. The solution 0.0026 0.0026

equation was found using the Hamiltonian solver within *Matlab®*.

Therefore, for proper, stable, minimum-phase plants, the equaliser asymptotically approaches the plant inverse as the measurement noise becomes negligible, that is,

$$\lim\_{R \to 0} H\_{l\mathbb{E}}(z) = G^{-1}(z) \,. \tag{42}$$

Time-invariant output and input estimation are demonstrated below.


Figure 1. Fragment of *Matlab®* script for Example 3. <sup>12</sup>

*Example 3.* Consider a time-invariant input estimation problem in which the plant is given by

$$\begin{aligned} \mathbf{G}(z) &= (z+0.9)^2 (z+0.1)^{-2} \\ &= (z^2+1.8z+0.81)(z^2+0.2z+0.01)^{-1} \\ &= (1.6z+0.8)(z^2+0.2z+0.01)^{-1}+1, \end{aligned}$$

together with *Q* = 1 and *R* = 0.0001. The controllable canonical form (see Chapter 1) yields the parameters *A* = 0.2 0.1 1 0 , *B* = 1 0 , *C* = 1.6 1.8 and *D* = 1. From Chapter 4, the corresponding algebraic Riccati equation is *P* = *APAT* − *KΩKT* + *BQBT*, where *K* = (*APCT + BQDT*)*Ω-1* and *Ω* = *CPCT +R + DQDT*. The minimum-variance output estimator is calculated as

<sup>&</sup>quot;There is no result in nature without a cause; understand the cause and you will have no need of the experiment." *Leonardo di ser Piero da Vinci*

Smoothing, Filtering and Prediction:

<sup>112</sup> Estimating the Past, Present and Future

Therefore, for proper, stable, minimum-phase plants, the equaliser asymptotically

*Example 3.* Consider a time-invariant input estimation problem in which the plant is given

together with *Q* = 1 and *R* = 0.0001. The controllable canonical form (see Chapter 1) yields

corresponding algebraic Riccati equation is *P* = *APAT* − *KΩKT* + *BQBT*, where *K* = (*APCT + BQDT*)*Ω-1* and *Ω* = *CPCT +R + DQDT*. The minimum-variance output estimator is calculated

"There is no result in nature without a cause; understand the cause and you will have no need of the

, *C* = 1.6 1.8 and *D* = 1. From Chapter 4, the

1 0  1

. (42)

approaches the plant inverse as the measurement noise becomes negligible, that is,

Time-invariant output and input estimation are demonstrated below.

w=sqrt(Q)\*randn(N,1); % process noise x=[0;0]; % initial state

y(k) = C\*x + D\*w(k); % plant output

v=sqrt(R)\*randn(1,N); % measurement noise z = y + v; % measurement

K = (A\*P\*(C')+B\*Q\*(D'))\*inv(omega); % predictor gain L = Q\*(D')\*inv(omega); % equaliser gain x=[0;0]; % initial state

 w\_estimate(k) = - L\*C\*x + L\*z(k); % equaliser output x = (A - K\*C)\*x + K\*z(k); % predicted state

)noise

Figure 1. Fragment of *Matlab®* script for Example 3. <sup>12</sup>

 *G*(*z*) = (*z* + 0.9)2(*z* + 0.1)−2

= (*z2* + 1.8*z* + 0.81)(*z2* + 0.2*z* + 0.01) −1

= (1.6*z* + 0.8)(*z2* + 0.2*z* + 0.01) −1 + 1,

, *B* =

0.2 0.1 1 0 

by

as 

the parameters *A* =

for k = 1:N

for k = 1:N

end

end

x = A\*x + B\*w(k);

omega=C\*P\*(C') + D\*Q\*(D') + R;

experiment." *Leonardo di ser Piero da Vinci*

<sup>0</sup> lim ( ) ( ) *IE <sup>R</sup> Hz Gz*

$$
\begin{bmatrix}
\hat{\mathfrak{x}}\_{k+1/k} \\
\hat{\mathfrak{y}}\_{k/k}
\end{bmatrix} = \begin{bmatrix}
(A - KC) & K \\
(C - LC) & L
\end{bmatrix} \begin{bmatrix}
\hat{\mathfrak{x}}\_{k/k-1} \\
z\_k
\end{bmatrix} /
$$

where *L* = (*CPCT + DQDT*)*Ω-1*. The solution 0.0026 0.0026 0.0026 0.0026 *<sup>P</sup>* for the algebraic Riccati equation was found using the Hamiltonian solver within *Matlab®*.

Figure 2. Sample trajectories for Example 5: (i) measurement sequence (dotted line); (ii) actual and estimated process noise sequences (superimposed solid lines).

The resulting transfer function of the output estimator is

$$H\_{OE}(z) = (z + 0.9)^2 (z + 0.9)^{-2},$$

which illustrates the low-measurement noise asymptote (41). The minimum-variance input estimator is calculated as

$$
\begin{bmatrix}
\hat{\mathfrak{x}}\_{k+1/k} \\
\hat{w}\_{k/k}
\end{bmatrix} = \begin{bmatrix}
(A - KC) & K \\
\end{bmatrix} \begin{bmatrix}
\hat{\mathfrak{x}}\_{k/k-1} \\
z\_k
\end{bmatrix} /
$$

where *L* = *QDTΩ-1*. The input estimator transfer function is

<sup>&</sup>quot;Your theory is crazy, but its not crazy enough to be true." *Niels Henrik David Bohr*

output estimator has the transfer function

variance solutions are the same.

**5.8 Problems** 

following pairs.

= *jk* 

, { }*<sup>T</sup> Ewvj <sup>k</sup>* = 0.

(b) Find the predictor gain.

(d) Find the filter gain.

(a) Find the predicted error variance.

(i) 1 2

3 4 *<sup>A</sup>*

observable, then *|λi(A – KC)| <* 1, that is, the steady-state filter is asymptotically stable. The

<sup>1</sup> ( ) ( )( ) *H z C I LC zI A KC K CL OE* .

Since the task of solving an algebraic Riccati equation is equivalent to spectral factorisation, the transfer functions of the minimum-mean-square error and steady-state minimum-

**Problem 1.** Calculate the observability matrices and comment on the observability of the

**Problem 2.** Generalise the proof of Lemma 1 (which addresses the unforced system *xk+1* =

<sup>1</sup> <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP C CP C R CP A BQB tk tk t k t k t k*

<sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P AP A AP C CP C R CP A BQB t k t k tk tk t k*

<sup>1</sup> ( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P A P A A P C CP C R CP A tk k tk k k tk tk k tk k*

**Problem 4.** Suppose that measurements are generated by the single-input-single-output

ˆ 1 1 *k k a*

+

*x <sup>a</sup>*

"Thoughts, like fleas, jump from man to man. But they don't bite everybody." *Baron Stanislaw Jerzy Lec*

system xk+1 = *<sup>k</sup> ax* + *wk*, *zk* = *xk* + *vk*, where *a* , { } *E vk* = 0, { }*<sup>T</sup> Ewwj <sup>k</sup>* = <sup>2</sup> (1 ) *jk a*

(c) Verify that the one-step-ahead minimum-variance predictor is realised by

1/ <sup>ˆ</sup> *k k <sup>x</sup>* = / 1 <sup>2</sup>

(e) Write down the realisation of the minimum-variance filter.

.

3 4 *<sup>A</sup>*

1

1

1

and *Rt k* = *<sup>T</sup> CP C t k* + *R*.

2 2

*z a* .

1 1 1 *<sup>k</sup> a a*

, { }*<sup>T</sup> Evvj <sup>k</sup>*

, *C* 2 4 .

, *<sup>C</sup>* 2 4 . (ii) 1 2

*Axk* and *yk* = *Cxk*) for the system *xk+1* = *Axk* + *Bwk* and *yk* = *Cxk* + *Dwk*.

Show that a Riccati difference equation for *PP P tk tk tk* <sup>1</sup> is given by

**Problem 3.** Consider the two Riccati difference equations

where *At k* = *At k* − ( *<sup>T</sup> <sup>T</sup> Atk tk tk P C CP C* + <sup>1</sup> ) *R C tk tk*

$$H\_{\rm IE}(z) = (z + 0.1)^2 \, (z + 0.9)^{-2},$$

which corresponds to the inverse of the plant and illustrates the asymptote (42). A simulation was generated based on the fragment of *Matlab®* script shown in Fig. 1 and some sample trajectories are provided in Fig. 2. It can be seen from the figure that the actual and estimated process noise sequences are superimposed, which demonstrates that an equaliser can be successful when the plant is invertible and the measurement noise is sufficiently low. In general, when measurement noise is not insignificant, the asymptotes (41) – (42) will not apply, as the minimum-variance equaliser solution will involve a trade-off between inverting the plant and filtering the noise.


Table 2. Main results for time-invariant output estimation.

#### **5.7 Conclusion**

In the linear time-invariant case, it is assumed that the signal model and observations can be described by *xk+1* = *Axk* + *Bwk*, *yk* = *Cxk*, and *zk* = *yk* + *vk*, respectively, where the matrices *A*, *B*, *C*, *Q* and *R* are constant. The Kalman filter for this problem is listed in Table 2. If the pair (*A*, *C*) is completely observable, the solution of the corresponding Riccati difference equation monotonically converges to the unique solution of the algebraic Riccati equation that appears in the table.

The implementation cost is lower than for time-varying problems because the gains can be calculated before running the filter. If *|λi(A)| <* 1*, i = 1* to *n*, and the pair (*A*, *C*) is completely

<sup>&</sup>quot;Clear thinking requires courage rather than intelligence." *Thomas Stephen Szasz*

observable, then *|λi(A – KC)| <* 1, that is, the steady-state filter is asymptotically stable. The output estimator has the transfer function

$$H\_{\rm OE}(z) = \mathbf{C}(I - LC)(zI - A + \mathbf{KC})^{-1}\mathbf{K} + \mathbf{CL} \dots$$

Since the task of solving an algebraic Riccati equation is equivalent to spectral factorisation, the transfer functions of the minimum-mean-square error and steady-state minimumvariance solutions are the same.

#### **5.8 Problems**

Smoothing, Filtering and Prediction:

*<sup>k</sup>* <sup>1</sup> *k k x Ax Bw*

*k k y Cx*

*k kk zyv*

1/ / 1 ˆ ( )ˆ *k k kk k x A KC x Kz*

1, / / 1 ˆ ( )ˆ *k k kk k y C LC x Lz*

<sup>1</sup> ( ) *T T K APC CPC R*

<sup>1</sup> ( ) *T T L CPC CPC R*

( ) *<sup>T</sup> <sup>T</sup> <sup>T</sup> <sup>T</sup> P APA K CPC R K BQB*

<sup>114</sup> Estimating the Past, Present and Future

*HIE*(*z*) = (*z* + 0.1) 2 (*z* + 0.9) *<sup>−</sup>*2, which corresponds to the inverse of the plant and illustrates the asymptote (42). A simulation was generated based on the fragment of *Matlab®* script shown in Fig. 1 and some sample trajectories are provided in Fig. 2. It can be seen from the figure that the actual and estimated process noise sequences are superimposed, which demonstrates that an equaliser can be successful when the plant is invertible and the measurement noise is sufficiently low. In general, when measurement noise is not insignificant, the asymptotes (41) – (42) will not apply, as the minimum-variance equaliser solution will involve a trade-off between

ASSUMPTIONS MAIN RESULTS

inverting the plant and filtering the noise.

and *C* are known.

*E*{*wk*} = *E*{v*k*} = 0. { }*<sup>T</sup> Ewwk k* = *Q*  and { }*<sup>T</sup> Evvk k* = *R* are known. *A*, *B*

Table 2. Main results for time-invariant output estimation.

*Q >* 0, *R >* 0 and *<sup>T</sup> CPC* + *Rk* > 0.

The pair (*A*, *C*) is observable.

In the linear time-invariant case, it is assumed that the signal model and observations can be described by *xk+1* = *Axk* + *Bwk*, *yk* = *Cxk*, and *zk* = *yk* + *vk*, respectively, where the matrices *A*, *B*, *C*, *Q* and *R* are constant. The Kalman filter for this problem is listed in Table 2. If the pair (*A*, *C*) is completely observable, the solution of the corresponding Riccati difference equation monotonically converges to the unique solution of the algebraic Riccati equation that

The implementation cost is lower than for time-varying problems because the gains can be calculated before running the filter. If *|λi(A)| <* 1*, i = 1* to *n*, and the pair (*A*, *C*) is completely

"Clear thinking requires courage rather than intelligence." *Thomas Stephen Szasz*

**5.7 Conclusion** 

Signals and

Filtered state

Predictor gain, filter

gain and algebraic

Riccati equation

and output

estimate

system

appears in the table.

**Problem 1.** Calculate the observability matrices and comment on the observability of the following pairs.

$$\begin{array}{cccc} \text{(i)} \ A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}, & \text{C} = \begin{bmatrix} 2 & 4 \end{bmatrix}. & & \text{(ii)} \ A = \begin{bmatrix} 1 & -2 \\ -3 & -4 \end{bmatrix}, & \text{C} = \begin{bmatrix} 2 & 4 \end{bmatrix}. \end{array}$$

**Problem 2.** Generalise the proof of Lemma 1 (which addresses the unforced system *xk+1* = *Axk* and *yk* = *Cxk*) for the system *xk+1* = *Axk* + *Bwk* and *yk* = *Cxk* + *Dwk*.

**Problem 3.** Consider the two Riccati difference equations

$$P\_{t+k} = A P\_{t+k-1} A^T - A P\_{t+k-1} \mathbf{C}^T (\mathbf{C} P\_{t+k-1} \mathbf{C}^T + \mathbf{R})^{-1} \mathbf{C} P\_{t+k-1} A^T + B Q B^T$$

$$P\_{t+k+1} = A P\_{t+k} A^T - A P\_{t+k} \mathbf{C}^T (\mathbf{C} P\_{t+k} \mathbf{C}^T + \mathbf{R})^{-1} \mathbf{C} P\_{t+k} A^T + B Q B^T.$$

Show that a Riccati difference equation for *PP P tk tk tk* <sup>1</sup> is given by

$$
\overline{P}\_{t+k+1} = \overline{A}\_k \overline{P}\_{t+k} \overline{A}\_k^{\overline{r}} - \overline{A}\_k \overline{P}\_{t+k} \mathbf{C}^{\overline{r}} (\mathbf{C} \overline{P}\_{t+k} \mathbf{C}^{\overline{r}} + \overline{R}\_k)^{-1} \mathbf{C} \overline{P}\_{t+k} \overline{A}\_k^{\overline{r}}
$$

where *At k* = *At k* − ( *<sup>T</sup> <sup>T</sup> Atk tk tk P C CP C* + <sup>1</sup> ) *R C tk tk* and *Rt k* = *<sup>T</sup> CP C t k* + *R*.

**Problem 4.** Suppose that measurements are generated by the single-input-single-output system xk+1 = *<sup>k</sup> ax* + *wk*, *zk* = *xk* + *vk*, where *a* , { } *E vk* = 0, { }*<sup>T</sup> Ewwj <sup>k</sup>* = <sup>2</sup> (1 ) *jk a* , { }*<sup>T</sup> Evvj <sup>k</sup>* = *jk* , { }*<sup>T</sup> Ewvj <sup>k</sup>* = 0.


(c) Verify that the one-step-ahead minimum-variance predictor is realised by

$$
\hat{\mathfrak{X}}\_{k+1/k} = \frac{a}{1 + \sqrt{1 - a^2}} \hat{\mathfrak{X}}\_{k/k - 1} + \frac{a\sqrt{1 - a^2}}{1 + \sqrt{1 - a^2}} z\_k \dots
$$

(d) Find the filter gain.

(e) Write down the realisation of the minimum-variance filter.

<sup>&</sup>quot;Thoughts, like fleas, jump from man to man. But they don't bite everybody." *Baron Stanislaw Jerzy Lec*

[8] R. R. Bitmead and Michel Gevers, "Riccati Difference and Differential Equations: Convergence, Monotonicity and Stability", In S. Bittanti, A. J. Laub and J. C. Willems

[9] H. K. Wimmer, "Monotonicity and Maximality of Solutions of Discrete-time Algebraic Riccati Equations", *Journal of Mathematical Systems, Estimation and Control*, vol. 2, no. 2,

[10] H. K. Wimmer and M. Pavon, "A comparison theorem for matrix Riccati difference

[11] G. A. Einicke, "Asymptotic Optimality of the Minimum Variance Fixed-Interval Smoother", *IEEE Transactions on Signal Processing*, vol. 55, no. 4, pp. 1543 – 1547, Apr.

[12] B. D. O. Anderson and J. B. Moore, *Optimal Filtering*, Prentice-Hall Inc, Englewood

[13] G. Freiling, G. Jank and H. Abou-Kandil, "Generalized Riccati Difference and Differential Equations", *Linear Algebra and its Applications*, vol. 241, pp. 291 – 303, 1996. [14] G. Freiling and V. Ionescu, "Time-varying discrete Riccati equation: some monotonicity

equations", *Systems and Control Letters*, vol. 19, pp. 233 – 239, 1992.

results", *Linear Algebra and its Applications*, vol. 286, pp. 135 – 148, 1999.

"Man is but a reed, the most feeble thing in nature, but he is a thinking reed." *Blaise Pascal*

(Eds.), *The Riccati Equation*, Springer Verlag, 1991.

pp. 219 – 235, 1992.

Cliffs, New Jersey, 1979.

2007.

**Problem 5.** Assuming that a system *G* has the realisation *xk*+1 = *Akxk* + *Bkwk*, *yk* = *Ckxk* + *Dkwk*, expand ΔΔH(*z*) = *GQG*(*z*) + *R* to obtain Δ(*z*) and the optimal output estimation filter*.* 

#### **5.9 Glossary**

In addition to the terms listed in Section 2.6, the notation has been used herein.


#### **5.10 References**


<sup>&</sup>quot;Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that we may fear less." *Marie Sklodowska Curie* 

Smoothing, Filtering and Prediction:

<sup>116</sup> Estimating the Past, Present and Future

**Problem 5.** Assuming that a system *G* has the realisation *xk*+1 = *Akxk* + *Bkwk*, *yk* = *Ckxk* + *Dkwk*,

*A*, *B*, *C*, *D* A linear time-invariant system is assumed to have the realisation xk+1 =

*Q, R* Time-invariant covariance matrices of stationary stochastic signals *wk*

[1] K. Ogata, *Discrete-time Control Systems*, Prentice-Hall, Inc., Englewood Cliffs, New

[2] M. Gopal, *Modern Control Systems Theory*, New Age International Limited Publishers,

[3] M. R. Opmeer and R. F. Curtain, "Linear Quadratic Gassian Balancing for Discrete-Time Infinite-Dimensional Linear Systems", *SIAM Journal of Control and Optimization*, vol. 43,

[4] S. W. Chan, G. C. Goodwin and K. S. Sin, "Convergence Properties of the Riccati Difference Equation in Optimal Filtering of Nonstablizable Systems", *IEEE Transactions* 

[5] C. E. De Souza, M. R. Gevers and G. C. Goodwin, "Riccatti Equations in Optimal Filtering of Nonstabilizable Systems Having Singular State Transition Matrices", *IEEE* 

[6] C. E. De Souza, "On Stabilizing Properties of Solutions of the Riccati Difference Equation", *IEEE Transactions on Automatic Control*, vol. 34, no. 12, pp. 1313 – 1316, Dec.

[7] R. R. Bitmead, M. Gevers and V. Wertz, *Adaptive Optimal Control. The thinking Man's* 

"Nothing in life is to be feared, it is only to be understood. Now is the time to understand more, so that

*Transactions on. Automatic Control*, vol. 31, no. 9, pp. 831 – 838, Sep. 1986.

*Axk* + *Bwk* and *yk* = *Cxk* + *Dwk* in which *A*, *B*, *C*, *D* are constant state

expand ΔΔH(*z*) = *GQG*(*z*) + *R* to obtain Δ(*z*) and the optimal output estimation filter*.* 

space matrices of appropriate dimension.

In addition to the terms listed in Section 2.6, the notation has been used herein.

and *vk*, respectively.

*HOE(z)* Transfer function matrix of output estimator. *HIE(z)* Transfer function matrix of input estimator.

*on Automatic Control*, vol. 29, no. 2, pp. 110 – 118, Feb. 1984.

*P* Steady-state error covariance matrix. *K* Time-invariant predictor gain matrix. *L* Time-invariant filter gain matrix.

*O* Observability matrix. *W* Observability gramian.

Δ(*z*) Spectral factor.

**5.10 References** 

1989.

Jersey, 1987.

New Delhi, 1993.

no. 4, pp. 1196 – 1221, 2004.

*GPC*, Prentice Hall, New York, 1990.

we may fear less." *Marie Sklodowska Curie* 

**5.9 Glossary** 


<sup>&</sup>quot;Man is but a reed, the most feeble thing in nature, but he is a thinking reed." *Blaise Pascal*

Chapter title

Author Name

### **Continuous-Time Smoothing**

#### **6.1 Introduction**

The previously-described minimum-mean-square-error and minimum-variance filtering solutions operate on measurements up to the current time. If some processing delay can be tolerated then improved estimation performance can be realised through the use of smoothers. There are three state-space smoothing technique categories, namely, fixed-point, fixed-lag and fixed-interval smoothing. Fixed-point smoothing refers to estimating some linear combination of states at a previous instant in time. In the case of fixed-lag smoothing, a fixed time delay is assumed between the measurement and on-line estimation processes. Fixed-interval smoothing is for retrospective data analysis, where measurements recorded over an interval are used to obtain the improved estimates. Compared to filtering, smoothing has a higher implementation cost, as it has increased memory and calculation requirements.

Continuous-Time Smoothing 119

A large number of smoothing solutions have been reported since Wiener's and Kalman's development of the optimal filtering results – see the early surveys [1] – [2]. The minimumvariance fixed-point and fixed-lag smoother solutions are well known. Two fixed-interval smoother solutions, namely the maximum-likelihood smoother developed by Rauch, Tung and Striebel [3], and the two-filter Fraser-Potter formula [4], have been in widespread use since the 1960s. However, the minimum-variance fixed-interval smoother is not well known. This smoother is simply a time-varying state-space generalisation of the optimal Wiener solution.

The main approaches for continuous-time fixed-point, fixed-lag and fixed-interval smoothing are canvassed here. It is assumed throughout that the underlying noise processes are zero mean and uncorrelated. Nonzero means and correlated processes can be handled using the approaches of Chapters 3 and 4. It is also assumed here that the noise statistics and state-space model parameters are known precisely. Note that techniques for estimating parameters and accommodating uncertainty are addressed subsequently.

Some prerequisite concepts, namely time-varying adjoint systems, backwards differential equations, Riccati equation comparison and the continuous-time maximum-likelihood method are covered in Section 6.2. Section 6.3 outlines a derivation of the fixed-point smoother by Meditch [5]. The fixed-lag smoother reported by Sage *et al* [6] and Moore [7], is the subject of Section 6.4. Section 6.5 deals with the Rauch-Tung-Striebel [3], Fraser-Potter [4] and minimum-variance fixed-interval smoother solutions [8] - [10]. As before, the approach

<sup>&</sup>quot;Life has got a habit of not standing hitched. You got to ride it like you find it. You got to change with it. If a day goes by that don't change some of your old notions for new ones, that is just about like trying to milk a dead cow." *Woodrow Wilson Guthrie*

The negative sign of the derivative within (6) indicates that this differential equation proceeds backwards in time. The corresponding state transition matrix is defined below.

*Proof: Following the proof of Lemma 1 of Chapter 3, by differentiating (7) and substituting (4) – (5), it is easily verified that (7) is a solution of (6). �*  The Lyapunov equation corresponding to (6) is described next because it is required in the

*Lemma 3: In respect of the backwards differential equation (6), assume that u(t) is a zero-mean white* 

 *and P t*(, ) 

(, ) () (, ) (, ) () () () () *<sup>T</sup> <sup>T</sup> Pt At Pt Pt At C tUtCt*

 

*Proof: The backwards Lyapunov differential equation (10) can be obtained by using (6) and (7)* 

The following Riccati Equation comparison theorem is required subsequently to compare

*Theorem 1 (Riccati Equation Comparison Theorem) [12], [8]: Let P1(t) ≥ 0 and P2(t) ≥ 0 denote* 

"Progress always involves risk; you can't steal second base and keep your foot on first base." *Frederick* 

1 11 1 1 111 1 1 1 () () () () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> P t A tP t P t A t P tS tP t B tQ tB t BtQtB t* (11)

*,* (7)

(,) *<sup>H</sup> t t = I.* (9)

*dt* 

*k (see the proof of Lemma 2 in Chapter 3). �* 

 *=* { ( ) ( )} *<sup>T</sup> dE t t*

*.* (10)

( ) *t , namely,* <sup>0</sup> { ( ) ( )} *<sup>T</sup> Eut t*

 *satisfy the Lyapunov* 

 *= 0.* 

*,* (8)

0 0 0 () (, ) ( ) (,) () () *<sup>t</sup> <sup>H</sup> H T t*

*t t t t s t C s u s ds*

0 0 0 (, ) (, ) () (, )

*H <sup>H</sup> d tt T H t t A t tt dt*

*Lemma 2: The differential equation (6) has the solution* 

*where the adjoint state transition matrix,* <sup>0</sup> (, ) *<sup>H</sup> t t , satisfies* 

development of backwards Riccati equations.

*Then the covariances P(t, τ) =* { ( ) ( )} *<sup>T</sup> Et t*

**6.2.3Comparison of Riccati Equations** 

the performance of filters and smoothers.

*solutions of the Riccati differential equations* 

 *=* { () () *<sup>T</sup> Et t* 

*process with E{u(t)uT(τ)} = U(t)δ(t – τ) that is uncorrelated with* <sup>0</sup>

 

 *+* ( ) ( )} *<sup>T</sup>* 

*with boundary condition* 

*differential equation* 

*within* { ( ) ( )} *<sup>T</sup> dE t t dt* 

*and* 

*James Wilcox*

here is to accompany the developments, where appropriate, with proofs about performance being attained. Smoothing is not a panacea for all ills. If the measurement noise is negligible then smoothing (and filtering) may be superfluous. Conversely, if measurement noise obliterates the signals then data recovery may not be possible. Therefore, estimator performance is often discussed in terms of the prevailing signal-to-noise ratio.

#### **6.2 Prerequisites**

#### **6.2.1Time-varying Adjoint Systems**

Since fixed-interval smoothers employ backward processes, it is pertinent to introduce the adjoint of a time-varying continuous-time system. Let : *<sup>p</sup>* → *<sup>q</sup>* denote a linear timevarying system

$$
\dot{\mathbf{x}}(t) = A(t)\mathbf{x}(t) + B(t)w(t) \,. \tag{1}
$$

$$y(t) = \mathbf{C}(t)\mathbf{x}(t) + D(t)w(t) \,. \tag{2}$$

operating on the interval [0, *T*]. Let *w* denote the set of *w*(*t*) over all time *t*, that is, *w* = {*w*(*t*), *t* [0, *T*]}. Similarly, let *y* = *w* denote {*y*(*t*), *t* [0, *T*]}. The adjoint of , denoted by : *<sup>H</sup> <sup>q</sup>* → *<sup>p</sup>* , is the unique linear system satisfying

$$<\!y \,, \mathcal{G} \, w \rhd = \! \! \mathcal{G} \, ^H y \, w \rhd \tag{3}$$

for all *y <sup>q</sup>* and *w <sup>p</sup>* .

*Lemma 1: The adjoint <sup>H</sup> of the system described by (1) – (2), with x(t0) = 0, having the realisation* 

$$
\dot{\tilde{\varphi}}(t) = -A^\top(t)\tilde{\varphi}(t) - C^\top(t)u(t) \,. \tag{4}
$$

$$\mathbf{z}(t) = \mathbf{B}^{\top}(t)\boldsymbol{\zeta}(t) + \mathbf{D}^{\top}(t)\boldsymbol{u}(t),\tag{5}$$

*with* () 0 *T , satisfies (3).* 

The proof follows *mutatis mutandis* from that of Lemma 1 of Chapter 3 and is set out in [11]. The original system (1) – (2) needs to be integrated forwards in time, whereas the adjoint system (4) – (5) needs to be integrated backwards in time. Some important properties of backward systems are discussed in the next section. The simplification *D(t)* = 0 is assumed below unless stated otherwise.

#### **6.2.2Backwards Differential Equations**

The adjoint state evolution (4) is rewritten as

$$-\dot{\tilde{\zeta}}(t) = A^{\top}(t)\tilde{\zeta}(t) + \mathbb{C}^{\top}(t)u(t) \,. \tag{6}$$

<sup>&</sup>quot;The simple faith in progress is not a conviction belonging to strength, but one belonging to acquiescence and hence to weakness." *Norbert Wiener*

Smoothing, Filtering and Prediction:

→ *<sup>q</sup>* denote a linear time-

*<sup>H</sup> y*, *w*> (3)

 *described by (1) – (2), with x(t0) = 0, having the* 

*t A t t C tut* . (6)

, denoted by

(1) (2)

(4) (5)

<sup>120</sup> Estimating the Past, Present and Future

here is to accompany the developments, where appropriate, with proofs about performance being attained. Smoothing is not a panacea for all ills. If the measurement noise is negligible then smoothing (and filtering) may be superfluous. Conversely, if measurement noise obliterates the signals then data recovery may not be possible. Therefore, estimator

Since fixed-interval smoothers employ backward processes, it is pertinent to introduce the

operating on the interval [0, *T*]. Let *w* denote the set of *w*(*t*) over all time *t*, that is, *w* = {*w*(*t*), *t*

() () () () () *<sup>T</sup> <sup>T</sup>*

 *t A t t C tut ,* 

() () () () () *<sup>T</sup> <sup>T</sup> zt B t t D tut* 

() () () () () *<sup>T</sup> <sup>T</sup>*

 

*,*

The proof follows *mutatis mutandis* from that of Lemma 1 of Chapter 3 and is set out in [11]. The original system (1) – (2) needs to be integrated forwards in time, whereas the adjoint system (4) – (5) needs to be integrated backwards in time. Some important properties of backward systems are discussed in the next section. The simplification *D(t)* = 0 is assumed

"The simple faith in progress is not a conviction belonging to strength, but one belonging to

*w* denote {*y*(*t*), *t* [0, *T*]}. The adjoint of

performance is often discussed in terms of the prevailing signal-to-noise ratio.

*xt Atxt Btwt* () () () () () *,* 

*y*() () () () () *t Ctxt Dtwt ,*

adjoint of a time-varying continuous-time system. Let : *<sup>p</sup>*

<*y*, *w*> =<

*<sup>H</sup> of the system* 

: *<sup>H</sup> <sup>q</sup>* → *<sup>p</sup>* , is the unique linear system satisfying

**6.2 Prerequisites** 

varying system

[0, *T*]}. Similarly, let *y* =

for all *y <sup>q</sup>* and *w <sup>p</sup>* .

() 0 *T , satisfies (3).* 

below unless stated otherwise.

**6.2.2Backwards Differential Equations**  The adjoint state evolution (4) is rewritten as

acquiescence and hence to weakness." *Norbert Wiener*

*Lemma 1: The adjoint* 

*realisation* 

*with* 

**6.2.1Time-varying Adjoint Systems** 

The negative sign of the derivative within (6) indicates that this differential equation proceeds backwards in time. The corresponding state transition matrix is defined below.

*Lemma 2: The differential equation (6) has the solution* 

$$\boldsymbol{\xi}(t) = \boldsymbol{\Phi}^H(t, t\_0)\boldsymbol{\zeta}(t\_0) - \int\_{t\_0}^t \boldsymbol{\Phi}^H(\mathbf{s}, t)\boldsymbol{\mathcal{C}}^T(\mathbf{s})\boldsymbol{\mu}(\mathbf{s})d\mathbf{s} \tag{7}$$

*where the adjoint state transition matrix,* <sup>0</sup> (, ) *<sup>H</sup> t t , satisfies* 

$$
\boldsymbol{\Phi}^H(t, t\_0) = \frac{d\boldsymbol{\Phi}^H(t, t\_0)}{dt} = -\boldsymbol{A}^T(t)\boldsymbol{\Phi}^H(t, t\_0) \,. \tag{8}
$$

*with boundary condition* 

$$
\Phi\_H(t, t) = I.
\tag{9}
$$

*Proof: Following the proof of Lemma 1 of Chapter 3, by differentiating (7) and substituting (4) – (5), it is easily verified that (7) is a solution of (6). �* 

The Lyapunov equation corresponding to (6) is described next because it is required in the development of backwards Riccati equations.

*Lemma 3: In respect of the backwards differential equation (6), assume that u(t) is a zero-mean white process with E{u(t)uT(τ)} = U(t)δ(t – τ) that is uncorrelated with* <sup>0</sup> ( ) *t , namely,* <sup>0</sup> { ( ) ( )} *<sup>T</sup> Eut t = 0.* 

*Then the covariances P(t, τ) =* { ( ) ( )} *<sup>T</sup> Et t and P t*(, )  *=* { ( ) ( )} *<sup>T</sup> dE t t dt satisfy the Lyapunov* 

*differential equation* 

$$\dot{\mathbf{P}} - \dot{\mathbf{P}}(t, \tau) = A(t)^{\top} P(t, \tau) + P(t, \tau)A(t) - \mathbf{C}^{\top}(t) \mathbf{L}(t) \mathbf{C}(t) \,. \tag{10}$$

*Proof: The backwards Lyapunov differential equation (10) can be obtained by using (6) and (7) within* { ( ) ( )} *<sup>T</sup> dE t t dt =* { () () *<sup>T</sup> Et t +* ( ) ( )} *<sup>T</sup> k (see the proof of Lemma 2 in Chapter 3). �* 

#### **6.2.3Comparison of Riccati Equations**

The following Riccati Equation comparison theorem is required subsequently to compare the performance of filters and smoothers.

*Theorem 1 (Riccati Equation Comparison Theorem) [12], [8]: Let P1(t) ≥ 0 and P2(t) ≥ 0 denote solutions of the Riccati differential equations P2(t)* 

$$\dot{P}\_{1}(t) = A\_{1}(t)P\_{1}(t) + P\_{1}(t)A\_{1}^{\top}(t) - P\_{1}(t)S\_{1}(t)P\_{1}(t) + B\_{1}(t)Q\_{1}(t)B\_{1}^{\top}(t) + B(t)Q(t)B^{\top}(t) \tag{11}$$

*and* 

<sup>&</sup>quot;Progress always involves risk; you can't steal second base and keep your foot on first base." *Frederick James Wilcox*

( ( ) ) ( ( )) *<sup>b</sup>*

The Gaussian likelihood function for *x*(*t*) is calculated from (15) and (16) as

<sup>1</sup> ( ( )) exp 0.5( ( ) ) ( ( ) )

It is often more convenient to work with the log-probability density function

1/2 / 2 <sup>1</sup> log ( ( )) log (2 ) 0.5( ( ) ) ( ) *<sup>n</sup> <sup>T</sup> xx xx p x t*

1/ 2 / 2 <sup>1</sup> log ( ( )) log (2 ) 0.5 ( ( ) ) ( ) . *<sup>n</sup> <sup>T</sup> xx xx f xt*

ˆ arg max log ( | ( )) *p x t*

ˆ arg max log ( | ( )) *f x t*

likelihood estimation is illustrated by the two examples that follow.

*Example 1.* Consider the first-order autoregressive system

( ( ), *axt* <sup>2</sup> )

  *w*

*<sup>w</sup>* , namely,

/ 2 <sup>0</sup> <sup>0</sup> <sup>1</sup> ( ( )) exp 0.5( ( ) ( )) (2 ) *T*

*f xt x t a x t dt*

*n w*

"Faced with the choice between changing one's mind and proving that there is no need to do so, almost

*xx <sup>n</sup>*

*f xt x t R x t dx*

1/ 2 / 2

*R*

*xx*

either maximises the log-probability density function

or maximises the log-likelihood function

*dx t*( )

everyone gets busy on the proof." *John Kenneth Galbraith*

from (22) that *x t* ( ) ~ <sup>0</sup>

log ( | ( )) *f x t* 

where *x t* ( ) =

(2 )

and the log-likelihood function

*<sup>a</sup> P a x t b p x t dx* . (16)

 

> 

(20)

. (21)

<sup>0</sup> *xt axt wt* () () () , (22)

*dt* , *w*(*t*) is a zero-mean Gaussian process and *a0* is unknown. It follows

. (23)

2 2

that

 *x t* 

or

<sup>1</sup>

 *R x t R x dx* 

*R x t R x dx*

Suppose that a given record of *x*(*t*) is assumed to be belong to a Gaussian distribution that is a function of an unknown quantity *θ*. A statistical approach for estimating the unknown *θ* is the method of maximum likelihood. This typically involves finding an estimate ˆ

to zero and solving for the unknown *θ*. Continuous-time maximum

So-called maximum likelihood estimates can be found by setting either log ( | ( )) *<sup>p</sup>*

*T*

. (17)

(18)

 (19)

$$\dot{P}\_2(t) = A\_2(t)P\_2(t) + P\_2(t)A\_2^\dagger(t) - P\_2(t)S\_2(t)P\_2(t) + B\_2(t)Q\_2(t)B\_2^\dagger(t) + B(t)Q(t)B^\dagger(t)\tag{12}$$

*with S1*(*t*) = <sup>1</sup> 11 1 () () () *<sup>T</sup> C tR tC t* , *S2*(*t*) = <sup>1</sup> 22 2 () () () *<sup>T</sup> C tR tC t* , *where A*1(*t*), *B*1(*t*), *C*1(*t*), *Q*1(*t*) *≥ 0*, *R*1(*t*) *≥ 0*, *A*2(*t*), *B*2(*t*), *C*2(*t*), *Q*2(*t*) *≥ 0* and *R*2(*t*) *≥ 0 are of appropriate dimensions. If* 

$$\text{(i)}\quad P\_1(t\_0) \ge P\_2(t\_0) \text{ for a } t\_0 \ge 0 \text{ and } \dots$$

$$\begin{array}{ll} \text{(ii)} & \begin{bmatrix} Q\_{\texttt{l}}(t) & A\_{\texttt{l}}(t) \\ A\_{\texttt{l}}^{\,\,\,\,\,}(t) & -S\_{\texttt{l}}(t) \end{bmatrix} \succeq \begin{bmatrix} Q\_{\texttt{z}}(t) & A\_{\texttt{z}}(t) \\ A\_{\texttt{z}}^{\,\,\,\,\,}(t) & -S\_{\texttt{z}}(t) \end{bmatrix} \text{ for all } t \ge t\_{0}. \end{array}$$

*Then* 

$$P\_1(t) \succeq P\_2(t) \tag{13}$$

*for all t ≥ t0.* 

*Proof: Condition (i) of the theorem is the initial step of an induction argument. For the induction step, denote* <sup>3</sup> *P t*( )  *=* 1*P t*( ) *<sup>−</sup>* <sup>2</sup> *P t*( ) *, P3*(*t*) *= P1*(*t*) *− P2*(*t*) *and A*( )*<sup>t</sup>* <sup>1</sup> ( ) *<sup>T</sup> <sup>A</sup> t +* 1 2 *S tP t* () () *<sup>−</sup>* 1 3 0.5 ( ) ( ) *S tP t . Then* 

$$\dot{P}\_{\uparrow}(t) = \overline{A}(t)P\_{\uparrow}(t) + P\_{\uparrow}(t)\overline{A}^{\top}(t) + \begin{bmatrix} I & P\_{\uparrow}(t) \end{bmatrix} \begin{bmatrix} \begin{bmatrix} Q\_{\uparrow}(t) & A\_{\uparrow}(t) \\ A\_{\downarrow}^{\prime}(t) & -S\_{\downarrow}(t) \end{bmatrix} - \begin{bmatrix} Q\_{\downarrow}(t) & A\_{\downarrow}(t) \\ A\_{\downarrow}^{\prime}(t) & -S\_{\downarrow}(t) \end{bmatrix} \begin{bmatrix} I \\ P\_{\downarrow}(t) \end{bmatrix}$$

*which together with condition (ii) yields* 

$$
\dot{P}\_\natural(t) \ge \overline{A}(t)P\_\natural(t) + P\_\natural(t)\overline{A}^\dagger(t) \,. \tag{14}
$$

*Lemma 5 of Chapter 3 and (14) imply* <sup>3</sup> *P t*( ) *≥ 0 and the claim (13) follows. �* 

### **6.2.4The Maximum-Likelihood Method**

Rauch, Tung and Streibel famously derived their fixed-interval smoother [3] using a maximum-likelihood technique which is outlined as follows. Let *x*(*t*) ~ ( , *Rxx*) denote a continuous random variable having a Gaussian (or normal) distribution within mean *E*{*x*(*t*)} = *μ* and covariance *E{*(*x*(*t*) – *μ*)(*x*(*t*) – *μ*)*T*} = *Rxx*. The continuous-time Gaussian probability density function of *x*(*t*) *<sup>n</sup>* is defined by

$$p(\mathbf{x}(t)) = \frac{1}{(2\pi)^{n/2} \left| \mathcal{R}\_{\text{xx}} \right|^{1/2}} \exp\left\{ -0.5(\mathbf{x}(t) - \boldsymbol{\mu})^{\mathrm{T}} \mathcal{R}\_{\text{xx}}^{-1} (\mathbf{x}(t) - \boldsymbol{\mu}) \right\},\tag{15}$$

in which |*Rxx*| denotes the determinant of *Rxx*. The probability that the continuous random variable *x*(*t*) with a given probability density function *p*(*x*(*t*)) lies within an interval [a, b] is given by the likelihood function (which is also known as the cumulative distribution function)

<sup>&</sup>quot;The price of doing the same old thing is far higher than the price of change." *William Jefferson (Bill) Clinton*

Smoothing, Filtering and Prediction:

22 2 () () () *<sup>T</sup> C tR tC t* , *where A*1(*t*), *B*1(*t*), *C*1(*t*), *Q*1(*t*) *≥ 0*, *R*1(*t*) *≥*

*P*1(*t*) ≥ *P*2(*t*) (13)

1 1 2 2 2

 ( , 

*Rxx*) denote a

*A t St At St Pt* 

*t t*

<sup>3</sup> 3 3 () () () () () *<sup>T</sup> P t AtP t P tA t .* (14)

<sup>122</sup> Estimating the Past, Present and Future

*with S1*(*t*) = <sup>1</sup>

*(ii)* <sup>1</sup> <sup>1</sup>

*Then* 

*for all t ≥ t0.* 

1 3 0.5 ( ) ( ) *S tP t . Then* 

function)

*Clinton*

1 1 () () () () *<sup>T</sup> Qt At A t St* 

11 1 () () () *<sup>T</sup> C tR tC t* , *S2*(*t*) = <sup>1</sup>

*(i) P1*(*t0*) *≥ P2*(*t0*) *for a t0 ≥ 0 and* 

*0*, *A*2(*t*), *B*2(*t*), *C*2(*t*), *Q*2(*t*) *≥ 0* and *R*2(*t*) *≥ 0 are of appropriate dimensions. If* 

<sup>1</sup> <sup>1</sup> <sup>2</sup> <sup>2</sup>

maximum-likelihood technique which is outlined as follows. Let *x*(*t*) ~

<sup>1</sup> ( ( )) exp 0.5( ( ) ) ( ( ) )

*pxt xt R xt*

*xx <sup>n</sup>*

1/ 2 / 2

*R*

*xx*

*T*

3 3 3 2

*which together with condition (ii) yields* 

**6.2.4The Maximum-Likelihood Method** 

density function of *x*(*t*) *<sup>n</sup>* is defined by

(2 )

*≥* <sup>2</sup> <sup>2</sup> 2 2 () () () () *<sup>T</sup> Qt At A t St* 

<sup>2</sup> 22 2 2 222 2 2 2 () () () () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> P t A tP t P t A t P tS tP t B tQ tB t BtQtB t* (12)

*for all t ≥ t0.* 

*Proof: Condition (i) of the theorem is the initial step of an induction argument. For the induction step, denote* <sup>3</sup> *P t*( )  *=* 1*P t*( ) *<sup>−</sup>* <sup>2</sup> *P t*( ) *, P3*(*t*) *= P1*(*t*) *− P2*(*t*) *and A*( )*<sup>t</sup>* <sup>1</sup> ( ) *<sup>T</sup> <sup>A</sup> t +* 1 2 *S tP t* () () *<sup>−</sup>*

() () () () () () () () () () () () () () ()

*Qt At Q t At I P t AtP t P tA t I P t*

*Lemma 5 of Chapter 3 and (14) imply* <sup>3</sup> *P t*( ) *≥ 0 and the claim (13) follows. �* 

Rauch, Tung and Streibel famously derived their fixed-interval smoother [3] using a

continuous random variable having a Gaussian (or normal) distribution within mean *E*{*x*(*t*)} = *μ* and covariance *E{*(*x*(*t*) – *μ*)(*x*(*t*) – *μ*)*T*} = *Rxx*. The continuous-time Gaussian probability

in which |*Rxx*| denotes the determinant of *Rxx*. The probability that the continuous random variable *x*(*t*) with a given probability density function *p*(*x*(*t*)) lies within an interval [a, b] is given by the likelihood function (which is also known as the cumulative distribution

"The price of doing the same old thing is far higher than the price of change." *William Jefferson (Bill)* 

<sup>1</sup>

 *T*

, (15)

$$P(a \le \mathbf{x}(t) \le b) = \int\_{a}^{b} p(\mathbf{x}(t)) d\mathbf{x} \,. \tag{16}$$

The Gaussian likelihood function for *x*(*t*) is calculated from (15) and (16) as

$$f(\mathbf{x}(t)) = \frac{1}{(2\pi)^{n/2} \left| R\_{\mathbf{x}} \right|^{1/2}} \int\_{-\alpha}^{\alpha} \exp \left\{ -0.5(\mathbf{x}(t) - \boldsymbol{\mu})^{\top} R\_{\mathbf{x}}^{-1} (\mathbf{x}(t) - \boldsymbol{\mu}) \right\} d\mathbf{x} \,. \tag{17}$$

It is often more convenient to work with the log-probability density function

$$\log \left. p(\mathbf{x}(t)) \right| = -\log \left. (2\pi)^{n/2} \right| \left. \mathcal{R}\_{\text{xx}} \right|^{1/2} - 0.5(\mathbf{x}(t) - \boldsymbol{\mu})^{\top} \mathcal{R}\_{\text{xx}}^{-1} (\mathbf{x} - \boldsymbol{\mu}) d\mathbf{x} \tag{18}$$

and the log-likelihood function

$$\log\_a f(\mathbf{x}(t)) = -\log\_a (2\pi)^{n/2} \left| R\_{\text{xx}} \right|^{1/2} - 0.5 \int\_{-\alpha}^{\alpha} (\mathbf{x}(t) - \boldsymbol{\mu})^{\mathsf{T}} R\_{\text{xx}}^{-1} (\mathbf{x} - \boldsymbol{\mu}) d\mathbf{x}.\tag{19}$$

Suppose that a given record of *x*(*t*) is assumed to be belong to a Gaussian distribution that is a function of an unknown quantity *θ*. A statistical approach for estimating the unknown *θ* is the method of maximum likelihood. This typically involves finding an estimate ˆ that either maximises the log-probability density function

$$\hat{\theta} = \underset{\theta}{\text{arg max}} \; \vert \; \log \; \; p(\theta \mid \mathbf{x}(t)) \tag{20}$$

or maximises the log-likelihood function

$$
\hat{\theta} = \underset{\theta}{\text{arg max}} \quad \text{log} \quad f(\theta | \ge(t)). \tag{21}
$$

So-called maximum likelihood estimates can be found by setting either log ( | ( )) *<sup>p</sup> x t* or

log ( | ( )) *f x t* to zero and solving for the unknown *θ*. Continuous-time maximum likelihood estimation is illustrated by the two examples that follow.

*Example 1.* Consider the first-order autoregressive system

$$
\dot{\hat{\mathbf{x}}}(t) = -a\_0 \mathbf{x}(t) + w(t) \, \tag{22}
$$

where *x t* ( ) = *dx t*( ) *dt* , *w*(*t*) is a zero-mean Gaussian process and *a0* is unknown. It follows from (22) that *x t* ( ) ~ <sup>0</sup> ( ( ), *axt* <sup>2</sup> ) *<sup>w</sup>* , namely,

$$f\left(\dot{\mathbf{x}}(t)\right) = \frac{1}{\left(2\pi\right)^{n/2}\sigma\_w} \int\_0^r \exp\left\{-0.5\left(\dot{\mathbf{x}}(t) + a\_0 \mathbf{x}(t)\right)^2 \sigma\_w^{-2}\right\} dt\,. \tag{23}$$

<sup>&</sup>quot;Faced with the choice between changing one's mind and proving that there is no need to do so, almost everyone gets busy on the proof." *John Kenneth Galbraith*

The second component,

where *A*(*<sup>a</sup>*) =

() 0 0 0 *A t* 

estimate ˆ ( )*t* of 

where

*David Lloyd George*

1 if 0 if *<sup>t</sup>*

*t t*

**6.3.2Solution Derivation** 

for *t* > *τ*, (31) may be written as

*t KtCt*

corresponding signal model may be written as

() 0 0 () *<sup>t</sup>*

*A t* 

> 0 *B t*

*A t*

and *B*(*a*)(*t*) = ( )

A solution for the continuous-time fixed-point smoothing problem can be developed from first principles, for example, see [5] - [6]. However, it is recognised in [13] that a simpler solution derivation follows by transforming the smoothing problem into a filtering problem that possesses an augmented state. Following the nomenclature of [14], consider an

component, *x*(*t*) *<sup>n</sup>* , is the state of the system *x t* ( ) = *A*(*t*)*x*(*t*) + *B*(*t*)*w*(*t*) and *y*(*t*) = *C*(*t*)*x*(*t*).

( )*t <sup>n</sup>* , equals *x*(*t*) at time *t* = *τ*, that is,

( ) *<sup>t</sup> B t B t* 

is the Kronecker delta function. Note that the simplifications *A*(*<sup>a</sup>*) =

arise for *t* > *τ*. The smoothing objective is to produce an

() () () <sup>1</sup> ( ) ( )( ) ( ) ( ) *<sup>a</sup> a aT K t P t C tR t* , (32)

*z t*

( ) *x t t* 

and *C*(*<sup>a</sup>*)(*t*) = [*C*(*t*) 0], in which

. The first

(29) (30)

(31)

, then

( ) ( ) *K t K t* 

. (33)

( )*t* = *x*(*τ*). The

augmented state vector having two components, namely, *x(a)*(*t*) = ( )

( ) () () () () () () () () *<sup>a</sup> a a <sup>a</sup> x t A tx t B twt*

() () () () () () *a a zt C tx t vt* ,

, *B*(*a*)(*t*) = ( )

( )*t* from the measurements *z*(*t*) over *t* [0, *T*].

Employing the Kalman-Bucy filter recursions for the system (29) – (30) results in ( ) () () () ( ) <sup>ˆ</sup> () () () () () () ( | ) <sup>ˆ</sup> <sup>ˆ</sup> *<sup>a</sup> a a <sup>a</sup> <sup>a</sup> x t A tx t K t zt C txt t*

in which *P(a)*(*t*) 2 2 *n n* is to be found. Consider the partitioning ( ) ( ) *<sup>a</sup> K t*<sup>=</sup>

ˆ(|) () () () 0 ˆ(|) ( ) ( ) ˆ( ) () () 0 ( ) ( ) *xt t At KtCt xt t K t*

 

"Don't be afraid to take a big step if one is indicated. You can't cross a chasm in two small jumps."

*t K t*

() () () () () () () () () () () *<sup>a</sup> aa a <sup>a</sup> <sup>A</sup> t K tC t x t K tzt* ,

Taking the logarithm of both sides gives

$$\log \,\, f(\dot{\mathbf{x}}(t)) = -\log \,(2\pi)^{n/2} \sigma\_w - 0.5 \sigma\_w^{-2} \int\_0^T (\dot{\mathbf{x}}(t) + a\_0 \mathbf{x}(t))^2 dt \,. \tag{24}$$

Setting 0 log ( ( )) *f xt a* = 0 results in <sup>0</sup> <sup>0</sup> ( ( ) ( )) ( ) *T x t a x t x t dt* = 0 and hence

$$
\hat{a}\_0 = -\left(\int\_0^T (\mathbf{x}^2(t)dt)\right)^{-1} \int\_0^T \dot{\mathbf{x}}(t)\mathbf{x}(t)dt\ . \tag{25}
$$

*Example 2.* Consider the third-order autoregressive system

$$
\ddot{\mathbf{x}}(t) + a\_{\mathbf{r}}\ddot{\mathbf{x}}(t) + a\_{\mathbf{r}}\dot{\mathbf{x}}(t) + a\_{\mathbf{o}}\mathbf{x}(t) = \mathbf{z}v(t) \tag{26}
$$

where *x t*( ) = 3 3 *dxt*( ) *dt* and *x t*( ) = 2 2 *dxt*( ) *dt* . The above system can be written in a controllable canonical form as

$$
\begin{bmatrix}
\dot{\mathbf{x}}\_1(t) \\
\dot{\mathbf{x}}\_2(t) \\
\dot{\mathbf{x}}\_3(t)
\end{bmatrix} = \begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0
\end{bmatrix} \begin{bmatrix}
\mathbf{x}\_1(t) \\
\mathbf{x}\_2(t) \\
\mathbf{x}\_3(t)
\end{bmatrix} + \begin{bmatrix}
w(t) \\ 0 \\ 0
\end{bmatrix}.\tag{27}
$$

Assuming <sup>1</sup> *x t* ( ) ~ 21 12 03 ( ( ) ( ) ( ), *ax t ax t ax t* <sup>2</sup> ) *<sup>w</sup>* , taking logarithms, setting to zero the partial derivatives with respect to the unknown coefficients, and rearranging yields

$$
\begin{bmatrix}
\hat{a}\_{0} \\
\hat{a}\_{1} \\
\hat{a}\_{2}
\end{bmatrix} = -\begin{bmatrix}
\int\_{0}^{T} \mathbf{x}\_{3}^{2} dt & \int\_{0}^{T} \mathbf{x}\_{2} \mathbf{x}\_{3} dt & \int\_{0}^{T} \mathbf{x}\_{1} \mathbf{x}\_{3} dt \\
\int\_{0}^{T} \mathbf{x}\_{2} \mathbf{x}\_{3} dt & \int\_{0}^{T} \mathbf{x}\_{2}^{2} dt & \int\_{0}^{T} \mathbf{x}\_{2} \mathbf{x}\_{1} dt \\
\int\_{0}^{T} \mathbf{x}\_{i} \mathbf{x}\_{3} dt & \int\_{0}^{T} \mathbf{x}\_{2} \mathbf{x}\_{1} dt & \int\_{0}^{T} \mathbf{x}\_{1}^{2} dt
\end{bmatrix}^{-1} \begin{bmatrix}
\int\_{0}^{T} \dot{\mathbf{x}}\_{i} \mathbf{x}\_{i} dt \\
\int\_{0}^{T} \dot{\mathbf{x}}\_{i} \mathbf{x}\_{2} dt \\
\int\_{0}^{T} \dot{\mathbf{x}}\_{i} \mathbf{x}\_{i} dt
\end{bmatrix},\tag{28}
$$

in which state time dependence is omitted for brevity.

#### **6.3 Fixed-Point Smoothing**

#### **6.3.1Problem Definition**

In continuous-time fixed-point smoothing, it is desired to calculate state estimates at one particular time of interest, *τ*, 0 ≤ *τ* ≤ *t*, from measurements *z*(*t*) over the interval *t* [0, *T*]. For example, suppose that a continuous measurement stream of a tennis ball's trajectory is available and it is desired to determine whether it bounced within the court boundary. In this case, a fixed-point smoother could be employed to estimate the ball position at the time of the bounce from the past and future measurements.

<sup>&</sup>quot;When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is probably wrong." *Arthur Charles Clarke*

Smoothing, Filtering and Prediction:

. (27)

, (28)

*<sup>w</sup>* , taking logarithms, setting to zero the

<sup>124</sup> Estimating the Past, Present and Future

 

<sup>0</sup> <sup>0</sup> log ( ( )) log (2 ) 0.5 ( ( ) ( )) *<sup>T</sup> <sup>n</sup>*

*<sup>w</sup> <sup>w</sup> f xt* 

*Example 2.* Consider the third-order autoregressive system

*x t*( ) =

2

in which state time dependence is omitted for brevity.

of the bounce from the past and future measurements.

and

 

Assuming <sup>1</sup> *x t* ( ) ~ 21 12 03 

> ˆ ˆ ˆ

*a*

*a*

**6.3 Fixed-Point Smoothing** 

**6.3.1Problem Definition** 

 = 0 results in <sup>0</sup> <sup>0</sup> ( ( ) ( )) ( ) *T*

> <sup>1</sup> 2 <sup>0</sup> <sup>0</sup> <sup>0</sup> ˆ ( () () () *T T a x t dt x t x t dt*

> > 2 2 *dxt*( )

1 2 1 01 2 2 3 3

*x t x t x t x t*

( ( ) ( ) ( ), *ax t ax t ax t* <sup>2</sup> )

2 2

( ) () () () 1 0 0 () 0 () 0 1 0 () 0

partial derivatives with respect to the unknown coefficients, and rearranging yields

*T T T T*

*x dt x x dt x x dt x x dt*

*T T T T*

*T T T T*

1 3 2 1 <sup>1</sup> 1 1 <sup>0</sup> <sup>0</sup> <sup>0</sup> <sup>0</sup>

*x x dt x x dt x dt x x dt*

In continuous-time fixed-point smoothing, it is desired to calculate state estimates at one particular time of interest, *τ*, 0 ≤ *τ* ≤ *t*, from measurements *z*(*t*) over the interval *t* [0, *T*]. For example, suppose that a continuous measurement stream of a tennis ball's trajectory is available and it is desired to determine whether it bounced within the court boundary. In this case, a fixed-point smoother could be employed to estimate the ball position at the time

"When a distinguished but elderly scientist states that something is possible, he is almost certainly

right. When he states that something is impossible, he is probably wrong." *Arthur Charles Clarke*

<sup>3</sup> 2 3 1 3 1 3 <sup>0</sup> <sup>0</sup> <sup>0</sup> <sup>0</sup> <sup>0</sup> 2 <sup>1</sup> 2 3 <sup>2</sup> 2 1 1 2 <sup>0</sup> <sup>0</sup> <sup>0</sup> <sup>0</sup>

*a x x dt x dt x x dt x x dt*

1

*x t a a a x t wt*

/ 2 2 2

*x t a x t x t dt* = 0 and hence

*x t a x t dt* . (24)

. (25)

<sup>2</sup> <sup>1</sup> <sup>0</sup> *x t ax t ax t ax t wt* () () () () () (26)

*dt* . The above system can be written in a controllable

Taking the logarithm of both sides gives

0 log ( ( )) *f xt a*

> 3 3 *dxt*( ) *dt*

Setting

where *x t*( ) =

canonical form as

A solution for the continuous-time fixed-point smoothing problem can be developed from first principles, for example, see [5] - [6]. However, it is recognised in [13] that a simpler solution derivation follows by transforming the smoothing problem into a filtering problem that possesses an augmented state. Following the nomenclature of [14], consider an

augmented state vector having two components, namely, *x(a)*(*t*) = ( ) ( ) *x t t* . The first component, *x*(*t*) *<sup>n</sup>* , is the state of the system *x t* ( ) = *A*(*t*)*x*(*t*) + *B*(*t*)*w*(*t*) and *y*(*t*) = *C*(*t*)*x*(*t*). The second component, ( )*t <sup>n</sup>* , equals *x*(*t*) at time *t* = *τ*, that is, ( )*t* = *x*(*τ*). The corresponding signal model may be written as

$$\dot{\mathbf{x}}^{(a)}(t) = A^{(a)}(t)\mathbf{x}^{(a)}(t) + B^{(a)}(t)w(t) \tag{29}$$

$$\mathbf{x}(t) = \mathbf{C}^{(a)}(t)\mathbf{x}^{(a)}(t) + \mathbf{v}(t) \,. \tag{30}$$

where *A*(*<sup>a</sup>*) = () 0 0 () *<sup>t</sup> A t A t* , *B*(*a*)(*t*) = ( ) ( ) *<sup>t</sup> B t B t* and *C*(*<sup>a</sup>*)(*t*) = [*C*(*t*) 0], in which 1 if *t* second

0 if *<sup>t</sup> t* is the Kronecker delta function. Note that the simplifications *A*(*<sup>a</sup>*) = () 0 0 0 *A t* and *B*(*a*)(*t*) = ( ) 0 *B t* arise for *t* > *τ*. The smoothing objective is to produce an 

estimate ˆ ( )*t* of ( )*t* from the measurements *z*(*t*) over *t* [0, *T*].

#### **6.3.2Solution Derivation**

Employing the Kalman-Bucy filter recursions for the system (29) – (30) results in

$$\begin{aligned} \dot{\hat{\mathbf{x}}}^{(a)}(t) &= A^{(a)}(t)\hat{\mathbf{x}}^{(a)}(t) + K^{(a)}(t) \left( z(t) - \mathbf{C}^{(a)}(t)\hat{\mathbf{x}}(t|\,t) \right) \\ &= \left( A^{(a)}(t) - K^{(a)}(t)\mathbf{C}^{(a)}(t) \right) \mathbf{x}^{(a)}(t) + K^{(a)}(t)z(t) \end{aligned} \tag{31}$$

where

() () () <sup>1</sup> ( ) ( )( ) ( ) ( ) *<sup>a</sup> a aT K t P t C tR t* , (32)

in which *P(a)*(*t*) 2 2 *n n* is to be found. Consider the partitioning ( ) ( ) *<sup>a</sup> K t*<sup>=</sup> ( ) ( ) *K t K t* , then

for *t* > *τ*, (31) may be written as

$$
\begin{bmatrix}
\hat{\mathbf{x}}(t\mid t) \\
\hat{\boldsymbol{\xi}}(t)
\end{bmatrix} = \begin{bmatrix}
A(t) - K(t)\mathbf{C}(t) & \mathbf{0} \\
\end{bmatrix} \begin{bmatrix}
\hat{\mathbf{x}}(t\mid t) \\
\boldsymbol{\xi}(t)
\end{bmatrix} + \begin{bmatrix}
K(t) \\
\underline{K}(t)
\end{bmatrix} \mathbf{z}(t) \tag{33}
$$

<sup>&</sup>quot;Don't be afraid to take a big step if one is indicated. You can't cross a chasm in two small jumps." *David Lloyd George*

[16], [17].

*satisfied since* 

**6.4 Fixed-Lag Smoothing 6.4.1Problem Definition** 

**6.4.2Solution Derivation** 

really angry." *Rosbeth Moss Kanter*

− ˆ( | )] }*<sup>T</sup> xt t*

and { ( ) ( )} *<sup>T</sup> Evtv*

*xt t xt* ˆ( | )][ ( ) 

estimates, *xt t* ˆ(| )

which is initialised with ˆ

**6.3.3Performance** 

 ( ) 

*Lemma 4: In respect of the fixed-point smoother (40),* 

*T T*

 = *Rt t* () ( ) 

*A t C tR tCt*

 = *x*ˆ( ) 

[15]. The smoother (40) and its associated error covariances (36) – (38) are also discussed in

It can be seen that the right-hand-side of the smoother error covariance (38) is non-positive and therefore Ω(*t*) must be monotonically decreasing. That is, the smoothed estimates improve with time. However, since the right-hand-side of (36) varies inversely with *R*(*t*), the improvement reduces with decreasing signal-to-noise ratio. It is shown below the fixed-

*Proof: The initialisation (39) accords with condition (i) of Theorem 1. Condition (ii) of the theorem is* 

() () ( ) ( ) ( )0 0 () () () () 0 0

 

*and hence the claim (41) follows.* �

For continuous-time estimation problems, as usual, it assumed that the observations are

estimates at a fixed time lag behind the current measurements. That is, smoothed state

lag. In particular, fixed-lag smoother estimates are sought which minimise *E*{[*x*(*t*) −

improvement over the minimum-variance filter when the smoothing lag equals several time

Previously, augmented signal models together with the application of the standard Kalman filter recursions were used to obtain the smoother results. However, as noted in [19], it is difficult to derive the optimal continuous-time fixed-lag smoother in this way because an ideal delay operator cannot easily be included within an asymptotically stable state-space system. Consequently, an alternate derivation based on that in [6] is outlined in the

"Change is like putting lipstick on a bulldog. The bulldog's appearance hasn't improved, but now it's

*Qt At C tR tCt*

*T*

point smoother improves on the performance of the minimum-variance filter.

1

modelled by *x t* ( ) = *A*(t)*x*(*t*) + *B*(*t*)*w*(*t*), *z*(*t*) = *C*(*t*)*x*(*t*) + *v*(*t*), with { ( ) ( )} *<sup>T</sup> Ewtw*

constants associated with the minimum-variance filter for the problem.

. Alternative derivations of (40) are presented in [5], [8],

*Pt t* () () . (41)

 = *Qt t* () ( ) 

1

. In fixed-lag smoothing, it is desired to calculate state

, are desired at time *t*, given data at time *t* + *τ*, where *τ* is a prescribed

. It is found in [18] that the smoother yields practically all the

Define the augmented error state as ( ) ( ) *<sup>a</sup> x t* = ( ) ( ) *<sup>a</sup> x t* − ( ) ˆ ( ) *<sup>a</sup> x t* , that is,

$$
\begin{bmatrix}
\tilde{\boldsymbol{\chi}}(t \mid t) \\
\tilde{\boldsymbol{\xi}}(t)
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{\chi}(t) \\
\boldsymbol{\xi}(\tau)
\end{bmatrix} - \begin{bmatrix}
\hat{\boldsymbol{\chi}}(t \mid t) \\
\hat{\boldsymbol{\xi}}(t)
\end{bmatrix}.
\tag{34}
$$

Differentiating (34) and using *z*(*t*) = *Ctxt t* () ( | ) + *v*(*t*) gives

$$\begin{aligned} \begin{bmatrix} \dot{\tilde{\mathbf{x}}}(t\ \ \mathbf{t}\ \ \mathbf{t})\\ \dot{\tilde{\mathbf{y}}}(t\ \ \end{bmatrix} &= \begin{bmatrix} \dot{\mathbf{x}}(t\ \ \mathbf{t}\ \ \mathbf{t})\\ \mathbf{0} \end{bmatrix} - \begin{bmatrix} \dot{\tilde{\mathbf{x}}}(t\ \ \mathbf{t}\ \ \mathbf{t})\\ \dot{\tilde{\mathbf{y}}}(t\ \ \end{bmatrix} \\\\ &= \begin{bmatrix} A(t) - K(t)\mathbf{C}(t) & \mathbf{0} \\\ -\underline{K}(t)\mathbf{C}(t) & \mathbf{0} \end{bmatrix} \begin{bmatrix} \tilde{\mathbf{x}}(t\ \ \mathbf{t}\ \ \mathbf{t})\\ \tilde{\mathbf{y}}(t\ \ \end{bmatrix} + \begin{bmatrix} B(t) & -K(t) \\\ \mathbf{0} & -\underline{K}(t) \end{bmatrix} \begin{bmatrix} w(t)\\ w(t) \end{bmatrix}. \end{aligned} \tag{35}$$

Denote *P*(*a*) (*t*) = () () () () *<sup>T</sup> Pt t t t* , where *P*(*t*) = *E*{[*x*(*t*) − *xt t xt* ˆ( | )][( ( ) − ˆ( | )] }*<sup>T</sup> xt t* , ( )*t* = *E t* {[ ( ) 

− ˆ ( )][ ( ) *t t* − ˆ( )] }*<sup>T</sup> t* and ( )*t* = *E t* {[ ( ) − ˆ ( )][ ( ) *t xt* − ˆ( | )] }*<sup>T</sup> xt t* . Applying Lemma 2 of Chapter 3 to (35) yields the Lyapunov differential equation

$$
\begin{bmatrix}
\dot{P}(t) & \dot{\Sigma}^{\top}(t) \\
\dot{\Sigma}(t) & \dot{\Omega}(t)
\end{bmatrix} = 
\begin{bmatrix}
A(t) - K(t)\mathbb{C}(t) & 0 \\
\end{bmatrix}
\begin{bmatrix}
P(t) & \Sigma^{\top}(t) \\
\Sigma(t) & \Omega(t)
\end{bmatrix} + 
\begin{bmatrix}
P(t) & \Sigma^{\top}(t) \\
\Sigma(t) & \Omega(t)
\end{bmatrix}
\begin{bmatrix}
A^{\top}(t) - \mathbb{C}^{\top}(t)K^{\top}(t) & -\mathbb{C}^{\top}(t)\underline{K}^{\top}(t) \\
0 & 0
\end{bmatrix}
$$

$$
+ 
\begin{bmatrix}
B(t) & -K(t) \\
0 & -\underline{K}(t)
\end{bmatrix}
\begin{bmatrix}
Q(t) & 0 \\
0 & R(t)
\end{bmatrix}
\begin{bmatrix}
B^{\top}(t) & 0 \\
\end{bmatrix}.
$$

Simplifying the above differential equation yields

$$\dot{P}(t) = A(t)P(t) + P(t)A^\top(t) - P(t)\mathcal{C}^\top(t)R^{-1}(t)\mathcal{C}(t)P(t) + B(t)Q(t)B^\top(t),\tag{36}$$

$$
\dot{\Sigma}(t) = \Sigma(t) \left( A^{\top}(t) - \mathbb{C}^{\top}(t) K^{\top}(t) \right),
\tag{37}
$$

$$
\dot{\Omega}(t) = -\Sigma(t)\mathbf{C}^{\top}(t)R^{-1}(t)\mathbf{C}(t)\Sigma^{\top}(t)\ . \tag{38}
$$

Equations(37) – (38) can be initialised with

$$
\Sigma(\tau) = P(\tau) \,. \tag{39}
$$

Thus, the fixed-point smoother estimate is given by

$$
\hat{\tilde{\varphi}}(t) = \Sigma(t)\mathbf{C}^{\top}(t)\mathbf{R}^{-1}(t)\left(\mathbf{z}(t) - \mathbf{C}(t)\hat{\mathbf{x}}(t|t)\right),
\tag{40}
$$

<sup>&</sup>quot;If you don't like change, you're going to like irrelevance even less." *General Eric Shinseki*

Smoothing, Filtering and Prediction:

(35)

(36) (37) (38)

<sup>126</sup> Estimating the Past, Present and Future

*t*

. (34)

() () 0 ( ) 0 () ( ) *At KtCt xt t Bt Kt w t*

*t K t v t*

, where *P*(*t*) = *E*{[*x*(*t*) − *xt t xt* ˆ( | )][( ( ) − ˆ( | )] }*<sup>T</sup> xt t* , ( )*t* = *E t* {[ ( )

.

*t tC tR t zt Ctxt t* , (40)

.

( )][ ( ) *t xt* − ˆ( | )] }*<sup>T</sup> xt t* . Applying Lemma 2 of

. (39)

 

Define the augmented error state as ( ) ( ) *<sup>a</sup> x t* = ( ) ( ) *<sup>a</sup> x t* − ( ) ˆ ( ) *<sup>a</sup> x t* , that is,

0 ˆ ( ) ( ) *xt t xt t xt t*

 

*t*

Differentiating (34) and using *z*(*t*) = *Ctxt t* () ( | ) + *v*(*t*) gives

*t* and ( )*t* = *E t* {[ ( )

Chapter 3 to (35) yields the Lyapunov differential equation

*t t KtCt tt tt*

*Bt Kt Q t B t*

() () () 0 () 0

Simplifying the above differential equation yields

Equations(37) – (38) can be initialised with

Thus, the fixed-point smoother estimate is given by

(|) (|) ˆ(|)

*t*

(*t*) = () () () () *<sup>T</sup> Pt t t t* 

 − ˆ( )] }*<sup>T</sup>* 

Denote *P*(*a*)

 

− ˆ ( )][ ( ) *t t* 

(|) ( ) ˆ(|) ˆ ( ) ( ) ( ) *xt t x t xt t*

> *t*

() () () 0 (|) () () ( )

 − ˆ 

0 () 0 () () ()

*K t R t Kt Kt* 

<sup>1</sup> () () () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt AtPt Pt A t PtC tR tCtPt BtQtB t* ,

() () () () () *T TT t t A t C tK t* ,

<sup>1</sup> () () () () () () *<sup>T</sup> <sup>T</sup> t tC tR tCt t* .

() () *P* 

ˆ <sup>1</sup> () () () () () () ( | ) ˆ *<sup>T</sup>*

"If you don't like change, you're going to like irrelevance even less." *General Eric Shinseki*

() () () () () 0 () () () () () () () () () () () () () 0 () () () () 0 0 *<sup>T</sup> <sup>T</sup> <sup>T</sup> T TT T T Pt t At KtCt Pt t Pt t A t C tK t C tK t*

*T T*

*T*

*KtCt*

  which is initialised with ˆ ( ) = *x*ˆ( ) . Alternative derivations of (40) are presented in [5], [8], [15]. The smoother (40) and its associated error covariances (36) – (38) are also discussed in [16], [17].

#### **6.3.3Performance**

It can be seen that the right-hand-side of the smoother error covariance (38) is non-positive and therefore Ω(*t*) must be monotonically decreasing. That is, the smoothed estimates improve with time. However, since the right-hand-side of (36) varies inversely with *R*(*t*), the improvement reduces with decreasing signal-to-noise ratio. It is shown below the fixedpoint smoother improves on the performance of the minimum-variance filter.

*Lemma 4: In respect of the fixed-point smoother (40),* 

$$P(t) \ge \Omega(t) \,. \tag{41}$$

*Proof: The initialisation (39) accords with condition (i) of Theorem 1. Condition (ii) of the theorem is satisfied since* 

$$
\begin{bmatrix} Q(t) & A(t) \\ A^\top(t) & -\mathbf{C}^\top(t)\mathbf{R}^{-1}(t)\mathbf{C}(t) \end{bmatrix} \ge \begin{bmatrix} -\mathbf{C}^\top(t)\mathbf{R}^{-1}(t)\mathbf{C}(t)\mathbf{0} & \mathbf{0} \\ \mathbf{0} & \mathbf{0} \end{bmatrix}
$$

*and hence the claim (41) follows.* �

#### **6.4 Fixed-Lag Smoothing**

#### **6.4.1Problem Definition**

For continuous-time estimation problems, as usual, it assumed that the observations are modelled by *x t* ( ) = *A*(t)*x*(*t*) + *B*(*t*)*w*(*t*), *z*(*t*) = *C*(*t*)*x*(*t*) + *v*(*t*), with { ( ) ( )} *<sup>T</sup> Ewtw* = *Qt t* () ( ) and { ( ) ( )} *<sup>T</sup> Evtv* = *Rt t* () ( ) . In fixed-lag smoothing, it is desired to calculate state estimates at a fixed time lag behind the current measurements. That is, smoothed state estimates, *xt t* ˆ(| ) , are desired at time *t*, given data at time *t* + *τ*, where *τ* is a prescribed lag. In particular, fixed-lag smoother estimates are sought which minimise *E*{[*x*(*t*) − *xt t xt* ˆ( | )][ ( ) − ˆ( | )] }*<sup>T</sup> xt t* . It is found in [18] that the smoother yields practically all the improvement over the minimum-variance filter when the smoothing lag equals several time constants associated with the minimum-variance filter for the problem.

#### **6.4.2Solution Derivation**

Previously, augmented signal models together with the application of the standard Kalman filter recursions were used to obtain the smoother results. However, as noted in [19], it is difficult to derive the optimal continuous-time fixed-lag smoother in this way because an ideal delay operator cannot easily be included within an asymptotically stable state-space system. Consequently, an alternate derivation based on that in [6] is outlined in the

<sup>&</sup>quot;Change is like putting lipstick on a bulldog. The bulldog's appearance hasn't improved, but now it's really angry." *Rosbeth Moss Kanter*

**6.4.3Performance**  *Lemma 5 [18]:* 

**6.5 Fixed-Interval Smoothing** 

**6.5.1Problem Definition** 

decreases monotonically with the smoother lag *τ*.

trade off performance, calculation cost and delay.

system operates on the results of the forward process.

depend on the quantity being estimated, *viz.*,

*wt T wt* ˆ ( | )][ ( ) − ˆ ( | )] }*<sup>T</sup> wt T* ;

ˆ( | )] }*<sup>T</sup> yt T* is minimised.

*xt T xt* ˆ( | )][ ( ) − ˆ( | )] }*<sup>T</sup> xt T* ; and

"If you want to truly understand something, try to change it." *Kurt Lewin*

*P*(*t*) – *E*{[*x*(*t*) − *xt t xt* ˆ( | )][ ( )

*Proof. It is argued from the references of [18] for the fixed-lag smoothed estimate that* 

− ˆ( | )] }*<sup>T</sup> xt t*

<sup>1</sup> {[ ( ) ( | )][ ( ) ( | )] } ( ) ( ) ( , ) ( ) ( ) ( ) ( , ) ( ) ˆ ˆ *<sup>T</sup> T T <sup>t</sup> E x t x t t x t x t t P t P t s t C s R s C s s t ds P t*

*Thus, (48) follows by inspection of (49). �*  That is to say, the minimum-variance filter error covariance is greater than fixed-lag smoother error covariance. It is also argued in [18] that (48) implies the error covariance

Many data analyses occur off-line. In medical diagnosis for example, reviews of ultra-sound or CAT scan images are delayed after the time of measurement. In principle, smoothing could be employed instead of filtering for improving the quality of an image sequence.

Fixed-lag smoothers are elegant – they can provide a small performance improvement over filters at moderate increase in implementation cost. The best performance arises when the lag is sufficiently large, at the expense of increased complexity. Thus, the designer needs to

Fixed-interval smoothers are a brute-force solution for estimation problems. They provide improved performance without having to fine tune a smoothing lag, at the cost of approximately twice the filter calculation complexity. Fixed interval smoothers involve two passes. Typically, a forward process operates on the measurements. Then a backward

The plants are again assumed to have state-space realisations of the form *x t* ( ) = *A*(t)*x*(*t*) + *B*(*t*)*w*(*t*) and *y*(*t*) = *C*(*t*)*x*(*t*) + *D*(*t*)*w*(*t*). Smoothers are considered which operate on measurements *z*(*t*) = *y*(*t*) + *v*(*t*) over a fixed interval *t* [0, *T*]. The performance criteria

in input estimation, the objective is to calculate a *wt T* ˆ (| ) that minimises *E*{[*w*(*t*) −

in state estimation, *xt T* ˆ(| ) is calculated which achieves the minimum *E*{[*x*(*t*) −

in output estimation, *y*ˆ(| ) *t T* is produced such that *E*{[*y*(*t*) − *y*ˆ( | )][ ( ) *t T yt* −

(49)

> 0. (48)

following. Recall that the gain of the minimum-variance filter is calculated as *K*(*t*) = <sup>1</sup> () () () *<sup>T</sup> PtC tR t* , where *P*(*t*) is the solution of the Riccati equation (3.36). Let (,) *t* denote the transition matrix of the filter error system *xt t* (|)  *=* (*A*(*t*) *– KtCt xt t* ( ) ( )) ( | ) + *Btwt* () () *− Ktvt* () () , that is,

$$
\dot{\Phi}(t,s) = \left( A(\tau) - K(\tau) \mathbb{C}(\tau) \right) \Phi(t,s) \tag{42}
$$

and (, ) *s s* = *I*. It is assumed in [6], [17], [18], [20] that a smoothed estimate *xt t* ˆ(| ) of *x*(*t*) is obtained as

$$
\hat{\mathbf{x}}(t \mid t+\tau) = \hat{\mathbf{x}}(t) + P(t)\xi(t,\tau) \,. \tag{43}
$$

where

$$\mathcal{L}(t, t + \tau) = \int\_{t}^{t + \tau} \Phi^{\top}(\tau, t) \mathsf{C}^{\top}(\tau) \mathsf{R}^{-1}(\tau) \left( z(\tau) - \mathsf{C}(\tau) \hat{\boldsymbol{x}}(\tau \mid \tau) \right) d\tau \,. \tag{44}$$

The formula (43) appears in the development of fixed interval smoothers [21] - [22], in which case *ξ*(*t*) is often called an adjoint variable. From the use of Leibniz' rule, that is,

$$\frac{d}{dt}\int\_{a(t)}^{b(t)}f(t,s)ds = f(t,b(t))\frac{db(t)}{dt} - f(t,a(t))\frac{da(t)}{dt} + \int\_{a(t)}^{b(t)}\frac{\partial}{\partial t}f(t,s)ds$$

it can be found that

$$\begin{aligned} \dot{\boldsymbol{\zeta}}(t, t + \tau) &= \boldsymbol{\Phi}^{\top}(t + \tau) \boldsymbol{\mathcal{C}}^{\top}(t + \tau) \boldsymbol{\mathcal{R}}^{-1}(t + \tau) \Big( \boldsymbol{z}(t + \tau) - \boldsymbol{\mathcal{C}}(t + \tau) \hat{\boldsymbol{\boldsymbol{x}}}(t + \tau) \, | \, t + \tau \big) \\ &- \boldsymbol{\mathcal{C}}^{\top}(t) \boldsymbol{\mathcal{R}}^{-1}(t) \Big( \boldsymbol{z}(t) - \boldsymbol{\mathcal{C}}(t) \hat{\boldsymbol{\boldsymbol{x}}}(t \, | \, t \big) \Big) - \left( \boldsymbol{\mathcal{A}}(t) - \boldsymbol{\mathcal{K}}(t) \boldsymbol{\mathcal{C}}(t) \right)^{\top} \boldsymbol{\xi}(t, t + \tau) \, . \end{aligned} \tag{45}$$

Differentiating (43) with respect to *t* gives

$$
\dot{\hat{\mathbf{x}}}(t \mid t+\tau) = \dot{\hat{\mathbf{x}}}(t \mid t) + \dot{P}(t)\xi(t,\tau) + P(t)\dot{\xi}(t,\tau) \,. \tag{46}
$$

Substituting (, ) *t* = <sup>1</sup> *P t xt t xt t* () ( | ) ( | ) ˆ ˆ and expressions for *x t* ˆ( ) , *P t*( ) , (, ) *t t* into (43) yields the fixed–lag smoother differential equation

$$\begin{aligned} \dot{\hat{\mathbf{x}}}(t \mid t+\tau) &= A(t)\dot{\hat{\mathbf{x}}}(t \mid t+\tau) + B(t)Q(t)B^{\top}(t)P^{-1}(t)\left(\hat{\mathbf{x}}(t \mid t+\tau) - \hat{\mathbf{x}}(t \mid t)\right) \\ &+ P(t)\Phi^{\top}(t+\tau, t)\mathbf{C}^{\top}(t+\tau)R^{-1}(t+\tau)\left(\mathbf{z}(t+\tau) - \mathbf{C}(t+\tau)\hat{\mathbf{x}}(t+\tau \mid t+\tau)\right). \end{aligned} \tag{47}$$

<sup>&</sup>quot;An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: What does happen is that the opponents gradually die out." *Max Karl Ernst Ludwig Planck* 

**6.4.3Performance** 

*Lemma 5 [18]:* 

Smoothing, Filtering and Prediction:

Let

 *=* (*A*(*t*) *– KtCt xt t* ( ) ( )) ( | ) + *Btwt* () () *−*

, (43)

 

 

, *P t*( ) ,

 (, ) *t t* 

(42)

 

. (44)

 

 . (46)

   

  of *x*(*t*)

(45)

(47)

*t* denote

<sup>128</sup> Estimating the Past, Present and Future

following. Recall that the gain of the minimum-variance filter is calculated as *K*(*t*) =

<sup>1</sup> () () () *<sup>T</sup> PtC tR t* , where *P*(*t*) is the solution of the Riccati equation (3.36). Let (,)

 (, ) ( ) ( ) ( ) (, ) *ts A K C ts* 

*xt t xt Pt t* ˆ ˆ ( | ) () () (, )

 <sup>1</sup> (, ) ( , ) ( ) ( ) ( ) ( ) ( | ) <sup>ˆ</sup> *<sup>t</sup> T T <sup>t</sup> t t tC R z C x d*

case *ξ*(*t*) is often called an adjoint variable. From the use of Leibniz' rule, that is,

( ) ( ) ( ) ( )

*a t a t*

*dt dt dt t*

<sup>1</sup> ( , ) ( ) ( ) ( ) ( ) ( ) ( )| ) ˆ *<sup>T</sup> <sup>T</sup>*

 

*tt t C t R t zt Ct xt t*

<sup>1</sup> () () () () ( | ) () () () (, ) <sup>ˆ</sup> *<sup>T</sup> <sup>T</sup> C tR t zt Ctxt t At KtCt tt*

() ()

*xt t xt t Pt t Pt t* ˆ ˆ ( | ) ( | ) () (, ) () (, )

<sup>1</sup> () ( ,) ( ) ( ) ( ) ( ) ( | ) <sup>ˆ</sup> *<sup>T</sup> <sup>T</sup> Pt t tC t R t zt Ct xt t*

 

 = <sup>1</sup> *P t xt t xt t* () ( | ) ( | ) ˆ ˆ 

<sup>1</sup> ˆ( | ) () ( | ) () () () () ( | ) ( | ) ˆ ˆ ˆ *<sup>T</sup> xt t Atxt t BtQtB tP t xt t xt t*

 

The formula (43) appears in the development of fixed interval smoothers [21] - [22], in which

( ) ( ) ( , ) ( , ( )) ( , ( )) (, ) *b t b t*

 ,

> 

and expressions for *x t* ˆ( )

.

 

.

"An important scientific innovation rarely makes its way by gradually winning over and converting its opponents: What does happen is that the opponents gradually die out." *Max Karl Ernst Ludwig Planck* 

*<sup>d</sup> db t da t <sup>f</sup> t s ds <sup>f</sup> tbt <sup>f</sup> tat <sup>f</sup> t s ds*

 

  

and (, ) *s s* = *I*. It is assumed in [6], [17], [18], [20] that a smoothed estimate *xt t* ˆ(| )

 

the transition matrix of the filter error system *xt t* (|)

 

*Ktvt* () () , that is,

is obtained as

it can be found that

 (, ) *t* 

Substituting

 

into (43) yields the fixed–lag smoother differential equation

Differentiating (43) with respect to *t* gives

where

$$P(t) - E([\mathbf{x}(t) = \dot{\hat{\mathbf{x}}}(t \mid t + \tau)][\mathbf{x}(t) = \dot{\hat{\mathbf{x}}}(t \mid t + \tau)]^{\top}) \tag{48}$$

*Proof. It is argued from the references of [18] for the fixed-lag smoothed estimate that* 

$$E\{\left[\mathbf{x}(t) - \dot{\mathbf{x}}(t|t+\tau)\right]\left[\mathbf{x}(t) - \dot{\mathbf{x}}(t|t+\tau)\right]^\dagger\} = P(t) - P(t)\int\_t^t \Phi^\top(\mathbf{s}, t)\mathbf{C}^\top(\mathbf{s})\mathbf{R}^{-1}(\mathbf{s})\mathbf{C}(\mathbf{s})\Phi(\mathbf{s}, t)d\mathbf{s} \tag{49}$$

*Thus, (48) follows by inspection of (49). �* 

That is to say, the minimum-variance filter error covariance is greater than fixed-lag smoother error covariance. It is also argued in [18] that (48) implies the error covariance decreases monotonically with the smoother lag *τ*.

#### **6.5 Fixed-Interval Smoothing**

#### **6.5.1Problem Definition**

Many data analyses occur off-line. In medical diagnosis for example, reviews of ultra-sound or CAT scan images are delayed after the time of measurement. In principle, smoothing could be employed instead of filtering for improving the quality of an image sequence.

Fixed-lag smoothers are elegant – they can provide a small performance improvement over filters at moderate increase in implementation cost. The best performance arises when the lag is sufficiently large, at the expense of increased complexity. Thus, the designer needs to trade off performance, calculation cost and delay.

Fixed-interval smoothers are a brute-force solution for estimation problems. They provide improved performance without having to fine tune a smoothing lag, at the cost of approximately twice the filter calculation complexity. Fixed interval smoothers involve two passes. Typically, a forward process operates on the measurements. Then a backward system operates on the results of the forward process.

The plants are again assumed to have state-space realisations of the form *x t* ( ) = *A*(t)*x*(*t*) + *B*(*t*)*w*(*t*) and *y*(*t*) = *C*(*t*)*x*(*t*) + *D*(*t*)*w*(*t*). Smoothers are considered which operate on measurements *z*(*t*) = *y*(*t*) + *v*(*t*) over a fixed interval *t* [0, *T*]. The performance criteria depend on the quantity being estimated, *viz.*,


<sup>&</sup>quot;If you want to truly understand something, try to change it." *Kurt Lewin*

where

Hence, the solution is given by

Recognising that <sup>1</sup> ( ) *T Q kT s s*

procedure.

where

is the smoother gain. Suppose that *x T* ˆ(| )

obtain state estimates *x*ˆ(|)

 

estimates *x*ˆ(|)

**6.5.2.2 Alternative Form** 

place." *Washington Irving*

*x T Ax T G x T x* ˆ( | ) ()( | ) () ( | ) ( | )

<sup>1</sup> ( ( | ) ( ) ( | )) <sup>ˆ</sup> <sup>ˆ</sup> () () () () ( ) ˆ(| )

*s x k T T x kT T T*

<sup>1</sup> <sup>1</sup> ( ) ( ) ( ) ( )( ) ( ) *<sup>T</sup> G kT B kT T Q kT B kT I AT P kT <sup>s</sup> ss s <sup>s</sup> <sup>s</sup> <sup>s</sup>*

<sup>1</sup> () () () () () *<sup>T</sup> G BQB P*

integrated backwards in time from the initial condition *x T* ˆ(| )

<sup>1</sup> ˆ( | ) () ( | ) () () () () ( | ) ( | ) <sup>ˆ</sup> ˆ ˆ *<sup>T</sup> xt T Atxt T BtQtB tP t xt T xt t*

1

*<sup>T</sup> x T Ax T G BQB <sup>P</sup>*

 

*x T*

multiples of *Ts* and are constant during the sampling interval. Using the Euler

 

To summarise, the above fixed-interval smoother is realised by the following two-pass

(i) In the first pass, the (forward) Kalman-Bucy filter operates on measurements *z*(*τ*) to

(ii) In the second pass, the differential equation (50) operates on the filtered state

For the purpose of developing an alternate form of the above smoother found in the

,

"There is a certain relief in change, even though it be from bad to worse. As I have often found in travelling in a stagecoach that it is often a comfort to shift one's position, and be bruised in a new

( | ) ( )( ( | ) ( | )) *t T P t xt T xt t* ˆ ˆ (55)

to obtain smoothed state estimates *x T* ˆ(| )

 

= *Q*(τ), see [23], and taking the limit as *Ts* → 0 and yields

 

 ˆ ˆ ˆ , (50)

*T*

, *A*(τ), *B*(τ), *Q*(τ), *P−*1(τ) are sampled at integer *k*

, the sampled gain may be written as

. (53)

. Equation (50) is

at *τ* = *T*.

(54)

 = *x*ˆ(|) 

. (52)

(51)

 

> 

approximation ˆ( |) *<sup>s</sup> x kT T* = ˆ(( 1) | ) ( | ) <sup>ˆ</sup> *<sup>s</sup> <sup>s</sup>*

> .

Alternative derivations of this smoother appear in [6], [20], [23], [24].

literature, consider a fictitious forward version of (50), namely,

() ( | ) () () () ( | ) <sup>ˆ</sup> *<sup>T</sup> <sup>A</sup> txt T BtQtB t t T*

This section focuses on three continuous-time fixed-interval smoother formulations; the maximum-likelihood smoother derived by Rauch, Tung and Streibel [3], the Fraser-Potter smoother [4] and a generalisation of Wiener's optimal unrealisable solution [8] – [10]. Some additional historical background to [3] – [4] is described within [1], [2], [17].

#### **6.5.2The Maximum Likelihood Smoother**

#### **6.5.2.1 Solution Derivation**

Rauch, Tung and Streibel [3] employed the maximum-likelihood method to develop a discrete-time smoother for state estimation and then used a limiting argument to obtain a continuous-time version. A brief outline of this derivation is set out here. Suppose that a record of filtered estimates, *x*ˆ(|) , is available over a fixed interval *τ* [0, *T*]. Let *x T* ˆ(| ) denote smoothed state estimates at time 0 ≤ *τ* ≤ *T* to be evolved backwards in time from filtered states *x*ˆ(|) . The smoother development is based on two assumptions. First, it is assumed that *x T* ˆ(| ) is normally distributed with mean *Ax T* ()( | ) ˆ and covariance *B*(τ)*Q*(τ)*BT*(τ), that is, *x T* ˆ(| ) ~ ( ( ) ( | ), *Ax T* ˆ *B*(*τ*)*Q*(*τ*)*BT*(*τ*)). The probability density function of *x T* ˆ(| ) is

$$\begin{split} p(-\dot{\hat{\mathbf{x}}}(\tau\mid T) \mid \hat{\mathbf{x}}(\tau\mid T)) &= \frac{1}{(2\pi)^{n/2} \left| B(\tau)Q(\tau)B^{\top}(\tau) \right|^{1/2}} \\ &\times \exp\left\{ -0.5(-\dot{\hat{\mathbf{x}}}(\tau\mid T) - A(\tau)\hat{\mathbf{x}}(\tau\mid T))^{\top}(B(\tau)Q(\tau)B^{\top}(\tau))^{-1}(-\dot{\hat{\mathbf{x}}}(\tau\mid T) - A(\tau)\hat{\mathbf{x}}(\tau\mid T)) \right\} \end{split}$$

Second, it is assumed that *x T* ˆ(| ) is normally distributed with mean *x*ˆ(|) and covariance *P*(*τ*), namely, *x T* ˆ(| ) ~ *N x*(( | ) ˆ , *P*(*τ*)). The corresponding probability density function is

$$p(\hat{\mathbf{x}}(\tau \mid T) \mid \hat{\mathbf{x}}(\tau \mid \tau)) = \frac{1}{\left(2\pi\right)^{n/2} \left| P(\tau) \right|^{1/2}} \times \exp\left\{ -0.5 (\hat{\mathbf{x}}(\tau \mid T) - \hat{\mathbf{x}}(\tau \mid \tau))^{\top} P^{-1}(t) (\hat{\mathbf{x}}(\tau \mid T) - \hat{\mathbf{x}}(\tau \mid \tau)) \right\}.$$

From the approach of [3] and the further details in [6],

$$0 = \frac{\partial \log \, p(-\dot{\hat{\mathbf{x}}}(\tau \mid T) \mid \hat{\mathbf{x}}(\tau \mid T)) p(\hat{\mathbf{x}}(\tau \mid T) \mid \hat{\mathbf{x}}(\tau \mid \tau))}{\partial \hat{\mathbf{x}}(\tau \mid T)}$$

$$= \frac{\partial \log \, p(-\dot{\hat{\mathbf{x}}}(\tau \mid T) \mid \hat{\mathbf{x}}(\tau \mid T))}{\partial \hat{\mathbf{x}}(\tau \mid T)} + \frac{\partial \log \, p(\hat{\mathbf{x}}(t \mid T) \mid \hat{\mathbf{x}}(t \mid t))}{\partial \hat{\mathbf{x}}(t \mid T)}$$

results in

$$0 = \frac{\partial(-\dot{\hat{\mathbf{x}}}(\tau \mid T) - A(\tau)\hat{\mathbf{x}}(\tau \mid T))^{\top}}{\partial \hat{\mathbf{x}}(\tau \mid T)} (B(\tau)Q(\tau)B^{\top}(\tau))^{-1}(-\dot{\hat{\mathbf{x}}}(\tau \mid T) - A(\tau)\hat{\mathbf{x}}(\tau \mid T)) + P^{-1}(\tau)(\hat{\mathbf{x}}(t \mid T) - \hat{\mathbf{x}}(\tau \mid \tau)) \dots$$

<sup>&</sup>quot;The soft-minded man always fears change. He feels security in the status quo, and he has an almost morbid fear of the new. For him, the greatest pain is the pain of a new idea." *Martin Luther King Jr.*

Hence, the solution is given by

$$-\dot{\hat{\mathbf{x}}}(\tau \mid T) = A(\tau)\hat{\mathbf{x}}(\tau \mid T) + G(\tau)\left(\hat{\mathbf{x}}(\tau \mid T) - \hat{\mathbf{x}}(\tau \mid \tau)\right),\tag{50}$$

where

Smoothing, Filtering and Prediction:

ˆ and covariance

 

and covariance

 

 

<sup>130</sup> Estimating the Past, Present and Future

This section focuses on three continuous-time fixed-interval smoother formulations; the maximum-likelihood smoother derived by Rauch, Tung and Streibel [3], the Fraser-Potter smoother [4] and a generalisation of Wiener's optimal unrealisable solution [8] – [10]. Some

Rauch, Tung and Streibel [3] employed the maximum-likelihood method to develop a discrete-time smoother for state estimation and then used a limiting argument to obtain a continuous-time version. A brief outline of this derivation is set out here. Suppose that a

denote smoothed state estimates at time 0 ≤ *τ* ≤ *T* to be evolved backwards in time from

is normally distributed with mean *Ax T* ()( | )

 ( ( ) ( | ), *Ax T* 

<sup>1</sup> exp 0.5( ( | ) ( ) ( | )) ( ( ) ( ) ( )) ( ( | ) ( ) ( | )) <sup>ˆ</sup> <sup>ˆ</sup> <sup>ˆ</sup> <sup>ˆ</sup> *<sup>T</sup> <sup>T</sup> x T Ax T BQB x T Ax T*

<sup>1</sup> ( ( | ) | ( | )) ˆ ˆ exp 0.5( ( | ) ( | )) ( )( ( | ) ( | )) ˆ ˆ ˆ ˆ

0 = log ( ( | ) | ( | )) ( ( | ) | ( | )) ˆˆ ˆˆ ˆ(| ) *p x T x T px T x x T*

> 

<sup>1</sup> <sup>1</sup> ( ( | ) ( ) ( | )) <sup>ˆ</sup> <sup>ˆ</sup> <sup>0</sup> ( ( ) ( ) ( )) ( ( | ) ( ) ( | )) ( )( ( | ) ( | )) <sup>ˆ</sup> <sup>ˆ</sup> ˆ ˆ ˆ(| ) *<sup>T</sup> x T Ax T <sup>T</sup> B Q B x T A x T P xt T x x T*

"The soft-minded man always fears change. He feels security in the status quo, and he has an almost morbid fear of the new. For him, the greatest pain is the pain of a new idea." *Martin Luther King Jr.*

 

.

*<sup>n</sup> px T x x T x P tx T x*

ˆ(| ) *p x Tx T x T* 

 

> 

 

is normally distributed with mean *x*ˆ(|)

.

 

 

, *P*(*τ*)). The corresponding probability density function is

 

<sup>1</sup>

*T*

 

<sup>+</sup>log ( ( | ) | ( | )) ˆ ˆ ˆ(| ) *p xt T xt t xt T*

, is available over a fixed interval *τ* [0, *T*]. Let *x T* ˆ(| )

 

ˆ *B*(*τ*)*Q*(*τ*)*BT*(*τ*)). The probability density

 

> 

> >

 

. The smoother development is based on two assumptions. First, it is

additional historical background to [3] – [4] is described within [1], [2], [17].

1/ 2 / 2

 

 

(2 ) ( ) ( ) ( ) *<sup>n</sup> <sup>T</sup>*

 

1/ 2 / 2

 

*P*

(2 ) ( )

From the approach of [3] and the further details in [6],

 ~ *N x*(( | ) ˆ 

<sup>=</sup>log ( ( | ) | ( | )) ˆ ˆ

*BQB*

 

~

**6.5.2The Maximum Likelihood Smoother** 

**6.5.2.1 Solution Derivation** 

filtered states *x*ˆ(|)

function of *x T* ˆ(| )

*px Tx T*

*P*(*τ*), namely, *x T* ˆ(| )

results in

 

assumed that *x T* ˆ(| )

record of filtered estimates, *x*ˆ(|)

 

*B*(τ)*Q*(τ)*BT*(τ), that is, *x T* ˆ(| )

is

 

<sup>1</sup> ( ( | ) | ( | )) ˆ ˆ

Second, it is assumed that *x T* ˆ(| )

 

$$\mathbf{G}(\tau) = -B(\tau)\mathbf{Q}(\tau)\mathbf{B}^{\top}(\tau)\frac{\hat{\boldsymbol{\alpha}}(-\dot{\hat{\mathbf{x}}}(\tau\mid T) - A(\tau)\hat{\mathbf{x}}(\tau\mid T))^{\top}}{\hat{\boldsymbol{\alpha}}\hat{\mathbf{x}}(\tau\mid T)}\mathbf{P}^{-1}(\tau)\tag{51}$$

is the smoother gain. Suppose that *x T* ˆ(| ) , *A*(τ), *B*(τ), *Q*(τ), *P−*1(τ) are sampled at integer *k* multiples of *Ts* and are constant during the sampling interval. Using the Euler approximation ˆ( |) *<sup>s</sup> x kT T* = ˆ(( 1) | ) ( | ) <sup>ˆ</sup> *<sup>s</sup> <sup>s</sup> s x k T T x kT T T* , the sampled gain may be written as

$$\mathbf{G}(kT\_s) = \mathbf{B}(kT\_s)T\_s^{-1}\mathbf{Q}(kT\_s)\mathbf{B}^T(kT\_s)(I+AT\_s)\mathbf{P}^{-1}(kT\_s)\,. \tag{52}$$

Recognising that <sup>1</sup> ( ) *T Q kT s s* = *Q*(τ), see [23], and taking the limit as *Ts* → 0 and yields

$$G(\tau) = B(\tau)Q(\tau)B^{\top}(\tau)P^{-1}(\tau)\,. \tag{53}$$

To summarise, the above fixed-interval smoother is realised by the following two-pass procedure.


Alternative derivations of this smoother appear in [6], [20], [23], [24].

#### **6.5.2.2 Alternative Form**

For the purpose of developing an alternate form of the above smoother found in the literature, consider a fictitious forward version of (50), namely,

$$\begin{aligned} \dot{\hat{\mathbf{x}}}(t \mid T) &= A(t)\hat{\mathbf{x}}(t \mid T) + B(t)Q(t)B^\top(t)P^{-1}(t)\left(\hat{\mathbf{x}}(t \mid T) - \hat{\mathbf{x}}(t \mid t)\right) \\ &= A(t)\hat{\mathbf{x}}(t \mid T) + B(t)Q(t)B^\top(t)\tilde{\mathbf{y}}(t \mid T) \end{aligned} \tag{54}$$

where

$$\mathcal{L}(t \mid T) = P^{-1}(t)(\hat{\mathbf{x}}(t \mid T) - \hat{\mathbf{x}}(t \mid t)) \tag{55}$$

<sup>&</sup>quot;There is a certain relief in change, even though it be from bad to worse. As I have often found in travelling in a stagecoach that it is often a comfort to shift one's position, and be bruised in a new place." *Washington Irving*

where (| ) 

initialised with

*since* 

*Brecht*

*d T* (| ) *d* 

*T* = *Ex T* {( | ) ˆ

time from the initial condition

forward version of (64), namely,

**6.5.3The Fraser-Potter Smoother** 

appears in the second part of the lemma.

*Lemma 6: In respect of the fixed-interval smoother (50),* 

 , ˆ ( | )} *<sup>T</sup> x T* 

. The smoother error covariance differential equation (64) is solved backwards in

() () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> A tPt Pt A t KtRtK t BtQtB t* (66)

It is shown below that this smoother outperforms the minimum-variance filter. For the purpose of comparing the solutions of forward Riccati equations, consider a fictitious

*Proof: The initialisation (68) satisfies condition (i) of Theorem 1. Condition (ii) of the theorem is met* 

() () () () () () () () () () () () () () () () () 0 () () 0

*for all t ≥ t0 and hence the claim (69) follows*. �

The Central Limit Theorem states that the mean of a sufficiently large sample of independent identically distributed random variables will be approximately normally distributed [25]. The same is true of partial sums of random variables. The Central Limit Theorem is illustrated by the first part of the following lemma. A useful generalisation

*Lemma 7: Suppose that y1, y2, …, yn are independent random variables and W1, W2, … Wn are* 

"Today every invention is received with a cry of triumph which soon turns into a cry of fear." *Bertolt* 

*v = (W1y1 + W2y2 + … + Wnyn) (W1 + W2 + … Wn)-1.* (70)

*independent positive definite weighting matrices. Let μ = E{y}, u = y1 + y2 + … + yn and* 

*BtQtB t KtRtK t At KtCt BtQtB t At Gt*

*T T T T TT T T*

*A t C tK t At Gt* 

( | ) ( ( ) ( )) ( | ) ( | )( ( ) ( )) ( ) ( ) ( ) *<sup>T</sup> <sup>T</sup> t T At Gt t T t T At Gt BtQtB t* (67)

( | ) (|)

( ) ( ( ) ( ) ( )) ( ) ( )( ( ) ( ) ( )) ( ) ( ) ( ) ( ) ( ) ( ) *T TT <sup>T</sup> <sup>T</sup> Pt At KtCt Pt Pt A t C tK t KtRtK t BtQtB t*

at *t* = *T,* where *Pt t* (|) is the solution of the Riccati differential equation

is the smoother error covariance and (| )

*T Pt t* (65)

<sup>0</sup> 0 0 (|) (|) 0 *t T Pt t* . (68)

*Pt t t T* (|) (| ) . (69)

*T* =

is an auxiliary variable. An expression for the evolution of (| ) *t T* is now developed. Writing (55) as

$$
\hat{\mathbf{x}}(\boldsymbol{\tau} \mid T) = \hat{\mathbf{x}}(t \mid t) + P(t)\mathcal{J}(t \mid T) \tag{56}
$$

and taking the time differential results in

$$
\dot{\hat{\mathbf{x}}}(t \mid T) = \dot{\hat{\mathbf{x}}}(t \mid t) + \dot{P}(t)\xi(t \mid T) + P(t)\dot{\xi}(t \mid T) \,. \tag{57}
$$

Substituting *xt t* ˆ(|) = *Atxt t* () ( | ) ˆ + <sup>1</sup> () () ( () *<sup>T</sup> PtC tR zt* − *Ctxt t* ( ) ( | )) ˆ into (57) yields

$$P(t)\dot{\tilde{\xi}}(t|T) = P(t)\mathbb{C}^{\top}(t)R^{-1}\mathbb{C}(t) - P(t)\mathbb{C}^{\top}(t)R^{-1}\mathbf{z}(t) + A(t)P(t)\xi(t) + B(t)Q(t)B^{\top}(t) - \dot{P}(t)\xi(t). \tag{58}$$

Using *xt t* ˆ(|) = *xt T* ˆ(| ) – *Pt t T* () ( | ) , () () *<sup>T</sup> PtA t* = *A*() () *tPt* – <sup>1</sup> () () () () () ( | ) *<sup>T</sup> PtC tR tCtPt t T* + () () () *<sup>T</sup> BtQtB t* – *P t*( ) within (58) and rearranging gives

$$-\dot{\tilde{\varphi}}(t\mid T) = -\mathbf{C}^{\top}(t)\mathbf{R}^{-1}(t)\mathbf{C}(t)\hat{\mathbf{x}}(t\mid T) + A^{\top}(t)\tilde{\varphi}(t) - \mathbf{C}^{\top}(t)\mathbf{R}^{-1}(t)\mathbf{z}(t) \,. \tag{59}$$

The filter (54) and smoother (57) may be collected together as

$$
\begin{bmatrix}
\dot{\hat{\mathbf{x}}}(t\mid T) \\
\end{bmatrix} = \begin{bmatrix}
A(t) & B(t)Q(t)B^{\top}(t) \\
\end{bmatrix} \begin{bmatrix}
\hat{\mathbf{x}}(t\mid T) \\
\end{bmatrix} + \begin{bmatrix}
0 \\
\mathbf{C}^{\top}(t)\mathbf{R}^{-1}(t)\mathbf{z}(t)
\end{bmatrix}.\tag{60}
$$

Equation (60) is known as the Hamiltonian form of the Rauch-Tung-Striebel smoother [17].

#### **6.5.2.3 Performance**

In order to develop an expression for the smoothed error state, consider the backwards signal model

$$-\dot{\mathbf{x}}(\tau) = A(\tau)\mathbf{x}(\tau) + B(\tau)w(\tau) \,. \tag{61}$$

Subtracting (50) from (61) results in

$$-\dot{\mathbf{x}}(\tau) + \dot{\hat{\mathbf{x}}}(\tau \mid T) = (A(\tau) + G(\tau))(\mathbf{x}(\tau) - \hat{\mathbf{x}}(\tau \mid T)) - G(\tau)(\mathbf{x}(\tau) - \hat{\mathbf{x}}(\tau \mid \tau)) + B(\tau)w(\tau) \tag{62}$$

Let *x T* (| ) = *x*( ) − *x T* ˆ(| ) denote the smoothed error state and *x*(|) = *x*( ) − *x*ˆ(|) denote the filtered error state. Then the differential equation (62) can simply be written as

$$-\dot{\tilde{\mathbf{x}}}(\tau \mid T) = (A(\tau) + G(\tau))(\tilde{\mathbf{x}}(\tau \mid T) - G(\tau)\tilde{\mathbf{x}}(\tau \mid \tau) + B(\tau)w(\tau), \tag{63}$$

where *x T* (| ) = (( | ) *x T* ˆ *x*ˆ( )) . Applying Lemma 3 to (63) and using *E x*{( | ) ˆ , ( )} *<sup>T</sup> w* = 0 gives

$$-\dot{\Sigma}(\tau \mid T) = (A(\tau) + G(\tau))\Sigma(\tau \mid T) + \Sigma(\tau \mid T)(A(\tau) + G(\tau))^T - B(\tau)Q(\tau)B^T(\tau) \tag{64}$$

<sup>&</sup>quot;That which comes into the world to disturb nothing deserves neither respect nor patience." *Rene Char*

Smoothing, Filtering and Prediction:

(| ) *t T* is now developed.

(56)

. (61)

 

 

> 

 = *x*( ) 

 − *x*ˆ(|) 

> ,

<sup>132</sup> Estimating the Past, Present and Future

= *Atxt t* () ( | ) ˆ + <sup>1</sup> () () ( () *<sup>T</sup> PtC tR zt* − *Ctxt t* ( ) ( | )) ˆ into (57) yields

(58)

1 1

*C tR tCt A t t T C tR tzt*

Equation (60) is known as the Hamiltonian form of the Rauch-Tung-Striebel smoother [17].

In order to develop an expression for the smoothed error state, consider the backwards

 

> 

denote the smoothed error state and *x*(|)

, (63)

 

ˆ ˆ . (62)

 

 

"That which comes into the world to disturb nothing deserves neither respect nor patience." *Rene Char*

, (64)

 

 

. Applying Lemma 3 to (63) and using *E x*{( | ) ˆ

 

*T T T T*

. (59)

. (60)

. (57)

, () () *<sup>T</sup> PtA t* = *A*() () *tPt* – <sup>1</sup> () () () () () ( | ) *<sup>T</sup> PtC tR tCtPt t T*

*x T xt t Pt t T* ˆ ˆ ( | ) ( | ) () ( | )

*xt T xt t Pt t T Pt t T* ˆ ˆ ( | ) ( | ) () ( | ) () ( | ) 

<sup>1</sup> <sup>1</sup> ( ) ( | ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ). *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt t T PtC tR Ct PtC tR zt AtPt t BtQtB t Pt t*

<sup>1</sup> <sup>1</sup> ( | ) () () () ( | ) () () () () () ˆ *<sup>T</sup> <sup>T</sup> <sup>T</sup>*

*t T C tR tCtxt T A t t C tR tzt*

() () () () (| ) () () () (| )

*x Ax Bw* () ()() () ()

*x x T A G x x T G x x Bw* ( ) ( | ) ( ( ) ( ))( ( ) ( | )) ( )( ( ) ( | )) ( ) ( )

 

*x T A G x T Gx Bw* ( | ) ( ( ) ( ))( ( | ) ( ) ( | ) ( ) ( )

( | ) ( ( ) ( )) ( | ) ( | )( ( ) ( )) ( ) ( ) ( ) *<sup>T</sup> <sup>T</sup>*

*T A G T TA G B Q B*

 

 

denote the filtered error state. Then the differential equation (62) can simply be written as

 

> 

 

ˆ(| ) () () () () ˆ(| ) 0

is an auxiliary variable. An expression for the evolution of

The filter (54) and smoother (57) may be collected together as

*xt T At BtQtB t xt T*

 

*x*ˆ( ))

 

+ () () () *<sup>T</sup> BtQtB t* – *P t*( ) within (58) and rearranging gives

and taking the time differential results in

Using *xt t* ˆ(|) = *xt T* ˆ(| ) – *Pt t T* () ( | )

Writing (55) as

Substituting *xt t* ˆ(|)

> *t T*

signal model

 ˆ 

where *x T* (| ) 

= 0 gives

( )} *<sup>T</sup> w* 

Let *x T* (| ) 

**6.5.2.3 Performance** 

Subtracting (50) from (61) results in

= (( | ) *x T* ˆ

 = *x*( ) 

 

 − *x T* ˆ(| ) 

where (| ) *T* = *Ex T* {( | ) ˆ , ˆ ( | )} *<sup>T</sup> x T* is the smoother error covariance and (| ) *T* = *d T* (| ) *d* . The smoother error covariance differential equation (64) is solved backwards in time from the initial condition

$$
\Sigma(\tau \mid T) = P(t \mid t) \tag{65}
$$

at *t* = *T,* where *Pt t* (|) is the solution of the Riccati differential equation

$$\dot{P}(t) = (A(t) - K(t)C(t))P(t) + P(t)(A^\top(t) - \mathbb{C}^\top(t)K^\top(t)) + K(t)R(t)K^\top(t) + B(t)Q(t)B^\top(t)$$

$$= A(t)P(t) + P(t)A^\top(t) - K(t)R(t)K^\top(t) + B(t)Q(t)B^\top(t) \tag{66}$$

It is shown below that this smoother outperforms the minimum-variance filter. For the purpose of comparing the solutions of forward Riccati equations, consider a fictitious forward version of (64), namely,

$$
\dot{\Sigma}(t \mid T) = (A(t) + G(t))\Sigma(t \mid T) + \Sigma(t \mid T)(A(t) + G(t))^\top - B(t)Q(t)B^\top(t) \tag{67}
$$

initialised with

$$
\Sigma(t\_0 \mid T) = P(t\_0 \mid t\_0) > 0 \; . \tag{68}
$$

*Lemma 6: In respect of the fixed-interval smoother (50),* 

$$P(t \mid t) \ge \Sigma(t \mid T). \tag{69}$$

*Proof: The initialisation (68) satisfies condition (i) of Theorem 1. Condition (ii) of the theorem is met since* 

$$
\begin{bmatrix}
\boldsymbol{B}(t)\boldsymbol{Q}(t)\boldsymbol{B}^{\top}(t) + \boldsymbol{K}(t)\boldsymbol{R}(t)\boldsymbol{K}^{\top}(t) & \boldsymbol{A}(t) - \boldsymbol{K}(t)\boldsymbol{C}(t) \\
\boldsymbol{A}^{\top}(t) - \boldsymbol{C}^{\top}(t)\boldsymbol{K}^{\top}(t) & \boldsymbol{0}
\end{bmatrix} \succeq
\begin{bmatrix}
\boldsymbol{A}^{\top}(t) + \boldsymbol{G}^{\top}(t) & \boldsymbol{0}
\end{bmatrix}
$$

*for all t ≥ t0 and hence the claim (69) follows*. �

#### **6.5.3The Fraser-Potter Smoother**

The Central Limit Theorem states that the mean of a sufficiently large sample of independent identically distributed random variables will be approximately normally distributed [25]. The same is true of partial sums of random variables. The Central Limit Theorem is illustrated by the first part of the following lemma. A useful generalisation appears in the second part of the lemma.

*Lemma 7: Suppose that y1, y2, …, yn are independent random variables and W1, W2, … Wn are independent positive definite weighting matrices. Let μ = E{y}, u = y1 + y2 + … + yn and* 

$$\upsilon = \left(\mathbb{V}\_1 \mathbb{y}\_1 + \mathbb{V}\_2 \mathbb{y}\_2 + \dots + \mathbb{V}\_n \mathbb{y}\_n\right) \left(\mathbb{V}\_1 + \mathbb{V}\_2 + \dots \mathbb{V}\_n\right)^{\cdot 1}. \tag{70}$$

<sup>&</sup>quot;Today every invention is received with a cry of triumph which soon turns into a cry of fear." *Bertolt Brecht*

**6.5.4The Minimum-Variance Smoother** 

by finding the solution that minimises <sup>2</sup>

performance objective is to minimise <sup>2</sup>

2 *y*

1 *y*

The previously described smoothers are focussed on state estimation. A different signal estimation problem shown in Fig. 1 is considered here. Suppose that observations *z* = *y* + *v*

denote the output estimation error. The optimum minimum-variance filter can be obtained

Figure 1. The general estimation problem. The objective is to produce estimates <sup>1</sup> *y*ˆ of *y*<sup>1</sup>

The minimum-variance smoother is a more recent innovation [8] - [10] and arises by generalising Wiener's optimal noncausal solution for the above time-varying problem. The solution is obtained using the same completing-the-squares technique that was previously employed in the frequency domain (see Chapters 1 and 2). It can be seen from Fig. 1 that the

**�**

*ei* 2 1

2 2 *<sup>H</sup> <sup>H</sup>* 

"Restlessness and discontent are the first necessities of progress. Show me a thoroughly satisfied man

, where

*v w* .

(74)

*Q R* , (75)

*<sup>H</sup> ee* .

*v*

*w* is the output of a linear time-varying system and *v* is

*w* in such a way to meet a performance objective. Let *e* = *y*1 – <sup>1</sup> *y*ˆ

<sup>1</sup> *z y*ˆ

is desired which produces estimates <sup>1</sup> *y*ˆ of a second

*<sup>T</sup> ee* . Here, in the case of smoothing, the

*e*

**6.5.4.1Problem Definition** 

are available, where *y*2 =

reference system *y*1 =

*w*

from measurements *z*.

Consider the factorisation

**6.5.4.2Optimal Unrealisable Solutions** 

output estimation error is generated by *ei e i*

is a linear system that operates on the inputs *i* =

and I will show you a failure." *Thomas Alva Edison*

measurement noise. A solution

*(i) If yi ~*  ( , *R*)*, i = 1 to n, then u ~*  ( , *nnR );* 

$$\text{(ii)}\qquad \text{If } y\_i \sim \mathcal{M}'(0, I), \text{i = 1 to n, then } v \sim \mathcal{M}'(0, I).$$

*Proof:* 

$$\begin{array}{ll} \text{(i)} & \text{E\{u\}} = \text{E\{y\_1\}} + \text{E\{y\_2\}} + \dots + \text{E\{y\_n\}} = n\mu. \text{ E\{(u-\mu)(u-\mu)^\tau\}} = \text{E\{(y\_1-\mu)(y\_1-\mu)^\tau\}} + \text{E\{(y\_2-\mu)(y\_2-\mu)^\tau\}} \\ & - \text{\{(y\_2-\mu)^\tau\}} + \dots + \text{E\{(y\_n-\mu)(y\_n-\mu)^\tau\}} = n\text{R}. \end{array}$$

*(ii) E{v} = W1(W1 + W2 + … + Wn)-1E{y1} + W2(W1 + W2 + … + Wn)-1E(y2) + … + Wn(W1 + W2 + … Wn)-1E{yn}) = 0. E{vvT} =* <sup>1</sup> {( *<sup>T</sup> E W +* <sup>2</sup> *WT + … +* <sup>1</sup> 1 11 1 1 ) ( *T TT W W y yW W <sup>n</sup> + W*2 *+ …* <sup>1</sup> ) } *WN +* <sup>1</sup> {( *<sup>T</sup> E W +* <sup>2</sup> *WT + … +* <sup>1</sup> 2 22 2 1 ) ( *T TT W W yyW W <sup>n</sup> + W*2 *+ …* <sup>1</sup> ) } *WN + … +*  <sup>1</sup> {( *<sup>T</sup> E W +* <sup>2</sup> *WT + … +* <sup>1</sup> <sup>1</sup> ) ( *T TT W W yyW W n n nn n + W*2 *+ …* <sup>1</sup> ) } *WN =* <sup>1</sup> (*WT +* <sup>2</sup> *WT + … +*  1 1 1 ) ( *T T W WW <sup>n</sup> +* 2 2 *W WT + …* <sup>1</sup> )( *<sup>T</sup> WW W n n + W*2 *+ …* <sup>1</sup> ) *Wn = I. �* 

Fraser and Potter reported a smoother in 1969 [4] that combined state estimates from forward and backward filters using a formula similar to (70) truncated at *n* = 2. The inverses of the forward and backward error covariances, which are indicative of the quality of the respective estimates, were used as weighting matrices. The combined filter and Fraser-Potter smoother equations are

$$
\dot{\hat{\mathbf{x}}}(t|t) = A(t)\hat{\mathbf{x}}(t|t) + P(t|t)\mathbf{C}^{\top}(t)R^{-1}(t)(z(t) - \mathbf{C}(t)\hat{\mathbf{x}}(t|t)) \,. \tag{71}
$$

$$-\dot{\tilde{\varphi}}(t\mid t) = A(t)\tilde{\varphi}(t\mid t) + \Sigma(t\mid t)\mathbf{C}^{\top}(t)\mathbf{R}^{-1}(t)(z(t) - \mathbf{C}(t)\tilde{\varphi}(t\mid t))\,,\tag{72}$$

$$
\hat{\mathbf{x}}(t \mid T) = \left( P^{-1}(t \mid t) + \Sigma^{-1}(t \mid t) \right)^{-1} \left( P^{-1}(t \mid t) \hat{\mathbf{x}}(t \mid t) + \Sigma^{-1}(t \mid t) \tilde{\mathbf{y}}(t \mid t) \right), \tag{73}
$$

where *Pt t* (|) is the solution of the forward Riccati equation *Pt t* (|) = *A*() ( | ) *tPt t* + ( | ) () *<sup>T</sup> Pt tA t* − <sup>1</sup> ( | ) () () () ( | ) *<sup>T</sup> Pt tC tR tCtPt t* + () () () *<sup>T</sup> BtQtB t* and (|) *t t* is the solution of the backward Riccati equation (|) *t t* = *A*() ( | ) *t tt* + ( | ) () *<sup>T</sup> t tA t* − <sup>1</sup> ( | ) () () () ( | ) *<sup>T</sup> t tC tR tCt t t* + () () () *<sup>T</sup> BtQtB t* .

It can be seen from (72) that the backward state estimates, *ζ*(*t*), are obtained by simply running a Kalman filter over the time-reversed measurements. Fraser and Potter's approach is pragmatic: when the data is noisy, a linear combination of two filtered estimates is likely to be better than one filter alone. However, this two-filter approach to smoothing is *ad hoc* and is not a minimum-mean-square-error design.

<sup>&</sup>quot;If there is dissatisfaction with the status quo, good. If there is ferment, so much the better. If there is restlessness, I am pleased. Then let there be ideas, and hard thought, and hard work." *Hubert Horatio Humphrey.*

Smoothing, Filtering and Prediction:

1 11 1 1 ) ( *T TT W W y yW W <sup>n</sup>*

 *+ W*2 *+ …* <sup>1</sup> ) } *WN*

 *+ W*2 *+* 

 *=* <sup>1</sup> (*WT +* <sup>2</sup> *WT + … +* 

 *= I. �* 

 *+ … +* 

(71) (72) (73)

<sup>134</sup> Estimating the Past, Present and Future

 ( , *nnR );* 

*(i) E{u} = E{y1} + E(y2) + … + E{yn} = nμ. E{(u − μ)(u − μ)T} = E{(y1 − μ)(y1 − μ)T} + E{(y2*

*(ii) E{v} = W1(W1 + W2 + … + Wn)-1E{y1} + W2(W1 + W2 + … + Wn)-1E(y2) + … + Wn(W1 +* 

Fraser and Potter reported a smoother in 1969 [4] that combined state estimates from forward and backward filters using a formula similar to (70) truncated at *n* = 2. The inverses of the forward and backward error covariances, which are indicative of the quality of the respective estimates, were used as weighting matrices. The combined filter and Fraser-

2 22 2 1 ) ( *T TT W W yyW W <sup>n</sup>*

 *+ W*2 *+ …* <sup>1</sup> ) } *WN*

*W2 + … Wn)-1E{yn}) = 0. E{vvT} =* <sup>1</sup> {( *<sup>T</sup> E W +* <sup>2</sup> *WT + … +* <sup>1</sup>

<sup>1</sup> ) ( *T TT W W yyW W n n nn n*

 *+* 2 2 *W WT + …* <sup>1</sup> )( *<sup>T</sup> WW W n n + W*2 *+ …* <sup>1</sup> ) *Wn*

<sup>1</sup> ˆ ˆ ( | ) ( ) ( | ) ( | ) ( ) ( )( ( ) ( ) ( | )) ˆ *<sup>T</sup> xt t Atxt t Pt tC tR t zt Ctxt t* ,

<sup>1</sup> ( | ) ( ) ( | ) ( | ) ( ) ( )( ( ) ( ) ( | )) *<sup>T</sup>*

,

where *Pt t* (|) is the solution of the forward Riccati equation *Pt t* (|) = *A*() ( | ) *tPt t* + ( | ) () *<sup>T</sup> Pt tA t* − <sup>1</sup> ( | ) () () () ( | ) *<sup>T</sup> Pt tC tR tCtPt t* + () () () *<sup>T</sup> BtQtB t* and (|) *t t* is the solution of the backward Riccati equation (|) *t t* = *A*() ( | ) *t tt* + ( | ) () *<sup>T</sup> t tA t* −

It can be seen from (72) that the backward state estimates, *ζ*(*t*), are obtained by simply running a Kalman filter over the time-reversed measurements. Fraser and Potter's approach is pragmatic: when the data is noisy, a linear combination of two filtered estimates is likely to be better than one filter alone. However, this two-filter approach to smoothing is *ad hoc*

"If there is dissatisfaction with the status quo, good. If there is ferment, so much the better. If there is restlessness, I am pleased. Then let there be ideas, and hard thought, and hard work." *Hubert Horatio* 

*t t At t t t tC tR t zt Ct t t*

<sup>1</sup> 1 11 <sup>1</sup> *xt T P t t t t P t txt t t t t t* ˆ( | ) ( ( | ) ( | )) ( ( | ) ( | ) ( | ) ( | )) ˆ

,

(0, *I).* 

*(i) If yi ~* 

*(ii) If yi ~* 

*Proof:* 

*Humphrey.*

 ( , 

*…* <sup>1</sup> ) } *WN*

1 1 1 ) ( *T T W WW <sup>n</sup>*

Potter smoother equations are

<sup>1</sup> {( *<sup>T</sup> E W +* <sup>2</sup> *WT + … +* <sup>1</sup>

 

<sup>1</sup> ( | ) () () () ( | ) *<sup>T</sup> t tC tR tCt t t* + () () () *<sup>T</sup> BtQtB t* .

and is not a minimum-mean-square-error design.

*R*)*, i = 1 to n, then u ~* 

*− μ)(y2 − μ)T} + … + E{(yn − μ)(yn − μ)T} = nR.* 

 *+* <sup>1</sup> {( *<sup>T</sup> E W +* <sup>2</sup> *WT + … +* <sup>1</sup>

(0, *I), i = 1 to n, then v ~* 

#### **6.5.4The Minimum-Variance Smoother**

#### **6.5.4.1Problem Definition**

The previously described smoothers are focussed on state estimation. A different signal estimation problem shown in Fig. 1 is considered here. Suppose that observations *z* = *y* + *v* are available, where *y*2 = *w* is the output of a linear time-varying system and *v* is measurement noise. A solution is desired which produces estimates <sup>1</sup> *y*ˆ of a second reference system *y*1 =*w* in such a way to meet a performance objective. Let *e* = *y*1 – <sup>1</sup> *y*ˆ denote the output estimation error. The optimum minimum-variance filter can be obtained by finding the solution that minimises <sup>2</sup> *<sup>T</sup> ee* . Here, in the case of smoothing, the performance objective is to minimise <sup>2</sup> *<sup>H</sup> ee* .

Figure 1. The general estimation problem. The objective is to produce estimates <sup>1</sup> *y*ˆ of *y*<sup>1</sup> from measurements *z*.

#### **6.5.4.2Optimal Unrealisable Solutions**

The minimum-variance smoother is a more recent innovation [8] - [10] and arises by generalising Wiener's optimal noncausal solution for the above time-varying problem. The solution is obtained using the same completing-the-squares technique that was previously employed in the frequency domain (see Chapters 1 and 2). It can be seen from Fig. 1 that the output estimation error is generated by *ei e i* **�** , where It

$$\mathcal{R}\_{ii} = -\begin{bmatrix} \mathcal{H} & \mathcal{H}\mathcal{G}\_2 - \mathcal{G}\_1 \end{bmatrix} \tag{74}$$

is a linear system that operates on the inputs *i* = *v w* .

Consider the factorisation

$$
\Delta\Delta^H = \mathcal{G}\_2^\prime \mathcal{Q} \mathcal{G}\_2^H + R \tag{75}
$$

<sup>&</sup>quot;Restlessness and discontent are the first necessities of progress. Show me a thoroughly satisfied man and I will show you a failure." *Thomas Alva Edison*

that <sup>0</sup> lim*R <sup>I</sup>* 

part of *<sup>H</sup>* 

It is interesting to note from (81) and

and <sup>0</sup>

*Lemma 9: The filter solution* 

*minimises* <sup>2</sup> { }*<sup>H</sup> ee* <sup>=</sup> <sup>2</sup> { }*<sup>H</sup>*

*By inspection of (84), the solution (83) achieves* 

filter performance (85).

**6.5.4.3Optimal Realisable Solutions** 

factor are described below.

**Output Estimation** 

*Machiavelli*

*Proof: It follows from (77) that* 

outperforms the minimum-variance filter.

realising an exact Wiener-Hopf factor.

measurement noise is absent. Let { }*<sup>H</sup>*

lim 0 *<sup>H</sup> ei ei <sup>R</sup>* 

1

(82)

. That is, output estimation is superfluous when

*ei ei* + 2 2 { }*<sup>H</sup>* 

*ei ei = 0.* (85) *�*

*ei ei* denote the causal

(83)

11 2 2 2 2 2 2 2 2 ( ) *<sup>H</sup> <sup>H</sup> <sup>H</sup> <sup>H</sup> <sup>H</sup> ei ei Q Q Q RQ*

> *ei ei* = 1 1 { }*<sup>H</sup>*

*ei ei* . It is shown below that minimum-variance filter solution can be found using

1

1 2 {} { } *H H Q*

 <sup>1</sup> 1 2 { } *H H Q*

2 2 1 2 1 2 { } {( )( ) } *<sup>H</sup> H H H HH ei ei Q Q*

> 2 2 <sup>2</sup> { }*<sup>H</sup>*

*ei ei , provided that the inverses exist.*

It is worth pausing at this juncture to comment on the significance of the above results.

 The formulation (76) is an optimal solution for the time-varying smoother problem since it can be seen from (79) that it achieves the best-possible performance. Similarly, (83) is termed an optimal solution because it achieves the best-possible

By inspection of (79) and (85) it follows that the minimum-variance smoother

In general, these optimal solutions are not very practical because of the difficulty in

Practical smoother (and filter) solutions that make use of an approximate Wiener-Hopf

The Wiener-Hopf factor is modelled on the structure of the spectral factor which is described Section 3.4.4. Suppose that *R*(*t*) > 0 for all *t* [0, *T*] and there exist *R*1/2(*t*) *> 0* such

"There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things." *Niccolo Di Bernado dei* 

. (84)

the above completing-the squares technique and taking causal parts.

in which the time-dependence of *Q*(*t*) and *R*(*t*) is omitted for notational brevity. Suppose that Δ: *<sup>p</sup>* → *<sup>p</sup>* is causal, namely Δ and its inverse, Δ−1, are bounded systems that proceed forward in time. The system Δ is known as a Wiener-Hopf factor.

*Lemma 8: Assume that the Wiener-Hopf factor inverse, Δ-1, exists over t [0, T]. Then the smoother solution* 

$$\mathcal{H} = \mathcal{G} \! \! \! \! \! Q \! \! \! \! \! \! \! \! \! \! \/ \Delta^{H} \Delta^{-1}$$

$$= \mathcal{G} \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \/ \Delta \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \/ \Delta \! \! \! \! \! \/ \Delta \! \! \! \! \! \/ \Delta \! \! \! \! \! \! \/ \Delta \! \! \! \! \! \/ \Delta \! \!$$

*minimises* <sup>2</sup> *<sup>H</sup> ee*<sup>=</sup> <sup>2</sup> *H* *ei ei* .

*Proof: It follows from (74) that <sup>H</sup>* *ei ei* = 1 2 *H* *Q* − 1 2 *H H*  *Q* − 2 1 *H* *Q* + *H H*  . *Completing the square leads to <sup>H</sup>* *ei ei* = 1 1 *H* *ei ei* + 2 2 *H* *ei ei , where* 

$$(\mathcal{R}\_{i1}\mathcal{R}\_{i1}^H = (\mathcal{H}\Delta - \mathcal{G}\_i\mathcal{Q}\mathcal{G}\_2^H\Delta^{-H})(\mathcal{H}\Delta - \mathcal{G}\_i\mathcal{Q}\mathcal{G}\_2^H\Delta^{-H})^H\tag{77}$$

*and* 

$$\mathcal{R}\_{\text{el}}\mathcal{R}\_{\text{el}}^{\text{H}} = \mathcal{G}\_{\text{l}}\mathbb{Q}\mathcal{G}\_{\text{l}}^{\text{H}} - \mathcal{G}\_{\text{l}}\mathbb{Q}\mathcal{G}\_{\text{2}}^{\text{H}}(\Delta\Delta^{\text{H}})^{-1}\mathcal{G}\_{\text{2}}\mathbb{Q}\mathcal{G}\_{\text{l}}^{\text{H}}.\tag{78}$$

*By inspection of (77), the solution (76) achieves* 

$$\left\| \mathcal{R}\_{\boldsymbol{\alpha}/2} \mathcal{R}\_{\boldsymbol{\alpha}/2}^H \right\|\_2 = 0. \tag{79}$$

*Since* 1 1 <sup>2</sup> *H* *ei ei excludes the estimator solution , this quantity defines the lower bound for*  2 *H* *ei ei .* �

*Example 3.* Consider the output estimation case where 1 = <sup>2</sup> and

$$\mathcal{H}\_{\rm OE} = \mathcal{G}\_2 \mathbf{Q} \mathcal{G}\_2^H \left( \mathcal{G}\_2 \mathbf{Q} \mathcal{G}\_2^H + \mathcal{R} \right)^{-1},\tag{80}$$

which is of order *n*4 complexity. Using 2 2 *H* *Q* = *<sup>H</sup>* − *R* leads to the *n*2-order solution

$$\mathcal{H}\_{\rm OE} = I - R(\Delta \Delta^H)^{-1} \,. \tag{81}$$

<sup>&</sup>quot;Whatever has been done before will be done again. There is nothing new under the sun." *Ecclesiastes 1:9*

Smoothing, Filtering and Prediction:

 *[0, T]. Then the smoother* 

*H*

*Q* + *H H* 

.

(76)

<sup>136</sup> Estimating the Past, Present and Future

in which the time-dependence of *Q*(*t*) and *R*(*t*) is omitted for notational brevity. Suppose that Δ: *<sup>p</sup>* → *<sup>p</sup>* is causal, namely Δ and its inverse, Δ−1, are bounded systems that proceed

1

*.*

*H*

*H H*

*Q* − 2 1

*ei ei =0.* (79)

*, this quantity defines the lower bound for* 

forward in time. The system Δ is known as a Wiener-Hopf factor.

*Lemma 8: Assume that the Wiener-Hopf factor inverse, Δ-1, exists over t* 

*ei ei excludes the estimator solution* 

*Example 3.* Consider the output estimation case where

which is of order *n*4 complexity. Using 2 2

*OE* 

<sup>1</sup>

*H* *ei ei* .

*<sup>H</sup> ee*<sup>=</sup> <sup>2</sup>

*Proof: It follows from (74) that <sup>H</sup>*

*By inspection of (77), the solution (76) achieves* 

*Completing the square leads to <sup>H</sup>*

1 2 *H H Q*

<sup>1</sup>

*ei ei* = 1 2

*ei ei* = 1 1

2 2 1 2 1 2 ( )( ) *<sup>H</sup> H H H HH ei ei Q Q*

11 1 1 1 2 2 1 ( ) *<sup>H</sup> <sup>H</sup> H H <sup>H</sup> ei ei Q Q Q*

> 2 2 <sup>2</sup> *H*

 *H*

1 2 ( ) *H H Q*

12 22 ( ) *<sup>H</sup> <sup>H</sup> Q QR*

*H*

*ei ei* + 2 2

*ei ei , where* 

*ei ei .* �

22 22 ( ) *<sup>H</sup> <sup>H</sup> OE Q QR*

<sup>1</sup> *<sup>H</sup>*

*H*

"Whatever has been done before will be done again. There is nothing new under the sun." *Ecclesiastes* 

*Q* − 1 2

(77)

1

*.* (78)

> 1 = <sup>2</sup> and

, (80)

*I R*( ). (81)

*Q* = *<sup>H</sup>* − *R* leads to the *n*2-order solution

1

*solution* 

*minimises* <sup>2</sup>

*Since* 1 1 <sup>2</sup>

2 *H* 

*H*

*and* 

*1:9*

It is interesting to note from (81) and

$$\mathcal{R}\_{\text{i1}}\mathcal{R}\_{\text{i1}}^{\text{H}} = \mathcal{G}\_{2}\mathbb{Q}\mathcal{G}\_{2}^{\text{H}} - \mathcal{G}\_{2}\mathbb{Q}\mathcal{G}\_{2}^{\text{H}}(\mathcal{G}\_{2}\mathbb{Q}\mathcal{G}\_{2}^{\text{H}} - \mathbb{R})^{-1}\mathcal{G}\_{2}\mathbb{Q}\mathcal{G}\_{2}^{\text{H}} \tag{82}$$

that <sup>0</sup> lim*R <sup>I</sup>* and <sup>0</sup> lim 0 *<sup>H</sup> ei ei <sup>R</sup>*  . That is, output estimation is superfluous when measurement noise is absent. Let { }*<sup>H</sup>* *ei ei* = 1 1 { }*<sup>H</sup>* *ei ei* + 2 2 { }*<sup>H</sup>* *ei ei* denote the causal part of *<sup>H</sup>* *ei ei* . It is shown below that minimum-variance filter solution can be found using the above completing-the squares technique and taking causal parts.

*Lemma 9: The filter solution* 

$$\begin{aligned} \{\mathcal{H}\}\_{\,\,\,} &= \{\mathcal{G}\_1^\bullet \mathcal{Q} \mathcal{G}\_2^{\bullet^H} \boldsymbol{\Delta}^{-H} \boldsymbol{\Delta}^{-1}\}\_{\,\,\,} \\ &= \{\mathcal{G}\_1^\bullet \mathcal{Q} \mathcal{G}\_2^H \boldsymbol{\Delta}^{-H}\}\_{\,\,\,} \boldsymbol{\Delta}^{-1} \end{aligned} \tag{83}$$

*minimises* <sup>2</sup> { }*<sup>H</sup> ee* <sup>=</sup> <sup>2</sup> { }*<sup>H</sup>* *ei ei , provided that the inverses exist.*

*Proof: It follows from (77) that* 

$$\{\mathcal{R}\_{i2}\mathcal{R}\_{i2}^H\}\_{\ast} = \{ (\mathcal{H}\Delta - \mathcal{G}[\mathcal{Q}\mathcal{G}\_2^H \boldsymbol{\Delta}^{-H})(\mathcal{H}\Delta - \mathcal{G}[\mathcal{Q}\mathcal{G}\_2^H \boldsymbol{\Delta}^{-H})^H)\_{\ast} \} \,. \tag{84}$$

*By inspection of (84), the solution (83) achieves (83)* 

$$\left\| \left\langle \mathfrak{A}\_{\neq 2} \mathfrak{A}\_{\neq 2}^{\prime \dagger} \right\rangle\_{\ast} \right\|\_{2} = 0. \tag{85}$$

It is worth pausing at this juncture to comment on the significance of the above results.


Practical smoother (and filter) solutions that make use of an approximate Wiener-Hopf factor are described below.

#### **6.5.4.3Optimal Realisable Solutions**

#### **Output Estimation**

The Wiener-Hopf factor is modelled on the structure of the spectral factor which is described Section 3.4.4. Suppose that *R*(*t*) > 0 for all *t* [0, *T*] and there exist *R*1/2(*t*) *> 0* such

<sup>&</sup>quot;There is nothing more difficult to take in hand, more perilous to conduct, or more uncertain in its success, than to take the lead in the introduction of a new order of things." *Niccolo Di Bernado dei Machiavelli*

2 *<sup>w</sup>* = <sup>2</sup> 

obtained by evolving

time-reversing the

then time-reversing

The causal part { }

**Input Estimation** 

1

can set them straight." *John Ralston Saul*

substituting ˆ

<sup>2</sup> <sup>0</sup>

the inverse exists.

 lim *IE <sup>R</sup>* 

**Filtering** 

*Example 4.* Consider an estimation problem parameterised by *a = – 1, b =* 2 , *c* = 1, *d* = 0,

*xt xt zt* ˆ ˆ () 3 () 3 () ,

() 3 () 3 () *ttt* ,

( )*t* and calculating

*OE* 

12 1 / ˆ *I RR*

12 1 / <sup>ˆ</sup> *I R* .

*I R*

( )*t* and evolving

*<sup>v</sup>* = 1, which leads to *p* = *k* = 3 – 1 [26]. Smoothed output estimates may be

.

*y*ˆ( | ) () () *t T zt t*

*OE* of the minimum-variance smoother (88) is given by

 ˆ ˆ {} {} *<sup>H</sup>*

Employing (89) within (92) leads to the standard minimum-variance filter, namely,

*xt t At KtCt xt t Ktzt* ˆ( | ) ( ( ) ( ) ( )) ( | ) ( ) ( ) ˆ

*y*ˆ ˆ ( | ) () ( | ) *t t Ctxt t* .

2 2 ˆ ˆ ˆ ˆ ( ) *<sup>H</sup> <sup>H</sup>*

1 1 1 1

As expected, the low-measurement-noise-asymptote of this equaliser is given by

The development of a differential equation for the smoothed input estimate, *wt T* ˆ (| ) , makes use of the following formula [27] for the cascade of two systems. Suppose that two linear

"Ten geographers who think the world is flat will tend to reinforce each others errors….Only a sailor

. That is, at high signal-to-noise-ratios the equaliser approaches <sup>1</sup>

As discussed in Chapters 1 and 2, input estimates can be found using

for Δ within (76) yields the solution

*IE Q Q*

1

() () () *t xt zt* ˆ ,

 () () () *t tt* 

 ,

(92)

(93) (94)

2 

. (95)

<sup>1</sup> = *I*, and

, provided

that *R*(*t*) = *R*1/2(*t*) *R*1/2(*t*). An approximate Wiener-Hopf factor ˆ : *<sup>p</sup>* → *<sup>p</sup>* is defined by the system

$$
\begin{bmatrix}
\dot{\mathbf{x}}(t) \\
\boldsymbol{\delta}(t)
\end{bmatrix} = \begin{bmatrix}
A(t) & \mathbf{K}(t)\mathbf{R}^{1/2}(t) \\
\mathbf{C}(t) & \mathbf{R}^{1/2}(t)
\end{bmatrix} \begin{bmatrix}
\mathbf{x}(t) \\
\mathbf{z}(t)
\end{bmatrix} \tag{86}
$$

where *K*(*t*) = <sup>1</sup> () () () *<sup>T</sup> PtC tR t* is the Kalman gain in which *P*(*t*) is the solution of the Riccati differential equation

$$\dot{P}(t) = A(t)P(t) + P(t)A^\dagger(t) - P(t)\mathbb{C}^\dagger(t)R^{-1}(t)\mathbb{C}(t)P(t) + B(t)Q(t)B^\dagger(t). \tag{87}$$

The output estimation smoother (81) can be approximated as

$$\begin{aligned} \mathcal{H}\_{\text{OE}} &= I - R \big( \hat{\Delta} \hat{\Delta}^H \big)^{-1} \\ &= I - R \hat{\Delta}^{-H} \hat{\Delta}^{-1} \big. \end{aligned} \tag{88}$$

An approximate Wiener-Hopf factor inverse, ˆ <sup>1</sup> , within (88) is obtained from (86) and the Matrix Inversion Lemma, namely,

$$
\begin{bmatrix}
\dot{\hat{\mathbf{x}}}(t) \\
\boldsymbol{a}(t)
\end{bmatrix} = \begin{bmatrix}
A(t) - K(t)\mathbf{C}(t) & K(t) \\
\end{bmatrix} \begin{bmatrix}
\mathbf{x}(t) \\
\boldsymbol{z}(t)
\end{bmatrix} \tag{89}
$$

where *x t* ˆ( ) *<sup>n</sup>* is an estimate of the state within ˆ <sup>1</sup> . From Lemma 1, the adjoint of ˆ <sup>1</sup> , which is denoted by ˆ *<sup>H</sup>* , has the realisation

$$
\begin{bmatrix}
\mathcal{J}(t)
\end{bmatrix} = \begin{bmatrix}
\boldsymbol{A}^{\top}(t) - \boldsymbol{\mathsf{C}}^{\top}(t)\boldsymbol{K}^{\top}(t) & -\boldsymbol{\mathsf{C}}^{\top}(t)\boldsymbol{R}^{-1/2}(t) \\
\boldsymbol{K}^{\top}(t) & \boldsymbol{R}^{-1/2}(t)
\end{bmatrix} \begin{bmatrix}
\dot{\xi}(t) \\
\boldsymbol{a}(t)
\end{bmatrix}.\tag{90}
$$

where ( )*t <sup>p</sup>* is an estimate of the state within ˆ *<sup>H</sup>* . Thus, the smoother (88) is realised by (89), (90) and

$$
\hat{y}(t \mid T) = z(t) - R(t)\beta(t) \,. \tag{91}
$$

*Procedure 1.* The above output estimator can be implemented via the following three steps.


Smoothing, Filtering and Prediction:

: *<sup>p</sup>* → *<sup>p</sup>* is defined by the

, (89)

(88)

<sup>138</sup> Estimating the Past, Present and Future

1/ 2 1/ 2 ( ) () () () ( ) ( ) () () ( ) *x t At KtR t x t*

where *K*(*t*) = <sup>1</sup> () () () *<sup>T</sup> PtC tR t* is the Kalman gain in which *P*(*t*) is the solution of the Riccati

An approximate Wiener-Hopf factor inverse, ˆ <sup>1</sup> , within (88) is obtained from (86) and the

where *x t* ˆ( ) *<sup>n</sup>* is an estimate of the state within ˆ <sup>1</sup> . From Lemma 1, the adjoint of ˆ <sup>1</sup> ,

1/ 2 1/ 2

. (90)

 

( )*t <sup>p</sup>* is an estimate of the state within ˆ *<sup>H</sup>* . Thus, the smoother (88) is realised

*Procedure 1.* The above output estimator can be implemented via the following three steps.

Step 2. In lieu of the adjoint system (90), operate (89) on the time-reversed transpose of *α*(*t*). Then take the time-reversed transpose of the result to obtain *β*(*t*).

"If I have a thousand ideas and only one turns out to be good, I am satisfied." *Alfred Bernhard Nobel*

. (91)

<sup>1</sup> *<sup>H</sup> OE I R* ˆ ˆ

1/ 2 1/ 2 ˆ( ) () () () () () ( ) () () () () *x t A t KtCt Kt xt*

( ) () () () () () ( ) ( ) ( ) ( ) ( )

*t A t C tK t C tR t t t K t R t t*

*y*ˆ( | ) () () () *t T zt Rt t*

Step 1. Operate ˆ <sup>1</sup> on the measurements *z*(*t*) using (89) to obtain α(*t*).

Step 3. Calculate the smoothed output estimate from (91).

*T TT T*

*T*

 *t R tCt R t zt* 

( )

<sup>1</sup> () () () () () () () () () () () () () *<sup>T</sup> <sup>T</sup> <sup>T</sup> Pt AtPt Pt A t PtC tR tCtPt BtQtB t* . (87)

, (86)

 *t Ct R t z t* 

that *R*(*t*) = *R*1/2(*t*) *R*1/2(*t*). An approximate Wiener-Hopf factor ˆ

The output estimation smoother (81) can be approximated as

<sup>1</sup> ˆ ˆ *<sup>H</sup> I R* .

which is denoted by ˆ *<sup>H</sup>* , has the realisation

where 

by (89), (90) and

Matrix Inversion Lemma, namely,

system

differential equation

*Example 4.* Consider an estimation problem parameterised by *a = – 1, b =* 2 , *c* = 1, *d* = 0, 2 *<sup>w</sup>* = <sup>2</sup> *<sup>v</sup>* = 1, which leads to *p* = *k* = 3 – 1 [26]. Smoothed output estimates may be obtained by evolving

$$
\dot{\hat{\mathbf{x}}}(t) = \sqrt{3}\hat{\mathbf{x}}(t) + \sqrt{3}z(t) \text{ , } \boldsymbol{\alpha}(t) = -\hat{\mathbf{x}}(t) + z(t) \text{ , } \boldsymbol{\alpha}(t)
$$

time-reversing the ( )*t* and evolving

$$
\dot{\xi}(t) = \sqrt{3}\xi(t) + \sqrt{3}\alpha(t) \quad \beta(t) = -\xi(t) + \alpha(t) \text{ (}\lambda\text{)}
$$

then time-reversing ( )*t* and calculating

$$
\hat{y}(t \mid T) = z(t) - \beta(t) \dots
$$

#### **Filtering**

The causal part { } *OE* of the minimum-variance smoother (88) is given by

$$\{\mathcal{H}\_{\rm CE}\}\_{+} = I - R\{\hat{\Delta}^{-H}\}\_{+} \hat{\Delta}^{-1}$$

$$= I - R R^{-1/2} \hat{\Delta}^{-1} \tag{92}$$

$$= I - R^{1/2} \hat{\Delta}^{-1} \,.$$

Employing (89) within (92) leads to the standard minimum-variance filter, namely,

$$
\dot{\hat{\mathbf{x}}}(t|t) = (A(t) - K(t)\mathbf{C}(t))\hat{\mathbf{x}}(t|t) + K(t)z(t) \tag{93}
$$

$$
\hat{y}(t|\,t) = \mathbf{C}(t)\hat{\mathbf{x}}(t|\,t) \,. \tag{94}
$$

#### **Input Estimation**

As discussed in Chapters 1 and 2, input estimates can be found using <sup>1</sup> = *I*, and substituting ˆ for Δ within (76) yields the solution

$$\mathcal{H}\_{\rm II} = \underline{Q} \mathcal{G}\_2^{-1} (\hat{\underline{\hat{\Lambda}}} \hat{\underline{\hat{\Lambda}}}^H)^{-1} = \underline{Q} \mathcal{G}\_2^{-1} \hat{\underline{\hat{\Lambda}}}^{-H} \hat{\underline{\hat{\Lambda}}}^{-1}. \tag{95}$$

As expected, the low-measurement-noise-asymptote of this equaliser is given by 1 <sup>2</sup> <sup>0</sup> lim *IE <sup>R</sup>*  . That is, at high signal-to-noise-ratios the equaliser approaches <sup>1</sup> 2 , provided the inverse exists.

The development of a differential equation for the smoothed input estimate, *wt T* ˆ (| ) , makes use of the following formula [27] for the cascade of two systems. Suppose that two linear

<sup>&</sup>quot;Ten geographers who think the world is flat will tend to reinforce each others errors….Only a sailor can set them straight." *John Ralston Saul*

**6.5.4.4Performance** 

system *w* = <sup>1</sup>

Similarly, let *β* = <sup>0</sup>

Hopf factor satisfies

matrix.

*Henry Huxley*

has the realisation

<sup>0</sup> *y* 

An analysis of minimum-variance smoother performance requires an identity which is

where *w*(*t*) *<sup>n</sup>* and *A*(*t*) *n n* . By inspection of (100) – (101), the output of the inverse

 

where *P*(*t*) is an arbitrary matrix of compatible dimensions. The above equation can be verified by using (102) and (105) within (106). Using the above notation, the exact Wiener-

It is observed below that the approximate Wiener-Hopf factor (86) approaches the exact Wiener Hopf-factor whenever the problem is locally stationary, that is, whenever *A*(*t*), *B*(*t*), *C*(*t*), *Q*(*t*) and *R*(*t*) change sufficiently slowly, so that *P t*( ) of (87) approaches the zero

*Lemma 10 [8]: In respect of the signal model (1) – (2) with D(t) = 0, E{w(t)} = E{v(t)} = 0,* 

*"*Every great advance in natural knowledge has involved the absolute rejection of authority." *Thomas* 

*H* 

1

*xt Atxt wt* () () () ()

*wt t At t* () () () () 

() () () () *<sup>T</sup>*

() () *t t .*

() () () () *<sup>T</sup> ut t A t t* 

The following identity is required in the characterisation of smoother performance

0 0 () () () () () ( ) *<sup>T</sup> <sup>H</sup> PtA t AtPt Pt Pt*

() () () () () () <sup>0</sup> <sup>0</sup> *<sup>H</sup> T HT Ct BtQtB t C t Rt*

*E{w(t)wT(t)} = Q(t), E{v(t)vT(t)} = R(t), E{w(t)vT(t)} = 0 and the quantities defined above,* 

 0 0 *H H H T Ct Pt C t* ˆ ˆ () () () 

  *t A t t ut*

*u* denote the output of the adjoint system <sup>0</sup>

() () *t xt ,*

*.* (102)

*H* 

is given by

*.* (105)

*,* (106)

*.* (107)

*.* (108)

<sup>0</sup> *w* denote the output of

, which from Lemma 1

(100) (101)

(103) (104)

described after introducing some additional notation. Let *α* =

It follows that the output of the inverse system *u* = <sup>0</sup>

linear time-varying system having the realisation

is given by

*H* 

systems 1 and 2 have state-space parameters 1 1 1 1 *A B C D* and 2 2 2 2 *A B C D* , respectively.

$$\text{Then } \mathcal{G}\_2 \mathcal{G}\_1 \text{ is parameterized by } \begin{vmatrix} A\_1 & 0 & B\_1 \\ B\_2 \mathbb{C}\_1 & A\_2 & B\_2 \mathbb{D}\_1 \\ D\_2 \mathbb{C}\_1 & \mathbb{C}\_2 & D\_2 \mathbb{D}\_1 \end{vmatrix}. \text{ It follows that } \hat{w}(t \mid T) = \mathcal{Q} \mathcal{G}^H \hat{\Delta}^{-H} a(t).$$

is realised by

$$
\begin{bmatrix}
\end{bmatrix} = 
\begin{bmatrix}
\boldsymbol{A}^{\top}(t) - \boldsymbol{\mathcal{C}}^{\top}(t)\boldsymbol{K}^{\top}(t) & \boldsymbol{0} & -\boldsymbol{\mathcal{C}}^{\top}(t)\boldsymbol{R}^{-1/2}(t) \\
\boldsymbol{Q}(t)\boldsymbol{D}^{\top}(t)\boldsymbol{K}^{\top}(t) & \boldsymbol{Q}(t)\boldsymbol{B}^{\top}(t) & \boldsymbol{Q}(t)\boldsymbol{D}^{\top}(t)\boldsymbol{R}^{-1/2}(t)
\end{bmatrix}
\boldsymbol{\gamma}(t) \tag{96}
$$

in which ( )*t <sup>n</sup>* is an auxiliary state.

*Procedure 2.* Input estimates can be calculated via the following two steps.


#### **State Estimation**

Smoothed state estimates can be obtained by defining the reference system <sup>1</sup> within (76) as

$$
\dot{\hat{\mathbf{x}}}(t \mid T) = A(t)\hat{\mathbf{x}}(t \mid T) + B(t)\hat{w}(t \mid T) \,. \tag{97}
$$

That is, a smoother for state estimation is given by (89), (96) and (97). In frequency-domain estimation problems, minimum-order solutions are found by exploiting pole-zero cancellations, see Example 1.13 of Chapter 1. Here in the time-domain, (89), (96), (97) is not a minimum-order solution and some numerical model order reduction may be required.

Suppose that *C*(*t*) is of rank *n* and *D*(*t*) = 0. In this special case, an *n*2-order solution for state estimation can be obtained from (91) and

$$
\hat{\mathbf{x}}(t \mid T) = \mathbf{C}^\*(t)\hat{\mathbf{y}}(t \mid T) \,. \tag{98}
$$

where

$$\mathbf{C}^{\pi}(t) = \left(\mathbf{C}^{\top}(t)\mathbf{C}(t)\right)^{-1}\mathbf{C}^{\top}(t) \tag{99}$$

denotes the Moore-Penrose pseudoinverse.

<sup>&</sup>quot;In questions of science, the authority of a thousand is not worth the humble reasoning of a single individual." *Galileo Galilei*

#### **6.5.4.4Performance**

Smoothing, Filtering and Prediction:

, respectively.

 ˆ ( ) *H H Q t* 

and 2 2

<sup>1</sup> within (76)

. It follows that *wt T* ˆ (| ) *=*

1/ 2 1/ 2 1/ 2

 

*xt T Atxt T Btwt T* ˆ( | ) () ( | ) () ( | ) ˆ ˆ *.* (97)
