**Stochastic Differential Equations and Related Inverse Problems**

### **2.1 Concepts in Stochastic Calculus**

Computational Modelling of Multi-Scale Non-Fickian Dispersion

20 in Porous Media - An Approach Based on Stochastic Calculus

the North Fork of the New River that flows through the City of Fort Lauderdale, Florida, USA and how the storm water drainage from sewers affects the groundwater. Other ANN applications in water resources can be found in Aly and Peralta (1999), Mukhopadhyay (1999), Freeze and Gorelick (2000), Johnson and Rogers (2000), Hassan and Hamed (2001),

Beaudeau et al. (2001), and Lindsay et al. (2002).

As we have discussed in chapter 1, the deterministic mathematical formulation of solute transport through a porous medium introduces the dispersivity, which is a measure of the distance a solute tracer would travel when the mean velocity is normalized to be one. One would expect such a measure to be a mechanical property of the porous medium under consideration, but the evidence are there to show that dispersivity is dependent on the scale of the experiment for a given porous medium. One of the challenges in modelling the phenomena is to discard the Fickian assumptions, through which dispersivity is defined, and develop a mathematical discription containing the fluctuations associated with the mean velocity of a physical ensemble of solute particles. To this end, we require a sophisticated mathematical framework, and the theory of stochastic processes and differential equations is a natural mathematical setting. In this chapter we review some essential concepts in stochastic processes and stochastic differential equations in order to understand the stochastic calculus in a more applied context.

A deterministic variable expressed as a function of time uniquely determines the value of the variable at a given time. A stochastic variable *Y*, on the other hand, is one that does not have a unique value; it can have any one out of a set of values. We assign a unique label to each possible value of the stochastic variable, and set to denote the set of all such values. When *Y* represents, for example the outcome of throwing dice, may be a finite set of discrete numbers, and when *Y* is the instantaneous position of a fluid particle, it may be a continuous range of real numbers. If a particular value *y* is observed for *Y*, this is called an event *F*. In fact, this is only the simplest prototype of an event; other possibilities might be that the value of *Y* is observed not to be *y* (the complementary event), or that a value within a certain range of values is observed. The set of all possible events is denoted by *F.*  Even though the outcome of a particular observation of *Y* is unpredictable, the probability of observing *y* must be determined by a probability function *P*(). By using the standard methods of probability calculus, this implies that a probability *P*(*F*) can also be assigned to compound events *F* e.g. by appropriate summation or integration over values. For this to work, *F* must satisfy the criteria that for any event *F* in its complement *Fc* must also belong to *F*, and that for any subset of *F'*s the union of these must also belong to *F*. The explanation above of what it means to call *Y* a stochastic variable, is encapsulated in formal mathematical language by saying "*Y* is defined on a probability space *(, F, P )*" . standard

In describing physical systems, deterministic variables usually depend on additional parameters such as time. Similarly, a stochastic variable may depend on an additional

discontinuities; hence the need for the above definitions. To measure the size of a discontinuity, we define the term "jump" at any point *t* to be a discontinuity where the both *f*(*t+*) and *f*(*t-*) exist and the size of the jump be *ft ft ft* () ( ) ( ) . The jumps are the discontinuities of the first kind and any other discontinuity is called a discontinuity of the second kind. Obviously a function can only have countable number of jumps in a given range. From the mean value theorem in calculus, it can be shown that we can differentiate a function in a given interval only if the function is either continuous or has a discontinuity of the second kind during the interval. Stochastic calculus is the calculus dealing with often non-differentiable functions having jumps without discontinuities of the second kind. One such example of a function is the Wiener process (Brownian motion). One realization of the

definitions.

Stochastic Differential Equations and Related Inverse Problems 23

Figure 2.1. A realization of the Wiener process; this is a continuous but non-differentiable

The increments of the function shown in Figure 2.1 are irregular and a derivative cannot be defined according to the mean value theorem. This is because of the fact that the function changes erratically within small intervals, however small that interval may be. Therefore we have to devise new mathematical tools that would be useful in dealing with these irregular,

<sup>1</sup> <sup>0</sup> <sup>1</sup>

*n n*

(2.1.1)

is continuous

([ , ]) lim ( ) ( )

*n*

*f i i i V ab f t f t* 

If *Vf*([*a,b*]) is finite such as in continuous differentiable functions then *f* is called a function of finite variation on [*a,b*]. Variation of a function is a measure of the total change in the function value within the interval considered. An important result (Theorem 1.7 Klebaner (1998)) is that a function of finite variation can only have a countable number of jumps. Furthermore, if *f is* a continuous function, *<sup>f</sup>* exists and *f t dt* ( ) then *f* is a function

*n*

standard Wiener process is given in Figure 2.1.

function.

non-differentiable functions.

where 1 <sup>1</sup>

max( ) *<sup>n</sup> i i i n*

 *t t* .

Variation of a function *f* on [*a,b*] is defined as

parameter *t* (for example, the probability may change with time, i.e. *P*(*y,t*). The collection of stochastic variables, *Yt* , is termed a stochastic process. The word 'process' suggests temporal development and is particularly appropriate when the parameter *t* has the meaning of time, but mathematically it is equally well used for any other parameter, usually assumed to be a real number in the interval [0,).

The label is often explicitly included in writing the notation *Yt* ()*,* for an individual value obtained from the set of *Y*-values at a fixed *t*. Conversely, we might keep fixed, and let *t* vary; a natural notation would be to write *Y*(*t*). In physical terms, one may think of this as the set of values obtained from a single experiment to observe the time development of the stochastic variable *Y*. When the experiment is repeated, a different set of observations are obtained; those may be labelled by a different value of . Each such sequence of observed *Y*-values is called a realization (or sometimes a path) of the stochastic process, and from this perspective may be considered as labelling the realizations of the process. It is seen that it is somewhat arbitrary which of and *t* is considered to be a label, and which is an independent variable; this is sometimes expressed by writing the stochastic process as *Y*(*t,*).

In standard calculus, we deal with differentiable functions which are continuous except perhaps in certain locations of the domain under consideration. To understand the continuity of the functions better we make use of the definitions of the limits. We call a function *f*, a continuous function at the point *t = t0* if 0 0 lim ( ) ( ) *t t f t f t* regardless of the

direction *t* approaches *t0*. A right-continuous function at *t0* has a limiting value only when *t*  approaches *t0* from the right direction, i.e. *t* is larger than *t0* in the vicinity of *t0*. We will denote this as

$$f(t+) = \lim\_{t \downarrow t\_0} f(t) = f(t\_0) \quad .$$

Similarly a left-continuous function at *t0* can be represented as

$$f(t-) = \lim\_{t \uparrow t\_0} f(t) = f(t\_0) \ .$$

These statements imply that a continuous function is both right-continuous and leftcontinuous at a given point of *t*. Often we encounter functions having discontinuities; hence the need for the above definitions. To measure the size of a discontinuity, we define the term "jump" at any point *t* to be a discontinuity where the both *f*(*t+*) and *f*(*t-*) exist and the size of the jump be *ft ft ft* () ( ) ( ) . The jumps are the discontinuities of the first kind and any other discontinuity is called a discontinuity of the second kind. Obviously a function can only have countable number of jumps in a given range. From the mean value theorem in calculus, it can be shown that we can differentiate a function in a given interval only if the function is either continuous or has a discontinuity of the second kind during the interval. Stochastic calculus is the calculus dealing with often non-differentiable functions having jumps without discontinuities of the second kind. One such example of a function is the Wiener process (Brownian motion). One realization of the standard Wiener process is given in Figure 2.1. These statements imply that a continuous function is both right-continuous and left-continuous at a given point of *t*. Often we encounter functions having

(*t*). In physical terms, one may think of

and *t* is considered to be a label, and which is

may be considered as labelling the realizations of the process. It is

0

*t t*

0

0

lim ( ) ( )

*f t f t*

*,t*). The collection of

)*,* for an individual

fixed, and

. Each such sequence of

regardless of the

22 in Porous Media - An Approach Based on Stochastic Calculus

stochastic variables, *Yt* , is termed a stochastic process. The word 'process' suggests temporal development and is particularly appropriate when the parameter *t* has the meaning of time, but mathematically it is equally well used for any other parameter, usually assumed to

this as the set of values obtained from a single experiment to observe the time development of the stochastic variable *Y*. When the experiment is repeated, a different set of observations

observed *Y*-values is called a realization (or sometimes a path) of the stochastic process, and

an independent variable; this is sometimes expressed by writing the stochastic process as

In standard calculus, we deal with differentiable functions which are continuous except perhaps in certain locations of the domain under consideration. To understand the continuity of the functions better we make use of the definitions of the limits. We call a

direction *t* approaches *t0*. A right-continuous function at *t0* has a limiting value only when *t*  approaches *t0* from the right direction, i.e. *t* is larger than *t0* in the vicinity of *t0*. We will

.

.

0 ( ) lim ( ) ( ) *t t f t f t f t* 

0 ( ) lim ( ) ( ) *t t f t f t f t* 

These statements imply that a continuous function is both right-continuous and leftcontinuous at a given point of *t*. Often we encounter functions having discontinuities; hence the need for the above definitions. To measure the size of a discontinuity, we define the term "jump" at any point *t* to be a discontinuity where the both *f*(*t+*) and *f*(*t-*) exist and the size of the jump be *ft ft ft* () ( ) ( ) . The jumps are the discontinuities of the first kind and any other discontinuity is called a discontinuity of the second kind. Obviously a function can only have countable number of jumps in a given range. From the mean value theorem in calculus, it can be shown that we can differentiate a function in a given interval only if the function is either continuous or has a discontinuity of the second kind during the interval. Stochastic calculus is the calculus dealing with often non-differentiable functions having jumps without discontinuities of the second kind. One such example of a function is the Wiener process (Brownian motion). One realization of the standard Wiener process is given in Figure 2.1. These statements imply that a continuous function is both right-continuous and left-continuous at a given point of *t*. Often we encounter functions having

is often explicitly included in writing the notation *Yt* (

value obtained from the set of *Y*-values at a fixed *t*. Conversely, we might keep

parameter *t* (for example, the probability may change with time, i.e. *P*(*y*

be a real number in the interval [0,).

let *t* vary; a natural notation would be to write *Y*

seen that it is somewhat arbitrary which of

are obtained; those may be labelled by a different value of

function *f*, a continuous function at the point *t = t0* if 0

Similarly a left-continuous function at *t0* can be represented as

The label

*Y*(*t,*).

denote this as

from this perspective

discontinuities; hence the need for the above definitions. To measure the size of a discontinuity, we define the term "jump" at any point *t* to be a discontinuity where the both *f*(*t+*) and *f*(*t-*) exist and the size of the jump be *ft ft ft* () ( ) ( ) . The jumps are the discontinuities of the first kind and any other discontinuity is called a discontinuity of the second kind. Obviously a function can only have countable number of jumps in a given range. From the mean value theorem in calculus, it can be shown that we can differentiate a function in a given interval only if the function is either continuous or has a discontinuity of the second kind during the interval. Stochastic calculus is the calculus dealing with often non-differentiable functions having jumps without discontinuities of the second kind. One such example of a function is the Wiener process (Brownian motion). One realization of the standard Wiener process is given in Figure 2.1. To 

Figure 2.1. A realization of the Wiener process; this is a continuous but non-differentiable function.

The increments of the function shown in Figure 2.1 are irregular and a derivative cannot be defined according to the mean value theorem. This is because of the fact that the function changes erratically within small intervals, however small that interval may be. Therefore we have to devise new mathematical tools that would be useful in dealing with these irregular, non-differentiable functions.

Variation of a function *f* on [*a,b*] is defined as

$$V\_f([a, b]) = \lim\_{\delta\_s \to 0} \sum\_{i=1}^n \left| f(t\_i^n) - f(t\_{i-1}^n) \right| \tag{2.1.1}$$

where 1 <sup>1</sup> max( ) *<sup>n</sup> i i i n t t* .

If *Vf*([*a,b*]) is finite such as in continuous differentiable functions then *f* is called a function of finite variation on [*a,b*]. Variation of a function is a measure of the total change in the function value within the interval considered. An important result (Theorem 1.7 Klebaner (1998)) is that a function of finite variation can only have a countable number of jumps. Furthermore, if *f is* a continuous function, *<sup>f</sup>* exists and *f t dt* ( ) then *f* is a function

3. Linearity

constants *a* and *b*,

*1. Almost sure convergence* 

*2. Mean-square convergence* 

*3. Convergence in probability* 

*4. Convergence in distribution* 

processes:

functional values over a given range [*0,t*].

Using polarization identity and symmetry one can show that covariation is linear for any

Stochastic Differential Equations and Related Inverse Problems 25

Quadratic variation of a function [*f*](*t*) and covariation [*f,g*](*t*) are measures of change in the

In many situations where stochastic processes are involved, we would like to know the limiting values of useful random variables, i.e. whether they approach a some sort of a "steady state" or asymptotic behaviour. It is natural to define the steady state of random variable within a probabilistic context. Therefore, in stochastic processes, we define the

range we

({ : lim ( ) ( ) 0} ) 1 *<sup>n</sup> <sup>n</sup>*

lim ( ) 0 *<sup>n</sup> <sup>n</sup> EX X*

{*Xn*} converges to {*X*} with zero probability of having a difference between the two

 

Note that we adopt the notation of E( , ) or E[ , ] to denote the expected value (mean value)

Distribution function of {*Xn*} converges to that of {*X*} at any point of continuity of the

These four criteria add another dimension to our discussion of the asymptotic behaviour of a process. These arguments can be extended in comparing stochastic processes with each other.

, for all

.

2

.

> 0.

 

*X X*

{*Xn*} converges to {*X* } such a way that <sup>2</sup> ( ) *E Xn* for *n* = *1,2,*...,*n, E X*( ) and

lim ({ ; ( ) ) 0 *<sup>n</sup> <sup>n</sup> P XX*

 

Convergence in probability is called stochastic convergence as well.

of a stochastic variable. In physical literature, this is denoted by "< , >".

convergence of random variables using the following four different criteria:

Random variables {*Xn*} converges to {*X* } with probability one:

*P* 

limiting distribution (i.e. the distribution function of {*X*}).

[ , ]( ) [ , ]( ) [ , ]( ) *af bg h t a f h t b g h t* . (2.1.6)

of finite variation. This implies that a function of finite variation on [*a,b*] is differentiable on [*a,b*], and a corollary is that a function of infinite variation is non-differentiable. Another mathematical construct that plays a major role in stochastic calculus is the quadratic variation. In stochastic calculus, the quadratic variation of a function *f* over the interval [*0,t*] is given by

$$\mathbb{E}\left[f\right](t) = \lim\_{\delta\_s \to 0} \sum\_{i=1}^{n} \left( g(t\_i^{\boldsymbol{u}}) - g(t\_{i-1}^{\boldsymbol{u}}) \right)^2 \tag{2.1.2}$$

where the limit is taken over the partitions:

$$0 = t\_0'' < t\_1'' < \dots < t\_n'' = t\_{\dots'}$$

with 1 <sup>1</sup> max( ) *n n <sup>n</sup> i i i n t t* 0.

It can be proved that the quadratic variation of a continuous function with finite variation is zero. However, the functions having zero quadratic variation may have infinite variation such as zero energy processes (Klebaner, 1998). If a function or process has a finite positive quadratic variation within an interval, then its variation is infinite, and therefore the function is continuous but not differentiable.

Variation and quadratic variation of a function are very important tools in the development of stochastic calculus, even though we do not use quadratic variation in standard calculus.

We also define quadratic covariation of functions *f* and *g* on [0*,t*] by extending equation (2.1.2):

$$\mathbb{E}\left[f,\mathbf{g}\right](t) = \lim\_{\delta\_i \to 0} \sum\_{i=0}^{n-1} (f(t\_{i+1}^{\boldsymbol{u}}) - f(t\_i^{\boldsymbol{u}})) (\mathbf{g}(t\_{i+1}^{\boldsymbol{u}}) - \mathbf{g}(t\_i^{\boldsymbol{u}})) \tag{2.1.3}$$

when the limit is taken over partitions { } *<sup>n</sup> <sup>i</sup> <sup>t</sup>* of [0*,t*] with 1 <sup>1</sup> max( ) *n n <sup>n</sup> i i i n t t* 0. It can be shown that if both the functions are continuous and one is of finite variation, the quadratic covariation is zero.

Quadratic covariation of two functions, *f* and *g,* has the following properties:

### *1. Polarization identity*

Polarization identity expresses the quadratic covariation, [*f,g*](*t*) , in terms of quadratic variation of individual functions.

$$[f, g](t) = \frac{1}{2}([f + g, f + g](t) - [f, f](t) - [g, g](t))\tag{2.1.4}$$

2. Symmetry

$$[f\_{\mathbf{g}}](\mathbf{t}) \ = \begin{bmatrix} \mathbf{g}f \end{bmatrix} (\mathbf{t}) \ . \tag{2.1.5}$$

### 3. Linearity

Computational Modelling of Multi-Scale Non-Fickian Dispersion

2

, (2.1.2)

24 in Porous Media - An Approach Based on Stochastic Calculus

of finite variation. This implies that a function of finite variation on [*a,b*] is differentiable on [*a,b*], and a corollary is that a function of infinite variation is non-differentiable. Another mathematical construct that plays a major role in stochastic calculus is the quadratic variation. In stochastic calculus, the quadratic variation of a function *f* over the interval [*0,t*]

*n*

*i f t gt gt*

0 1 0 ... *nn n <sup>n</sup> tt tt* ,

It can be proved that the quadratic variation of a continuous function with finite variation is zero. However, the functions having zero quadratic variation may have infinite variation such as zero energy processes (Klebaner, 1998). If a function or process has a finite positive quadratic variation within an interval, then its variation is infinite, and therefore the

Variation and quadratic variation of a function are very important tools in the development of stochastic calculus, even though we do not use quadratic variation in standard calculus. We also define quadratic covariation of functions *f* and *g* on [0*,t*] by extending equation

> <sup>1</sup> <sup>1</sup> <sup>0</sup> <sup>0</sup> [ , ]( ) lim ( ( ) ( ))( ( ) ( ))

shown that if both the functions are continuous and one is of finite variation, the quadratic

Polarization identity expresses the quadratic covariation, [*f,g*](*t*) , in terms of quadratic

*f g t f t f t g t g t*

*n n n n i i i i*

*<sup>i</sup> <sup>t</sup>* of [0*,t*] with 1 <sup>1</sup>

<sup>1</sup> [ , ]( ) ([ , ]( ) [ , ]( ) [ , ]( )) <sup>2</sup> *f g <sup>t</sup> f gf g <sup>t</sup> f f <sup>t</sup> g g <sup>t</sup>* (2.1.4)

[*f,g*](t) = [*g,f*](t) . (2.1.5)

(2.1.3)

max( ) *n n <sup>n</sup> i i i n*

0. It can be

*t t*

1

*n*

*i*

Quadratic covariation of two functions, *f* and *g,* has the following properties:

*n*

*n*

<sup>1</sup> <sup>0</sup> <sup>1</sup> [ ]( ) lim ( ( ) ( ))

*n n i i*

is given by

with 1 <sup>1</sup>

*t t*

(2.1.2):

covariation is zero.

*1. Polarization identity* 

2. Symmetry

variation of individual functions.

max( ) *n n <sup>n</sup> i i i n*

0.

where the limit is taken over the partitions:

function is continuous but not differentiable.

when the limit is taken over partitions { } *<sup>n</sup>*

Using polarization identity and symmetry one can show that covariation is linear for any constants *a* and *b*,

$$[a\,f+b\,\mathrm{g},h](t) = a[f,h](t) + b[\,\mathrm{g},h](t) \,. \tag{2.1.6}$$

Quadratic variation of a function [*f*](*t*) and covariation [*f,g*](*t*) are measures of change in the functional values over a given range [*0,t*].

In many situations where stochastic processes are involved, we would like to know the limiting values of useful random variables, i.e. whether they approach a some sort of a "steady state" or asymptotic behaviour. It is natural to define the steady state of random variable within a probabilistic context. Therefore, in stochastic processes, we define the convergence of random variables using the following four different criteria: sort

### *1. Almost sure convergence*

Random variables {*Xn*} converges to {*X* } with probability one:

$$P(\{o \in \Omega : \lim\_{n \to \infty} \left| X\_n(o) - X(o) \right| = 0\} \text{ }) = 1\text{ }.$$

*2. Mean-square convergence* 

{*Xn*} converges to {*X* } such a way that <sup>2</sup> ( ) *E Xn* for *n* = *1,2,*...,*n, E X*( ) and

$$\lim\_{u \to \ast c} E(\left\|X\_u - X\right\|^2) = 0 \ . .$$

### *3. Convergence in probability*

{*Xn*} converges to {*X*} with zero probability of having a difference between the two processes:

$$\lim\_{n \to \infty} P(\{\alpha \in \Omega ; \left| X\_n(\alpha) - X(\alpha) \right| \ge \varepsilon\} = 0 \text{, for all } \varepsilon \searrow 0.$$

Convergence in probability is called stochastic convergence as well.

Note that we adopt the notation of E( , ) or E[ , ] to denote the expected value (mean value) of a stochastic variable. In physical literature, this is denoted by "< , >".

### *4. Convergence in distribution*

Distribution function of {*Xn*} converges to that of {*X*} at any point of continuity of the limiting distribution (i.e. the distribution function of {*X*}).

These four criteria add another dimension to our discussion of the asymptotic behaviour of a process. These arguments can be extended in comparing stochastic processes with each other.

displacements suffered by the grain can be considered independent at different times, the

Stochastic Differential Equations and Related Inverse Problems 27

In the physical Brownian motion, there are small but nevertheless finite intervals between the impulses of molecules colliding with the pollen grain. Consequently, the path that the grain follows, consists of a sequence of straight segments forming an irregular but continuous line – a so-called random walk. Each straight segment can be considered an

The mathematical idealisation of the Brownian motion let the interval between increments approach zero. The resulting process – called the Wiener process due to N. Wiener – is difficult to conceptualise: for example, consideration shows that the resulting position is everywhere continuous, but nowhere differentiable. This means that while the particle has a position at any moment, and since this position is changing it is moving, yet no velocity can be defined. Nevertheless as discussed by Stroock and Varadhan (1979) a consistent mathematical description is obtained by defining the position as a stochastic process *B*(*t*,

with the following properties that are suggested as a mathematical model for the Brownian

) *=* 0 , i.e. choose the position of the particle at the arbitrarily chosen initial time *t* 

is normally distributed with mean 0 and variance 1 ( ) *i i t t* ;

)] *= 0* for all values of *t*;

) at fixed time *t* is determined by a Gaussian probability;

*<sup>j</sup> t t <sup>i</sup> <sup>j</sup>* . (2.2.1)

*Var B t w t* , = , (2.2.2)

)*,* {*B*(*t*2*,*) *– B*(*t*1*,* )

) }

) }*,…,* {*B*(*tk,*

) *– B*(*tk-1,*

actual position of the grain can only change continuously.

) has independent increments, i.e. *B*(*t*1*,*

) are continuous functions of *t* for *t*  0 ;

When applied to *ti = tj = t*, P7 reduces to the statement that

), given by

**P7:** The covariance of Brownian motion is determined by a correlation between the values

*EBt Bt <sup>i</sup>* , , min ,

where '*Var*' means statistical variance. For the Brownian motion this can be interpreted as the statement that the radius within which the particle can be found increases proportional

Because the Wiener process is defined by the independence of its increments, it is for some purposes convenient to reformulate the variance of a Wiener process in terms of the

 

 *t1 t2 … tk* ;

**P5:** The Gaussian has a zero mean, *E*[*B*(*t*,

) at times *ti* and *tj* (for fixed

increment of the momentary position of the grain.

motion- a Wiener process:

*=* 0 as the coordinate origin;

**P4:** The stochastic variation of *B*(*t*,

are independent for all *0* 

**P3:** *Bt Bt <sup>i</sup>*<sup>1</sup> , , *<sup>i</sup>*

**P1:** *B*(0*,*

**P2:** *B*(*t,*

**P6:** *B*(*t*,

of B(*t*,

to time.

variance of the increments:

Unlike in deterministic variables where any asymptotic behaviour can clearly be identified either graphically or numerically, stochastic variables do require adherence to one of the convergence criteria mentioned above which are called the "criteria for strong convergence". There are weakly converging stochastic processes and we do not discuss the weak convergence criteria as they are not relevant to the development of the material in this book.

In standard calculus we have continuous functions with discontinuities at finitely many points and we integrate them using the definition of Riemann integral of a function *f* (*t*) over the interval [*a,b*]:

$$\int\_{a}^{b} f(t)dt = \lim\_{\delta \to 0} \sum\_{i=1}^{n} f(\xi\_{i}^{n}) \left(t\_{i}^{n} - t\_{i-1}^{n}\right),\tag{2.1.7}$$

where *<sup>n</sup> <sup>i</sup> t* 's represents partitions of the interval,

$$a = t\_0^n < t\_1^n < t\_2^n \dots < t\_n^n = b$$

$$\mathcal{S} = \max\_{1 \le i \le n} (t\_i^n - t\_{i-1}^n)\_\prime \quad \text{and} \qquad \qquad t\_{i-1}^n \le \xi\_i^n \le t\_i^n \quad .$$

Riemann integral is used extensively in standard calculus where continuous functions are the main concern. The integral converges regardless of the chosen *<sup>n</sup> i* within [ <sup>1</sup> , *n n i i t t* ].

A generalization of Riemann integral is Stieltjes integral which is defined as the integral of *f*(*t*) with respect to a monotone function *g*(*t*) over the interval [*a,b*]:

$$\int\_{a}^{b} f(t) \, dg(t) = \lim\_{\delta \to 0} \sum\_{i=1}^{n} f(\xi\_{i}) (g(t\_{i}^{u}) - g(t\_{i-1}^{u})) \tag{2.1.8}$$

with the same definitions as above for and *<sup>n</sup> <sup>i</sup> t* 's. It can be shown that for the Stieltjes integral to exist for any continuous function *f*(*t*), *g*(*t*) must be a function with finite variation on [*a,b*]. This means that if *g*(*t*) has infinite variation on [*a,b*] then for such a function, integration has to be defined differently. This is the case in the integration of the continuous stochastic processes, therefore, can not be integrated using Stieltjes integral. Before we discuss alternative forms of integration that can be applied to the functions of positive quadratic variation, i.e. the functions of infinite variation, we introduce a fundamentally important stochastic process, the Wiener process and its properties.

### **2.2 Wiener Process**

The botanist Robert Brown, first observed that pollen grains suspended in liquid, undergo irregular motion. Centuries later, it was realised that the physical explanation of this is that the pollen grain is continually bombarded by molecules of the liquid travelling with different speeds in different directions. Over a time scale that is large compared with the intervals between molecular impacts, these will average out and no net force is exerted on the grain. However, this will not happen over a small time interval; and if the mass of the grain is small enough to undergo appreciable displacement in the small time interval as the result of molecular impacts, an observable erratic motion results. The crucial point to notice in the present context is that while the impacts and therefore the individual

26 in Porous Media - An Approach Based on Stochastic Calculus

Unlike in deterministic variables where any asymptotic behaviour can clearly be identified either graphically or numerically, stochastic variables do require adherence to one of the convergence criteria mentioned above which are called the "criteria for strong convergence". There are weakly converging stochastic processes and we do not discuss the weak convergence criteria as they are not relevant to the development of the material in this

In standard calculus we have continuous functions with discontinuities at finitely many points and we integrate them using the definition of Riemann integral of a function *f* (*t*) over

> 01 2 .... *nn n n <sup>n</sup> at t t t b* ,

Riemann integral is used extensively in standard calculus where continuous functions are

A generalization of Riemann integral is Stieltjes integral which is defined as the integral of

( ) ( ) lim ( )( ( ) ( )) *<sup>n</sup> <sup>b</sup> n n ii i <sup>a</sup> <sup>i</sup> f t dg t f g t g t*

integral to exist for any continuous function *f*(*t*), *g*(*t*) must be a function with finite variation on [*a,b*]. This means that if *g*(*t*) has infinite variation on [*a,b*] then for such a function, integration has to be defined differently. This is the case in the integration of the continuous stochastic processes, therefore, can not be integrated using Stieltjes integral. Before we discuss alternative forms of integration that can be applied to the functions of positive quadratic variation, i.e. the functions of infinite variation, we introduce a fundamentally

The botanist Robert Brown, first observed that pollen grains suspended in liquid, undergo irregular motion. Centuries later, it was realised that the physical explanation of this is that the pollen grain is continually bombarded by molecules of the liquid travelling with different speeds in different directions. Over a time scale that is large compared with the intervals between molecular impacts, these will average out and no net force is exerted on the grain. However, this will not happen over a small time interval; and if the mass of the grain is small enough to undergo appreciable displacement in the small time interval as the result of molecular impacts, an observable erratic motion results. The crucial point to notice in the present context is that while the impacts and therefore the individual

<sup>1</sup> <sup>0</sup> <sup>1</sup>

and 1

<sup>1</sup> <sup>0</sup> <sup>1</sup> ( ) lim ( ) ( ) *<sup>n</sup> <sup>b</sup> n nn i ii <sup>a</sup> <sup>i</sup> f t dt f t t* 

, (2.1.7)

*n nn i ii t t* .

(2.1.8)

*i* 

*<sup>i</sup> t* 's. It can be shown that for the Stieltjes

within [ <sup>1</sup> , *n n*

*i i t t* ].

book.

where *<sup>n</sup>*

the interval [*a,b*]:

**2.2 Wiener Process** 

*<sup>i</sup> t* 's represents partitions of the interval,

with the same definitions as above for and *<sup>n</sup>*

<sup>1</sup> <sup>1</sup> max( ), *n n i i i n*

*t t*

the main concern. The integral converges regardless of the chosen *<sup>n</sup>*

*f*(*t*) with respect to a monotone function *g*(*t*) over the interval [*a,b*]:

important stochastic process, the Wiener process and its properties.

displacements suffered by the grain can be considered independent at different times, the actual position of the grain can only change continuously.

In the physical Brownian motion, there are small but nevertheless finite intervals between the impulses of molecules colliding with the pollen grain. Consequently, the path that the grain follows, consists of a sequence of straight segments forming an irregular but continuous line – a so-called random walk. Each straight segment can be considered an increment of the momentary position of the grain.

The mathematical idealisation of the Brownian motion let the interval between increments approach zero. The resulting process – called the Wiener process due to N. Wiener – is difficult to conceptualise: for example, consideration shows that the resulting position is everywhere continuous, but nowhere differentiable. This means that while the particle has a position at any moment, and since this position is changing it is moving, yet no velocity can be defined. Nevertheless as discussed by Stroock and Varadhan (1979) a consistent mathematical description is obtained by defining the position as a stochastic process *B*(*t*,) with the following properties that are suggested as a mathematical model for the Brownian motion- a Wiener process:

**P1:** *B*(0*,*) *=* 0 , i.e. choose the position of the particle at the arbitrarily chosen initial time *t =* 0 as the coordinate origin;

**P2:** *B*(*t,*) has independent increments, i.e. *B*(*t*1*,*)*,* {*B*(*t*2*,*) *– B*(*t*1*,*) }*,…,* {*B*(*tk,*) *– B*(*tk-1,*) } are independent for all *0 t1 t2 … tk* ;

**P3:** *Bt Bt <sup>i</sup>*<sup>1</sup> , , *<sup>i</sup>* is normally distributed with mean 0 and variance 1 ( ) *i i t t* ;

**P4:** The stochastic variation of *B*(*t*,) at fixed time *t* is determined by a Gaussian probability;

**P5:** The Gaussian has a zero mean, *E*[*B*(*t*,)] *= 0* for all values of *t*;

**P6:** *B*(*t*,) are continuous functions of *t* for *t*  0 ;

**P7:** The covariance of Brownian motion is determined by a correlation between the values of B(*t*,) at times *ti* and *tj* (for fixed ), given by

$$E\left[B\left(t\_{i'}oo\right)B\left(t\_{j'}oo\right)\right] = \min\left(t\_{i'}t\_{j}\right). \tag{2.2.1}$$

When applied to *ti = tj = t*, P7 reduces to the statement that

$$\operatorname{Var}\left[B\left(t,w\right)\right] = t,\tag{2.2.2}$$

where '*Var*' means statistical variance. For the Brownian motion this can be interpreted as the statement that the radius within which the particle can be found increases proportional to time.

Because the Wiener process is defined by the independence of its increments, it is for some purposes convenient to reformulate the variance of a Wiener process in terms of the variance of the increments:

( ) () , *dB t <sup>t</sup> dt*

We will use this relationship to formulate stochastic differential equations.

*<sup>i</sup>*

 and ( ( , ) ( , )) *Bt Bt <sup>i</sup> <sup>j</sup>*

This leads equation (2.3.6) to <sup>2</sup> ( ( , ) ( , )) ( ( , )), *EBt Bt EB t i j*

This leads to P7: ( ( , ) ( , )) min( , ) *EBt Bt <sup>i</sup> <sup>j</sup> i j* 

Using a similar approach it can be shown that if *<sup>i</sup> <sup>j</sup> t t* ,

*i* 2 2

 ( ( , ) ( , )) ( ( , ) ( , )) *Cov B t B t E B t B t <sup>i</sup> j*

and *dB t t dt* () ()

Stochastic Differential Equations and Related Inverse Problems 29

As shown before, the relationships among the properties mentioned above can be derived starting from P1 to P7. For example, let us evaluate the covariance of Wiener processes,

(, ) (, ) (, ) (, ) *Bt Bt Bt Bt <sup>j</sup>*

( ( , ) ( , )) ( ( , )( ( , ) ( , ) ( , ))

 

*i j i i j i*

( ( , )( ( , ) ( , ))) ( ( , )) ( ( , ) ( , )) *EBt Bt Bt EBt EBt Bt <sup>j</sup>*

( ( , ) ( , ) ( , ))) 0 *EBt Bt Bt j i*

And <sup>2</sup> <sup>2</sup> ( ( , )) (( ( , )) 0) ) *EB t E Bt <sup>i</sup>* 

*<sup>i</sup>* , and , therefore, ( ( , ) ( , )) *Cov B t B t t <sup>i</sup>*

 ( ( , ) ( , )) *Cov B t B t t <sup>i</sup> j*

The above derivations show the relatedness of the variance of an independent increment,

*t t* .

 and ( ( , ) ( , )) 0 *EBt Bt <sup>i</sup> <sup>j</sup>*

*EBt Bt EBt Bt Bt Bt*

 

2 2

 

 

*EB t Bt Bt B t EB t Bt Bt Bt EB t EBt Bt Bt*

( ( , ) ( , ) ( , ) ( , )) ( ( , ) ( , )( ( , ) ( , ))) ( ( , )) ( ( , )( ( , ) ( , )))

*i i j i i j i j i j i j*

 

 

 *<sup>j</sup>* .

0by

 *<sup>i</sup>*

> 

} is normally distributed with a variance ( 0) *<sup>i</sup> t* , and equation

*j*

to the properties of Wiener process given by P1 to P7. The fact that

 

 

. (2.3.3)

*<sup>i</sup> <sup>j</sup>* . (2.3.4)

*<sup>j</sup> <sup>i</sup>* . (2.3.5)

 

> 

(2.3.6)

 

 

*<sup>j</sup> <sup>i</sup> <sup>i</sup> <sup>j</sup>* . (2.3.7)

*<sup>i</sup>* . (2.3.8)

*<sup>j</sup>* . (2.3.9)

*<sup>i</sup>* .

are independent processes and therefore we can

.

(, ) *B ti* 

Therefore,

From P2, ( , ) *B tj*

write ,

and ( , ) *B tj*

:

Assuming *<sup>i</sup> <sup>j</sup> t t* , we can express:

According to P3 and P5, ( ( , )) 0 *EBtj*

Therefore, from equation (2.3.7)

From P3, { ( , ) (0, ) *Bt B <sup>i</sup>* 

<sup>1</sup> <sup>2</sup> *Var B t B t* { ( , ) ( , )} 

(2.3.8) becomes, <sup>2</sup> ( ( , )) *EB t t <sup>i</sup>*

From P3, for *ti* < *tj* :

$$\left[Var\left[B(\mathbf{t}\_{/\prime},\alpha) - B(\mathbf{t}\_{\prime},\alpha)\right] = \mathbf{t}\_{/\cdot} - \mathbf{t}\_{\prime}\right] \tag{2.2.3}$$

Bearing in mind that the statistical definition of the variance of a quantity *X* reduces to the expectation value expression <sup>2</sup> <sup>2</sup> *Var X E X E X* [] [ ] [] and that the expectation value or mean of a Wiener process is zero, we can rewrite this as,

$$E[\left[B(t\_2,o) - B(t\_1,o)\right]^2] = Var[B(t\_2,o) - B(t\_1,o)], \quad \text{i.e.} \quad E[\Delta B \cdot \Delta B] = \Delta t\_1 \tag{2.2.4}$$

where *t* is defined to mean the time increment for a fixed realization .

The connection between the two formulations is established by similarly rewriting equation (2.2.3) and then applying equation (2.2.1):

$$\begin{aligned} Var[B(t\_1, \alpha) - B(t\_2, \alpha)] &= E[\{B(t\_1, \alpha) - B(t\_2, \alpha)\}^2] \\ &= E[B^2(t\_1, \alpha) + B^2(t\_1, \alpha) - 2B(t\_1, \alpha)B(t\_2, \alpha)] \\ &= t\_1 + t\_2 - 2\min(t\_1, t\_2) \\ &= t\_1 - t\_2 \text{ for } t\_1 > t\_2 \text{ .} \end{aligned}$$

### **2.3 Further Properties of Wiener Process and their Relationships**

Consider a stochastic process *X t*(, ) having a stationary joint probability distribution and *EXt* ( ( , )) *0*, i.e. the mean value of the process is zero. The Fourier transform of *Var X t* ( ( , )) can be written as,

$$S(\mathcal{X}, \boldsymbol{\alpha}) = \frac{1}{2\pi} \int\_{-\infty}^{\infty} Var(\mathbf{X}(\tau\_{\mathcal{I}}, \boldsymbol{\alpha}) \, e^{-i\boldsymbol{\lambda}\cdot\boldsymbol{\tau}} d\tau \,\,\,\,\,. \tag{2.3.1}$$

*S*(,) is called the spectral density of the process *X t*(, ) and is also a function of angular frequency . The inverse of the Fourier transform is given by

$$Var(X(\tau, o)) = \int\_{-\infty}^{\infty} S(\mathcal{A}\_{\mathcal{I}} \ o) \, e^{i\boldsymbol{\lambda} \cdot \boldsymbol{\varepsilon}} d\boldsymbol{\lambda} \,. \tag{2.3.2}$$

Therefore variance of *X*(0, ) is the area under a graph of spectral density *S*(,) against :

$$Var(X(0,o)) = E(X^2(0,o)) \quad \text{because} \quad E(X(t,o)) = 0 \dots$$

Spectral density *S*(,) is considered as the "average power" per unit frequency at which gives rise to the variance of *X t*(, ) at = 0 . If the average power is a constant, the power is distributed uniformly across the frequency spectrum, such as the case for white light, then *X t*(, ) is called white noise. White noise is often used to model independent random disturbances in engineering systems, and the increments of Wiener process have the same characteristics as white noise. Therefore white noise ( ( )) *t* is defined as

$$
\zeta'(t) = \frac{d B(t)}{dt} \quad , \quad \text{and} \quad d B(t) = \zeta'(t)dt \,. \tag{2.3.3}
$$

We will use this relationship to formulate stochastic differential equations.

As shown before, the relationships among the properties mentioned above can be derived starting from P1 to P7. For example, let us evaluate the covariance of Wiener processes, (, ) *B ti* and ( , ) *B tj* :

$$\text{Cov}(\mathcal{B}(\mathbf{t}\_{\rangle}, oo)\mathcal{B}(\mathbf{t}\_{\rangle}, oo)) = \text{E}(\mathcal{B}(\mathbf{t}\_{\rangle}, oo)\mathcal{B}(\mathbf{t}\_{\rangle}, oo))\,. \tag{2.3.4}$$

Assuming *<sup>i</sup> <sup>j</sup> t t* , we can express:

$$B(t\_{\rangle'}\alpha) = B(t\_{\rangle'}\alpha) + B(t\_{\rangle'}\alpha) - B(t\_{\rangle'}\alpha) \,. \tag{2.3.5}$$

Therefore,

Computational Modelling of Multi-Scale Non-Fickian Dispersion

*<sup>j</sup> <sup>i</sup>* (2.2.3)

, i.e. *EB B t* [ ] , (2.2.4)

 

having a stationary joint probability distribution and

.

28 in Porous Media - An Approach Based on Stochastic Calculus

[ ( , ) ( , )] *Var B t B t t t <sup>j</sup> <sup>i</sup>*

Bearing in mind that the statistical definition of the variance of a quantity *X* reduces to the expectation value expression <sup>2</sup> <sup>2</sup> *Var X E X E X* [] [ ] [] and that the expectation value or mean

The connection between the two formulations is established by similarly rewriting equation

2 2

1 2 1 2 12 1 2

*0*, i.e. the mean value of the process is zero. The Fourier transform of

 

 

( ( ) <sup>1</sup> (,) <sup>2</sup> , *<sup>i</sup> <sup>S</sup> Var X e d*

( ( , )) ( , ) *<sup>i</sup> Var X S e d*

 

*t t tt tt t t*

2min( , ) for .

 

2

1 1 1 2

 

> 

 

is the area under a graph of spectral density

.

. (2.3.2)

= 0 . If the average power is a constant, the

*t* is defined as

because *EXt* ( ( , )) 0

is considered as the "average power" per unit frequency at

is called white noise. White noise is often used to model independent

. (2.3.1)

and is also a function of angular

[ ( , ) ( , ) 2 ( , ) ( , )]

*EB t B t Bt Bt*

From P3, for *ti* < *tj* :

of a Wiener process is zero, we can rewrite this as,

 

(2.2.3) and then applying equation (2.2.1):

Consider a stochastic process *X t*(, )

can be written as,

*EXt* ( ( , )) 

*S*(,) 

*S*(,) against

frequency

Spectral density *S*(,)

light, then *X t*(, )

Therefore variance of *X*(0, )

 

which gives rise to the variance of *X t*(, )

:

*Var X t* ( ( , )) 

2 <sup>2</sup> <sup>1</sup> <sup>2</sup> <sup>1</sup> *E B t B t Var B t B t* [{ ( , ) ( , )} ] [ ( , ) ( , )]

where *t* is defined to mean the time increment for a fixed realization

*Var B t B t E B t B t*

**2.3 Further Properties of Wiener Process and their Relationships** 

 

. The inverse of the Fourier transform is given by

power is distributed uniformly across the frequency spectrum, such as the case for white

random disturbances in engineering systems, and the increments of Wiener process have

 at

is called the spectral density of the process *X t*(, )

<sup>2</sup> *Var X E X* ( (0, )) ( (0, )) 

the same characteristics as white noise. Therefore white noise ( ( ))

1 2 1 2

[ ( , ) ( , )] [{ ( , ) ( , )} ]

 

2 2 2 2 ( ( , ) ( , )) ( ( , )( ( , ) ( , ) ( , )) ( ( , ) ( , ) ( , ) ( , )) ( ( , ) ( , )( ( , ) ( , ))) ( ( , )) ( ( , )( ( , ) ( , ))) *i j i i j i i i j i i j i j i j i j EBt Bt EBt Bt Bt Bt EB t Bt Bt B t EB t Bt Bt Bt EB t EBt Bt Bt* (2.3.6) from )) 0

 From P2, ( , ) *B tj* and ( ( , ) ( , )) *Bt Bt <sup>i</sup> <sup>j</sup>* are independent processes and therefore we can write ,

$$E(B(t\_{\rangle}, o)(B(t\_{\rangle}, o) - B(t\_{\rangle}, o))) = E(B(t\_{\rangle}, o))E(B(t\_{\rangle}, o) - B(t\_{\rangle}, o))\quad. \tag{2.3.7}$$

According to P3 and P5, ( ( , )) 0 *EBtj* and ( ( , ) ( , )) 0 *EBt Bt <sup>i</sup> <sup>j</sup>* .

Therefore, from equation (2.3.7)

$$E(B(t\_{\perp'}, \alpha)B(t\_{\perp'}, \alpha) - B(t\_{\perp'}, \alpha))) = 0 \dots$$

This leads equation (2.3.6) to <sup>2</sup> ( ( , ) ( , )) ( ( , )), *EBt Bt EB t i j <sup>i</sup>*

$$\text{And }\ E(B^2(t\_\vee, o)) = E(\{B(t\_\vee, o)\} - 0)^2\text{.}\tag{2.3.8}$$

From P3, { ( , ) (0, ) *Bt B <sup>i</sup>* } is normally distributed with a variance ( 0) *<sup>i</sup> t* , and equation (2.3.8) becomes, <sup>2</sup> ( ( , )) *EB t t <sup>i</sup> <sup>i</sup>* , and , therefore, ( ( , ) ( , )) *Cov B t B t t <sup>i</sup> j <sup>i</sup>* .

Using a similar approach it can be shown that if *<sup>i</sup> <sup>j</sup> t t* ,

$$\text{Cov}(\mathcal{B}(\mathbf{t}\_{i'}, o)\mathcal{B}(\mathbf{t}\_{j'}, o)) = \mathbf{t}\_{j'} \,. \tag{2.3.9}$$

This leads to P7: ( ( , ) ( , )) min( , ) *EBt Bt <sup>i</sup> <sup>j</sup> i j t t* .

The above derivations show the relatedness of the variance of an independent increment, <sup>1</sup> <sup>2</sup> *Var B t B t* { ( , ) ( , )} to the properties of Wiener process given by P1 to P7. The fact that

Summarizing the results,

<sup>1</sup> ( ( ) ( )) *n n Bt Bt t i i* almost surely as *<sup>n</sup>* .

, [ , ]( ) *BB t t* .

It can also be shown that <sup>2</sup> { (, ) } *Bt t*

4. Wiener process has Markov property

can be found in Klebaner (1998).

Therefore, the quadratic variation of Brownian motion *B t*(, )

is a martingale.

and

and

Omitting *t* and

3. Wiener process ( ( , )) *B t*

that ( ( )| ) ( ) *EBt s F Bt <sup>t</sup>* .

future states.

This can be expressed as

Converging almost surely.

predecessor 1 { ( ) ( )} *n n Bt Bt i i* .

2

is *t*:

. (2.3.13)

2

are also martingales.

{exp( ( , ) )} <sup>2</sup> *Bt t* 

 

( ( ) | ) ( ( ) | ( )) *PXt s y F PXt s y Xt <sup>t</sup>* , (2.3.14)

<sup>1</sup> ( ( ( ) ( )) ) , *n n E Bt Bt t i i*

Stochastic Differential Equations and Related Inverse Problems 31

2 <sup>1</sup> ( ( ( ) ( )) ) 0 *n n Var B t B t i i* as *<sup>n</sup>* .

This implies that, according to the monotone convergence theories that 2

Brownian

[ ( , ), ( , )]( ) *Bt Bt t t* 

A stochastic process, { ( )} *X t* is a martingale, when the future expected value of { ( )} *X t* is equal to {X (*t*)}. In mathematical notation, ( ( )| ) ( ) *EXt s F Xt <sup>t</sup>* with converging almost surely, *Ft is* the information about {*X*(*t*)} up to time *t*. We do not give the proof of these martingale characteristics of Brownian motion here but it is easy to show

These martingales can be used to characterize the Wiener process as well and more details

Markov property simply states that the future of a process depends only on the present state. In other words, a stochastic process having Markov property does not "remember" the past and the present state contains all the information required to drive the process into the

From the very definition of increments of the Wiener process for the discretized intervals of [0,t], 1 { ( ) ( )} *<sup>n</sup> <sup>n</sup> Bt Bt <sup>i</sup> <sup>i</sup>* , the Wiener process increment behaves independently to its immediate

<sup>1</sup> { ( , ) ( , )} *Bt Bt <sup>i</sup> <sup>i</sup>* is a Gaussian random variable with zero mean and 1 { } *i i t t* variance can be used to construct Wiener process paths on computer. If we divide the time interval [0, ]*t* into *n* equidistant parts having length *t* , and at the end of each segment we can randomly generate a Brownian increment using the Normal distribution with mean 0 and variance *t* . This increment is simply added to the value of Wiener process at the point considered and move on to the next point. When we repeat this procedure starting from *t t* to *t=t* and taking the fact that *B*(0, ) 0 into account, we can generate a realization of Wiener process. We can expect these Wiener process realizations to have properties quite distinct from other continuous functions of *t*. We will briefly discuss some important characteristics of Wiener process realizations next as these results enable us to utilise this very useful stochastic process effectively.

Some useful characteristics of Wiener process realizations *B t* ,are

1. *B t* ,is a continuous , nondifferentiable function of *t*.

2. The quadratic variation of *Bt Bt Bt t* ( , ), [ ( , ), ( , )]( ) over [0, ]*t* is *t*.

Using the definition of covariation of functions,

$$\begin{aligned} [B(t,\alpha),B(t,\alpha)](t) &= [B(t,\alpha),B(t,\alpha)]([0,t])\\ &= \lim\_{\delta\_i \to 0} \sum\_{i=1}^n [B(t\_i^n) - B(t\_{i-1}^n)]^2 \end{aligned} \tag{2.3.10}$$

where max( ) <sup>1</sup> *n n n i i t t* and 1 { } *n n i i t* is a partition of [0 , ]*t* , as *n* , *<sup>n</sup>* .

Taking the expectation of the summation,

$$E\left(\sum \left(B(t\_{i}^{u}) - B(t\_{i-1}^{u})\right)^{2}\right) = \sum \left(E\left(\left(B(t\_{i}^{u}) - B(t\_{i-1}^{u})\right)^{2}\right)\right). \tag{2.3.11}$$

2 <sup>1</sup> (( ( ) ( )) ) *n n E Bt Bt i i* is the variance of an independent increment <sup>1</sup> { ( ) ( )} *n n Bt Bt i i* .

As seen before, 1 <sup>1</sup> [ ( ) ( )] ( ) *n n n n Var B t B t t t i i i i* .

Therefore,

$$\begin{split} E\left(\sum \{B(t\_i^n) - \{B(t\_{i-1}^n)\}^2\right) &= \sum Var[B(t\_i^n) - B(t\_{i-1}^n)] \\ &= \sum\_{i=1}^n \{t\_i^n - t\_{i-1}^n\} = t - 0 = t. \end{split} \tag{2.3.12}$$

Let us take the variance of <sup>2</sup> <sup>1</sup> ( ( ) ( )) *n n Bt Bt i i* :

2 2 <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) ( ( ) ( )) 3( ) 3 max ( ) 3 *n n n n n n Var B t B t i i i i i i <sup>n</sup> t t tt t t* , and as <sup>2</sup> <sup>1</sup> , 0, ( ( ) ( )) 0 *n n n <sup>n</sup> Var B t B t i i* .

Summarizing the results,

$$E(\sum (B(t\_i^n) - B(t\_{i-1}^n))^2) = t\_{\prime\prime}$$

and

Computational Modelling of Multi-Scale Non-Fickian Dispersion

into account, we can generate a realization

are

over [0, ]*t* is *t*.

2

. (2.3.10)

(2.3.12)

 .

is a Gaussian random variable with zero mean and 1 { } *i i t t* variance

30 in Porous Media - An Approach Based on Stochastic Calculus

can be used to construct Wiener process paths on computer. If we divide the time interval [0, ]*t* into *n* equidistant parts having length *t* , and at the end of each segment we can randomly generate a Brownian increment using the Normal distribution with mean 0 and variance *t* . This increment is simply added to the value of Wiener process at the point considered and move on to the next point. When we repeat this procedure starting from

of Wiener process. We can expect these Wiener process realizations to have properties quite distinct from other continuous functions of *t*. We will briefly discuss some important characteristics of Wiener process realizations next as these results enable us to utilise this

*n*

<sup>1</sup> (( ( ) ( )) ) *n n E Bt Bt i i* is the variance of an independent increment <sup>1</sup> { ( ) ( )} *n n Bt Bt i i* .

<sup>1</sup> ( ( ) ( )) *n n Bt Bt i i* :

2

*E B t B t Var B t B t*

( ) *<sup>n</sup> <sup>n</sup> n n i i i i <sup>n</sup> n n i i*

1

*i*

2 2 <sup>1</sup> <sup>1</sup> <sup>1</sup> ( ) ( ( ) ( )) 3( ) 3 max ( ) 3 *n n n n n n Var B t B t i i i i i i <sup>n</sup> t t tt t t*

( ( ) ( ( )) [ ( ) ( )]

, and

 

[ ( , ), ( , )]( ) [ ( , ), ( , )]([0, ])

*Bt Bt t Bt Bt t*

*i*

<sup>1</sup> <sup>0</sup> <sup>1</sup>

*<sup>n</sup> n n i i*

*Bt Bt*

*i i t* is a partition of [0 , ]*t* , as *n* , *<sup>n</sup>*

2 2 <sup>1</sup> <sup>1</sup> ( ( ( ) ( )) ) ( (( ( ) ( )) )) *n n n n E Bt Bt E Bt Bt i i i i* . (2.3.11)

1 1

1

( ) 0.

*tt t t*

lim [ ( ) ( )]

 

<sup>1</sup> { ( , ) ( , )} *Bt Bt <sup>i</sup> <sup>i</sup>*

1. *B t* , *t t* to *t=t* and taking the fact that *B*(0, ) 0

Some useful characteristics of Wiener process realizations *B t* ,

2. The quadratic variation of *Bt Bt Bt t* ( , ), [ ( , ), ( , )]( )

Using the definition of covariation of functions,

*n n*

Taking the expectation of the summation,

*t t* and 1 { } *n n*

As seen before, 1 <sup>1</sup> [ ( ) ( )] ( ) *n n n n Var B t B t t t i i i i* .

*B t* 

Let us take the variance of <sup>2</sup>

.

as <sup>2</sup> <sup>1</sup> , 0, ( ( ) ( )) 0 *n n n <sup>n</sup> Var B t B t i i*

where max( ) <sup>1</sup>

Therefore,

*n i i*

2

is a continuous , nondifferentiable function of *t*.

very useful stochastic process effectively.

$$\operatorname{Var}(\sum(B(t\_i^{\boldsymbol{\mu}}) - B(t\_{i-1}^{\boldsymbol{\mu}}))^2) \to 0 \quad \text{as} \quad \boldsymbol{n} \to \infty \dots$$

This implies that, according to the monotone convergence theories that 2 <sup>1</sup> ( ( ) ( )) *n n Bt Bt t i i* almost surely as *<sup>n</sup>* .

Therefore, the quadratic variation of Brownian motion *B t*(, ) is *t*:

$$[B(t,o),B(t,o)](t) = t \,. \tag{2.3.13}$$

Omitting *t* and, [ , ]( ) *BB t t* .

3. Wiener process ( ( , )) *B t* is a martingale.

A stochastic process, { ( )} *X t* is a martingale, when the future expected value of { ( )} *X t* is equal to {X (*t*)}. In mathematical notation, ( ( )| ) ( ) *EXt s F Xt <sup>t</sup>* with converging almost surely, *Ft is* the information about {*X*(*t*)} up to time *t*. We do not give the proof of these martingale characteristics of Brownian motion here but it is easy to show that ( ( )| ) ( ) *EBt s F Bt <sup>t</sup>* .

It can also be shown that <sup>2</sup> { (, ) } *Bt t* and 2 {exp( ( , ) )} <sup>2</sup> *Bt t* are also martingales. These martingales can be used to characterize the Wiener process as well and more details can be found in Klebaner (1998).

### 4. Wiener process has Markov property

Markov property simply states that the future of a process depends only on the present state. In other words, a stochastic process having Markov property does not "remember" the past and the present state contains all the information required to drive the process into the future states.

This can be expressed as

$$P(X(t+s)\le y \mid F\_t) = P(X(t+s)\le y \mid X(t))\,. \tag{2.3.14}$$

Converging almost surely.

From the very definition of increments of the Wiener process for the discretized intervals of [0,t], 1 { ( ) ( )} *<sup>n</sup> <sup>n</sup> Bt Bt <sup>i</sup> <sup>i</sup>* , the Wiener process increment behaves independently to its immediate predecessor 1 { ( ) ( )} *n n Bt Bt i i* .

constant *c*. If *X*(*t*) is a deterministic process, which can be expressed as a sequence of

Stochastic Differential Equations and Related Inverse Problems 33

1

*i i i*

*c Bt Bt*

(( ( ) ( )))

1 2 1

*n*

*i*

( )

. (2.4.2)

, is only dependent on *Ft* then it is called an

similarly to equation (2.4.1) if *X t*(, )

(2.4.3)

is defined as a predictable

with almost surely

exists. The

is a

*ii i*

*ct t*

is a continuous stochastic process and its future values are solely

<sup>0</sup> (, ) *<sup>T</sup> X t dt* 

0

can be defined as

for convenience and adhering to the same discretization of interval [*S, T*] as in

1

. (2.4.4)

*<sup>S</sup> X t dB t* based on the convergence in

, (2.4.1)

1

*T S n*

[ ] () ()

Using the property of independent increments of Brownian motion, we can show that the

*I X X t dB t*

0

The time interval [*S,T*] has been discretized into *n* intervals : *St t t T* 0 1 *<sup>n</sup>* .

*Variance Var I X* ( [ ])

dependent on the information of this process only up to *t,* Ito integral *I X*[ ]( )

As we are only concerned about continuous processes driven by the past events, we limit our discussion of predictable processes. Reader may want to refer to ksendal (1998) and

*Xt Bt Bt* 

1

1

0 [ ] ( ) ( ) ( )( ( ) ( )) *<sup>n</sup> <sup>T</sup> n n <sup>n</sup> i i <sup>i</sup> <sup>S</sup> <sup>i</sup> I X X t dB t X t B t B t* 

probability. We take equation (2.4.3) as the definition of Ito integral for the purpose of this

*n n n i i i*

( , )( ( , ) ( , )) ,

*i*

constants over small intervals, we can define Ito integral as follows:

1

is zero and,

adapted process. A left-continuous adapted process *X t*(, )

Klebaner (1998) for more rigorous discussion of these concepts.

1

*n*

*i*

Equation (2.4.4) expresses an approximation of () () *<sup>T</sup>*

0

process and it satisfies the following condition: <sup>2</sup>

future states on a stochastic process, *X t*(, )

We can now define Ito integral *I X*[ ]( )

and this sum converges in probability.

continuous and adapted process then *I X*[ ]( )

*i n* 0, , 1 .

where 0

mean of *I X*[ ]( )

convergence.

Dropping

equation (2.4.1),

, , () { *ii i t S t tt*

*<sup>c</sup> X t <sup>c</sup>* 

It turns out that if *X t*(, )

In other words 1 { ( ) ( )} *<sup>n</sup> <sup>n</sup> Bt Bt <sup>i</sup> <sup>i</sup>* does not remember the behaviour of 1 { ( ) ( )} *n n Bt Bt i i* and only element common to both increments is ( ) *<sup>n</sup> B ti* .

One can now see intuitively why Wiener process should behave as a Markov process. This can be expressed as

$$P(B(t\_i + \mathbf{s}) \le \mathbf{x}\_i \mid \{B(t\_i), B(t\_{i-1})...0\}) = P(B(t\_i + \mathbf{s}) \le \mathbf{x}\_i \mid B(t\_i)) \,\tag{2.3.15}$$

which is another way of expressing the previous equation (2.3.14).

### 5. Generalized form of Wiener process

The Wiener process as defined above is sometimes called the standard Wiener process, to distinguish it from that obtained by the following generalization:

$$E[\left.B(t\_{i'},o)\left.B(t\_{j'},o)\right.\right] = \int\_{0}^{\min(t\_{i},t\_{j})} q(\tau) \,d\tau \,.$$

The integral kernel *q*() is called the correlation function and determines the correlation between stochastic process values at different times. The standard Wiener process is the simple case that *q*()=1 , i.e. full correlation over any time interval; the generalised Wiener process includes, for example, the case that *q* decreases, and there is progressively less correlation between the values in a given realization as the time interval between them increases.

### **2.4 Stochastic Integration**

At this point of our discussion, we need to define the integration of stochastic process with respect to the Wiener process ( ( , )) *B t* so that we understand the conditions under which this integral exists and what kind of processes can be integrated using this integral. The Stieltjes integral can not used to integrate the functions of infinite variation, and therefore, there is a need to define the integrals for the stochastic process such as the Wiener process. There are two choices available: Ito definition of integration and Stratanovich integration. These two definitions produce entirely different integral stochastic process.

The Ito definition is popular among mathematicians and physicists tend to use the Stratanovich integral. The Ito integral has the martingale property among many other martingale

useful technical properties (Keizw, 1987), and in addition, the Stratanovich integrals can be reduced to Ito integrals (Klebaner, 1998). In this monograph, we confine ourselves to Ito definition of integration:

$$I[X](o) = \int\_{S}^{\top} X(t, o) \, dB(t, o) \, .$$

*I X*[ ]( ) implies that the integration of *X t* , is along a realization and with respect to the Wiener process (a.k.a Brownian motion) which is a function of *t*. *I X*[ ]( ) is also a stochastic process in its own right and have properties originating from the definition of the integral. It is natural to expect *I X*[ ]( ) to be equal to *cBt Bs* ( ( , ) ( , )) when *X t*(, ) is a

Computational Modelling of Multi-Scale Non-Fickian Dispersion

32 in Porous Media - An Approach Based on Stochastic Calculus

In other words 1 { ( ) ( )} *<sup>n</sup> <sup>n</sup> Bt Bt <sup>i</sup> <sup>i</sup>* does not remember the behaviour of 1 { ( ) ( )} *n n Bt Bt i i* and

One can now see intuitively why Wiener process should behave as a Markov process. This

The Wiener process as defined above is sometimes called the standard Wiener process, to

[ ( , ) ( , ) ] ( )

between stochastic process values at different times. The standard Wiener process is the

process includes, for example, the case that *q* decreases, and there is progressively less correlation between the values in a given realization as the time interval between them

At this point of our discussion, we need to define the integration of stochastic process with

this integral exists and what kind of processes can be integrated using this integral. The Stieltjes integral can not used to integrate the functions of infinite variation, and therefore, there is a need to define the integrals for the stochastic process such as the Wiener process. There are two choices available: Ito definition of integration and Stratanovich integration.

The Ito definition is popular among mathematicians and physicists tend to use the Stratanovich integral. The Ito integral has the martingale property among many other useful technical properties (Keizw, 1987), and in addition, the Stratanovich integrals can be reduced to Ito integrals (Klebaner, 1998). In this monograph, we confine ourselves to Ito

> [ ]( ) ( , ) ( , ) *<sup>T</sup> <sup>S</sup> I X X t dB t*

stochastic process in its own right and have properties originating from the definition of the

.

 

to be equal to *cBt Bs* ( ( , ) ( , ))

is along a realization

 and with respect

is also a

is a

when *X t*(, )

<sup>1</sup> ( ( ) |{ ( ), ( )...0)}) ( ( ) | ( )) *PBt s x Bt Bt PBt s x Bt <sup>i</sup> i ii <sup>i</sup> i i* , (2.3.15)

min( , )

*i j t t*

 *<sup>d</sup>* .

) is called the correlation function and determines the correlation

so that we understand the conditions under which

)=1 , i.e. full correlation over any time interval; the generalised Wiener

0

*q*

only element common to both increments is ( ) *<sup>n</sup> B ti* .

5. Generalized form of Wiener process

respect to the Wiener process ( ( , )) *B t*

which is another way of expressing the previous equation (2.3.14).

distinguish it from that obtained by the following generalization:

*E Bt Bt <sup>i</sup> <sup>j</sup>* 

These two definitions produce entirely different integral stochastic process.

to the Wiener process (a.k.a Brownian motion) which is a function of *t*. *I X*[ ]( )

implies that the integration of *X t* ,

integral. It is natural to expect *I X*[ ]( )

can be expressed as

The integral kernel *q*(

**2.4 Stochastic Integration** 

definition of integration:

*I X*[ ]( ) 

simple case that *q*(

increases.

constant *c*. If *X*(*t*) is a deterministic process, which can be expressed as a sequence of constants over small intervals, we can define Ito integral as follows:

$$\begin{split} I[X] &= \int\_{S}^{\top} X(t) \, dB(t) \\ &= \sum\_{i=0}^{n-1} c\_i ( (B(t\_{i+1}) - B(t\_i)) ) \end{split} \tag{2.4.1}$$

where 0 1 , , () { *ii i t S t tt <sup>c</sup> X t <sup>c</sup> i n* 0, , 1 .

The time interval [*S,T*] has been discretized into *n* intervals : *St t t T* 0 1 *<sup>n</sup>* .

Using the property of independent increments of Brownian motion, we can show that the mean of *I X*[ ]( ) is zero and,

$$Variance = Var(I[X]) = \sum\_{i=0}^{n-1} c\_i^2 (t\_{i+1} - t\_i) \,. \tag{2.4.2}$$

It turns out that if *X t*(, ) is a continuous stochastic process and its future values are solely dependent on the information of this process only up to *t,* Ito integral *I X*[ ]( ) exists. The future states on a stochastic process, *X t*(, ) , is only dependent on *Ft* then it is called an adapted process. A left-continuous adapted process *X t*(, ) is defined as a predictable process and it satisfies the following condition: <sup>2</sup> <sup>0</sup> (, ) *<sup>T</sup> X t dt* with almost surely convergence. time interval integration.

As we are only concerned about continuous processes driven by the past events, we limit our discussion of predictable processes. Reader may want to refer to ksendal (1998) and Klebaner (1998) for more rigorous discussion of these concepts.

We can now define Ito integral *I X*[ ]( ) similarly to equation (2.4.1) if *X t*(, ) is a continuous and adapted process then *I X*[ ]( ) can be defined as

$$\sum\_{i=0}^{n-1} X(t\_i^n, o) (B(t\_{i+1}^n, o) - B(t\_i^n, o))\tag{2.4.3}$$

and this sum converges in probability.

Dropping for convenience and adhering to the same discretization of interval [*S, T*] as in equation (2.4.1),

$$I[X] = \int\_{S}^{T} X(t) dB(t) = \sum\_{i=0}^{n-1} X(t\_i^{\mu}) \left( B(t\_{i+1}^{\mu}) - B(t\_i^{\mu}) \right) \tag{2.4.4}$$

Equation (2.4.4) expresses an approximation of () () *<sup>T</sup> <sup>S</sup> X t dB t* based on the convergence in probability. We take equation (2.4.3) as the definition of Ito integral for the purpose of this

5. Ito integral of a deterministic function *X*(*t*) is a Guassian process with zero mean and

Stochastic Differential Equations and Related Inverse Problems 35

<sup>0</sup> <sup>0</sup> <sup>0</sup> ( [ ( )], [ ( )]) ( ) , 0. *<sup>t</sup>*

<sup>0</sup> [ [ ], [ ]]( ) ( ) *<sup>T</sup>*

We see that Ito integral has a positive quadratic variation making it a process with

7. Quadratic covariation of Ito integral with respect to processes 1 *X t*( ) and <sup>2</sup> *X t*( ) is given by

Armed with these properties we can proceed to discuss the machinery of stochastic calculus

As we have seen previously, the quadratic variations of Brownian motion, [*B***(***t***,**

[ ( , ), ( , )]( ) lim ( ( ) ( )) *n*

*t t* . Using the differential notation, <sup>1</sup> ( ) () *<sup>n</sup> <sup>n</sup> B Bt Bt <sup>i</sup> <sup>i</sup>* , and summation as

<sup>0</sup> [ ( , ), ( , )]( ) ( ( )) *<sup>t</sup> B t B t t dB s*

For our convenience and also to make the notation similar to the one in standard differential

This equation does not have a meaning outside the integral equation (2.5.3) and should not

2

0 ( ( )) *t*

*dB s t* .

 

*Bt Bt t Bt Bt* 

1 2 1 2 <sup>0</sup> [ [ ], [ ]]( ) ( ) ( ) *<sup>T</sup>*

1

 

*n*

*i*

<sup>1</sup> <sup>0</sup> <sup>0</sup>

2

2

*I X I X t X t dt* . (2.4.12)

*I X I X t X t X t dt* . (2.4.13)

2

, (2.5.1)

. (2.5.2)

*dB s t* (2.5.3)

as <sup>2</sup> ( ( )) *dB t dt* . (2.5.4)

*n n i i*

2

**),**

*Cov I X t I X t t X s ds t* (2.4.11)

covariance function,

*I* [*X* (*t* ) ] is a square integrable martingale.

**2.5 Stochastic Chain Rule (Ito Formula)** 

max( ) 0 <sup>1</sup> *n n*

*n i i*

calculus, we denote

be interpreted in any other way.

an integral ,

B(*t,*

infinite variation i.e., it is a nondifferentiable continuous function of *t*.

such as stochastic chain rule, which is also known as Ito formula.

)](*t*)*,* is the limit in probability over the interval [0*,t* ]:

 

0 ( ( )) *t*

We have shown that [*B,B*] (*t*) *= t*, and therefore, <sup>2</sup>

6. Quadratic variation of Ito integral,

book. As stated earlier *I X*[ ]( ) is a stochastic process and it has the following properties (see, for example, ksendal (1998) for more details):

1. Linearity

If *X*(*t*) and *Y*(*t*) are predictable processes and and as some constants, then

$$I[\alpha X + \beta \, Y](\alpha) = \alpha \, I[X](\alpha) + \beta \, I[Y](\alpha) \,. \tag{2.4.5}$$

2. Zero mean Property

$$E(I[X](\alpha)) = 0 \,. \tag{2.4.6}$$

3. Isometry Property

$$E[\left(\int\_{S}^{T} X(t) \, dB(t)\right)^{2}] = \int\_{S}^{T} E(X^{2}(t)) \, dt \, \,. \tag{2.4.7}$$

The isometry property says that the expected value of the square of Ito integral is the integral with respect to *t* of the expectation of the square of the process *X* (*t*)*.*  Since [( ( ) ( ))] 0 *<sup>T</sup> <sup>S</sup> E X t dB t* from zero mean property, we can express the left hand side of equation (2.4.7) as

$$\begin{aligned} &E(\left(\int\_{S}^{T}X(t)dB(t)\right)^{2} - E(\int\_{S}^{T}X(t)dB(t))) \\ &= E[\int\_{S}^{T}X(t)dB(t) - E(\int\_{S}^{T}X(t)dB(t))]^{2} = Var\left(\int\_{S}^{T}X(t)dB(t)\right) \end{aligned} \tag{2.4.8}$$

Therefore the variance of Ito integral process is <sup>2</sup> ( ( )) *<sup>T</sup> <sup>s</sup> E X t dt* and this result will be useful to us in understanding the behaviour of Ito integral process. We say that Ito integral is square integrable. According to Fubuni's Theorem, which states that, for a stochastic process *X*(*t*) *,* with continuous realizations,

$$E(\int\_s^T X(t)dt) = \int\_s^T E(X(t))dt \,\,\prime \,\,\,\tag{2.4.9}$$

and

$$E(\int\_s^T X^2(t)dt) = \int\_s^T E(X^2(t))dt\ . \tag{2.4.10}$$

4. Ito integral is a martingale

It can be shown that ( [ ( )]| ) [ ( )] *EIXt F IXt <sup>t</sup>* . Strictly speaking *X*(*t*) should satisfy <sup>2</sup> ( ) *<sup>T</sup> <sup>S</sup> X t dt* and <sup>2</sup> ( ( )) *<sup>T</sup> <sup>S</sup> E X t dt* for martingale property to be true. Therefore, Ito integrals are square integrable martingales.

5. Ito integral of a deterministic function *X*(*t*) is a Guassian process with zero mean and covariance function,

$$\text{Cov}(I[X(t)], I[X(t+t\_0)]) = \int\_0^t X^2(s)ds \, \text{s. } t\_0 \ge 0. \tag{2.4.11}$$

*I* [*X* (*t* ) ] is a square integrable martingale.

6. Quadratic variation of Ito integral,

Computational Modelling of Multi-Scale Non-Fickian Dispersion

is a stochastic process and it has the following properties

 . (2.4.5)

*E X t dB t E X t dt* . (2.4.7)

as some constants, then

. (2.4.6)

*<sup>s</sup> E X t dt* and this result will be useful

*<sup>s</sup> <sup>s</sup> E X t dt E X t dt* , (2.4.9)

*<sup>s</sup> <sup>s</sup> E X t dt E X t dt* . (2.4.10)

*<sup>S</sup> E X t dt* for martingale property to be true. Therefore, Ito

. (2.4.8)

34 in Porous Media - An Approach Based on Stochastic Calculus

 and 

*I X Y IX IY* [ ]( ) [ ]( ) [ ]( )

*EIX* ( [ ]( )) 0 

<sup>2</sup> <sup>2</sup> [( ( ) ( )) ] ( ( )) *<sup>T</sup> <sup>T</sup> S S*

The isometry property says that the expected value of the square of Ito integral is the integral with respect to *t* of the expectation of the square of the process *X* (*t*)*.* 

2

*E X t dB t E X t dB t*

*T T S S*

Therefore the variance of Ito integral process is <sup>2</sup> ( ( )) *<sup>T</sup>*

(( ( ) ( )) ( ( ) ( )))

*<sup>S</sup> E X t dB t* from zero mean property, we can express the left hand side of

[ ( ) ( ) ( ( ) ( ))] ( ) () ()

*E X t dB t E X t dB t Var X t dB t*

to us in understanding the behaviour of Ito integral process. We say that Ito integral is square integrable. According to Fubuni's Theorem, which states that, for a stochastic process

( ( ) ) ( ( )) *<sup>T</sup> <sup>T</sup>*

It can be shown that ( [ ( )]| ) [ ( )] *EIXt F IXt <sup>t</sup>* . Strictly speaking *X*(*t*) should

<sup>2</sup> <sup>2</sup> ( ( ) ) ( ( )) *<sup>T</sup> <sup>T</sup>*

*T T T S S S*

2

 

book. As stated earlier *I X*[ ]( )

1. Linearity

2. Zero mean Property

3. Isometry Property

Since [( ( ) ( ))] 0 *<sup>T</sup>*

*X*(*t*) *,* with continuous realizations,

4. Ito integral is a martingale

*<sup>S</sup> X t dt* and <sup>2</sup> ( ( )) *<sup>T</sup>*

integrals are square integrable martingales.

( ) *<sup>T</sup>*

equation (2.4.7) as

satisfy <sup>2</sup>

and

 

(see, for example, ksendal (1998) for more details):

If *X*(*t*) and *Y*(*t*) are predictable processes and

$$[I[X] \lrcorner I[X]](t) = \int\_0^T X^2(t)dt \,. \tag{2.4.12}$$

We see that Ito integral has a positive quadratic variation making it a process with infinite variation i.e., it is a nondifferentiable continuous function of *t*.

7. Quadratic covariation of Ito integral with respect to processes 1 *X t*( ) and <sup>2</sup> *X t*( ) is given by

$$[I[X\_1], I[X\_2]](t) = \int\_0^T X\_1(t)X\_2(t)dt\,. \tag{2.4.13}$$

Armed with these properties we can proceed to discuss the machinery of stochastic calculus such as stochastic chain rule, which is also known as Ito formula.

### **2.5 Stochastic Chain Rule (Ito Formula)**

As we have seen previously, the quadratic variations of Brownian motion, [*B***(***t***, ),** B(*t,*)](*t*)*,* is the limit in probability over the interval [0*,t* ]:

$$[B(t,\alpha),B(t,\alpha)](t) = \lim\_{\delta\_n \to 0} \sum\_{i=0}^{n-1} (B(t\_{i+1}^n) - B(t\_i^n))^2 \tag{2.5.1}$$

max( ) 0 <sup>1</sup> *n n n i i t t* . Using the differential notation, <sup>1</sup> ( ) () *<sup>n</sup> <sup>n</sup> B Bt Bt <sup>i</sup> <sup>i</sup>* , and summation as an integral ,

$$[B(t,\alpha),B(t,\alpha)](t) = \int\_0^t (dB(s))^2 \,. \tag{2.5.2}$$

We have shown that [*B,B*] (*t*) *= t*, and therefore, <sup>2</sup> 0 ( ( )) *t dB s t* .

For our convenience and also to make the notation similar to the one in standard differential calculus, we denote

$$\int\_{0}^{t} \left( dB(\mathbf{s}) \right)^{2} = t \tag{2.5.3}$$

$$\text{a.s.}\qquad \text{(dB(t))}^2 = dt \,. \tag{2.5.4}$$

This equation does not have a meaning outside the integral equation (2.5.3) and should not be interpreted in any other way.

Similarly for any other continuous function *g* (*t* )*,* 

$$\text{g(t)} (dB(t))^2 = \text{g(B(t))}dt \,\,\,\,\,\,\tag{2.5.5}$$

Therefore, we can summarize the following rules in differential notation as follows,

We can write the change in *g* as *t* changes, as follows:

Since 

we get

*x =* (*dx/dt*)

Then . *dg dg dY dt dY dt*

By integrating

In differential notation,

*t*, when

calculus we therefore have different rules.

For example, if *Y=f* (*t* ) then *g*(*Y* ) is a function of *Y*.

function. Then by using stochastic Taylor series expansion,

<sup>0</sup> <sup>0</sup>

From this, an expression for *dg/dt* is obtained by taking the limit

*t* 

In order to come to grips with the interpretation of the differential properties of *dBt* , it is useful to consider the chain rule of differentiation. This will also lead us to formulas that are often more useful in calculating Ito integrals than the basic definition as the limit of a sum. Consider first the case in ordinary calculus of a function *g*(*x,t*), where *x* is also a function of *t*.

Stochastic Differential Equations and Related Inverse Problems 37

(, ) (, ) (, )( ) ... *gtx gtx gtx gtx <sup>x</sup> t x x* 

away together with all higher derivatives, and the well-known chain rule formula for the total derivative (*dg/dt*) is obtained. However, if , instead of *x* , we have a Wiener process *Bt* ,

> *gtB gtB gtB <sup>g</sup> <sup>t</sup> <sup>B</sup> B B t x x*

If the expectation value of this expression over all realizations is taken, the above shows that the second derivative term is now only of order t and cannot be ignored. Since this holds

is taken without considering the expectation value. Unlike the case of ordinary calculus where all expressions containing products of differentials higher than 1 is neglected, in Ito

. *dg dg df dY* .

<sup>0</sup> ( ( )) (0) ( ( )) *<sup>t</sup> g f t g g f t df* .

Suppose say *f*(*t*) *=B*(*t*) (Brownian motion) and *g*(x) is twice continuously differentiable

<sup>1</sup> ( ( )) (0) ( ( )) ( ) ( ( )) <sup>2</sup> *t t*

*g B t g g B s dB s g B s ds* . (2.5.13)

for the expectation value, for consistency we also cannot neglect the term if the limit

Recall that in standard calculus chain rule is applied to composite functions.

*dt dt* . 0 ; . 0 *dt dB* ; . 0, *dB dt* and . *dB dB dt* . (2.5.12)

*t* 

0 of the ratio (

*g/t*).

function

*t*)2 and falls

*t* 0

2 1 2 2 2

0 the 2nd derivative term shown is of order (

2 1 2 2 (, ) (, ) (, )( ) ... *<sup>t</sup> <sup>t</sup> <sup>t</sup>*

*t t t*

which means,

$$\text{g(t)} (dB(t))^2 = \text{g(B(t))} dt \,. \tag{2.5.6}$$

 This equation is an expression of the approximation, converging in probability, of

$$\lim\_{\delta\_s \to 0} \sum\_{i=0}^{n-1} \mathcal{g}(t\_i^{\boldsymbol{\mu}}) (B(t\_{i+1}^{\boldsymbol{\mu}}) - B(t\_i^{\boldsymbol{\mu}}))^2 = \int\_0^t \mathcal{g}(B(s)) ds \,. \tag{2.5.7}$$

As the quadratic variation of a continuous and differentiable function is zero,

$$\begin{bmatrix} t, t \end{bmatrix} \begin{pmatrix} t \end{pmatrix} = 0. \tag{2.5.8}$$

This equation in integral notation,

$$\int\_0^t \left(dt\right)^2 = 0$$

and in differential notation,

$$\left(dt\right)^{2} = 0.\tag{2.5.9}$$

Similarly, quadratic covariation of *t* (a continuous and differentiable function) and Brownian notion,

$$[t,B](t) = 0 \,. \tag{2.5.10}$$

This relationship can be proved by expressing quadratic covariation as

$$\begin{aligned} [t,B](t) &= \lim\_{\delta\_n \to 0} \sum\_{i=0}^{n-1} (t\_{i+1}^n - t\_i^n)(B(t\_{i+1}^n) - B(t\_i^n)) \\\\ \delta\_n &= \max \{ t\_{i+1}^n - t\_i^n \} \\\\ [t,B](t) &\le \left. \delta\_n \sum\_{i=0}^{n-1} (B(t\_{i+1}^n) - B(t\_i^n)) \right| \\ &\le \left. \delta\_n B(t) \right| \end{aligned}$$

Therefore as *n* , 0 *<sup>n</sup>* (because *t* is a function of finite variation),

$$[t,B](t) \to 0 \quad \text{as} \quad n \to \infty \dots$$

Hence, [ , ]( ) 0 *tB t* and in integral notation, 0 <sup>0</sup> *<sup>t</sup> dt dB* .

This can be written in differential notation,

$$dt.dB = 0 \,. \,. \,\tag{2.5.11}$$

<sup>2</sup> *g t dB t g B t dt* ( )( ( )) ( ( )) , (2.5.5)

<sup>2</sup> *g t dB t g B t dt* ( )( ( )) ( ( )) . (2.5.6)

[*t,t*] (*t*) *=* 0*.* (2.5.8)

<sup>2</sup> () 0 *dt* = . (2.5.9)

[ , ]( ) 0 *tB t* . (2.5.10)

*dt dB*. 0 . (2.5.11)

. (2.5.7)

36 in Porous Media - An Approach Based on Stochastic Calculus

This equation is an expression of the approximation, converging in probability, of

<sup>1</sup> <sup>0</sup> <sup>0</sup> <sup>0</sup> lim ( )( ( ) ( )) ( ( ))

0

1

*n*

*i*

0

As the quadratic variation of a continuous and differentiable function is zero,

*<sup>n</sup> <sup>t</sup> n n <sup>n</sup> i i i*

2

*g t B t B t g B s ds*

2

() 0 *t dt* ,

Similarly, quadratic covariation of *t* (a continuous and differentiable function) and Brownian

[ , ]( ) lim ( )( ( ) ( ))

*n i i*

0 [ , ]( ) ( ( ) ( ))

*B t*

*t B t Bt Bt*

 

*i n*

( )

(because *t* is a function of finite variation),

[ , ]( ) 0 *tB t* as *n* .

 *t t* , 1

 

> max( ) <sup>1</sup> *n n*

*t B t t t Bt Bt*

1 1

1

*dt dB* .

.

*<sup>n</sup> <sup>n</sup> <sup>n</sup> n i i*

*nn n n ii i i*

Similarly for any other continuous function *g* (*t* )*,* 

which means,

*i*

*n*

This equation in integral notation,

and in differential notation,

Therefore as *n* , 0 *<sup>n</sup>*

This can be written in differential notation,

Hence, [ , ]( ) 0 *tB t* and in integral notation, 0 <sup>0</sup> *<sup>t</sup>*

notion,

1

This relationship can be proved by expressing quadratic covariation as

*n*

 

Therefore, we can summarize the following rules in differential notation as follows,

$$
\text{Let } \text{d}t = 0: \quad \text{d}t. \text{dB} = 0: \quad \text{dB}. \text{d}t = 0, \quad \text{and} \quad \text{dB}. \text{dB} = \text{d}t \text{ .} \tag{2.5.12}
$$

In order to come to grips with the interpretation of the differential properties of *dBt* , it is useful to consider the chain rule of differentiation. This will also lead us to formulas that are often more useful in calculating Ito integrals than the basic definition as the limit of a sum. Consider first the case in ordinary calculus of a function *g*(*x,t*), where *x* is also a function of *t*. We can write the change in *g* as *t* changes, as follows: the

$$\Delta \mathbf{g} = \frac{\partial \mathbf{g}(t, \mathbf{x})}{\partial t} \Delta \mathbf{t} + \frac{\partial \mathbf{g}(t, \mathbf{x})}{\partial \mathbf{x}} \Delta \mathbf{x} + \frac{1}{2} \frac{\partial^2 \mathbf{g}(t, \mathbf{x})}{\partial \mathbf{x}^2} (\Delta \mathbf{x})^2 + \dots$$

From this, an expression for *dg/dt* is obtained by taking the limit *t*  0 of the ratio (*g/t*).

Since *x =* (*dx/dt*) *t*, when *t*  0 the 2nd derivative term shown is of order (*t*)2 and falls away together with all higher derivatives, and the well-known chain rule formula for the total derivative (*dg/dt*) is obtained. However, if , instead of *x* , we have a Wiener process *Bt* , we get

$$
\Delta \mathbf{g} = \frac{\partial \mathbf{g}(t, B\_t)}{\partial t} \Delta t + \frac{\partial \mathbf{g}(t, B\_t)}{\partial \mathbf{x}} \Delta B\_t + \frac{1}{2} \frac{\partial^2 \mathbf{g}(t, B\_t)}{\partial \mathbf{x}^2} (\Delta B\_t \cdot \Delta B\_t) + \dots
$$

If the expectation value of this expression over all realizations is taken, the above shows that the second derivative term is now only of order t and cannot be ignored. Since this holds for the expectation value, for consistency we also cannot neglect the term if the limit *t*  0 is taken without considering the expectation value. Unlike the case of ordinary calculus where all expressions containing products of differentials higher than 1 is neglected, in Ito calculus we therefore have different rules. 

Recall that in standard calculus chain rule is applied to composite functions.

For example, if *Y=f* (*t* ) then *g*(*Y* ) is a function of *Y*.

$$\text{Then } \frac{d\mathbf{g}}{dt} = \frac{d\mathbf{g}}{dY}. \frac{dY}{dt}$$

In differential notation,

$$d\mathbf{g} = \frac{d\mathbf{g}}{dY} \, df \, \, .$$

By integrating

$$\mathcal{g}(f(t)) = \mathcal{g}(0) + \int\_0^t \mathcal{g}'(f(t))df\ .$$

Suppose say *f*(*t*) *=B*(*t*) (Brownian motion) and *g*(x) is twice continuously differentiable function. Then by using stochastic Taylor series expansion,

$$\mathbf{g}(B(t)) = \mathbf{g}(0) + \int\_0^t \mathbf{g}'(B(s)) dB(s) + \frac{1}{2} \int\_0^t \mathbf{g}''(B(s)) ds \,. \tag{2.5.13}$$

the drifting of the process. *Y*(*t*) is called an Ito process and in differential notation we can

Stochastic Differential Equations and Related Inverse Problems 39

Equation (2.5.20) can be quite useful in practical applications where the main driving force is perturbed by an irregular noise. A particle moving through a porous medium is such an example. In this case, advection gives rise to the drift term and hydrodynamic dispersion and microdiffusion give rise to the "diffusive" term. In the population dynamics, the diffusive term is a direct result of noise in the proportionality constant. Therefore it is

This is an Ito process with the drift coefficient of 1 and the diffusion coefficient of 2*B*(*t*).

<sup>0</sup> <sup>0</sup> ( ) (0) ( ) ( ) ( ). *<sup>t</sup> <sup>t</sup> Y t Y s ds s dB s* 

> <sup>0</sup> [ , ]( ) ( ) *<sup>t</sup> Y Y t s ds*

variation and using quadratic variation of Ito integral. In differential notation,

2

of a given Ito process. Suppose an Ito process is given by a general form,

() .

*t dt*

( ( )) ( ). ( ),

*dY t dY t dY dt*

2

The chain rule given in equation (2.5.12) gives us a way to compute the behaviour of a function of Brownian motion. It is also useful to know the chain rule to compute a function

> *dX t dt dB t* ( )

differentiable continuous function. Let *Y* (*t*) = *g*(*t* , *X* (*t*)). Here *Y* (*t*) is a function of *t* and Ito process *X* (*t*), and is also a stochastic process. *Y* (*t*) can also be expressed as an Ito process.

> <sup>1</sup> ( ) ( , ( )) ( , ( )) ( ) ( , ( )).( ( )) . <sup>2</sup> *dg <sup>g</sup> <sup>g</sup> dY t t X t dt t X t dX t t X t dX t dt x x*

2

2 2 2 2

*t dt dtdB dB*

is the drift coefficient and is the diffusion coefficient and let *g*(*t*, *x*) is a twice

2

2

(2.5.26)

where, <sup>2</sup> ( ( )) ( ( )). ( ( )) *dX t d X t d X t* , (2.5.27)

( )( ) 2 ( ),

 

 

 

. (2.5.20)

( )*t* the diffusion coefficient and they can depend on *Y*(*t*)

<sup>2</sup> *d B t dt B t dB t* ( ()) 2 () () . (2.5.21)

(2.5.22)

. (2.5.23)

*s ds* is a continuous function with finite

( ) , (2.5.25)

2

(2.5.24)

( )*t* is

 *dY t t dt t dB t* () () () () 

important to study Ito process further in order to apply it in modeling situations.

and/or *B*(*t*). For example, we can write in pervious result (equation (2.5.17)),

write,

is given by

where

Then Ito formula states,

called the drift coefficient and

Quadratic variation of Ito process on [0, ] *T*

This can be deduced from the fact that 0 ( ) *<sup>t</sup>*

Comparing equation (2.5.13) and the corresponding stochastic chain rule, we can see that the second derivative term of the Taylor series plays a significant role in changing the chain rule in the standard calculus to the stochastic one.

For example, let ( ) *<sup>x</sup> gx e*

$$\text{Therefore,}\quad e^{\mathbb{B}(t)} = e^{(0)} + \int\_0^t e^{\mathbb{B}(s)} dB(s) + \frac{1}{2} \int\_0^t e^{\mathbb{B}(s)} ds\text{ .}\tag{2.5.14}$$

In differential notation (which is only a convention),

$$d\left(e^{\mathbb{B}(t)}\right) = e^{\mathbb{B}(t)}dB(t) + \frac{1}{2}e^{\mathbb{B}(t)}dt\,. \tag{2.5.15}$$

As an another example, let <sup>2</sup> *gx x* ( ) .

Therefore, from the chain rule

$$(B(t))^2 = (B(0))^2 + 2\int\_s^t B(s)dB(s) + \frac{1}{2}\int\_0^t 2ds$$

$$\int\_0^t B(s)dB(s) = \frac{1}{2}(B(t))^2 - \frac{1}{2}t \,. \tag{2.5.16}$$

This is quite a different result from the standard integration. In differential convention,

$$
\partial B(t) \, dB(t) = \frac{1}{2} d(\left(B(t)\right)^2) - \frac{1}{2} dt \,. \tag{2.5.17}
$$

In other words, the stochastic process 0 () () *<sup>t</sup> B s dB s* can be calculated by evaluating 1 1 <sup>2</sup> { ( ( )) } 2 2 *Bt t* . We will show how this process behaves using computer simulations in section 2.6.

We can write Ito integral as

$$Y(t) = \int\_0^t \sigma(\mathbf{s}) dB(\mathbf{s}) \,. \tag{2.5.18}$$

Then we can add a "drift term" to the "diffusion term" given by equation (2.5.18):

$$Y(t) = Y(0) + \int\_0^t \mu(s)ds + \int\_0^t \sigma(s)dB(s)\,. \tag{2.5.19}$$

We recall that ( )*s* should be a predictable process and is subjected to the condition 2 <sup>0</sup> ( ) *<sup>T</sup> t dt* converging almost surely. ( )*t* is, on the other hand, an adapted continuous process of finite variation. In equation (2.5.19) 0 () () *<sup>t</sup> s dB s* represents the diffusion part of the process and 0 ( ) *<sup>t</sup> s ds* does not contain the noise; therefore it represents the drifting of the process. *Y*(*t*) is called an Ito process and in differential notation we can write,

$$dY(t) = \mu(t)dt + \sigma(t)dB(t) \,. \tag{2.5.20}$$

Equation (2.5.20) can be quite useful in practical applications where the main driving force is perturbed by an irregular noise. A particle moving through a porous medium is such an example. In this case, advection gives rise to the drift term and hydrodynamic dispersion and microdiffusion give rise to the "diffusive" term. In the population dynamics, the diffusive term is a direct result of noise in the proportionality constant. Therefore it is important to study Ito process further in order to apply it in modeling situations. ( )*t* is called the drift coefficient and ( )*t* the diffusion coefficient and they can depend on *Y*(*t*) and/or *B*(*t*). For example, we can write in pervious result (equation (2.5.17)),

$$d(B(t)^2) = dt + \mathcal{D}B(t)dB(t) \,. \tag{2.5.21}$$

This is an Ito process with the drift coefficient of 1 and the diffusion coefficient of 2*B*(*t*). Quadratic variation of Ito process on [0, ] *T*

$$Y(t) = Y(0) + \int\_0^t \mu(\mathbf{s})d\mathbf{s} + \int\_0^t \sigma(\mathbf{s})dB(\mathbf{s}).\tag{2.5.22}$$

is given by

Computational Modelling of Multi-Scale Non-Fickian Dispersion

changing

*<sup>t</sup> <sup>t</sup> B t B s B s e e e dB s e ds* . (2.5.14)

*Bt Bt B t d e e dB t e dt* . (2.5.15)

0

*B s dB s B t t* . (2.5.16)

*B t dB t d B t dt* (2.5.17)

*B s dB s* can be calculated by evaluating

. (2.5.18)

. (2.5.19)

( )*t* is, on the other hand, an adapted

*s dB s* represents the

therefore

*s ds* does not contain the noise; therefore it represents

38 in Porous Media - An Approach Based on Stochastic Calculus

Comparing equation (2.5.13) and the corresponding stochastic chain rule, we can see that the second derivative term of the Taylor series plays a significant role in changing the chain

> 0 0 <sup>1</sup> ( ) <sup>2</sup>

*t t*

2

Therefore, ( ) (0) ( ) ( )

() () <sup>1</sup> ( ) ( ) () <sup>2</sup>

<sup>1</sup> ( ( )) ( (0)) 2 ( ) ( ) 2 <sup>2</sup>

This is quite a different result from the standard integration. In differential convention,

*<sup>s</sup> B t B B s dB s ds* ,

1 1 ( ) ( ) ( ( )) 2 2

<sup>1</sup> <sup>2</sup> <sup>1</sup> ( ) ( ) (( ( )) ) . <sup>2</sup> <sup>2</sup>

*Bt t* . We will show how this process behaves using computer simulations in

<sup>0</sup> () () () *<sup>t</sup> Y t s dB s* 

<sup>0</sup> <sup>0</sup> ( ) (0) ( ) ( ) ( ) *<sup>t</sup> <sup>t</sup> Y t Y s ds s dB s* 

 

( )*s* should be a predictable process and is subjected to the condition

Then we can add a "drift term" to the "diffusion term" given by equation (2.5.18):

continuous process of finite variation. In equation (2.5.19) 0 () () *<sup>t</sup>*

2 2

0

*t*

rule in the standard calculus to the stochastic one.

In differential notation (which is only a convention),

In other words, the stochastic process 0 () () *<sup>t</sup>*

As an another example, let <sup>2</sup> *gx x* ( ) .

Therefore, from the chain rule

1 1 <sup>2</sup> { ( ( )) } 2 2

We recall that

2 <sup>0</sup> ( ) *<sup>T</sup>* 

We can write Ito integral as

*t dt* converging almost surely.

diffusion part of the process and 0 ( ) *<sup>t</sup>*

section 2.6.

For example, let ( ) *<sup>x</sup> gx e*

$$[Y, Y](t) = \int\_0^t \sigma^2(s)ds \quad . \tag{2.5.23}$$

This can be deduced from the fact that 0 ( ) *<sup>t</sup> s ds* is a continuous function with finite variation and using quadratic variation of Ito integral. In differential notation,

$$\begin{aligned} \left(dY(t)\right)^{2} &= dY(t).dY(dt),\\ &= \mu^{2}\left(t\right)\left(dt\right)^{2} + 2\mu\left\{\sigma dtdB + \sigma^{2}\left(dB\right)^{2}\right\},\\ &= \sigma^{2}\left(t\right)dt. \end{aligned} \tag{2.5.24}$$

The chain rule given in equation (2.5.12) gives us a way to compute the behaviour of a function of Brownian motion. It is also useful to know the chain rule to compute a function of a given Ito process. Suppose an Ito process is given by a general form,

$$dX(t) = \mu dt + \sigma dB(t) \,. \tag{2.5.25}$$

where is the drift coefficient and is the diffusion coefficient and let *g*(*t*, *x*) is a twice differentiable continuous function. Let *Y* (*t*) = *g*(*t* , *X* (*t*)). Here *Y* (*t*) is a function of *t* and Ito process *X* (*t*), and is also a stochastic process. *Y* (*t*) can also be expressed as an Ito process. Then Ito formula states,

$$d\,dY(t) = \frac{d\mathbf{g}}{dt}(t, X(t))dt + \frac{\partial \mathbf{g}}{\partial \mathbf{x}}(t, X(t))dX(t) + \frac{1}{2}\frac{\partial^2 \mathbf{g}}{\partial \mathbf{x}^2}(t, X(t)). (dX(t))^2. \tag{2.5.26}$$

$$\text{where}, \ (dX(t))^2 = d(X(t)).d(X(t))\,, \tag{2.5.27}$$

From the Ito formula (equation (2.5.26)),

differential equations in standard calculus.

12 1 2

*d X X dX t dX t* 

[ , ] ( ). ( ),

 

12 1 2

*X tX t X X*

( ) ( ) (0) (0)

Quadratic covariation is given by

The stochastic product rule is given by,

And <sup>2</sup> ( ) . () 0 *dt dt dB t* .

2

*X t X dB t* , (2.5.37)

, (2.5.38)

*dX t X t dt X t dB t* .

, (2.5.39)

. (2.5.40)

*dX t X t dt X t dB t* . In other words

(2.5.36)

2

1 11 (ln ( )) ( ) ( ( ) ), () 2 ()

*d X t dX t X t dt X t X t*

Stochastic Differential Equations and Related Inverse Problems 41

<sup>1</sup> <sup>1</sup> () , <sup>2</sup> <sup>2</sup>

<sup>0</sup> ln ( ) ln (0) ( ) *<sup>t</sup>*

*dt dB t dt*

By convention, the above stochastic differential is given by the following integral equation:

( ) ln ( ) (0) *X t B t*

( ) ( ) (0) *B t Xt X e* .

*X* 

( ) ( ) (0) *B t Xt X e* is a "solution" to the stochastic differential <sup>1</sup> () () () () <sup>2</sup>

 1 1 <sup>1</sup> *dX t t dt t dB t* () () () () 

()

Suppose 1 *X t*( ) and 2 *X t*( ) are Ito processes given by the following differentials:

This idea of having a solution to a stochastic differential is similar to having a solution to

2 2 <sup>2</sup> *dX t t dt t dB t* () () () () 

1 2 1 2 2 1 1 2

1 2 [ , ] ( ) ( )( ( )) () ()

12 1 2

*d X X t t dB t*

1 2 2 1 1 2 <sup>0</sup> <sup>0</sup>

( ) ( ) ( ) ( ) [ , ]( ) *<sup>t</sup> <sup>t</sup>*

*X s dX s X s dX s X X t*

 

  2 2

 

. (2.5.41)

2

. (2.5.42)

( ) . ( ) . ( ) ( ( )) .

*dt dt dB t dt dB t dB t*

*t t dt*

 

( ).

*dB t*

We can show that ( ) ( ) (0) *B t Xt X e* satisfies <sup>1</sup> () () () () <sup>2</sup>

*X t*

1 1 <sup>1</sup> () () () , () 2 <sup>2</sup>

*X t dt X t dB t dt*

and is evaluated according to the rules given by equation (2.5.12).

As an example, consider the Ito process

$$dX(t) = dt + \mathcal{D}B(t)dB(t)\tag{2.5.28}$$

where 1 and 2 () *B t* .

Assume <sup>2</sup> *gtx x* (, ) , therefore,

$$\frac{\partial \mathbf{g}}{\partial t} = \mathbf{0} \; ; \; \frac{\partial \mathbf{g}}{\partial \mathbf{x}} = \mathbf{2} \mathbf{x} \; ; \; \frac{\partial^2 \mathbf{g}}{\partial^2 \mathbf{x}} = \mathbf{2} \; . \tag{2.5.29}$$

Substituting to Ito formula above ,

$$\begin{split} dg &= 2X(t)dX(t) + dX(t) \, dX(t), \\ &= 2(dt + 2B(t)dB(t))B(t) + 4B^2(t)dt. \end{split} \tag{2.5.30}$$

$$dX^2(t) = (2X(t) + 4B^2(t))dt + 4X(t)B(t)dB(t). \tag{2.5.31}$$

As seen above <sup>2</sup> *dX t*( ) is also an Ito process with <sup>2</sup> *u Xt B t* 2 () 4 () (drift coefficient), a function of *X*(*t*) and *B* (*t*), and *v* = 4*X*(*t*)*B*(*t*) (diffusion coefficient) , also a function of *X*(*t*) and *B*(*t*).

Substituting <sup>2</sup> *Xt B t* () () to equation (2.5.31),

$$\begin{split}d(B^4(t)) &= 2(B^2(t) + 2B^2(t)dt + 4(B^2(t))B(t)dB(t) \\ &= 6B^2(t)dt + 4B^3(t)dB(t) \end{split} \tag{2.5.32}$$

We can derive this from chain rule for a function of B(*t*) as well.

Let <sup>4</sup> *gx x* ( ) , and from Ito formula:

$$\begin{split} dg &= g'(B(t))dB(t) + \frac{1}{2}g''(t)dt \\ &= 4B^3(t)dB(t) + \frac{1}{2}4.3. B^2(t)dt, \end{split} \tag{2.5.33}$$

$$d(B^4(t)) = 6B^2(t)dt + 4B^3(t)dB(t)\,\,. \tag{2.5.34}$$

This is the same Ito process as in equation (2.5.32). Let us consider another example which will be useful. Consider the function *gx x* ( ) ln and the Ito process

$$dX(t) = \frac{1}{2}X(t) + X(t)dB(t)\,. \tag{2.5.35}$$

For this Ito process <sup>1</sup> ( ) <sup>2</sup> *X t* and *X t*( ) . From the Ito formula (equation (2.5.26)),

Computational Modelling of Multi-Scale Non-Fickian Dispersion

*dX t dt B t dB t* () 2 () () (2.5.28)

2

<sup>2</sup> <sup>2</sup> *dX t X t B t dt X t B t dB t* ( ) (2 ( ) 4 ( )) 4 ( ) ( ) ( ). (2.5.31)

(2.5.30)

<sup>4</sup> <sup>2</sup> <sup>3</sup> *d B t B t dt B t dB t* ( ( )) 6 ( ) 4 ( ) ( ) . (2.5.34)

*dX t X t X t dB t* . (2.5.35)

. (2.5.29)

. (2.5.32)

(2.5.33)

2 <sup>2</sup> 2; 2 *g g <sup>x</sup> x x*

2 ( ) ( ) ( ). ( ), 2( 2 ( ) ( )) ( ) 4 ( ) .

*dt B t dB t B t B t dt*

*dg X t dX t dX t dX t*

As seen above <sup>2</sup> *dX t*( ) is also an Ito process with <sup>2</sup> *u Xt B t* 2 () 4 () (drift coefficient), a function of *X*(*t*) and *B* (*t*), and *v* = 4*X*(*t*)*B*(*t*) (diffusion coefficient) , also a function of *X*(*t*)

> ( ( )) 2( ( ) 2 ( ) 4( ( )) ( ) ( ) 6 () 4 () () *d B t B t B t dt B t B t dB t B t dt B t dB t*

> > 3 2

<sup>1</sup> 4 ( ) ( ) 4.3. ( ) , <sup>2</sup>

*B t dB t B t dt*

<sup>1</sup> ( ( )) ( ) ( ) <sup>2</sup>

*dg g B t dB t g t dt*

This is the same Ito process as in equation (2.5.32). Let us consider another example which

<sup>1</sup> () () () () <sup>2</sup>

4 2 2 2 2 3

We can derive this from chain rule for a function of B(*t*) as well.

*X t* and

will be useful. Consider the function *gx x* ( ) ln and the Ito process

*X t*( ) .

40 in Porous Media - An Approach Based on Stochastic Calculus

and is evaluated according to the rules given by equation (2.5.12).

<sup>0</sup> *<sup>g</sup> t* ;

As an example, consider the Ito process

2 () *B t* .

Assume <sup>2</sup> *gtx x* (, ) , therefore,

Substituting to Ito formula above ,

Substituting <sup>2</sup> *Xt B t* () () to equation (2.5.31),

Let <sup>4</sup> *gx x* ( ) , and from Ito formula:

For this Ito process <sup>1</sup> ( ) <sup>2</sup>

where

and *B*(*t*).

1 and

$$\begin{split}d(\ln X(t)) &= \frac{1}{X(t)}dX(t) + \frac{1}{2} \Big(-\frac{1}{X^2(t)}\Big) \langle X^2(t)dt\rangle, \\ &= \frac{1}{X(t)} \Big(\frac{1}{2}X(t)dt + X(t)dB(t)\Big) - \frac{1}{2}dt, \\ &= \frac{1}{2}dt + dB(t) - \frac{1}{2}dt, \\ &= dB(t). \end{split} \tag{2.5.36}$$

By convention, the above stochastic differential is given by the following integral equation:

$$
\ln X(t) = \ln X(0) + \int\_0^t dB(t) \,\prime \,\tag{2.5.37}
$$

$$\ln\left\lfloor\frac{X(t)}{X(0)}\right\rfloor = B(t) \quad \text{(2.5.38)}$$

$$X(t) = X(0) \, e^{B(t)}\,\,.$$

We can show that ( ) ( ) (0) *B t Xt X e* satisfies <sup>1</sup> () () () () <sup>2</sup> *dX t X t dt X t dB t* . In other words ( ) ( ) (0) *B t Xt X e* is a "solution" to the stochastic differential <sup>1</sup> () () () () <sup>2</sup> *dX t X t dt X t dB t* .

This idea of having a solution to a stochastic differential is similar to having a solution to differential equations in standard calculus.

Suppose 1 *X t*( ) and 2 *X t*( ) are Ito processes given by the following differentials:

$$dX\_1(t) = \mu\_1(t)dt + \sigma\_1(t)dB(t) \quad , \tag{2.5.39}$$

$$dX\_2(t) = \mu\_2(t)dt + \sigma\_2(t)dB(t) \quad . \tag{2.5.40}$$

Quadratic covariation is given by

$$\begin{aligned} d[X\_1, X\_2] &= dX\_1(t).dX\_2(t), \\ &= \mu\_1 \mu\_2(dt)^2 + \mu\_1 \,\sigma\_2 dt.d\mathcal{B}(t) + \mu\_2 \,\sigma\_1 dt.d\mathcal{B}(t) + \sigma\_1 \sigma\_2 \,(d\mathcal{B}(t))^2. \end{aligned}$$

And <sup>2</sup> ( ) . () 0 *dt dt dB t* .

$$\begin{split}d[X\_{1'}X\_2] &= \sigma\_1(t)\sigma\_2(t)(dB(t))^2\\ &= \sigma\_1(t)\sigma\_2(t)dt\end{split} \tag{2.5.41}$$

The stochastic product rule is given by,

$$\begin{aligned} \mathbf{X}\_1(t)\mathbf{X}\_2(t) - \mathbf{X}\_1(0)\mathbf{X}\_2(0) \\ = \int\_0^t \mathbf{X}\_1(s)dX\_2(s) + \int\_0^t \mathbf{X}\_2(s)dX\_1(s) + [\mathbf{X}\_1, \mathbf{X}\_2](t) \end{aligned} \tag{2.5.42}$$

This is again an Ito process.

As an integral equation,

following example.

We can consider

standard form, *dX t dt dB t* ( )

Therefore, ( , ) 2 *<sup>y</sup> g t y t e* , where

 ( ) .

These equations give, 1 *dX dt* and 2 *dX dB t* ( ) , where <sup>1</sup>

( ) and 2 22 *dX t dt dB t* ( )

2

( ) ( ) ( , ( )) 2 *B t X t g t Bt t e* . (2.5.49)

0 ; and 2

1 .

 1 ; <sup>1</sup> 0 ; <sup>2</sup> 

1 2 1 2

,

, and  ( ) .

( )

(2.5.47)

(2.5.48)

Stochastic Differential Equations and Related Inverse Problems 43

<sup>1</sup> ( ) ( ) (0) (0) ( )( ( ) ( )) ( )( ( ) ) ( ). <sup>2</sup> *t t X t X t X X X t X t t B t dt X t X t t dB t*

If 1 2 *gx x* (,) is a continuous and twice differentiable function of 1 *x* and 2 *x* , and we are

1 2 1 2 2 2 1 2

12 1 2 <sup>2</sup> <sup>1</sup> 2 1 <sup>0</sup> <sup>0</sup>

 

Then 1 2 *gX t X t* ( ( ), ( )) is also an Ito process and given by the following differential form:

*g X g X g X dg X t X t dX t dX t dX t*

1 2 1 2 2 1

 

() () <sup>1</sup> ( ) ( ( ), ( )) ( ) ( ) ( ( )) <sup>2</sup>

1 2 1

*x x x*

2 2 <sup>1</sup> <sup>1</sup> 1 1 ( ( )) ( ) . ( ) *dX t dX t dX t dt*

> .

2 2 <sup>2</sup> <sup>2</sup> 2 2 ( ( )) ( ) . ( ) *dX t dX t dX t dt*

<sup>1</sup> 2 12 *dX t dX t dt* ( ). ( )

These can be considered as a generalization of the rules on differentials given by equation (2.5.12). We use this generalized Ito formula for a function of two Ito processes in the

We will express the stochastic process ( ) () 2 *B t Xt t e* as an Ito process having the

<sup>1</sup> *Xt t* ( ) ,

<sup>2</sup> *X t y Bt* () () .

<sup>1</sup> ( ( ) ( )) ( ) ( ) ( ) ( ) ( ) ( ) ) ( ) ( ), <sup>2</sup>

*d X t X t X t X t tX t X t B t dt X t t X t dB t*

2 1 2 1

given Ito processes of the forms, 1 11 *dX t dt dB t* ( )

2 2

Using quadratic variation and covariation of Ito processes,

 

2 2 1 2 2 2 1 2

<sup>1</sup> ( ) ( ) ( ( )) ( ). ( ). <sup>2</sup>

*g X gXX dX t dX t dX t*

2 1 2

*x x x*

<sup>1</sup> ( ) ( ) ( ) ( ) ( ) ( )). <sup>2</sup>

*X t X t t B t dt X t X t t dB t*

If at least one of *X*1 and *X*2 is a continuous function with finite variation, then 1 2 [ , ]( ) 0 *XX t* and equation (2.5.42) reduces to the integration by parts formula in the standard calculus.

Stochastic product rule can be expressed in differential form:

$$\begin{aligned} d(X\_1(t)X\_2(t)) \\ = X\_1(t)dX\_2(t) + X\_2(t)dX\_1(t) + \sigma\_1(t)\sigma\_2(t)dt. \end{aligned} \tag{2.5.43}$$

 As an example, consider *Y*(*t*) = *t B*(*t*),

1 2 *Yt X tX t* () () () ,

where 1 *Xt t* ( ) , a continuous function with finite variation and <sup>1</sup> 0 , and 2 *X t Bt* () () , Brownian motion with infinite variation and <sup>2</sup> 1 .

From the product rule,

$$d(Y(t)) = tdB(t) + B(t)dt + \text{(0)}(1)dt$$

$$d(tB(t)) = tdB(t) + B(t)dt\tag{2.5.44}$$

This is the same result we obtain if we use the standard product rule. The reason for this is that quadratic covariation of a continuous function and a function with infinite variation is zero as we have mentioned previously.

Suppose 1 *dX t tdB t B t dt* () () () , and

$$\begin{aligned} dX\_2(t) &= \frac{1}{2} X\_2(t)dt + X\_2(t)dB(t) \text{ , where } \\\\ \mu\_1(t) &= B(t) \; ; \; \sigma\_1(t) = t \; ; \; \sigma\_2(t) = X\_2(t) \; ; \; \text{and } \mu\_2(t) = \frac{1}{2} X\_2 \; . \end{aligned}$$

From the product rule,

$$\begin{split} d(X\_1(t)X\_2(t)) &= X\_1(t)dX\_2(t) + X\_2(t)dX\_1(t) + \sigma\_1 \sigma\_2 dt, \\ &= X\_1(t)dX\_2(t) + X\_2(t)dX\_1(t) + tX\_2(t)dt. \end{split} \tag{2.5.45}$$

By substitution,

$$\begin{split}d(X\_1(t)X\_2(t)) &= X\_1(t)(\frac{1}{2}X\_2dt + X\_2(t)dB(t)) + X\_2(t)(t dB(t) + B(t)dt) + tX\_2(t)dt, \\ dY\_1 &= \left(\frac{1}{2}X\_1(t)X\_2(t) + X\_2(t)B(t) + tX\_2(t)\right)dt + (X\_1(t)X\_2(t) + tX\_2(t))dB(t). \end{split} \tag{2.5.46}$$

This is again an Ito process.

Computational Modelling of Multi-Scale Non-Fickian Dispersion

*d tB t tdB t B t dt* ( ( )) ( ) ( ) (2.5.44)

(2.5.43)

0 , and 2 *X t Bt* () () ,

(2.5.46)

42 in Porous Media - An Approach Based on Stochastic Calculus

If at least one of *X*1 and *X*2 is a continuous function with finite variation, then 1 2 [ , ]( ) 0 *XX t* and equation (2.5.42) reduces to the integration by parts formula in the

1 2 2 1 12

1 2 *Yt X tX t* () () () ,

1 .

*d Y t tdB t B t dt dt* ( ( )) ( ) ( ) (0)(1) ,

This is the same result we obtain if we use the standard product rule. The reason for this is that quadratic covariation of a continuous function and a function with infinite variation is

() () *t Xt* ; and <sup>2</sup> <sup>2</sup>

1 2 1 2 2 1 12

( ( ) ( )) ( ) ( ) ( ) ( ) ,

*d X t X t X t dX t X t dX t dt*

1 2 1 2 2 2 2

*d X t X t X t X dt X t dB t X t tdB t B t dt tX t dt*

<sup>1</sup> ( ( ) ( )) ( )( ( ) ( )) ( )( ( ) ( ) ) ( ) , <sup>2</sup>

12 2 2 1 2 2

<sup>1</sup> ( ) ( ) ( ) ( ) ( ) ( ( ) ( ) ( )) ( ). <sup>2</sup>

*X t X t X t B t tX t dt X t X t tX t dB t*

<sup>1</sup> ( ) <sup>2</sup> 

12 21 2

() () () () () .

*X t dX t X t dX t tX t dt*

*t X* .

 

(2.5.45)

*X t dX t X t dX t t t dt*

() () () () () () .

 

Stochastic product rule can be expressed in differential form:

1 2

where 1 *Xt t* ( ) , a continuous function with finite variation and <sup>1</sup>

*dX tX t*

( ( ) ( ))

zero as we have mentioned previously. Suppose 1 *dX t tdB t B t dt* () () () , and

> <sup>1</sup> () () () () <sup>2</sup> *dX t X t dt X t dB t* , where

> > ( )*t t* ; <sup>2</sup> <sup>2</sup>

2 <sup>2</sup> <sup>2</sup>

() () *t Bt* ; <sup>1</sup>

From the product rule,

By substitution,

 1 

Brownian motion with infinite variation and <sup>2</sup>

As an example, consider *Y*(*t*) = *t B*(*t*),

standard calculus.

From the product rule,

$$\begin{split}d(X\_1(t)X\_2(t)) &= \left(\frac{1}{2}X\_1(t)X\_2(t) + tX\_2(t) + X\_2(t)B(t)\right)dt + \left(X\_1(t) + t\right)\{X\_2(t)dB(t),\\ &\le X\_2(t)\left(\frac{1}{2}X\_1(t) + t + B(t)\right)dt + X\_2(t)\{X\_1(t) + t\}dB(t).\end{split} \tag{2.5.47}$$

 As an integral equation,

$$X\_1(t)X\_2(t) - X\_1(0)X\_2(0) = \int\_0^t X\_2(t)(\frac{1}{2}X\_1(t) + t + B(t))dt + \int\_0^t X\_2(t)(X\_1(t) + t)dB(t).$$

If 1 2 *gx x* (,) is a continuous and twice differentiable function of 1 *x* and 2 *x* , and we are given Ito processes of the forms, 1 11 *dX t dt dB t* ( ) ( ) and 2 22 *dX t dt dB t* ( ) ( ) .

Then 1 2 *gX t X t* ( ( ), ( )) is also an Ito process and given by the following differential form:

$$\begin{split} \mathrm{d}g(\mathbf{X}\_{1}(t),\mathbf{X}\_{2}(t)) &= \frac{\partial \mathbf{g}(\mathbf{X}\_{1})}{\partial \mathbf{x}\_{1}} d\mathbf{X}\_{1}(t) + \frac{\partial \mathbf{g}(\mathbf{X}\_{2})}{\partial \mathbf{x}\_{2}} d\mathbf{X}\_{2}(t) + \frac{1}{2} \frac{\partial^{2} \mathbf{g}(\mathbf{X}\_{1})}{\partial^{2} \mathbf{x}\_{1}} \mathrm{d}X\_{1}(t) \mathrm{d}\mathbf{X}\_{1}(t) \mathrm{d}\mathbf{X}\_{2}(t) \\ &+ \frac{1}{2} \frac{\partial^{2} \mathbf{g}(\mathbf{X}\_{2})}{\partial^{2} \mathbf{x}\_{2}} \mathrm{d}X\_{2}(t) \mathrm{d}\mathbf{X}\_{2}(t) \mathrm{d}\mathbf{X}\_{1}(t) \mathrm{d}\mathbf{X}\_{2}(t) . \end{split} \tag{2.5.48}$$

Using quadratic variation and covariation of Ito processes,

$$\left(dX\_1(t)\right)^2 = dX\_1(t) \cdot dX\_1(t) = \sigma\_1^2 dt \;/\;/2$$

$$\left(dX\_2(t)\right)^2 = dX\_2(t) \cdot dX\_2(t) = \sigma\_2^2 dt \;/\;/\;/\;/}$$

$$dX\_1(t) \cdot dX\_2(t) = \sigma\_1 \sigma\_2 dt \;.$$

These can be considered as a generalization of the rules on differentials given by equation (2.5.12). We use this generalized Ito formula for a function of two Ito processes in the following example.

We will express the stochastic process ( ) () 2 *B t Xt t e* as an Ito process having the standard form, *dX t dt dB t* ( ) ( ) .

We can consider

$$X(t) = \mathcal{g}(t, B(t)) = \mathcal{2} + t + e^{\mathcal{B}(t)}.\tag{2.5.49}$$

Therefore, ( , ) 2 *<sup>y</sup> g t y t e* , where

$$\begin{aligned} X\_1(t) &= t \\ X\_2(t) &= y = B(t) \end{aligned}$$

These equations give, 1 *dX dt* and 2 *dX dB t* ( ) , where <sup>1</sup> 1 ; <sup>1</sup> 0 ; <sup>2</sup> 0 ; and 2 1 .

We now to discuss a population dynamics example equipped with the knowledge of Ito

Stochastic Differential Equations and Related Inverse Problems 45

() () () *<sup>t</sup>*

*dX t r t dt t dB t X t* ( ) ( ( ) ( ) ( )) . ( ) 

*dX t r t dt t W dB t X t* () () () () () 

*dX t rX t dt X t dB t* () () () () 

(ln ) 1 ( ( )) ( ) ( ( )) <sup>2</sup> *<sup>x</sup> <sup>g</sup> dg x t dX t dX t x x* 

2

*Xt Xt*

() 1 1 (ln( ( )) ( ), () 2 ()

( ) , () 2

*dX t dt*

*X t*

*X t*

,

*<sup>t</sup>* ,

( )*t* 

.

2

,

2

<sup>1</sup> ( ( ) ( ) ( )) . ( ) <sup>2</sup>

( ). <sup>2</sup>

*r dt dB t*

2 (ln( ( )) ( ) , <sup>2</sup>

*d X t rdt dB t dt*

 

*rX t dt X t dB t dt*

2

2

2

2

2

 *t rt tW* ,

If the coefficient (f) is "noisy", we can express if as follows:

*dX t r t X t dt t X t d t* () () () () ()()

*dX t d Xt*

As seen from the above equation (2.5.51), *X*(*t*) is an Ito process.

and Brownian motion increments ( ) *<sup>t</sup> dB t W dt* .

Consider the case with *r*(*t*)= *r* , a constant and

<sup>1</sup> *X t*( ) can be written in the differential form:

( ) ()() *dx t txt dt* 

. (2.5.50)

. (2.5.51)

, a constant then the process

process and formula:

where *Wt* = white noise, then

Therefore ,

Assume *g*(*x*)=*ln x*,

Then using the Ito formula,

Using equation (2.5.48),

$$\begin{aligned} \mathrm{d}\mathrm{g} &= \{\mathrm{1}\} dt + e^{\mathrm{B}(t)} dB(t) + \frac{1}{2} \{\mathrm{0}\} (dB(t))^2 + \frac{1}{2} e^{\mathrm{B}(t)} (dB(t))^2 + \{\mathrm{0}\} dt. \mathrm{dB}(t) \\ &= dt + e^{\mathrm{B}(t)} dB(t) + \frac{1}{2} e^{\mathrm{B}(t)} dt. \end{aligned}$$

Using <sup>2</sup> ( ( )) *dB t dt* ,

$$d\mathbf{g} = d\mathbf{X}(t) = \left(1 + \frac{1}{2}e^{\mathbf{B}(t)}\right)dt + e^{\mathbf{B}(t)}dB(t).$$

From a previous example, () () <sup>1</sup> ( ) ( ) () <sup>2</sup> *Bt Bt B t d e e dB t e dt* .

Therefore <sup>1</sup> () () ( ) () ( ( )) ( ) <sup>2</sup> *Bt Bt B t dX t dt e dt e dB t dt d e* .

From the integral notation,

$$\begin{aligned} X(t) &= X(0) + \int\_0^t dt + \int\_0^t d(e^{\mathbb{B}(t)}) \, \, \, \, \\ X(t) &= (0) + t + e^{\mathbb{B}(t)} - 1 \, \, \, \, \end{aligned}$$

$$X(t) = (X(0) - 1) + t + e^{\mathbb{B}(t)} \, .$$

Comparing with

$$\begin{aligned} X(t) &= 2 + t + e^{B(t)}, \\ X(0) - 1 &= 2, \\ X(0) &= 3. \end{aligned}$$

*X*(t)= constant + *t* + *B t*( ) *e* can be considered as a solution process to the stochastic differential,

$$dX(t) = (1 + \frac{1}{2}e^{\mathbb{B}(t)})dt + e^{\mathbb{B}(t)}dB(t)\text{.}$$

As we can see in the above solution, the solution process contains the characteristics of both the drift and diffusion phenomena. In this case, diffusion phenomenon dominates as *t* increases because of the expected value of the exponential of Brownian motion increases at a faster rate in general. If we examine the drift term of the stochastic differential above, we see that the drift term is also affected by the Brownian motion, so the final solution is always a result of complex interactions between the drift term and the diffusion term.

We now to discuss a population dynamics example equipped with the knowledge of Ito process and formula:

$$\frac{d\mathbf{x}(t)}{dt} = \alpha(t)\mathbf{x}(t)\,. \tag{2.5.50}$$

If the coefficient (f) is "noisy", we can express if as follows:

$$\alpha(t) = r(t) + \sigma(t)W\_{t'} \; .$$

where *Wt* = white noise, then

$$dX(t) = \left(r(t)dt + \sigma(t)dB(t)\right) \cdot X(t) \; ,$$

and Brownian motion increments ( ) *<sup>t</sup> dB t W dt* .

Therefore ,

Computational Modelling of Multi-Scale Non-Fickian Dispersion

44 in Porous Media - An Approach Based on Stochastic Calculus

( ) 2 ( ) 2

*dg dt e dB t dB t e dB t dt dB t*

*B t B t*

*Bt Bt B t d e e dB t e dt* .

<sup>1</sup> <sup>1</sup> (1) ( ) (0)( ( )) ( ( )) (0) . ( ) <sup>2</sup> <sup>2</sup>

<sup>1</sup> ( ) ( ) () 1 ( ). <sup>2</sup> *B t B t dg dX t e dt e dB t* 

0 0 ( ) (0) ( ) *t t B t X t X dt d e* ,

( ) ( ) (0) 1 *B t Xt t e* ,

( ) ( ) ( (0) 1) *B t Xt X t e* .

( ) () 2 *B t Xt t e* ,

*X*(t)= constant + *t* + *B t*( ) *e* can be considered as a solution process to the stochastic

<sup>1</sup> ( ) ( ) ( ) (1 ) ( ) <sup>2</sup> *B t B t dX t e dt e dB t* .

As we can see in the above solution, the solution process contains the characteristics of both the drift and diffusion phenomena. In this case, diffusion phenomenon dominates as *t* increases because of the expected value of the exponential of Brownian motion increases at a faster rate in general. If we examine the drift term of the stochastic differential above, we see that the drift term is also affected by the Brownian motion, so the final solution is always a

(0) 1 2, (0) 3.

 

*X X*

result of complex interactions between the drift term and the diffusion term.

( )

( ) ( )

*B t B t*

*dt e dB t e dt*

From a previous example, () () <sup>1</sup> ( ) ( ) () <sup>2</sup>

Therefore <sup>1</sup> () () ( ) () ( ( )) ( ) <sup>2</sup>

*Bt Bt B t dX t dt e dt e dB t dt d e* .

<sup>1</sup> () . <sup>2</sup>

Using equation (2.5.48),

Using <sup>2</sup> ( ( )) *dB t dt* ,

From the integral notation,

Comparing with

differential,

$$dX(t) = \left(r(t)dt + \sigma(t)\mathcal{W}\_r dB(t)\right)X(t),$$

$$dX(t) = r(t)X(t)dt + \sigma(t)X(t)d(t)\,. \tag{2.5.51}$$

As seen from the above equation (2.5.51), *X*(*t*) is an Ito process.

Consider the case with *r*(*t*)= *r* , a constant and ( )*t* , a constant then the process <sup>1</sup> *X t*( ) can be written in the differential form:

$$dX(t) = rX(t)dt + \sigma X(t)dB(t)\,\dots$$

Assume *g*(*x*)=*ln x*,

Then using the Ito formula,

$$d\operatorname{dg}(\mathfrak{x}(t)) = \frac{\partial(\ln \mathfrak{x})}{\partial \mathfrak{x}} dX(t) + \frac{1}{2} \frac{\partial^2 \mathfrak{g}}{\partial \mathfrak{x}^2} (dX(t))^2,$$

$$\begin{split}d(\ln(X(t)) = \frac{dX(t)}{X(t)} + \frac{1}{2} \Big(\frac{-1}{X(t)^2}\Big) (\sigma^2)\_{\prime} \\ = \frac{dX(t)}{X(t)} - \frac{\sigma^2}{2} dt\_{\prime} \\ = \frac{1}{X(t)} (rX(t)dt + \sigma X(t)dB(t)) - \frac{\sigma^2}{2} dt\_{\prime} \\ \end{split}$$

$$\begin{split}d(\ln(X(t)) = rdt + \sigma dB(t) - \frac{\sigma^2}{2} dt\_{\prime} \\ = \Big(r - \frac{\sigma^2}{2}\Big)dt + \sigma dB(t). \end{split}$$

increases gradually – a property referred to as time varying variance. The use of the Wiener process in a modelling situation to represent the noise in the system should be carefully thought through. If the noise can be represented as white noise, then Wiener process enters into the equation because of the relationship between the white noise and the Wiener process. It is also important to realize that the Ito integral is a stochastic process dependent on the Wiener process. This is analogous to integration in standard calculus because an

Stochastic Differential Equations and Related Inverse Problems 47

Given the Wiener process realization depicted in Figure 2.1, we compute the Ito integral of

As we have previously seen, this integral can be evaluated by using the following stochastic

1 1 (, ) (, ) (, ) 2 2

*B s dB s B t t*

.

Figure 2.3. The realization of the Wiener process used in the calculation of the Ito Integral

2

 

200 400 600 800 1000

200 400 600 800 1000

indefinite integral is a function of the independent, deterministic variable.



shown in Figure 2.4.




0.25


Wiener process, 0 (, ) *<sup>t</sup>*

Figure 2.2. A realization of the Wiener process increment.

relationship converging in probability; and it is shown in Figure 2.4.

0

*t*

*B t dB* .

0.025 0.05 0.075

Converting back to the integral form,

$$
\ln(X(t)) = \ln(X(0)) + \int\_0^t \left(r - \frac{\sigma^2}{2}\right) dt + \int\_0^t \sigma dB(t) \,, \tag{2.5.52}
$$

$$
\ln\left(\frac{X(t)}{X(0)}\right) = \left(r - \frac{\sigma^2}{2}\right)t + \sigma B(t) \,, \tag{2.5.53}
$$

$$
X(t) = X(0) \exp\left(\sigma B(t)\right) \exp\left((r - \frac{\sigma^2}{2})t\right) \,. \tag{2.5.52}
$$

*X*(*t*) process, therefore, satisfies the Ito process,

$$dX(t) = rX(t)dt + \sigma X(t)dB(t)\_{,r}$$

and equation (2.5.52) can be considered as a solution to the stochastic differential equation. As discussed earlier, this solution significantly different from its deterministic counterpart.

This section we have reviewed the essentials of stochastic calculus and presented the results which could be useful in developing models and solving stochastic differential equations. While analytical expressions are quite helpful to understand stochastic processes, computer simulation provides us with an intuitive "feel" for the simulated phenomena. Sometimes it is revealing to simulate a number of realizations of a process and visualize them on computers to understand the behaviours of the process. phenomena.

### **2.6 Computer Simulation of Brownian Motion and Ito Processes**

In the previous section, we have introduced Brownian motion (the Wiener process) as a stationary, continuous stochastic process with independent increments. This process is a unique one to model the irregular noise such as Gaussian white noise in systems, and once such a process is incorporated in differential equations, the process of obtaining solutions involve stochastic calculus. Only a limited number of stochastic differential equations have analytical solutions and some of these equations are given by Kloeden and Platen (1992). In many instances we have to resort to numerical methods. We illustrate the behaviour of the Wiener process and Ito processes through computer simulations so that reader can appreciate the variable nature of individual realizations.

For the numerical implementation, it is most convenient to use the variance specification of the Wiener process *B*(*t*). The time span of the simulation, [0,1] is discretised into small equal time increments *delt*, and the corresponding independent Wiener increments selected randomly from a normal distribution with zero mean and variance, *delt*.

Figure 2.2 shows the Wiener process increments as a single stochastic process. Since Gaussian white noise is the derivative of Wiener process and the time interval is a constant, Figure 2.2 depicts a realization of an approximation of white noise process.

The Wiener process is very irregular (Figure 2.1), and the only discernible pattern is that as time progresses, the position tends to wander away from the starting position at the origin. In other words, if the statistical variance over realisations for a fixed time is evaluated, this

. (2.5.52)

46 in Porous Media - An Approach Based on Stochastic Calculus

<sup>0</sup> <sup>0</sup> ln( ( )) ln( (0)) ( ) <sup>2</sup>

<sup>2</sup> ( ) ln ( ) (0) 2 *X t r t Bt*

*X*

computers to understand the behaviours of the process.

appreciate the variable nature of individual realizations.

**2.6 Computer Simulation of Brownian Motion and Ito Processes**

randomly from a normal distribution with zero mean and variance, *delt*.

Figure 2.2 depicts a realization of an approximation of white noise process.

*X*(*t*) process, therefore, satisfies the Ito process,

*X t X r dt dB t*

( ) (0)exp ( ) exp ( ) <sup>2</sup> *Xt X Bt r t*

*dX t rX t dt X t dB t* () () () () 

and equation (2.5.52) can be considered as a solution to the stochastic differential equation. As discussed earlier, this solution significantly different from its deterministic counterpart. This section we have reviewed the essentials of stochastic calculus and presented the results which could be useful in developing models and solving stochastic differential equations. While analytical expressions are quite helpful to understand stochastic processes, computer simulation provides us with an intuitive "feel" for the simulated phenomena. Sometimes it is revealing to simulate a number of realizations of a process and visualize them on

In the previous section, we have introduced Brownian motion (the Wiener process) as a stationary, continuous stochastic process with independent increments. This process is a unique one to model the irregular noise such as Gaussian white noise in systems, and once such a process is incorporated in differential equations, the process of obtaining solutions involve stochastic calculus. Only a limited number of stochastic differential equations have analytical solutions and some of these equations are given by Kloeden and Platen (1992). In many instances we have to resort to numerical methods. We illustrate the behaviour of the Wiener process and Ito processes through computer simulations so that reader can

For the numerical implementation, it is most convenient to use the variance specification of the Wiener process *B*(*t*). The time span of the simulation, [0,1] is discretised into small equal time increments *delt*, and the corresponding independent Wiener increments selected

Figure 2.2 shows the Wiener process increments as a single stochastic process. Since Gaussian white noise is the derivative of Wiener process and the time interval is a constant,

The Wiener process is very irregular (Figure 2.1), and the only discernible pattern is that as time progresses, the position tends to wander away from the starting position at the origin. In other words, if the statistical variance over realisations for a fixed time is evaluated, this

2

*t t*

,

<sup>2</sup>

,

,

Converting back to the integral form,

increases gradually – a property referred to as time varying variance. The use of the Wiener process in a modelling situation to represent the noise in the system should be carefully thought through. If the noise can be represented as white noise, then Wiener process enters into the equation because of the relationship between the white noise and the Wiener process. It is also important to realize that the Ito integral is a stochastic process dependent on the Wiener process. This is analogous to integration in standard calculus because an indefinite integral is a function of the independent, deterministic variable. 

Figure 2.2. A realization of the Wiener process increment.

Given the Wiener process realization depicted in Figure 2.1, we compute the Ito integral of Wiener process, 0 (, ) *<sup>t</sup> B t dB* .

As we have previously seen, this integral can be evaluated by using the following stochastic relationship converging in probability; and it is shown in Figure 2.4.

$$\int\_0^t B(s,\alpha) \, dB(s,\alpha) = \frac{1}{2} B^2(t,\alpha) - \frac{1}{2}t \,.$$

Figure 2.3. The realization of the Wiener process used in the calculation of the Ito Integral shown in Figure 2.4.

200 400 600 800 1000

<sup>0</sup> <sup>0</sup> 6 () 4 () ()

*B t dt B t dB t*

*t t*

smoothen

<sup>0</sup> <sup>0</sup> () 6 () 4 () () *<sup>t</sup> <sup>t</sup> B t B t dt B t dB t* .

realizations of the standard Wiener process, and they are shown in Figure 2.7.

Even for a decreasing and erratic Wiener process, the Ito process

Stochastic Differential Equations and Related Inverse Problems 49

As seen in Figure 2.7, individual realizations of the Ito process <sup>2</sup> <sup>3</sup>

are distinct from each other; and they show the complexity in stochastic integration as

200 400 600 800 1000

<sup>0</sup> <sup>0</sup> 6 () 4 () ()

*B t dt B t dB t* .

*t t*

*B t dt B t dB t* in general has a smoother realization which has an overall growth in positive direction. The effect of Ito integration tends to smoothen the erratic behaviour of Wiener process. We have evaluated the above Ito process for 3 different

2.5

 <sup>2</sup> <sup>3</sup> <sup>0</sup> <sup>0</sup> 6 () 4 () ()

*t t*

7.5 10

12.5

17.5

15

5

Figure 2.6. Ito process <sup>4</sup> <sup>2</sup> <sup>3</sup>

opposed to integration in standard calculus.

5

Figure 2.7. Three realizations of <sup>2</sup> <sup>3</sup>

10

15

20

25

Figure 2.4. A realization of 0 (, ) *<sup>t</sup> B t dB* .

Let us consider the following Ito process which we have derived in section 2.5. In differential notation,

$$d(B^4(t)) = 6B^2(t)dt + 4\, B^3(t)dB(t) \,,$$

which means,

$$B^4(t) = B^4(0) + \int\_0^t 6B^2(t)dt + \int\_0^t 4B^3(t)dB(t)\text{, and}$$

$$B^4(t) = \int\_0^t 6B^2(t)dt + \int\_0^t 4B^3(t)dB(t)\text{.}\tag{2.6.1}$$

The Ito process given in equation (2.6.1) is simulated in Figure 2.6 for the Wiener realization depicted in Figure 2.5.

Figure 2.5. Wiener realization used in evaluating the Ito process <sup>4</sup> *B t*( ) .

48 in Porous Media - An Approach Based on Stochastic Calculus

200 400 600 800 1000

*B t B t dt B t dB t* . (2.6.1)

Let us consider the following Ito process which we have derived in section 2.5. In

<sup>4</sup> <sup>2</sup> <sup>3</sup> *d B t B t dt B t dB t* ( ( )) 6 ( ) 4 ( ) ( ) ,

The Ito process given in equation (2.6.1) is simulated in Figure 2.6 for the Wiener realization

200 400 600 800 1000

0.2

*B t dB* .

4 4 2 3 <sup>0</sup> <sup>0</sup> ( ) (0) 6 ( ) 4 ( ) ( ) *<sup>t</sup> <sup>t</sup> B t B B t dt B t dB t* , and

Figure 2.5. Wiener realization used in evaluating the Ito process <sup>4</sup> *B t*( ) .

4 2 3 <sup>0</sup> <sup>0</sup> () 6 () 4 () () *<sup>t</sup> <sup>t</sup>*

Figure 2.4. A realization of 0 (, ) *<sup>t</sup>*

differential notation,

which means,

depicted in Figure 2.5.







0.4

0.6

0.8

1.2

1.4

1

Figure 2.6. Ito process <sup>4</sup> <sup>2</sup> <sup>3</sup> <sup>0</sup> <sup>0</sup> () 6 () 4 () () *<sup>t</sup> <sup>t</sup> B t B t dt B t dB t* .

Even for a decreasing and erratic Wiener process, the Ito process <sup>2</sup> <sup>3</sup> <sup>0</sup> <sup>0</sup> 6 () 4 () () *t t B t dt B t dB t* in general has a smoother realization which has an overall growth in positive direction. The effect of Ito integration tends to smoothen the erratic behaviour of Wiener process. We have evaluated the above Ito process for 3 different realizations of the standard Wiener process, and they are shown in Figure 2.7.

As seen in Figure 2.7, individual realizations of the Ito process <sup>2</sup> <sup>3</sup> <sup>0</sup> <sup>0</sup> 6 () 4 () () *t t B t dt B t dB t*

are distinct from each other; and they show the complexity in stochastic integration as opposed to integration in standard calculus.

### **2.7 Solving Stochastic Differential Equation**

Let us consider an ordinary differential equation which relates the derivative of the dependent variable (*y*(*t*)) to the independent variable (*t*) through a function, ( ( ), ) *yt t* , with the initial condition 0 *y*(0) *y* :

$$\frac{dy}{dt} = \phi \left( y, t \right),\tag{2.7.1}$$

Strong solutions do not depend on individual realizations of Brownian motion. In other words, all possible realizations of an Ito process, which is a strong solution of a SDE, satisfy the SDE under consideration. Not all the SDEs have strong solutions. Other class of solutions is called weak solutions where solution to each individual realization is different from each other. In this section we will focus only on strong solutions. In many situations, finding analytical solutions to SDEs is impossible and therefore we will review a minimum number of SDEs and their solutions in order to facilitate the discussion in the subsequent

Stochastic Differential Equations and Related Inverse Problems 51

If *X* (*t*) is a stochastic process and another stochastic process Y(*t*) is related to X(*t*) through

Thus *Y*(*t* ) is called the stochastic exponential of *X* (*t*). If *X* (*t*) is a stochastic process of finite

<sup>1</sup> ( ) ( ) (0) [ , ]( ) <sup>2</sup>

<sup>1</sup> ( ) ( ) (0) , <sup>2</sup>

*t Bt B t*

<sup>1</sup> ( ) ( ). <sup>2</sup>

<sup>1</sup> ( ) <sup>2</sup> ( ) *Bt t Xt e*

*Bt t*

[*X*, *X*](t) is quadratic variation of *X* (t) and for a continuous function with finite variation

For example, consider the following stochastic differential equation in differential form,

This SDE does not have a drift term and the diffusion term is an Ito integral.

*dY t Y t dX t* () () () , (2.7.9)

( ) ( ) *X t Yt e* , (2.7.10)

(2.7.13)

. (2.7.14)

*t Xt X XX t* . (2.7.11)

*dX t X t dB t* () () () . (2.7.12)

chapters.

with *Y*(0) 1 .

( ) ( ) *<sup>t</sup> Yt e*

[*X,X*](*t*) *=* 0*.* 

We know, [*B, B*](*t*) *= t*.

Therefore from the above result,

Then the solution to the SDE is

the stochastic differential,

and, for any process *X*(*t*)*,*

variation thus the solution to equation (2.7.9) is,

satisfies the stochastic differential given above when

$$\text{and}\quad dy = \phi(y, t)dt\,. \tag{2.7.2}$$

In many natural systems, this rate of change can be influenced by random noise caused by a combination of factors, which could be difficult to model. As a model of this random fluctuations, white noise ( ( )) *t* is a suitable candidate. Therefore we can write, in general, the increments of the noise process as ( ,) () *y t t* where is an amplitude function modifying the white noise.

Hence,

$$\frac{dy}{dt} = \phi\left(y\_\prime t\right) + \sigma\left(y\_\prime t\right)\xi\left(t\right). \tag{2.7.3}$$

As we have seen before (equation (2.3.3)),

$$
\sigma\left(y, t\right)\xi'(t) = \sigma\left(y, t\right)\frac{dB}{dt}\tag{2.7.4}
$$

where, *B*(*t*) = the standard Wiener process.

Therefore,

$$\frac{dy}{dt} = \phi\left(y, t\right) + \sigma\left(y, t\right)\frac{dB}{dt}\tag{2.7.5}$$

$$dy = \phi\left(y, t\right)d\phi + \sigma\left(y, t\right)dB\,\,. \tag{2.7.6}$$

In general, ( ,) *y t* and ( ,) *y t* , could be stochastic processes. This equation is called a stochastic differential equation (SDE) driven by Wiener process. Once the Wiener process enters into equation (2.7.4), *y* becomes a stochastic process, *Y t*(, ) , and in the differential notation SDE is written as

$$d\mathcal{Y}(t) = \phi\left(\mathcal{Y}(t), t\right)dt + \sigma\left(\mathcal{Y}(t), t\right)dB(t)\,. \tag{2.7.7}$$

This actually means,

$$Y(t) = Y(0) + \bigwedge\_{o}^{t} \phi\left(Y(t), t\right) dt + \bigwedge\_{o}^{t} \sigma\left(Y(t), t\right) dB(t) \;. \tag{2.7.8}$$

If we can find a function of Wiener process in the form of an Ito process that satisfies the above integral equation (2.7.8), we call that function a strong solution of SDE.

Strong solutions do not depend on individual realizations of Brownian motion. In other words, all possible realizations of an Ito process, which is a strong solution of a SDE, satisfy the SDE under consideration. Not all the SDEs have strong solutions. Other class of solutions is called weak solutions where solution to each individual realization is different from each other. In this section we will focus only on strong solutions. In many situations, finding analytical solutions to SDEs is impossible and therefore we will review a minimum number of SDEs and their solutions in order to facilitate the discussion in the subsequent chapters.

If *X* (*t*) is a stochastic process and another stochastic process Y(*t*) is related to X(*t*) through the stochastic differential,

$$dY(t) = Y(t) \, dX(t) \quad , \tag{2.7.9}$$

with *Y*(0) 1 .

Computational Modelling of Multi-Scale Non-Fickian Dispersion

, (2.7.1)

*t* is a suitable candidate. Therefore we can write, in general,

where

( ,) *y t* , could be stochastic processes. This equation is called a

( ,) . (2.7.2)

is an amplitude function

. (2.7.3)

(2.7.4)

, (2.7.5)

. (2.7.6)

. (2.7.7)

, and in the differential

. (2.7.8)

( ( ), ) *yt t* , with

50 in Porous Media - An Approach Based on Stochastic Calculus

Let us consider an ordinary differential equation which relates the derivative of the

( ,) *dy <sup>y</sup> <sup>t</sup> dt* 

and *dy y t dt* 

In many natural systems, this rate of change can be influenced by random noise caused by a combination of factors, which could be difficult to model. As a model of this random

> ( ,) () *y t t*

( ,) ( ,) () *dy <sup>y</sup> t yt t dt* 

( ,) () ( ,) *dB yt t yt dt*

( ,) ( ,) *dy dB yt yt dt dt* 

 *dy y t d y t dB* 

 

 ( ,) ( ,) 

stochastic differential equation (SDE) driven by Wiener process. Once the Wiener process

*dY t Y t t dt Y t t dB t* ( ) ( ( ), ) ( ( ), ) ( )

( ) (0) ( ( ), ) ( ( ), ) ( ) *t t Y t Y Y t t dt Y t t dB t*

If we can find a function of Wiener process in the form of an Ito process that satisfies the

 

 

dependent variable (*y*(*t*)) to the independent variable (*t*) through a function,

**2.7 Solving Stochastic Differential Equation** 

enters into equation (2.7.4), *y* becomes a stochastic process, *Y t*(, )

above integral equation (2.7.8), we call that function a strong solution of SDE.

the initial condition 0 *y*(0) *y* :

fluctuations, white noise ( ( ))

modifying the white noise.

Hence,

Therefore,

In general,

notation SDE is written as

This actually means,

( ,) *y t* and

the increments of the noise process as

As we have seen before (equation (2.3.3)),

where, *B*(*t*) = the standard Wiener process.

Thus *Y*(*t* ) is called the stochastic exponential of *X* (*t*). If *X* (*t*) is a stochastic process of finite variation thus the solution to equation (2.7.9) is,

$$Y(t) = e^{\chi(t)},\tag{2.7.10}$$

and, for any process *X*(*t*)*,*

( ) ( ) *<sup>t</sup> Yt e*satisfies the stochastic differential given above when

$$
\xi'(t) = X(t) - X(0) - \frac{1}{2}[X, X](t) \,. \tag{2.7.11}
$$

[*X*, *X*](t) is quadratic variation of *X* (t) and for a continuous function with finite variation [*X,X*](*t*) *=* 0*.* 

For example, consider the following stochastic differential equation in differential form,

$$dX(t) = X(t) \, dB(t) \,. \tag{2.7.12}$$

This SDE does not have a drift term and the diffusion term is an Ito integral.

We know, [*B, B*](*t*) *= t*.

Therefore from the above result,

$$\begin{aligned} \xi(t) &= B(t) - B(0) - \frac{1}{2}t, \\ &= B(t) - \frac{1}{2}(t). \end{aligned} \tag{2.7.13}$$

 Then the solution to the SDE is

$$X(t) = e^{\frac{B(t) - \frac{1}{2}t}{2}}.\tag{2.7.14}$$

is a strong solution to the differential equation

We will define a function,

*dX t X t dt X t dB t* () () () ()

Stochastic Differential Equations and Related Inverse Problems 53

<sup>1</sup> <sup>2</sup> ( , ) exp(( ) ). <sup>2</sup> *<sup>f</sup> x t* 

2

<sup>1</sup> exp(( ) ( )). <sup>2</sup>

<sup>1</sup> <sup>2</sup> exp(( ) ) <sup>2</sup> *<sup>f</sup> t x*

<sup>1</sup> exp(( ) ) <sup>2</sup> *<sup>f</sup> t x*

2 2

<sup>1</sup> <sup>1</sup> ( )exp(( ) ) <sup>2</sup> <sup>2</sup>

 

2 2

<sup>1</sup> <sup>1</sup> ( )exp(( ) ) <sup>2</sup> <sup>2</sup>

 

*<sup>f</sup> t x*

2 2

 ,

 ,

2 2 2

.

 *t X*

 *t Bt*

<sup>1</sup> ( ) *dX dt* ; <sup>2</sup>

 

> 

2 2 2 2 2

*t Bt dt dt*

*t B t dt t B t dB t*

 

 

<sup>1</sup> <sup>1</sup> 1 1 <sup>1</sup> exp(( ) ( )) ( )exp( ) exp( ) . <sup>2</sup> <sup>2</sup> 2 2 <sup>2</sup>

2 2

 

<sup>1</sup> <sup>1</sup> exp(( ) ( )) exp(( ) ( ) ( ), <sup>2</sup> <sup>2</sup>

*t x*

 

 

.

 

 

. (2.7.20)

 

<sup>2</sup> ( )0 *dX* .

( ) ( ( ), ),

*Xt f Bt t*

We need to apply Ito formula for the two Ito processes 1 *X t*( ) and 2 *X t*( ) .

1 2 1 2 *dX dX t d X X* . () [ , ] 0 ; <sup>2</sup>

2 2

 

 *dX t X t t dt X t t dB t* ( ) ( ( ), ) ( ( ), ) ( ) 

This proves that *Xt f Bt t* ( ) ( ( ), ) is a strong solution of the SDE given by equation (2.7.19).

We can see that if we can find a function *f* ( ,) *x t* , and for a given Wiener process *B t*( ),

 

11 1 ( ) (0) (0), 22 2

*ff f f f dB t dt dt x t x t xt*

 

> 

( ) ( ) ( ).

*X t dt X t dB t*

 

 

*Xt f Bt t* ( ) ( ( ), ) is a solution to the SDE of the form

<sup>1</sup> *X t Bt* () () ; 2 *Xt t* ( ) (a continuous function with finite variation);

*x*

2

*x*

*t*

1 2

 

( ( )) ( ( ( ), ))

*dXt d f Bt t*

( ( , ))

*dfX X*

From Ito formula,

2

Now let us consider a similar SDE with a drift term:

$$dX(t) = \alpha \left[ X(t) \, dt + \beta \left[ X(t) \, d\mathbb{B}(t) \right], \tag{2.7.15}$$

where and are constants.

Dividing it by *X* (*t*),

$$\frac{dX(t)}{dX(t)} = \alpha^- dt + \beta^- dB(t) \,. \tag{2.7.16}$$

This differential represents,

$$\begin{split} \int\_{0}^{t} \frac{dX}{X(t)} &= \int\_{0}^{t} \alpha \, dt + \int\_{0}^{t} \beta \, \, dB(t),\\ &= \alpha \, t + \beta (B(t) - B(0)),\\ &= \alpha \, t + \beta \, B(t). \end{split} \tag{2.7.17}$$

The second term on the right hand side comes from Ito integration.

Now let us assume

$$
\phi \left( t \right) = \alpha \left| t + \beta \left| B(t) \right| \right. \tag{2.7.18}
$$

Then the SDE becomes,

$$\int\_{\circ} \frac{dX(t)}{X(t)} = \phi\left(t\right).$$

and

$$\begin{aligned} \xi(t) &= \phi(t) - \phi(0) - \frac{1}{2}[\phi, \phi](t) \ . \\\\ [\phi, \phi](t) &= [(\alpha \, t + \beta \, B(t)), (\alpha \, t + \beta \, B(t))](t) \ . \\ &= [\alpha \, t, \alpha \, t](t) + 2\alpha \, \beta \, [t, B(t)](t) + \beta \, \, ^2 \, [B, B](t) . \\ &= 0 + 0 + \beta^2 \, \, t. \end{aligned}$$

Therefore <sup>1</sup> <sup>2</sup> () () 0 <sup>2</sup> *t t Bt t* .

Then the solution to the SDE is

$$X(t) = \exp\left( (\alpha - \frac{1}{2}\beta^{\frac{n}{2}})t + \beta^{\frac{n}{2}}B(t) \right).$$

Let us examine whether the stochastic process

$$X(t) = \exp((\alpha - \frac{1}{2}\beta^2)t + \beta'B(t)).\tag{2.7.19}$$

is a strong solution to the differential equation

$$dX(t) = \alpha X(t)dt + \beta \, X(t) \, dB(t) \, .$$

We will define a function,

Computational Modelling of Multi-Scale Non-Fickian Dispersion

, (2.7.15)

. (2.7.16)

() () *t t Bt* . (2.7.18)

2

*t Bt* (2.7.19)

(2.7.17)

52 in Porous Media - An Approach Based on Stochastic Calculus

( ) ( ) ( ) *dX t dt dB t*

<sup>0</sup> <sup>0</sup> <sup>0</sup> ( ), ( )

*<sup>t</sup> dX <sup>t</sup> <sup>t</sup>*

 

 

0

 

<sup>1</sup> ( ) ( ) (0) [ , ]( ) <sup>2</sup>

*t t t* .

2

<sup>1</sup> <sup>2</sup> ( ) exp(( ) ( )) <sup>2</sup> *X t* 

<sup>1</sup> <sup>2</sup> ( ) exp(( ) ( )). <sup>2</sup> *X t* 

*t*

00 .

 

 

 

( ) ( ), ( ) *<sup>t</sup> dX t <sup>t</sup> X t* 

> 

[ , ]( ) [( ( )),( ( ))]( ),

*t t Bt t Bt t*

 

 

 *t Bt* .

> 

[ , ]( ) 2 [ , ( )]( ) [ , ]( ),

*t t t tBt t BB t*

 

( ( ) (0)), ( ).

*dt dB t*

 

*t Bt B t Bt*

 

 *dX t X t dt X t dB t* () () () () 

*X t*

*X t*

The second term on the right hand side comes from Ito integration.

*t t Bt t* .

Let us examine whether the stochastic process

 

 

Now let us consider a similar SDE with a drift term:

are constants.

where

 and 

Dividing it by *X* (*t*),

Now let us assume

and

Then the SDE becomes,

Therefore <sup>1</sup> <sup>2</sup> () () 0 <sup>2</sup>

 

Then the solution to the SDE is

This differential represents,

$$f(x,t) = \exp((a - \frac{1}{2}\beta^2)t + \beta^\prime X).$$

$$\begin{split} X(t) &= f(B(t), t), \\ &= \exp((a - \frac{1}{2}\beta^2)t + \beta^\prime B(t)). \end{split}$$

We need to apply Ito formula for the two Ito processes 1 *X t*( ) and 2 *X t*( ) . processes

<sup>1</sup> *X t Bt* () () ; 2 *Xt t* ( ) (a continuous function with finite variation);

$$dX\_1.dX\_2(t) = d[X\_1, X\_2] = 0 \; ; \; \{dX\_1\}^2 = dt \; ; \; \{dX\_2\}^2 = 0 \; \dots$$

$$\frac{\partial f}{\partial \mathbf{x}} = \beta \exp((a - \frac{1}{2}\beta^2)t + \beta \mathbf{x})$$

$$\frac{\partial^2 f}{\partial \mathbf{x}^2} = \beta^2 \exp((a - \frac{1}{2}\beta^2)t + \beta \mathbf{x}) \; ;$$

$$\begin{split} \frac{\partial f}{\partial t} &= (a - \frac{1}{2}\beta^2) \exp((a - \frac{1}{2}\beta^2)t + \beta \mathbf{x}) \\ &= (a - \frac{1}{2}\beta^2) \exp((a - \frac{1}{2}\beta^2)t + \beta \mathbf{x}) \end{split}$$

From Ito formula,

$$\begin{split} d(f(X\_1, X\_2)) \\ = & \frac{\partial f}{\partial x} d\mathcal{B}(t) + \frac{\partial f}{\partial t} dt + \frac{1}{2} \frac{\partial^2 f}{\partial x^2} dt + \frac{1}{2} \frac{\partial^2 f}{\partial t^2}(0) + \frac{1}{2} \frac{\partial^2 f}{\partial x \partial t}(0), \\ = & \beta \exp((a - \frac{1}{2}\beta^2)t + \beta \, \mathcal{B}(t)) + (a - \frac{1}{2}\beta^2) \exp(a - \frac{1}{2}\beta^2) dt + \frac{1}{2} \beta^2 \exp(a - \frac{1}{2}\beta^2) dt. \\ \end{split}$$
 
$$\begin{split} d(X(t)) &= d(f(\mathcal{B}(t), t)) \\ &= a \exp((a - \frac{1}{2}\beta^2)t + \beta \, \mathcal{B}(t)) dt + \beta \, \exp((a - \frac{1}{2}\beta^2)t + \beta \, \mathcal{B}(t)) dB(t), \\ &= aX(t)dt + \beta X(t)dB(t). \end{split}$$

This proves that *Xt f Bt t* ( ) ( ( ), ) is a strong solution of the SDE given by equation (2.7.19). We can see that if we can find a function *f* ( ,) *x t* , and for a given Wiener process *B t*( ), *Xt f Bt t* ( ) ( ( ), ) is a solution to the SDE of the form

$$dX(t) = \mu\left(X(t), t\right)dt + \sigma\left(X(t), t\right)dB(t)\,. \tag{2.7.20}$$

Therefore,

( ) exp( )( ( ) exp( ) ( )) *t X t at X as dB s* 

.

The integral in the solution given above is an Ito integral and should be calculated according Ito integration. For non-linear stochastic differential equations, appropriate substitutions

Stochastic Differential Equations and Related Inverse Problems 55

Stochastic differential equations (SDEs) offer an attractive way of modelling the random system dynamics, but the estimation of the drift and diffusion coefficients remains a challenging problem in many situations. There are various statistical methods that are used to estimate the parameters in differential equations driven by Wiener processes. In this section we offer an alternative approach based on artificial neural networks to estimate the parameters in a SDE. Readers who are familiar with neural networks may skip this section. Artificial Neural Networks (ANNs) as discussed in chapter 1 are universal function approximators that can map any nonlinear function, and they have been used in a variety of fields, such as prediction, pattern recognition, classification and forecasting. ANNs are less sensitive to error term assumptions and they can tolerate noise and chaotic behaviour better than most other methods. Other advantages include greater fault tolerance, robustness and adaptability due to ANNs' large number of interconnected processing elements that can be trained to learn new patterns (Bishop, 1995). The Multilayer Perceptron (MLP) network is among the most common ANN architecture in use. It is one type of feed forward networks wherein the connections are only allowed from the nodes in layer *i* to the nodes in layer *i*+1. There are other more complex neural network architectures available, such as recurrent networks and stochastic networks; however MLP networks are always sufficient for dealing with most of the recognition and classification problems if enough hidden layers and hidden neurons are used (Samarasinghe, 2006). We show how to use the output values from the SDE solutions of the equations to train neural networks, and use the trained networks to estimate the SDE parameters for given output data. MLP networks will be used to solve this

This is also the solution of the SDE given by equation (2.7.23).

may be found to reduce them to linear ones.

type of mapping problem.

discrete intervals.

The general form of SDE can be expressed by

where *y t*( ) = the state variable of interest,

 *dy t y t dt y t dw t* () ( ,, ) ( ,, ) () 

unknown), and *w t*( ) = a standard Wiener process. In practice, to determine the parameter

 **,** the system output variable *y* is usually observed at discrete time intervals, *t*, where <sup>0</sup> *t T* , at *M* independent points: 1 2 { , ,..., } *<sup>M</sup> y yy y* **.** Observed data are recorded in discrete time intervals, regardless whether the model is described best by a continuous or

(2.8.1)

= a set of parameters (known and

**2.8 The Estimation of Parameters for Stochastic Differential Equations Using Neural Networks** 

*X*(*t*) should also satisfy,

$$X(t) = X(0) + \int\_0^t \mu(X(s), s) + \int\_0^t \sigma \, dB(s) \, ds$$

provided that *t ds* and ( ) *t ds s* exist.

Solution to the general linear SDE of the form,

$$dX(t) = (\alpha(t) + \beta(t)X(t))dt + (\gamma\ (t) + \delta(t)X(t))\,d\mathcal{B}(t). \tag{2.7.21}$$

where , , and are given adapted processes and continuous functions of *t*, can be quite useful in applications. form, the

The solution can be expresses as a product of two Ito processes (Klebaner, 1998)

$$X(t) = \mu(t)\,\upsilon(t)\,,\,\text{where}\,\,\,\prime$$

$$d\mu(t) = \beta\,\,\mu(t)dt + \delta\,\,\mu(t)\,\,dB(t)\,\,\,\,\,\text{and}\,\,\,\,\,\,\,$$

$$d\upsilon(t) = a\,\,dt + b\,\,dB(t)\,\,.$$

*u*(*t*) can be solved by using a stochastic exponential as shown above and once we have a solution, we can obtain *a*(*t*)*, b*(*t*) by solving the following two equations:

$$\begin{array}{l} b(t) \ u(t) = \gamma(t) \ , \text{ and} \\\\ a(t) \ u(t) = \alpha(t) - \delta(t) \ \gamma(t) \ . \end{array}$$

Then the solution to the general linear SDE is given by (Klebaner, 1998) :

$$X(t) = u(t)\Big(X(0) + \int\_0^t \frac{a(s) - \delta(s)\,\gamma(s)}{u(s)}ds + \int\_0^t \frac{\gamma(s)}{u(s)}dB(s)\Big) \tag{2.7.22}$$

 As an example let us solve the following linear SDE:

$$d\mathbf{x}(t) = a\,\,\mathbf{X}(t)\,\,dt + dB(t)\,,\tag{2.7.23}$$

where *a* is a constant.

Here () , *t a* ( ) 1, *t* () 0 *t* , and() 0 *t* .

Using the general solution with

$$\begin{aligned} d\iota(t) &= a\iota(t)\,dt + \text{(0)}\,\, dB(t)\,\iota \\ &= a(t)\,\,\iota(t)\,\,dt. \end{aligned}$$

From stochastic exponential,

$$u(t) = \exp(a \, t) \quad \dots$$

Therefore,

Computational Modelling of Multi-Scale Non-Fickian Dispersion

(2.7.21)

54 in Porous Media - An Approach Based on Stochastic Calculus

0 0 ( ) (0) ( ( ), ) ( ) *t t X t X X s s dB s* 

*dX t t t X t dt t t X t dB t* ( ) ( ( ) ( ) ( )) ( ( ) ( ) ( )) ( ).

*Xt ut vt* () () () , where ,

 

*u*(*t*) can be solved by using a stochastic exponential as shown above and once we have a

 .

0 0 () () () () ( ) ( ) (0) ( ) ( ) ( ) *<sup>t</sup> <sup>t</sup> s ss s Xt ut X ds dB s*

 

( ) ( ) (0) ( ), () () . *du t a u t dt dB t a t u t dt* 

*ut at* ( ) exp( ) .

*u s u s*

 (2.7.22)

*dx t a X t dt dB t* () () () , (2.7.23)

*du t u t dt u t dB t* () () () ()

 

,

> 

, and

are given adapted processes and continuous functions of *t*, can

*X*(*t*) should also satisfy,

*t*

be quite useful in applications.

*ds*

Solution to the general linear SDE of the form,

 and ( ) *t*

*ds s*

exist.

 

The solution can be expresses as a product of two Ito processes (Klebaner, 1998)

*dv t a dt b dB t* ( ) ( ) .

*at ut t t t* () () () () () 

solution, we can obtain *a*(*t*)*, b*(*t*) by solving the following two equations: *bt ut t* () () () , and

Then the solution to the general linear SDE is given by (Klebaner, 1998) :

As an example let us solve the following linear SDE:

() 0 *t* , and

() 0 *t* .

provided that

 , , and 

where

Here () , *t a* ( ) 1, *t*

where *a* is a constant.

Using the general solution with

From stochastic exponential,

$$X(t) = \exp(at)(X(o) + \int\_o^t \exp(-as) \, \, dB(s))\,\,.$$

This is also the solution of the SDE given by equation (2.7.23).

The integral in the solution given above is an Ito integral and should be calculated according Ito integration. For non-linear stochastic differential equations, appropriate substitutions may be found to reduce them to linear ones. in

### **2.8 The Estimation of Parameters for Stochastic Differential Equations Using Neural Networks**

Stochastic differential equations (SDEs) offer an attractive way of modelling the random system dynamics, but the estimation of the drift and diffusion coefficients remains a challenging problem in many situations. There are various statistical methods that are used to estimate the parameters in differential equations driven by Wiener processes. In this section we offer an alternative approach based on artificial neural networks to estimate the parameters in a SDE. Readers who are familiar with neural networks may skip this section. Artificial Neural Networks (ANNs) as discussed in chapter 1 are universal function approximators that can map any nonlinear function, and they have been used in a variety of fields, such as prediction, pattern recognition, classification and forecasting. ANNs are less sensitive to error term assumptions and they can tolerate noise and chaotic behaviour better than most other methods. Other advantages include greater fault tolerance, robustness and adaptability due to ANNs' large number of interconnected processing elements that can be trained to learn new patterns (Bishop, 1995). The Multilayer Perceptron (MLP) network is among the most common ANN architecture in use. It is one type of feed forward networks wherein the connections are only allowed from the nodes in layer *i* to the nodes in layer *i*+1. There are other more complex neural network architectures available, such as recurrent networks and stochastic networks; however MLP networks are always sufficient for dealing with most of the recognition and classification problems if enough hidden layers and hidden neurons are used (Samarasinghe, 2006). We show how to use the output values from the SDE solutions of the equations to train neural networks, and use the trained networks to estimate the SDE parameters for given output data. MLP networks will be used to solve this type of mapping problem.

The general form of SDE can be expressed by

$$dy(t) = \mu(y, t, \theta) \, dt + \sigma(y, t, \theta) \, dw(t) \tag{2.8.1}$$

where *y t*( ) = the state variable of interest, = a set of parameters (known and unknown), and *w t*( ) = a standard Wiener process. In practice, to determine the parameter **,** the system output variable *y* is usually observed at discrete time intervals, *t*, where <sup>0</sup> *t T* , at *M* independent points: 1 2 { , ,..., } *<sup>M</sup> y yy y* **.** Observed data are recorded in discrete time intervals, regardless whether the model is described best by a continuous or discrete intervals.

the experiments is assumed to be [1, 2]. In addition,

the contribution of diffusion term, and the range of

used only to assess the performance of the network.

[0.5, 3] in the nonlinear case.

analytical solution (section 2.7),

of 0 

10 to max

The discrete observations *X t*( ) of these two equations are obtained at the sampling instants. Suppose the number of samples to be *nt* , we consider the first *nt* time steps starting from 0 *X* 1 and the size of sampling interval *t* = 0.001. All the values come from the solution of SDEs. It has been shown that using Ito formula, Eq. (2.8.2) has an

Stochastic Differential Equations and Related Inverse Problems 57

<sup>0</sup> ( ) exp ( ) ( ) <sup>2</sup> *Xt X t wt* 

For Eq. (2.8.4), we use the Euler method for the numerical solution. The numerical solution

Before we describe the neural networks data sets, we clarify the terminology about "training", "validation" and "test" data sets. In the literature of machine learning and neural networks communities, different meaning of the words "validation" and "test" are employed. We restrict ourselves to the definitions given by Ripley (1996): a training set is used for learning, a validation set is used to tune the network parameters and a test set is

We generate a number of SDE realisations for a specified range of parameters with some patterns of Wiener processes to train the ANN. These data sets are called training data sets and validation sets are randomly chosen from the training sets. In order to test the prediction capability of the ANN, test data sets are generated with different patterns of

Obviously if the test data sets were generated from SDEs which contain only a single Wiener process, the result would be biased if this Wiener process was coincidently similar to the one used to generate the training data sets. To fairly assess the performance of networks, five

To determine the value of time step *nt* , we have taken different *nt* values, where min

All the experiments are carried out on a personal computer running the Microsoft Windows XP operating system. We use a commercial ANN software, namely NeuroShell2, for the neural network computations. It is recommended for academic users only, or those users who are concerned with classic neural network paradigms like backpropagation. Users interested in solving real problems should consider the NeuroShell Predictor, NeuroShell

*nt* = 200 and *<sup>n</sup> t* = 10, to generate the training and test data sets. We found that 50 values were sufficient to represent the pattern of SDEs in order to train neural networks. Further increase in *nt* did not increase the neural networks performance in parameter

has been compared with the analytical solution of the equation.

Wiener processes within the same range of parameters as the training data sets.

different patterns of Wiener processes are used to generate the test data sets.

estimation. Therefore 50 time steps are used in our computational experiments.

Classifier, or the NeuroShell Trader (Group, W.S., 2005).

2

 

is used to adjust the proportion of

is [0.01, 0.1] in the linear case and

. (2.8.4)

*nt* =

Figure 2.8. Basic structure of a MLP with backpropagation algorithm.

A MLP shown in Figure 2.8 has one hidden layer ( *m*<sup>1</sup> ) and one output layer ( *m*<sup>2</sup> ), and all the layers are fully connected to their subsequent layer. Connections are only allowed from the input layer to the hidden layer, and then, from the hidden layer to the output layer.

Rumelhart (1986) developed the backpropagation learning algorithm and it is commonly used to train ANNs due to its advanced ability to generalize wider variety of problems. A typical backpropagation learning algorithm is based on an architecture that contains a layer of input neurons, output neurons, and one or more hidden layers; these neurons are all interconnected with different weights. In the backpropagation training algorithm, the error information is passed backward from the output layer to the input layer (Figure 2.8). Weights are adjusted with the gradient descent method.

The ANN is trained by first setting up the network with all its units and connections, and then initialising with arbitrary weights. Then the network is trained by presenting examples. During the training phase the weights on connections change enabling the network to learn. When the network performs well on all training examples it should be validated on some other examples that it has not seen before. If the network can produce reasonable output values which are similar to validation targets and contain only small errors, it is then considered to be ready to use for problem solving. In (Group,

Both linear and nonlinear SDEs are examined in this section. The linear SDE (Eq. (2.8.2)) is expressed by a one-dimensional diffusion equation. Its drift term has a linear relationship to the output variable of the model, and the diffusion term represents the noise in the model. Eq. (2.8.3) is arbitrarily chosen as a representative nonlinear SDE:

$$dX(t) = \alpha X(t)dt + \gamma X(t)dw(t) \text{, and } \tag{2.8.2}$$

$$dX(t) = aX(t)dt + \beta X^2(t)dt + \gamma dw(t),\tag{2.8.3}$$

where *α, β* = constant coefficients to be estimated as parameters, and = a constant coefficient to adjust the noise level (amplitude).

For each particular parameter *α* or the combination of parameters *α* and *β*, we can generate one realisation of SDE output through Eq. (2.8.2) or (2.8.3). The range of and used in

56 in Porous Media - An Approach Based on Stochastic Calculus

A MLP shown in Figure 2.8 has one hidden layer ( *m*<sup>1</sup> ) and one output layer ( *m*<sup>2</sup> ), and all the layers are fully connected to their subsequent layer. Connections are only allowed from the input layer to the hidden layer, and then, from the hidden layer to the output layer.

Rumelhart (1986) developed the backpropagation learning algorithm and it is commonly used to train ANNs due to its advanced ability to generalize wider variety of problems. A typical backpropagation learning algorithm is based on an architecture that contains a layer of input neurons, output neurons, and one or more hidden layers; these neurons are all interconnected with different weights. In the backpropagation training algorithm, the error information is passed backward from the output layer to the input layer (Figure 2.8).

The ANN is trained by first setting up the network with all its units and connections, and then initialising with arbitrary weights. Then the network is trained by presenting examples. During the training phase the weights on connections change enabling the network to learn. When the network performs well on all training examples it should be validated on some other examples that it has not seen before. If the network can produce reasonable output values which are similar to validation targets and contain only small errors, it is then

Both linear and nonlinear SDEs are examined in this section. The linear SDE (Eq. (2.8.2)) is expressed by a one-dimensional diffusion equation. Its drift term has a linear relationship to the output variable of the model, and the diffusion term represents the noise in the model.

<sup>2</sup> *dX t X t dt X t dt dw t* () () () () 

For each particular parameter *α* or the combination of parameters *α* and *β*, we can generate

 

, and (2.8.2)

, (2.8.3)

 and 

= a constant

used in

*dX t X t dt X t dw t* () () () ()

where *α, β* = constant coefficients to be estimated as parameters, and

one realisation of SDE output through Eq. (2.8.2) or (2.8.3). The range of

Figure 2.8. Basic structure of a MLP with backpropagation algorithm.

Weights are adjusted with the gradient descent method.

considered to be ready to use for problem solving.

coefficient to adjust the noise level (amplitude).

Eq. (2.8.3) is arbitrarily chosen as a representative nonlinear SDE:

the experiments is assumed to be [1, 2]. In addition, is used to adjust the proportion of the contribution of diffusion term, and the range of is [0.01, 0.1] in the linear case and [0.5, 3] in the nonlinear case.

The discrete observations *X t*( ) of these two equations are obtained at the sampling instants. Suppose the number of samples to be *nt* , we consider the first *nt* time steps starting from 0 *X* 1 and the size of sampling interval *t* = 0.001. All the values come from the solution of SDEs. It has been shown that using Ito formula, Eq. (2.8.2) has an analytical solution (section 2.7),

$$X(t) = X\_0 \exp\left[ (\alpha - \frac{\gamma^2}{2})t + \gamma w(t) \right]. \tag{2.8.4}$$

For Eq. (2.8.4), we use the Euler method for the numerical solution. The numerical solution of 0 has been compared with the analytical solution of the equation.

Before we describe the neural networks data sets, we clarify the terminology about "training", "validation" and "test" data sets. In the literature of machine learning and neural networks communities, different meaning of the words "validation" and "test" are employed. We restrict ourselves to the definitions given by Ripley (1996): a training set is used for learning, a validation set is used to tune the network parameters and a test set is used only to assess the performance of the network. layers It out

We generate a number of SDE realisations for a specified range of parameters with some patterns of Wiener processes to train the ANN. These data sets are called training data sets and validation sets are randomly chosen from the training sets. In order to test the prediction capability of the ANN, test data sets are generated with different patterns of Wiener processes within the same range of parameters as the training data sets.

Obviously if the test data sets were generated from SDEs which contain only a single Wiener process, the result would be biased if this Wiener process was coincidently similar to the one used to generate the training data sets. To fairly assess the performance of networks, five different patterns of Wiener processes are used to generate the test data sets.

To determine the value of time step *nt* , we have taken different *nt* values, where min *nt* = 10 to max *nt* = 200 and *<sup>n</sup> t* = 10, to generate the training and test data sets. We found that 50 values were sufficient to represent the pattern of SDEs in order to train neural networks. Further increase in *nt* did not increase the neural networks performance in parameter estimation. Therefore 50 time steps are used in our computational experiments.

All the experiments are carried out on a personal computer running the Microsoft Windows XP operating system. We use a commercial ANN software, namely NeuroShell2, for the neural network computations. It is recommended for academic users only, or those users who are concerned with classic neural network paradigms like backpropagation. Users interested in solving real problems should consider the NeuroShell Predictor, NeuroShell Classifier, or the NeuroShell Trader (Group, W.S., 2005).

There are two parameters,

and

defined as,

ANN with actual values, and it is defined by

 and 

measurement of strength of nonlinear term. They can be defined as

1

1

2 1

predictions are worse than the mean of samples, the <sup>2</sup> *R* value will be less than 0.

1

*R*

1

where *y* = actual value, *y*ˆ **=** predicted value of *y, y* = mean of *y*, and *m* = number of data patterns. In the case of parameter estimation for the linear SDE, one <sup>2</sup> *R* value is obtained for determining the accuracy of the predicted parameter *α*. For the nonlinear SDE, two <sup>2</sup> *R* values are calculated for determining the accuracy of the predicted parameters *α* and *β*. If the ANN predicts all the values correctly as the actual values, a perfect fit would result in an <sup>2</sup> *R* value of 1. In a very poor fit, the <sup>2</sup> *R* value would be close to 0. If ANN

In addition to <sup>2</sup> *R* , the Average Absolute Percentage Error (*AAPE*) is also used for evaluating the prediction performance where needed (Triola, 2004). The *AAPE* can be

1

The performance of ANN is evaluated by assessing the accuracy of the estimated parameters. Different ANN architectures including various combinations of hidden layers, neurons, and training epochs are used to obtain the optimum neural network. Further, a range of diffusion term is used to evaluate the effect of different level of stochasticity on the performance of ANN. We do not have *a priori* knowledge of the optimal ANN architecture at first; therefore we choose the default parameters in NeuroShell2 for one hidden layer MLP network, which has

*y y AAPE m y*

where *y* = target value, *y*ˆ = predicted value of y*,* and *m* = number of data patterns.

<sup>1</sup> <sup>ˆ</sup> <sup>100</sup> *<sup>m</sup> i i i i*

(2.8.9)

*m*

*i m i i*

 

*n i dt x i <sup>P</sup> n dt dt x i*

 

*n dt dt x i*

Stochastic Differential Equations and Related Inverse Problems 59

1 ( )

<sup>2</sup> *R* (coefficient of multiple determinations) is a statistical indication of data sets which is determined by multiple regression analysis, and it is an important indicator of the ANN performance used in NeuroShell2 (Triola, 2004). <sup>2</sup> *R* compares the results predicted by

 

 

*n i dt <sup>P</sup>*

1

to determine the strength of the linear term (i.e.

, in the drift term of the nonlinear SDE. We define *Pa*

, (2.8.6)

. (2.8.7)

*dt x t*( ) ). Similarly, *P*

, (2.8.8)

indicates the

( )

( )

2

( ) ˆ

*y y*

*i i*

( )

*y y*

2

Among all the parameters in MLP, the numbers of input and output neurons are the easiest parameters to be determined because each independent variable is represented by its own input neuron. The number of inputs is determined by the number of sampling instants in the SDE's solution, and the number of outputs is determined by the number of parameters which need to be predicted. In terms of the number of hidden layers and hidden layer neurons, we try a network that started with one layer and a few neurons, and then test different hidden layers and neurons to achieve the best ANN performance. In the following experiments, (X – Y – Z) is used to denote to the networks, where X is the number of input nodes, Y is the number of hidden nodes and Z is the number of output nodes.

We found that the logistic function was always superior to other five transfer functions used in NeuroShell2, logistic, linear, hyperbolic tangent function, Sine and Gaussian, as input, output and hidden layer functions because of its nonlinear and continuously differentiable properties, which are desirable for learning complex problems. In addition to the logistic function, we use the default values of 0.1 in NeuroShell2 for both learning rate and momentum as we found that it was always appropriate.

The number of training epochs plays an important role in determining the performance of the ANN. An epoch is the presentation of the entire training going through the network. ANNs need sufficient training epochs to learn complex input-output relationships. However excessive training epochs require unnecessarily long training time and cause over fitting problems where the network performs during the training very well but fails in testing (Caruana, 2001). To monitor the over fitting problem, we set up 20% of the training sets as validation sets and and the ANN monitors errors on the validation sets during training. The profile of the error plot for the training and validation sets during the training procedure indicates whether further training epoch is needed. We can stop training when the error of the training set plot keeps decreasing but that of the validation set plot has an increasing or flat line at the end. and set of of

In order to test the robustness of neural networks, we need to measure the level of noise in the diffusion term of a SDE with respect to its drift term. Thus the diffusion parameter *γ* is used to adjust the noise level. The higher *γ* value indicates greater noise and increases the influence of the contribution of the diffusion term to the entire solution. As one can assume, the increased noise levels raises the difficulty of estimation. To measure it, we define *P* for linear equation (2.8.2) as

$$P\_{\gamma} \equiv \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{\gamma \, dw(i)}{a \, dt} \right| \, \nu$$

and for nonlinear equation (2.8.3),

$$P\_{\gamma} = \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{\gamma \, dw(i)}{\alpha \, dt \, \mathbf{x}(i) + \beta \, dt \, \mathbf{x}(i)^2} \right|,\tag{2.8.5}$$

where *n* = the number of time steps, and *dt* = time differential.

There are two parameters, and , in the drift term of the nonlinear SDE. We define *Pa* to determine the strength of the linear term (i.e. *dt x t*( ) ). Similarly, *P* indicates the measurement of strength of nonlinear term. They can be defined as

$$P\_{\alpha} \equiv \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{\alpha \, dt}{\alpha \, dt + \beta \, dt \, \mathbf{x}(i)} \right| \,\tag{2.8.6}$$

and

Computational Modelling of Multi-Scale Non-Fickian Dispersion

for

cause

58 in Porous Media - An Approach Based on Stochastic Calculus

Among all the parameters in MLP, the numbers of input and output neurons are the easiest parameters to be determined because each independent variable is represented by its own input neuron. The number of inputs is determined by the number of sampling instants in the SDE's solution, and the number of outputs is determined by the number of parameters which need to be predicted. In terms of the number of hidden layers and hidden layer neurons, we try a network that started with one layer and a few neurons, and then test different hidden layers and neurons to achieve the best ANN performance. In the following experiments, (X – Y – Z) is used to denote to the networks, where X is the number of input

We found that the logistic function was always superior to other five transfer functions used in NeuroShell2, logistic, linear, hyperbolic tangent function, Sine and Gaussian, as input, output and hidden layer functions because of its nonlinear and continuously differentiable properties, which are desirable for learning complex problems. In addition to the logistic function, we use the default values of 0.1 in NeuroShell2 for both learning rate and

The number of training epochs plays an important role in determining the performance of the ANN. An epoch is the presentation of the entire training going through the network. ANNs need sufficient training epochs to learn complex input-output relationships. However excessive training epochs require unnecessarily long training time and cause over fitting problems where the network performs during the training very well but fails in testing (Caruana, 2001). To monitor the over fitting problem, we set up 20% of the training sets as validation sets and and the ANN monitors errors on the validation sets during training. The profile of the error plot for the training and validation sets during the training procedure indicates whether further training epoch is needed. We can stop training when the error of the training set plot keeps decreasing but that of the validation set plot has an

In order to test the robustness of neural networks, we need to measure the level of noise in the diffusion term of a SDE with respect to its drift term. Thus the diffusion parameter *γ* is used to adjust the noise level. The higher *γ* value indicates greater noise and increases the influence of the contribution of the diffusion term to the entire solution. As one can assume, the increased noise levels raises the difficulty of estimation. To measure it, we define *P*

> 1 1 ( ) *<sup>n</sup>*

2

, (2.8.5)

() ()

 

*i dw i <sup>P</sup> n dt*

1 ( )

*n dt x i dt x i*

*dw i <sup>P</sup>*

 ,

1

 

*n i*

where *n* = the number of time steps, and *dt* = time differential.

nodes, Y is the number of hidden nodes and Z is the number of output nodes.

momentum as we found that it was always appropriate.

increasing or flat line at the end.

linear equation (2.8.2) as

and for nonlinear equation (2.8.3),

$$P\_{\beta} \equiv \frac{1}{n} \sum\_{i=1}^{n} \left| \frac{\beta \, dt \, \mathbf{x}(i)}{\alpha \, dt + \beta \, dt \, \mathbf{x}(i)} \right|. \tag{2.8.7}$$

<sup>2</sup> *R* (coefficient of multiple determinations) is a statistical indication of data sets which is determined by multiple regression analysis, and it is an important indicator of the ANN performance used in NeuroShell2 (Triola, 2004). <sup>2</sup> *R* compares the results predicted by ANN with actual values, and it is defined by

$$R^2 = 1 - \frac{\sum\_{i=1}^{m} (y\_i - \hat{y}\_i)^2}{\sum\_{i=1}^{m} (y\_i - \overline{y})^2} \,' \tag{2.8.8}$$

where *y* = actual value, *y*ˆ **=** predicted value of *y, y* = mean of *y*, and *m* = number of data patterns. In the case of parameter estimation for the linear SDE, one <sup>2</sup> *R* value is obtained for determining the accuracy of the predicted parameter *α*. For the nonlinear SDE, two <sup>2</sup> *R* values are calculated for determining the accuracy of the predicted parameters *α* and *β*. If the ANN predicts all the values correctly as the actual values, a perfect fit would result in an <sup>2</sup> *R* value of 1. In a very poor fit, the <sup>2</sup> *R* value would be close to 0. If ANN predictions are worse than the mean of samples, the <sup>2</sup> *R* value will be less than 0.

In addition to <sup>2</sup> *R* , the Average Absolute Percentage Error (*AAPE*) is also used for evaluating the prediction performance where needed (Triola, 2004). The *AAPE* can be defined as,

$$AAPE = \frac{1}{m} \sum\_{i=1}^{w} \frac{|y\_i - \hat{y}\_i|}{y\_i} 100\tag{2.8.9}$$

where *y* = target value, *y*ˆ = predicted value of y*,* and *m* = number of data patterns.

The performance of ANN is evaluated by assessing the accuracy of the estimated parameters. Different ANN architectures including various combinations of hidden layers, neurons, and training epochs are used to obtain the optimum neural network. Further, a range of diffusion term is used to evaluate the effect of different level of stochasticity on the performance of ANN.

We do not have *a priori* knowledge of the optimal ANN architecture at first; therefore we choose the default parameters in NeuroShell2 for one hidden layer MLP network, which has

We use the same SDE parameters except *γ* = 0.07 to create training and test data sets, and 100 Wiener processes are used to produce the training data sets. All the *R2* values are obtained by using early stopping. The results in Table 2.1 suggest that when there is only one hidden layer and the number of neurons in the hidden layer is very small, the performance of the network is poor because the network does not have enough "power" to learn the input-output relationship. When the number of neurons in the hidden layer is close to the half number of input neurons, the performance reaches a higher accuracy. Further increase in the number of hidden layers and neurons does not improve the

Stochastic Differential Equations and Related Inverse Problems 61

The ANN performance is investigated for different combinations of drift and diffusion terms. We use three different MLP architectures, 50-30-1, 50-15-15-1, and 50-10-10-10-1, to

The ANN performance is investigated for different combinations of drift and diffusion terms. We use three different MLP architectures, 50-30-1, 50-15-15-1, and 50-10-10-10-1, to

> 3 hidden layer

= 0.01. Because the test set is created by 5 Wiener

value reaches 0.05 (Figure 2.10C) and

Test set 2 *R*

2 hidden layers Test set 2 *<sup>R</sup>*

50-3-1 0.2728 50-3-3-1 0.5103 50-3-3-3-1 0.4920 50-10-1 0.5222 50-5-5-1 0.4916 50-5-5-5-1 0.4576 50-30-1 0.5392 50-15-15-1 0.5125 50-10-10-10-1 0.5075 50-50-1 0.5151 50-25-25-1 0.4986 50-20-20-20-1 0.4987 50-100-1 0.4980 50-50-50-1 0.4936 50-30-30-30-1 0.5072 50-200-1 0.4969 50-100-100-1 0.4669 50-100-100-100-1 0.4892

Table 2.1. <sup>2</sup> *R* variation on test set with different hidden neurons and hidden layers.

Figure 2.10A demonstrates that the ANN performance decreases as the magnitude of the diffusion term increases and Figure 2.10B shows that the target and network output in the

processes, it should be noticed that there are five repetitive sub-data sets and each of them represents a range of α values, which is from 1 to 2, with one pattern of Wiener process. By observing the sub-data sets separately, we can gain a better understanding on how noise

the ratio of diffusion term and drift term reaches 0.67 (shown in Figure 2.10A), the 2nd, 3rd and 5th sub-data sets show more accurate predictions than the 1st and 4th sets. Figure 2.10D demonstrates that the network-generated outputs just tend to use the average of targets in most of the sub-data sets when γ = 0.10 where the weight of diffusion term is more than that

train and test the data sets, and record the best performance.

train and test the data sets, and record the best performance.

Test set 2 *R*

test sets are in good agreement when

influences the estimation of the parameter. As the

of the drift term (Pγ = 1.39 as shown in Figure 2.10A).

performance.

1 hidden layer

68 hidden neurons and the logistic transfer function for the hidden layers and output layer. In addition, = [1, 2], = 0.01 and = 0.03 are used for the parameters of SDE.

The experiments show that when training data set is developed using only one Wiener process, over fitting problem is obvious. The average error in the training set continues to decrease during training process. Because of the powerful mapping capability of neural networks, the average error between the target and network outputs approaches zero as the training continues. During the first four epochs, the average error in the test set drops significantly. It reaches the lowest at the epoch 8. After that, the validation set error starts rising although the training set error is getting smaller. The reason for this increasingly poor generalization is that the neural network tends to track every individual point in the training set created by a particular pattern of Wiener process, rather than seeing the whole character of the equation.

When the training data set is produced by more than one Wiener process, over fitting significantly decreases. The average error in the validation set continues to drop and remains stable after certain epochs. We examine the relationship between the number of Wiener processes and ANN prediction ability.

Figure 2.9. <sup>2</sup> *R* on the training and test sets against the number of Wiener Processes used to produce the training sets in the case of γ = 0.03 (A) and γ = 0.07 (B).

The same ANN architecture and SDE parameters as the previous section are used here. Additionally we test a set of noisier data with = 0.07. The results are obtained with the same numbers of training epochs. Figure 2.9 shows the influence of the number of Wiener processes that are used to produce training data sets. It indicates that as the number of Wiener Processes in the training sets increases, the network produces higher <sup>2</sup> *R* values for the test sets. It should be noted that the size of training data set expands as more Wiener processes are employed, and consequently the expansion causes slower training. Therefore, although there is a marginal improvement on <sup>2</sup> *R* value when more than 80 Wiener processes are used, we limit the number of Wiener Processes to 100 in further investigations.

= 0.03 are used for the parameters of SDE.

**Number of Wiener processes 0 20 40 60 80 100 120 140 160**

= 0.07. The results are obtained with the

**1.0 Test set**

**Training set**

60 in Porous Media - An Approach Based on Stochastic Calculus

68 hidden neurons and the logistic transfer function for the hidden layers and output layer.

The experiments show that when training data set is developed using only one Wiener process, over fitting problem is obvious. The average error in the training set continues to decrease during training process. Because of the powerful mapping capability of neural networks, the average error between the target and network outputs approaches zero as the training continues. During the first four epochs, the average error in the test set drops significantly. It reaches the lowest at the epoch 8. After that, the validation set error starts rising although the training set error is getting smaller. The reason for this increasingly poor generalization is that the neural network tends to track every individual point in the training set created by a particular pattern of Wiener process, rather than seeing the whole

When the training data set is produced by more than one Wiener process, over fitting significantly decreases. The average error in the validation set continues to drop and remains stable after certain epochs. We examine the relationship between the number of

**R2**

**0.0**

Figure 2.9. <sup>2</sup> *R* on the training and test sets against the number of Wiener Processes used to

The same ANN architecture and SDE parameters as the previous section are used here.

same numbers of training epochs. Figure 2.9 shows the influence of the number of Wiener processes that are used to produce training data sets. It indicates that as the number of Wiener Processes in the training sets increases, the network produces higher <sup>2</sup> *R* values for the test sets. It should be noted that the size of training data set expands as more Wiener processes are employed, and consequently the expansion causes slower training. Therefore, although there is a marginal improvement on <sup>2</sup> *R* value when more than 80 Wiener processes are used, we limit the number of Wiener Processes to 100 in further investigations.

**0.2**

**0.4**

**0.6**

**0.8**

**B** 

**Test set Training set**

**A** 

In addition,

**R2**

**0.80**

**0.85**

**0.90**

**0.95**

**1.00**

character of the equation.

= [1, 2],

Wiener processes and ANN prediction ability.

**Number of Wiener processes 0 20 40 60 80 100 120 140 160**

Additionally we test a set of noisier data with

produce the training sets in the case of γ = 0.03 (A) and γ = 0.07 (B).

= 0.01 and

We use the same SDE parameters except *γ* = 0.07 to create training and test data sets, and 100 Wiener processes are used to produce the training data sets. All the *R2* values are obtained by using early stopping. The results in Table 2.1 suggest that when there is only one hidden layer and the number of neurons in the hidden layer is very small, the performance of the network is poor because the network does not have enough "power" to learn the input-output relationship. When the number of neurons in the hidden layer is close to the half number of input neurons, the performance reaches a higher accuracy. Further increase in the number of hidden layers and neurons does not improve the performance.

The ANN performance is investigated for different combinations of drift and diffusion terms. We use three different MLP architectures, 50-30-1, 50-15-15-1, and 50-10-10-10-1, to train and test the data sets, and record the best performance.

The ANN performance is investigated for different combinations of drift and diffusion terms. We use three different MLP architectures, 50-30-1, 50-15-15-1, and 50-10-10-10-1, to train and test the data sets, and record the best performance.


Table 2.1. <sup>2</sup> *R* variation on test set with different hidden neurons and hidden layers.

Figure 2.10A demonstrates that the ANN performance decreases as the magnitude of the diffusion term increases and Figure 2.10B shows that the target and network output in the test sets are in good agreement when = 0.01. Because the test set is created by 5 Wiener processes, it should be noticed that there are five repetitive sub-data sets and each of them represents a range of α values, which is from 1 to 2, with one pattern of Wiener process. By observing the sub-data sets separately, we can gain a better understanding on how noise influences the estimation of the parameter. As the value reaches 0.05 (Figure 2.10C) and the ratio of diffusion term and drift term reaches 0.67 (shown in Figure 2.10A), the 2nd, 3rd and 5th sub-data sets show more accurate predictions than the 1st and 4th sets. Figure 2.10D demonstrates that the network-generated outputs just tend to use the average of targets in most of the sub-data sets when γ = 0.10 where the weight of diffusion term is more than that of the drift term (Pγ = 1.39 as shown in Figure 2.10A).

Network architecture

Range of *α*

Range of *β*

**R2**

**-0.2**

nonlinear equations against their corresponding *P*

**0.0**

**0.2**

**0.4**

**0.6**

**0.8**

**1.0**

**1.2**

<sup>2</sup> *R* (

) <sup>2</sup> *R* (

to the complexity of input-output relationship in the nonlinear SDEs.

) Network

Stochastic Differential Equations and Related Inverse Problems 63

50-10-2 0.0256 0.9161 50-10-10-2 -0.0135 0.9209 50-30-2 0.0349 0.9453 50-20-20-2 -0.0183 0.9299 50-60-2 0.0395 0.9406 50-50-50-2 0.0134 0.9310 50-100-2 0.0296 0.9209 50-10-10-10-2 0.0128 0.9198 50-200-2 0.0354 0.9299 50-50-50-50-2 -0.0164 0.9257

Table 2.2. Network performance in the nonlinear SDE as network architecture changes.


We use three network architectures, 50-30-2, 50-60-2 and 50-50-50-2, to estimate parameters for different SDEs and recorded the best results. The results in Table 2.3 indicate that the accuracy of network performance decreases as the strength of diffusion terms in SDEs increases, which is similar to the linear equation. Figure 2.11 shows that comparing with the results in the linear case (Figure 2.10A), the prediction capability of networks for the nonlinear case is poorer due

γ Pα Pβ Pγ <sup>2</sup> *R* (*α*) AAPE

[1,2] [1,2] 0.5 0.12 0.88 0.10 0.0349 17.79 0.9453 4.02 [1,2] [1,2] 2 0.11 0.89 0.40 0.0006 18 0.6833 9.92 [1,2] [1,2] 3 0.11 0.89 0.60 -0.012 17.59 0.3687 12.96

Table 2.3. Network performance in the nonlinear SDE as diffusion term increases.

**P**

**0.0 0.2 0.4 0.6 0.8**

Figure 2.11. The scattered graph of <sup>2</sup> *R* values for the parameters in the linear and the

values.

()

architecture

<sup>2</sup> *R* (

) <sup>2</sup> *R* (

<sup>2</sup> *R* (*β*) AAPE ( )

 **in linear SDE in non-linear SDE in non-linear SDE** )

Figure 2.10. A: The neural network performance decreases as the diffusion term in SDEs increases. B, C and D: Target values and network outputs when *γ*= 0.01, 0.05 and 0.10; x-axis represents the index in testing datasets where five Wiener processes were used.

We investigate nonlinear SDE as well. Moreover, because the nonlinear SDE contains two parameters, we investigate how the accuracy of estimation varies for different combination of parameters. The parameter values and ranges of the SDE are as follows: = [1, 2], = 0.05, = [1, 2], = 0.05 and *γ*= 0.5. We use early stopping to find out the best results. From Table 2.2, the different network architectures result in a very similar performance. The 2 *<sup>R</sup>* values for are very close to zero while the <sup>2</sup> *R* values for *β* are all more than 0.9. According to the statistical meaning of <sup>2</sup> *R* given previously, we consider that the neural networks fail to predict *α* and succeed in predicting *β*. We explore the reason for the difference between *α* and *β* later.

Stochastic Differential Equations and Related Inverse Problems 63

**2.2 Target**

**Network output**

**Network output**

**Dataset index 0 100 200 300 400 500**

**2.2 Target**

**Dataset index 0 100 200 300 400 500**

= [1, 2],

**Sub-dataset 1 Sub-dataset 3 Sub-dataset 5** 

62 in Porous Media - An Approach Based on Stochastic Calculus

**R2 P A** 

**P**

Figure 2.10. A: The neural network performance decreases as the diffusion term in SDEs increases. B, C and D: Target values and network outputs when *γ*= 0.01, 0.05 and 0.10; x-axis

We investigate nonlinear SDE as well. Moreover, because the nonlinear SDE contains two parameters, we investigate how the accuracy of estimation varies for different combination

From Table 2.2, the different network architectures result in a very similar performance. The 2 *<sup>R</sup>* values for

According to the statistical meaning of <sup>2</sup> *R* given previously, we consider that the neural networks fail to predict *α* and succeed in predicting *β*. We explore the reason for the

represents the index in testing datasets where five Wiener processes were used.

of parameters. The parameter values and ranges of the SDE are as follows:

**0.8 1.0 1.2 1.4 1.6 1.8 2.0**

= 0.05 and *γ*= 0.5. We use early stopping to find out the best results.

are very close to zero while the <sup>2</sup> *R* values for *β* are all more than 0.9.

**B** 

**0.8 1.0 1.2 1.4 1.6 1.8 2.0**

**D** 

**0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8**

 **0.00 0.02 0.04 0.06 0.08 0.10 0.12**

**Network output**

**C** 

**2.2 Target**

**Dataset index 0 100 200 300 400 500**

**R2**

**0.8 1.0 1.2 1.4 1.6 1.8 2.0**

= 0.05,

= [1, 2],

difference between *α* and *β* later.

**0.2**

**0.4**

**0.6**

**0.8**

**1.0**

**1.2**


Table 2.2. Network performance in the nonlinear SDE as network architecture changes.

We use three network architectures, 50-30-2, 50-60-2 and 50-50-50-2, to estimate parameters for different SDEs and recorded the best results. The results in Table 2.3 indicate that the accuracy of network performance decreases as the strength of diffusion terms in SDEs increases, which is similar to the linear equation. Figure 2.11 shows that comparing with the results in the linear case (Figure 2.10A), the prediction capability of networks for the nonlinear case is poorer due to the complexity of input-output relationship in the nonlinear SDEs.


Table 2.3. Network performance in the nonlinear SDE as diffusion term increases.

Figure 2.11. The scattered graph of <sup>2</sup> *R* values for the parameters in the linear and the nonlinear equations against their corresponding *P*values.

**A Stochastic Model for** 

**Hydrodynamic Dispersion** 

We have seen in chapter 1 that, in the derivation of advection–dispersion equation, also known as continuum transport model ( Rashidi et al. ,1999), the velocity fluctuations around the mean velocity enter into the calculation of solute flux at a given point through averaging theorems. The mean advective flux and the mean dispersive flux are then related to the concentration gradients through Fickian–type assumptions. These assumptions are instrumental in defining dispersivity as a measure of solute dispersion. Dispersivity is

To address the issue of scale dependence of dispersive fundamentally, it has been argued that a more realistic mathematical framework for modelling is to use stochastic calculus (Holden et al., 1996; Kulasiri and Verwoerd, 1999, 2002). Stochastic calculus deals with the uncertainty in the natural and other phenomena using nondifferentiable functions for which ordinary differentials do not exist (Klebaner, 1998). Stochastic calculus is based on the premise that the differentials of nondifferential functions can have meaning only through certain types of integrals such as Ito integrals which are rigorously developed in the literature. In addition, mathematically well-defined processes such as the Weiner process aid in formulating mathematical models of complex systems. Mathematical theories aside, one needs to question the validity of using stochastic calculus in each instance. In modelling the solute transport in porous media, we consider that the fluid velocity is fundamentally a random variable with respect to space and time and continuous but irregular, i.e., nondifferentiable. In many natural porous formations, geometrical structures are irregular and therefore, as fluid particles encounter porous structures, velocity changes are more likely to be irregular than regular. In many situations, we hardly have accurate information about the porous structure, which contributes to greater uncertainties. Hence, stochastic calculus provides a more sophisticated mathematical framework to model the advectiondispersion in porous media found in practical situations, especially involving natural porous formations. By using stochastic partial differential equations, for example, we could incorporate the uncertainty of the dispersion coefficient and hydraulic conductivity that are present in porous structures such as underground aquifers. The incorporation of the dispersivity as a random, irregular coefficient makes the solution of resulting partial differential equations an interesting area of study. However, the scale dependency of the dispersivity can not be addressed in this manner because the dispersivity itself is not a

material property but a constant that depends on the scale of the experiment.

In this chapter we develop one dimensional model without resorting to Fickian assumptions and discuss the methods of estimating the parameters. As of many contracted description of

**3.1 Introduction**

proven to be scale dependant.


Table 2.4. Network performance for different parameters in the nonlinear SDE

To investigate the reason for largest differences in *R2* values for *α* and *β*, we change the magnitudes of term and term in SDEs by altering parameters and values while keeping diffusion level an approximate constant. Table 2.4 shows that the bigger the contribution of a term containing a particular parameter (*Pα* or *Pβ*), the smaller the error (*AAPE*) and better the prediction (*R2*) for that parameter. Therefore, we conclude that the accuracy of a parameter in a nonlinear SDE is dependent on its term that contributes *pro rata* to the drift term.

In the data preparation stage, we use different time steps to solve SDEs and found 50 data points are sufficient to represent the realisation of SDEs. In addition, we emphasise the effect of the number of Wiener processes used to create training data sets. Increasing the number of Wiener processes boosts the performance of networks considerably and eliminates the over fitting problem. When over fitting occurs, the resulting network is accurate on the training set but perform poorly on the test set. When the number of Wiener processes used to generate training data sets is increased, the learning procedure finds common features amongst the training sets that enable the network to correctly estimate the parameter(s) in test data sets.

In the ANN training procedure, we use early stopping to obtain the optimum test results. We also employ different MLP architectures, transfer functions, learning rates and momentums. However we find that these factors do not increase the performance of ANNs significantly.

The diffusion level in a SDE has a significant impact on the network performance. In the linear SDE, when the ratio of diffusion term and drift term is below 0.40, the network can estimate the parameter accurately ( <sup>2</sup> *R* >0.93). When the ratio reaches 0.67, the network estimates the parameter accurately only when Wiener processes in test sets and in training sets are similar. If the diffusion term is larger than the drift term, the network cannot predict the parameter(s) and only tends to give an average value of the parameters used for training datasets. For nonlinear SDEs, the estimation ability of a network is generally poorer than that for the linear SDEs. Furthermore, the accuracy of a parameter in a nonlinear SDE is dependent on its term that contributes *pro rata* to the drift term.

We can conclude that the classical neural networks method (MLP with backpropagation algorithm) provides a simple but robust parameter estimation approach for the SDEs that are under certain noisy conditions, but this estimation capability is limited for the SDEs having a high diffusion level. When the diffusion level is high (>10%-20%), the statistical methods also fail to estimate parameters accurately.
