**2.3 Entropy of the mesoscopic process resulting from deterministic, microscopic system**

Kolmogorov and other authors [19] studied the entropy and ergodic properties of the stationary mesoscopic process defined previously, following methods introduced by Shannon in the framework of signal theory [27–30]. These methods, and part of Kolmogorov's results, can be extended to the nonstationary process (11).

#### *2.3.1 The* n*-times entropy and the instantaneous entropy of the mesoscopic system*

Following Kolmogorov, we consider y the Shannon entropy [27–30] of the trajectory (*i*)*<sup>n</sup>* = (*i*0, … , *in-*1) in the phase space

$$S(p\_n) = -\sum\_{i\_0,\dots,i\_{n-1}} p\_n(i\_0, 0; \dots; i\_{n-1}, n-1) \ln p\_n(i\_0, 0; \dots; i\_{n-1}, n-1). \tag{15}$$

*Stochastic Theory of Coarse-Grained Deterministic Systems: Martingales and Markov… DOI: http://dx.doi.org/10.5772/intechopen.95903*

On the other hand, the new information obtained by observing the system in the mesoscopic state *in* at time *tn*, knowing that it was in the respective states *i*0, … *in*-1 at the prior times 0, … *n*-1, will be called the instantaneous entropy

$$\begin{split} s\_n(p) = \mathbb{S}\_{n+1}(p) - \mathbb{S}\_n(p) &= -\sum\_{i\_0, \ldots; i\_n} p(i\_0, 0; \ldots i\_n, n) \ln p\left(i\_n, n|i\_{n-1}, n-1; \ldots; i\_0, 0\right) \geq 0, \\ &= \sum\_{i\_0, \ldots; i\_{n-1}} p(i\_0, 0; \ldots i\_{n-1}, n-1) \operatorname{\mathbf{S}}\left(p\left(\bullet, n|i\_{n-1}, n-1; \ldots; i\_0, 0\right)\right). \end{split} \tag{16}$$

where *p* denotes the infinite process. The properties of *S*(*pn*) and *sn*(*p*) have been extensively studied by Kolmogorov and other authors in the case of the stationary process (6) [19]: they are summarily mentioned in 2.5. They are not necessarily valid for the nonstationary process.

### *2.3.2 Maximizing the* n*-times entropy of the mesoscopic system: The "Markov scheme"*

If one knows the first two distributions *p*<sup>1</sup> and *p*2, one can mimics the exact mesoscopic distributions *pn* by using the Jaynes' principle, maximizing the entropy *S*(*qn*) of a ditribution *qn* under the constraints *q*<sup>1</sup> = *p*<sup>1</sup> and *q*<sup>2</sup> = *p*2. Then it is found that optimal distribution *qn* is the Markov distribution *q*<sup>n</sup> satifying these constraints [18].

It is shown in Ref. [18] that for n > 2, both the *n*-times entropy *Sn*ð Þ*q* and the instantaneous entropy *sn*ð Þ*q* are larger than the correponding entropies *Sn*ð Þ *p* and *sn*ð Þ *p* of the exact process *p*, except if *p* is Markov: *p = q*.

The Markov process *qn* is not really an approximation of the mesoscopic process *p*, because *q*<sup>n</sup> does not tend to *pn* when *n* ! ∞. Approximating the exact mesoscopic process by a Markov process will be the main purpose of the next section.

### **2.4 Entropy and memory in the stationary situation**

### *2.4.1 Kolmogorov entropy of the stationary process*

Here we consider the stationary process arising from the initial uniform microscopic distribution *μ*(*x*), when the *n*-times stationary probability is *p*<sup>0</sup> *<sup>n</sup>* given by (7). For the sake of simplicity we omit the index <sup>0</sup> in the present Section, unless otherwise specified. It can be shown [19] that the entropy *Sn*(*p*) is an increasing, concave function of *n*

$$\mathfrak{s}\_{n} \equiv \mathfrak{S}\_{n+1}(p) - \mathfrak{S}\_{n}(p) \ge 0. \tag{17}$$

$$
\mathfrak{L}\_{n+1} - \mathfrak{s}\_n = \mathfrak{S}\_{n+1}(p) - 2\mathfrak{S}\_n(p) + \mathfrak{S}\_{n-1}(p) \le \ 0. \tag{18}
$$

It results from (17) and (18), and also from 2.5.2, that the limits

$$\lim\_{n \to \infty} \frac{1}{n} \operatorname{S}\_{\mathfrak{n}}(p) \; = \lim\_{n \to \infty} \mathfrak{s}\_{\mathfrak{n}}(p) = \mathfrak{s}(p). \tag{19}$$

exist: *s*(*p*) is the Kolmogorov entropy of the evolution function *f* with respect to the partition (*i*) of the mesoscopic states [19]. More simply, we can call it entropy of the mesoscopic process.

#### *2.4.2 Memory decrease in the stationary mesoscopic process*

It has been proved recently [18] that, although it is infinite, the memory of the mesoscopic process fades out with time: for *n* large enough, if *N* > *n* the probability

*N* being the total number of microstates in *X*.

**The** *n***-times nonstationary mesoscopic probabilities** are obtained from (9)

where *n*(*A*) is the number of microstates belonging to some subset *A* of *X*. So

*n i*0∩*φ*�1*i*1<sup>∩</sup> … <sup>∩</sup>*φ*�*n*þ<sup>1</sup>*in*�<sup>1</sup> � �

� � *p i*ð Þ<sup>0</sup>

*<sup>μ</sup> <sup>i</sup>*0∩*φ*�<sup>1</sup>*i*1<sup>∩</sup> … <sup>∩</sup>*φ*�*nin* � � *p i*ð Þ<sup>0</sup>

*<sup>N</sup> :* (10)

*μ*ð Þ *i*<sup>0</sup>

*μ*ð Þ *i*<sup>0</sup>

� �*:* (13)

*:* (11)

*:* (12)

*pn*ð*i*0, 0; *<sup>i</sup>*1, 1, ; *in*�1, *<sup>n</sup>* � <sup>1</sup>Þ ¼ *p i*0∩*φ*�1*i*1<sup>∩</sup> … <sup>∩</sup>*φ*�*n*þ<sup>1</sup>*in*�1, 0 � � <sup>¼</sup> *p i*ð Þ<sup>0</sup> *μ*ð Þ *i*<sup>0</sup>

*Advances in Dynamical Systems Theory, Models, Algorithms and Applications*

*pn*ð*i*0, 0; *<sup>i</sup>*1, 1, ; *in*�1, *<sup>n</sup>* � <sup>1</sup>Þ ¼ *<sup>μ</sup> <sup>i</sup>*0∩*φ*�1*i*1<sup>∩</sup> … <sup>∩</sup>*φ*�*n*þ<sup>1</sup>*in*�<sup>1</sup>

*i*0

From the definition of the relative probabilities, one can formally write

*i*1

*p i*<sup>ð</sup> *<sup>n</sup>*, *n i*<sup>j</sup> *<sup>n</sup>*�1, *<sup>n</sup>* � 1; … ; *<sup>i</sup>*0, 0Þ ¼ *<sup>p</sup>*<sup>0</sup>ð Þ *in*, *n i*<sup>j</sup> *<sup>n</sup>*�1, *<sup>n</sup>* � 1; … ; *<sup>i</sup>*0, 0

We will make use of this simple but important property later.

**2.3 Entropy of the mesoscopic process resulting from deterministic,**

*2.3.1 The* n*-times entropy and the instantaneous entropy of the mesoscopic system*

Following Kolmogorov, we consider y the Shannon entropy [27–30] of the

The corresponding process is generally not Markovian. For instance, if *i*<sup>0</sup> ∩ *ϕ*�<sup>1</sup>

*p i*2, *t*<sup>2</sup> j*i*1, *t*<sup>1</sup>

but in general this equation is useless, since the conditional probability *p*(*i*2, *t*2|

It results from (11) that the nonstationary conditional probabilities, *conditioned by the whole past up to time* 0, are identical to the corresponding stationary proba-

Kolmogorov and other authors [19] studied the entropy and ergodic properties of the stationary mesoscopic process defined previously, following methods introduced by Shannon in the framework of signal theory [27–30]. These methods, and part of Kolmogorov's results, can be extended to the nonstationary process (11).

� �*p i*1, *<sup>t</sup>*<sup>1</sup>

<sup>¼</sup> *<sup>μ</sup> <sup>i</sup>*0∩*φ*�<sup>1</sup>*i*1<sup>∩</sup> … <sup>∩</sup>*φ*�*<sup>n</sup>* ð Þ *in μ i*0∩*φ*�<sup>1</sup>*i*1∩ … ∩*φ*�*n*þ<sup>1</sup>*in*�<sup>1</sup>

*pn*ð Þ *i*0, 0; … ; *in*�1, *n* � 1 ln *pn*ð Þ *i*0, 0; … ; *in*�1, *n* � 1 *:* (15)

� � *:* (14)

and all multiple probabilities follow, for instance

*pn*�<sup>1</sup>ð Þ¼ *<sup>i</sup>*1, 1; … ; *in*, *<sup>n</sup>* <sup>X</sup>

*p i*2, *t*<sup>2</sup> � � <sup>¼</sup> <sup>X</sup>

*i*1, *t*1) cannot be computed independently of *p*(*i*1, *t*1).

� <sup>¼</sup> 0 but *p i*2, 2 *<sup>i</sup>*1, 1<sup>Þ</sup> � � 6¼ <sup>0</sup> � � .

*<sup>p</sup> :i*2, 2 *<sup>i</sup>*1, 1; *<sup>i</sup>*0, 0<sup>Þ</sup> �

bilities: as an example

**microscopic system**

*S pn*

**88**

� � ¼ � <sup>X</sup>

trajectory (*i*)*<sup>n</sup>* = (*i*0, … , *in-*1) in the phase space

*i*0, … *in*�<sup>1</sup>

*i*<sup>1</sup> 6¼ ∅, *i*<sup>1</sup> ∩ *ϕ*�<sup>1</sup> *i*<sup>2</sup> 6¼ ∅ and *i*<sup>0</sup> ∩ *ϕ*�<sup>2</sup> *i2* = ∅, it is easily seen that

of *iN* at time *N* conditioned by the *n* last events, is practically equal to the probability at time *N*, conditioned by the *whole past* down to time 0.

$$p(i\_N, N \mid i\_{N-1}, N-1; \dots i\_n, N-n) \approx p(i\_N, N \mid i\_{N-1}, N-1; \dots i\_0, 0) \text{ when } n \to \infty. \tag{20}$$

More precisely, for any *ε* > 0, there exists a positive integer *n* such that for any *N* > *n*

$$0 < \mathfrak{s}\_N < \mathfrak{s}\_n < \varepsilon. \tag{21}$$

*<sup>d</sup>*0, … *<sup>m</sup>*þ*n*�<sup>1</sup> *<sup>Π</sup>m*þ*n*, *<sup>Π</sup>*ð Þ *<sup>n</sup>*

*Stochastic Theory of Coarse-Grained Deterministic Systems: Martingales and Markov…*

stationary conditional probability *Π<sup>m</sup>* <sup>+</sup> *<sup>n</sup>* and its approximation *Π*ð Þ *<sup>n</sup>*

*<sup>d</sup>*0, … *<sup>m</sup>*þ*n*�<sup>1</sup> *<sup>Π</sup>m*þ*n*, *<sup>Π</sup>*ð Þ *<sup>n</sup>*

**2.5 Martingale theory and almost sure convergence**

systems. We refer to [20] for adressing more general cases.

as small as desired by choosing *n* large enough.

*DOI: http://dx.doi.org/10.5772/intechopen.95903*

*almost surely* [18], as well as and in probability

So, the approximation *Π*ð Þ *<sup>n</sup>*

a martingale if for all *n*:

*2.5.1 Definitions*

variable *X*.

⊂ … ⊂ F), and.

theorem holds [20, 21]:

stochastic variable *X*.

**91**

*m*þ*n* !

So, the probability that this distance exceeds a given accuracy *a* > 0 can be made

Further results can be obtained by newly using the sationnarity of process *p*. In fact, it can be shown [18] that *p* is a martingale [20–22]. Then, general results from martingales theory (see below) show that when *n* ! ∝ the distance between the

> *m*þ*n* !

For convenience, we first summarize some definitions and results of martingale

i. *simplified definition*: a (discrete time) sequence of stochastic variables *Xn* is

We now sketch the derivation of this conclusion from martingale theory.

theory [20–22], before applying them to the mesoscopic laws of deterministic

h i j j *Xn* <sup>&</sup>lt; <sup>∞</sup> and *Xn*þ<sup>1</sup>j*Xn*, … *<sup>X</sup>*<sup>1</sup>

ii. *more generally* (see the general definition, for instance, in [20])

where h i *X* denotes the average (mathematical expectation) of the stochastic

If • (Ω, F, *P*) is a probability space (where Ω is the state space, *P* is the probability law, and F is the set of all subspaces (*σ*-algebra) for which *P* is defined),

• F <sup>n</sup> is an increasing sequence of *σ*-algebras extracted from F (F <sup>n</sup> ⊂ F *<sup>n</sup>* + 1

Among the remarkable properties of martingales, the following convergence

If (*Xn*) is a positive martingale, the sequence *Xn* converges almost surely to a

Stronger and more general results can be found in the references.

So, for almost all trajectories *ω*, *Xn*(*ω*) ! *X*(*ω*) with probability 1 when *n* ! ∞.

• for all *n* ≥ 0, *Xn* is a stochastic variable defined on (Ω,F n, *P*),

the sequence *<sup>X</sup>*<sup>n</sup> is a martingale if h i j j *Xn* <sup>&</sup>lt; <sup>∞</sup> and *Xn*þ<sup>1</sup>jF*<sup>n</sup>*

*2.5.2 Convergence theorem for martingales*

*a:s:*

*<sup>m</sup>*þ*<sup>n</sup>* converges to *Π<sup>m</sup>* <sup>+</sup> *<sup>n</sup>* for almost all trajectories [18].

<sup>¼</sup> *Xn:* (30)

<sup>¼</sup> *Xn*.

*p*

0 if *n* ! ∞*:* (28)

0 if *n* ! ∞*:* (29)

*<sup>m</sup>*þ*<sup>n</sup>* tends to 0

where *sn* is the instantaneous entropy given by (14). In fact, let us write

$$\Pi\_N(i\_N) = p\left(i\_N, N|i\_{N-1}, N-1; \dots \newline \ldots \newline 0\right) = \mu\left(\left.f\_{-N}i\_N\right|f\_{-N+1}i\_{N-1}\cap \dots \cap i\_0\right); \tag{22}$$
 
$$\mu^{(n)}(i\_1) = \mu(i\_1, N|i\_1, N, \dots \newline \ldots \newline \ldots \newline \ldots \newline \ldots \newline \ldots) = \mu(f\_{-N}|f\_1, \dots |f\_0) = \mu(f\_{-N}|f\_0) = \mu(f\_{-N}|f\_0) = \mu(f\_0) = \mu(f\_{-N}) = \mu(f\_0) = \mu(f\_{-N}) = \mu(f\_0) = \mu(f\_{-N}) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0) = \mu(f\_0)$$

$$I\_N^{(n)}(i\_N) = p\left(i\_N, N \middle| i\_{N-1}, N-1; \dots; i\_{N-n}, N-n\right) = \mu\left(f\_{-N} i\_N \middle| f\_{-N+1} i\_{N-1} \cap \dots \cap f\_{-N+n} i\_{N-n}\right). \tag{23}$$

For a given *n*, formula (23) allows one to define a new process *p*(*n*) from the original process *p*, which can be called "the approximate process of order *n*" of *p* (see Section 2.6). It results from (21) and from the stationarity of *p* that for any *ε* > 0, there is an integer *n*(*ε*) depending only on *ε*, such that for any integers *N*, *n* > *n*(*ε*)

$$0 < s\_n(p) - s\_N(p) = \sum\_{i\_0, \dots, i\_{N-1}} \mu\left(i\_0 \cap f\_{-1}(i\_1) \cap \dots \cap f\_{-N+1}(i\_{N-1}) \cdot \mathbb{S}\_{0,\dots,N-1}\left(\Pi\_N \middle| \begin{matrix} n \\ N \end{matrix}\right)\right)$$

where *<sup>S</sup>*0, … *<sup>N</sup>*�<sup>1</sup> *<sup>Π</sup>N*|*Π*ð Þ *<sup>n</sup> N* � � is the relative entropy of *<sup>Π</sup><sup>N</sup>* with respect to *<sup>Π</sup>*ð Þ *<sup>n</sup> <sup>N</sup>* : the last right hand member of Eq. (22) is the average of this relative entropy on the past of *N.* Because *sN*(*p*) decreases to a limit *s p*ð Þ when *N* ! ∞, it results that

$$0 < \delta \mathfrak{s}\_n(p) \equiv \mathfrak{s}\_n(p) - \mathfrak{s}(p) \le \varepsilon \text{ if } n > n(\varepsilon). \tag{25}$$

The total variation distance *d*(*P*,*Q*) between two distributions *Pj* and *Qj* over the states *j* of a finite set (*j*) is

$$d(P, Q) = \frac{1}{2} \sum\_{j} \left| P\_j - Q\_j \right|. \tag{26}$$

Then, the total variation distance *<sup>d</sup>*0, … *<sup>N</sup>*�<sup>1</sup> *<sup>Π</sup>N*, *<sup>Π</sup>*ð Þ *<sup>n</sup> N* � � between *<sup>Π</sup><sup>N</sup>* and *<sup>Π</sup>*ð Þ *<sup>n</sup> <sup>N</sup>* (for a given past trajectory between times 0 and *N*-1) is related to the relative entropy [18, 31] and it can be concluded that

$$\left\langle d\_{0,\ldots,N-1}\left(\Pi\_N, \Pi\_N^{(n)}\right)\right\rangle^2 \le \left\langle \left[d\_{0,\ldots,N-1}\left(\Pi\_N, \Pi\_N^{(n)}\right)\right]^2 \right\rangle < \varepsilon/2 \text{ if } n(\varepsilon) < n < N. \tag{27}$$

#### *2.4.3 Convergence properties of the approximate process*

Let us write *m* = *N-n* > 0. It follows [18] from (25) that for any fixed *m*, the total variation distance between the exact and the approximate probabilities *<sup>d</sup>*0, … *<sup>m</sup>*þ*n*�<sup>1</sup> *<sup>Π</sup><sup>m</sup>*þ*<sup>n</sup>*, *<sup>Π</sup>*ð Þ *<sup>n</sup> m*þ*n* � � *tends to* <sup>0</sup> *in probability* when *<sup>n</sup>* ! <sup>∞</sup>

*Stochastic Theory of Coarse-Grained Deterministic Systems: Martingales and Markov… DOI: http://dx.doi.org/10.5772/intechopen.95903*

$$d\_{0,\ldots m+n-1} \left(\varPi\_{m+n}, \varPi\_{m+n}^{(n)}\right) \xrightarrow{p} \mathbf{0} \text{ if } n \to \infty. \tag{28}$$

So, the probability that this distance exceeds a given accuracy *a* > 0 can be made as small as desired by choosing *n* large enough.

Further results can be obtained by newly using the sationnarity of process *p*. In fact, it can be shown [18] that *p* is a martingale [20–22]. Then, general results from martingales theory (see below) show that when *n* ! ∝ the distance between the stationary conditional probability *Π<sup>m</sup>* <sup>+</sup> *<sup>n</sup>* and its approximation *Π*ð Þ *<sup>n</sup> <sup>m</sup>*þ*<sup>n</sup>* tends to 0 *almost surely* [18], as well as and in probability

$$d\_{0,\ldots m+n-1} \left(\varPi\_{m+n}, \varPi\_{m+n}^{(n)}\right) \stackrel{a.s.}{\rightarrow} \mathbf{0} \text{ if } n \rightarrow \infty. \tag{29}$$

So, the approximation *Π*ð Þ *<sup>n</sup> <sup>m</sup>*þ*<sup>n</sup>* converges to *Π<sup>m</sup>* <sup>+</sup> *<sup>n</sup>* for almost all trajectories [18]. We now sketch the derivation of this conclusion from martingale theory.

#### **2.5 Martingale theory and almost sure convergence**

For convenience, we first summarize some definitions and results of martingale theory [20–22], before applying them to the mesoscopic laws of deterministic systems. We refer to [20] for adressing more general cases.
