**3.1 Gradient equation for rigid motion**

The general gradient equation is the first approximation of the assumption that image brightness is invariable before and after the relative 3-D motion between a camera and an object. Assuming that the brightness values before and after 3-D motion are equal, the image brightness after 3-D motion are expressed by Taylor expansion, and terms of degree 2 and above are ignored. As a result, at each pixel ð Þ *x*, *y* , the gradient equation is formulated with the partial differentials *f <sup>x</sup>*, *f <sup>y</sup>* and *ft* , where *t* denotes time, of the image brightness *f x*ð Þ , *y*, *t* and the optical flow, as follows:

$$f\_t = -f\_\mathbf{x} \nu\_\mathbf{x} - f\_\mathbf{y} \nu\_\mathbf{y}. \tag{5}$$

By substituting Eqs. (2) and (3) into Eq. (5), the gradient equation representing a rigid motion constraint can be derived explicitly.

$$f\_t = -\left(f\_x v\_x^r + f\_y v\_y^r\right) - \left(-f\_x r\_\mathcal{\mathcal{Y}} + f\_y r\_\mathcal{x}\right) \mathbf{Z}\_0 d \equiv -f^r - f^u d. \tag{6}$$

#### **3.2 Probabilistic model for differential-type method**

Let *M* be the number of pairs of two frames and *N* be the number of pixels. In our study, *f* ð Þ *i*,*j t* n o *i*¼1,⋯,*N*; *j*¼1,⋯,*M* and *r*ð Þ*<sup>J</sup>* � � *<sup>j</sup>*¼1,⋯,*<sup>M</sup>* are treated as random variables, and *<sup>d</sup>*ð Þ*<sup>i</sup>* n o *<sup>i</sup>*¼1,⋯,*<sup>N</sup>* corresponding to the inverse depth of each pixel is treated as stochastic variable and recovered individually for each pixel. However, multiple frames *r*ð Þ*<sup>j</sup>* � � that vibrate due to irregular rotation are used for processing, but no pixel tracking is done. Therefore, the recovered *d*ð Þ*<sup>i</sup>* at each pixel does not correspond exactly to the value of this pixel, but takes the mean of the adjacent regions defined by the vibration width of the image. As a result, the recovered *d*ð Þ*<sup>i</sup>* correlates with the values in the adjacent regions. Therefore, from the beginning, *d*ð Þ*<sup>i</sup>* should be treated as a variable with such a correlation. Based on tremor, *d*ð Þ*<sup>i</sup>* , which is estimated to correlate with the neighborhood, is planned to be improved to *d* for each pixel when dealing with drift and microsaccade in future research.

In this study, we assume that optical flow is very small, and hence, observation errors of *ft* , *f <sup>x</sup>* and *f <sup>y</sup>*, which are calculated by finite difference, are small. Additionally, equation error is also small, and therefore we can assume that error having no relation with *ft* , *f <sup>x</sup>* and *f <sup>y</sup>* is added to the whole gradient equation. From this consideration, we assume that *f* ð Þ *i*,*j <sup>t</sup>* is a Gaussian random variable with mean 0 and variance *σ*<sup>2</sup> *<sup>o</sup>*, and *f* ð Þ *i*,*j <sup>x</sup>* and *f* ð Þ *i*,*j <sup>y</sup>* have no error.

$$p\left(f\_t^{(ij)}|d^{(i)},r^{(.)},\sigma\_o^2\right) = \frac{1}{\sqrt{2\pi}\sigma\_o} \times \exp\left\{-\frac{\left(f\_t^{(ij)} + f^{r(ij)} + f^{u(ij)}d^{(i)}\right)^2}{2\sigma\_o^2}\right\},\tag{7}$$

where *<sup>i</sup>* <sup>¼</sup> 1, <sup>⋯</sup>, *<sup>N</sup>* and *<sup>j</sup>* <sup>¼</sup> 1, <sup>⋯</sup>, *<sup>M</sup>*, and *<sup>σ</sup>*<sup>2</sup> *<sup>o</sup>* is an unknown variance.

As mentioned above, considering the neighborhood correlation of *d* to be recovered, in this study, to simplify modeling, we use the following equation as the depth model.

$$\exp\left(d|\sigma\_d^2\right) = \frac{1}{\left(\sqrt{2\pi}\sigma\_d\right)^N} \exp\left\{-\frac{d^\mathrm{T}\mathrm{L}d}{2\sigma\_d^2}\right\},\tag{8}$$

Based on the densities defined at 3.2, the objective function is derived as

using the following definitions derived by formulating the posterior density

^*r* ð Þ*j*

*σ*^2 *o*

*<sup>w</sup>*^ ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>w</sup>*^ ð Þ *<sup>i</sup>*,*<sup>j</sup>* <sup>T</sup> <sup>þ</sup>

<sup>A</sup> <sup>þ</sup> *<sup>Z</sup>*0*d*ð Þ*<sup>i</sup> <sup>f</sup>*

*F d* ^ ð Þ*<sup>i</sup>* � � n o � <sup>1</sup>

*<sup>r</sup>* <sup>¼</sup> *<sup>G</sup>*^

*o*

*o*

*<sup>j</sup>*¼<sup>1</sup>tr *<sup>A</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>R</sup>*^ð Þ*<sup>j</sup>* � � <sup>þ</sup> *<sup>σ</sup>*^<sup>2</sup>

*<sup>r</sup>* can be updated as

!�<sup>1</sup>

*<sup>V</sup>*^ ð Þ*<sup>j</sup> r* X *N*

> 1 *σ*^2 *r I*

*i*¼1 *f* ð Þ *i*,*j*

*<sup>r</sup>* <sup>þ</sup> ^*r*ð Þ*<sup>j</sup> <sup>m</sup>* ^*r* ð Þ*j m* T

> 2*σ*<sup>2</sup> *r*

ð Þ *i*,*j y* �*f* ð Þ *i*,*j x*

!

( ) " #

*<sup>m</sup>* <sup>þ</sup> tr <sup>X</sup>

*N*

*w*ð Þ *<sup>i</sup>*,*<sup>j</sup> w*ð Þ *<sup>i</sup>*,*<sup>j</sup>* <sup>T</sup> !

*<sup>t</sup> <sup>w</sup>*^ ð Þ *<sup>i</sup>*,*<sup>j</sup>* , (10)

, (11)

� *<sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup>*

*<sup>G</sup>*^ � *<sup>d</sup>*<sup>T</sup>*L<sup>d</sup>* 2*σ*<sup>2</sup> *d*

<sup>2</sup>*<sup>M</sup> :* (15)

,

, (12)

<sup>0</sup> <sup>þ</sup> *<sup>Z</sup>*0*d*ð Þ*<sup>i</sup> <sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup>*

*<sup>d</sup> :*

*d*s.

(16)

(13)

*:* (14)

*<sup>R</sup>*^ð Þ*<sup>j</sup>*

(9)

*i*¼1

ln *σ*<sup>2</sup> *r*

*<sup>J</sup>*ð Þ¼ *<sup>d</sup>*, <sup>Θ</sup> Const*:* � *MN*

*j*¼1

*j*¼1

ð Þ *i*,*j t* n o, ^ *<sup>d</sup>*, <sup>Θ</sup>^ � �.

> ^*r* ð Þ*j*

ð Þ *i*,*j*

�*f* ð Þ *i*,*j*

*<sup>J</sup>*ð Þ¼� *<sup>d</sup>*, <sup>Θ</sup> *MN*

*<sup>x</sup> <sup>x</sup>*ð Þ*<sup>i</sup> <sup>y</sup>*ð Þ*<sup>i</sup>* <sup>þ</sup> *<sup>f</sup>*

*<sup>x</sup>* <sup>1</sup> <sup>þ</sup> *<sup>x</sup>*ð Þ*<sup>i</sup>* <sup>2</sup> � � � *<sup>f</sup>*

rewritten as follows, ignoring constant values.

ln *σ*<sup>2</sup>

2

From this representation, *σ*<sup>2</sup>

� 1 2*σ*<sup>2</sup> *o* X *M*

� 1 2*σ*<sup>2</sup> *r* X *M*

*<sup>p</sup> <sup>r</sup>*ð Þ*<sup>j</sup>* � �<sup>j</sup> *<sup>f</sup>*

*<sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup>* <sup>¼</sup> *<sup>f</sup>*

0 @ 2

X *N*

*DOI: http://dx.doi.org/10.5772/intechopen.97404*

*i*¼1 *f* ð Þ *i*,*j t* 2

tr*R*^ð Þ*<sup>j</sup>* � *<sup>d</sup>*<sup>T</sup>*Ld*

*<sup>m</sup>* � <sup>E</sup> *<sup>r</sup>*ð Þ*<sup>j</sup>* <sup>j</sup> *<sup>f</sup>*

*<sup>V</sup>*^ ð Þ*<sup>j</sup> <sup>r</sup>* <sup>¼</sup> <sup>1</sup> *σ*^2 *o* X *N*

*<sup>R</sup>*^ð Þ*<sup>j</sup>* � <sup>E</sup> *<sup>r</sup>*ð Þ*<sup>j</sup> <sup>r</sup>*ð Þ*<sup>j</sup>* <sup>T</sup><sup>j</sup> *<sup>f</sup>*

ð Þ *i*,*j*

2*σ*<sup>2</sup> *d* ,

ln *σ*<sup>2</sup>

*Stereoscopic Calculation Model Based on Fixational Eye Movements*

*<sup>o</sup>* � <sup>3</sup>*<sup>M</sup>* 2

<sup>þ</sup> <sup>2</sup> <sup>X</sup> *N*

*i*¼1 *f* ð Þ *i*,*j <sup>t</sup> w*ð Þ *<sup>i</sup>*,*<sup>j</sup>* <sup>T</sup> !

ð Þ *i*,*j t* n o, ^ *<sup>d</sup>*, <sup>Θ</sup>^ h i ¼ � <sup>1</sup>

*<sup>y</sup>* <sup>1</sup> <sup>þ</sup> *<sup>y</sup>*ð Þ*<sup>i</sup>* <sup>2</sup> � �

ð Þ *i*,*j <sup>y</sup> x*ð Þ*<sup>i</sup> y*ð Þ*<sup>i</sup>*

*<sup>o</sup>* � *<sup>M</sup>* ln *<sup>σ</sup>*<sup>2</sup>

*σ*2 *<sup>o</sup>* ¼

This allows each *d*ð Þ*<sup>i</sup>* to be updated individually as follows:

*<sup>d</sup>*ð Þ*<sup>i</sup>* <sup>¼</sup> *<sup>σ</sup>*^<sup>2</sup>

*σ*2 *dZ*2 0 P*<sup>M</sup>*

*σ*2 *<sup>d</sup>Z*<sup>0</sup> P *M j*¼1

�

**9**

*<sup>o</sup>* and *σ*<sup>2</sup>

*i*¼1

ð Þ *i*,*j t* n o, ^ *<sup>d</sup>*, <sup>Θ</sup>^ h i <sup>¼</sup> *<sup>V</sup>*^ ð Þ*<sup>j</sup>*

1

In the M step, f g *d*, Θ are updated so that Eq. (9) is maximized. Eq. (9) can be

*MN* , *<sup>σ</sup>*<sup>2</sup>

For *d*, the partial derivative for each *d*ð Þ*<sup>i</sup>* in the last term of Eq. (9) contains the adjacent surrounding *d*s. Therefore, to update *d*, we need to solve the simultaneous equations. To avoid this, use the One-Step-Late (OSL) method [20]. That is, consider the surrounding *d* s as a constant and evaluate it with the current estimate ^

*<sup>r</sup>* � <sup>1</sup> 2*σ*<sup>2</sup> *o*

*F d* ^ ð Þ*<sup>i</sup>* � � n o

*o* ^ *d* ð Þ*i*

*f* ð Þ *i*,*j <sup>t</sup> <sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup> d* T ^*r*ð Þ*<sup>j</sup> <sup>m</sup>* <sup>þ</sup> tr *<sup>B</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>R</sup>*^ð Þ*<sup>j</sup>* n o � �

*σ*2 *dZ*2 0 P*<sup>M</sup>*

*<sup>j</sup>*¼<sup>1</sup>tr *<sup>A</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>R</sup>*^ð Þ*<sup>j</sup>* � � <sup>þ</sup> *<sup>σ</sup>*^<sup>2</sup>

where *<sup>d</sup>* is a *<sup>N</sup>*-dimensional vector composed of *<sup>d</sup>*ð Þ*<sup>i</sup>* n o and *<sup>L</sup>* indicates a matrix corresponding to the 2-D Laplacian operator with a free end condition. By assuming this probabilistic density, we make a recovered depth map smooth. In this study, the variance *σ*<sup>2</sup> *<sup>d</sup>* is controled heuristically in consideration of smoothness of a recovered depth map. In future, we are going to examine a strategy for determination of *σ*2 *<sup>d</sup>* in the whole system which models all fixational movements including microsaccade and drift also. Hereafter, we use the definition <sup>Θ</sup> � *<sup>σ</sup>*<sup>2</sup> *<sup>o</sup>*, *σ*<sup>2</sup> *r* � �. In the simulation described later, random rotation is sampled according to Eq. (4) as the rotation for the initial image, but since the rotation between successive frames is estimated during depth restoration, the determined *σ*<sup>2</sup> *<sup>r</sup>* and the set value are different.

#### **3.3 Algorithm for differential-type method**

By applying the MAP-EM algorithm [19], parameter f g *d*, Θ can be estimated as a MAP estimator based on *p d*, Θj *f* ð Þ *i*,*j t* � � n o , which is formulated by marginalizing the joint probability *<sup>p</sup> <sup>r</sup>*ð Þ*<sup>j</sup>* � �, *<sup>d</sup>*, <sup>Θ</sup><sup>j</sup> *<sup>f</sup>* ð Þ *i*,*j t* � � n o with respect to *<sup>r</sup>*ð Þ*<sup>j</sup>* � �, but the prior of Θ is formally regarded as an uniform distribution. Additionally, *r*ð Þ*<sup>j</sup>* � � can be estimated as a MAP estimator based on *<sup>p</sup> <sup>r</sup>*ð Þ*<sup>j</sup>* � �<sup>j</sup> *<sup>f</sup>* ð Þ *i*,*j t* n o, <sup>Θ</sup>^, ^ *d* � �, in which ^� means a MAP estimator described above.

In the EM scheme, *f* ð Þ *i*,*j t* n o, *<sup>r</sup>*ð Þ*<sup>j</sup>* n o � � is considered as a complete data, *<sup>r</sup>*ð Þ*<sup>j</sup>* � � is treated as a missing data, and f g *d*, Θ is treated as an unknown parameter. E step and M step are mutually repeated until they converge. At first, at the E step, calculate the conditional expectation of the log likelihood of the complete data with observing *f* ð Þ *i*,*j t* n o, using the current MAP estimates ^ *<sup>d</sup>*, <sup>Θ</sup>^ n o. It is generally called Q function. Especially for the MAP-EM algorithm, the objective function *J*ð Þ *d*, Θ maximized at the M step is equal to the Q function augmented by the log prior densities of parameters. In the following, the values computed using Θ^ are indicated as ^� also.

*Stereoscopic Calculation Model Based on Fixational Eye Movements DOI: http://dx.doi.org/10.5772/intechopen.97404*

no relation with *ft*

variance *σ*<sup>2</sup>

*p f*ð Þ *<sup>i</sup>*,*<sup>j</sup>*

the depth model.

the variance *σ*<sup>2</sup>

*σ*2

different.

observing *f*

as ^� also.

**8**

consideration, we assume that *f*

*Applications of Pattern Recognition*

ð Þ *i*,*j <sup>x</sup>* and *f*

*o*

where *<sup>i</sup>* <sup>¼</sup> 1, <sup>⋯</sup>, *<sup>N</sup>* and *<sup>j</sup>* <sup>¼</sup> 1, <sup>⋯</sup>, *<sup>M</sup>*, and *<sup>σ</sup>*<sup>2</sup>

*<sup>p</sup> <sup>d</sup>*j*σ*<sup>2</sup> *d* � � <sup>¼</sup> <sup>1</sup>

estimated during depth restoration, the determined *σ*<sup>2</sup>

estimated as a MAP estimator based on *<sup>p</sup> <sup>r</sup>*ð Þ*<sup>j</sup>* � �<sup>j</sup> *<sup>f</sup>*

ð Þ *i*,*j t* n o

**3.3 Algorithm for differential-type method**

MAP estimator based on *p d*, Θj *f*

MAP estimator described above. In the EM scheme, *f*

> ð Þ *i*,*j t* n o

the joint probability *<sup>p</sup> <sup>r</sup>*ð Þ*<sup>j</sup>* � �, *<sup>d</sup>*, <sup>Θ</sup><sup>j</sup> *<sup>f</sup>*

where *<sup>d</sup>* is a *<sup>N</sup>*-dimensional vector composed of *<sup>d</sup>*ð Þ*<sup>i</sup>* n o

*<sup>o</sup>*, and *f*

*<sup>t</sup>* <sup>j</sup>*d*ð Þ*<sup>i</sup>* , *<sup>r</sup>*ð Þ*<sup>j</sup>* , *<sup>σ</sup>*<sup>2</sup>

� �

, *f <sup>x</sup>* and *f <sup>y</sup>* is added to the whole gradient equation. From this

*f* ð Þ *i*,*j <sup>t</sup>* þ *f*

� �*<sup>N</sup>* exp � *<sup>d</sup>*<sup>T</sup>*Ld*

*<sup>d</sup>* is controled heuristically in consideration of smoothness of a recov-

corresponding to the 2-D Laplacian operator with a free end condition. By assuming this probabilistic density, we make a recovered depth map smooth. In this study,

ered depth map. In future, we are going to examine a strategy for determination of

simulation described later, random rotation is sampled according to Eq. (4) as the rotation for the initial image, but since the rotation between successive frames is

By applying the MAP-EM algorithm [19], parameter f g *d*, Θ can be estimated as a

*<sup>t</sup>* is a Gaussian random variable with mean 0 and

*r i*ð Þ,*<sup>j</sup>* <sup>þ</sup> *<sup>f</sup> u i*ð Þ,*<sup>j</sup> <sup>d</sup>*ð Þ*<sup>i</sup>* � �<sup>2</sup> 2*σ*<sup>2</sup> *o*

*<sup>o</sup>* is an unknown variance.

2*σ*<sup>2</sup> *d*

( )

9 >=

>;

, (8)

and *L* indicates a matrix

*<sup>o</sup>*, *σ*<sup>2</sup> *r* � �. In the

*<sup>r</sup>* and the set value are

, which is formulated by marginalizing

, Θ^, ^ *d*

ð Þ *i*,*j t* n o

� �

n o � � is considered as a complete data, *<sup>r</sup>*ð Þ*<sup>j</sup>* � � is

*<sup>d</sup>*, <sup>Θ</sup>^ n o

with respect to *r*ð Þ*<sup>j</sup>* � �, but the prior of

, in which ^� means a

. It is generally called Q

, (7)

ð Þ *i*,*j*

*<sup>y</sup>* have no error.

� exp �

8 ><

>:

As mentioned above, considering the neighborhood correlation of *d* to be recovered, in this study, to simplify modeling, we use the following equation as

> ffiffiffiffiffi <sup>2</sup>*<sup>π</sup>* <sup>p</sup> *<sup>σ</sup><sup>d</sup>*

*<sup>d</sup>* in the whole system which models all fixational movements including microsaccade and drift also. Hereafter, we use the definition <sup>Θ</sup> � *<sup>σ</sup>*<sup>2</sup>

> ð Þ *i*,*j t* � � n o

� � n o

, *r*ð Þ*<sup>j</sup>*

, using the current MAP estimates ^

function. Especially for the MAP-EM algorithm, the objective function *J*ð Þ *d*, Θ maximized at the M step is equal to the Q function augmented by the log prior densities of parameters. In the following, the values computed using Θ^ are indicated

ð Þ *i*,*j t*

treated as a missing data, and f g *d*, Θ is treated as an unknown parameter. E step and M step are mutually repeated until they converge. At first, at the E step, calculate the conditional expectation of the log likelihood of the complete data with

Θ is formally regarded as an uniform distribution. Additionally, *r*ð Þ*<sup>j</sup>* � � can be

ð Þ *i*,*j*

<sup>¼</sup> <sup>1</sup> ffiffiffiffiffi <sup>2</sup>*<sup>π</sup>* <sup>p</sup> *<sup>σ</sup><sup>o</sup>* Based on the densities defined at 3.2, the objective function is derived as

$$\begin{split} J(\mathbf{d}, \boldsymbol{\Theta}) &= \text{Const.} - \frac{M\boldsymbol{\mathcal{N}}}{2} \ln \sigma\_{o}^{2} - \frac{3\boldsymbol{\mathcal{M}}}{2} \ln \sigma\_{r}^{2} \\ &- \frac{1}{2\sigma\_{o}^{2}} \sum\_{j=1}^{M} \left\{ \sum\_{i=1}^{N} f\_{i}^{(ij)^{2}} + 2 \left( \sum\_{i=1}^{N} f\_{i}^{(ij)} \, \boldsymbol{\mathcal{w}}^{(ij)\text{T}} \right) \dot{\boldsymbol{\mathcal{r}}}\_{m}^{(j)} + \text{tr} \left[ \left( \sum\_{i=1}^{N} \, \boldsymbol{\mathcal{w}}^{(ij)} \, \boldsymbol{\mathcal{w}}^{(ij)\text{T}} \right) \hat{\boldsymbol{\mathcal{R}}}^{(j)} \right] \right\}, \\ &- \frac{1}{2\sigma\_{r}^{2}} \sum\_{j=1}^{M} \text{tr} \boldsymbol{\hat{\mathcal{R}}}^{(j)} - \frac{\boldsymbol{\mathcal{d}}^{\text{T}} \mathbf{L} \boldsymbol{d}}{2\sigma\_{d}^{2}}, \end{split} \tag{9}$$

using the following definitions derived by formulating the posterior density *<sup>p</sup> <sup>r</sup>*ð Þ*<sup>j</sup>* � �<sup>j</sup> *<sup>f</sup>* ð Þ *i*,*j t* n o, ^ *<sup>d</sup>*, <sup>Θ</sup>^ � �.

$$\hat{\sigma}\_{m}^{(j)} \equiv \mathbb{E}\left[r^{(j)} | \left\{ f\_{t}^{(ij)} \right\}, \hat{\mathbf{d}}, \hat{\Theta} \right] = -\frac{1}{\hat{\sigma}\_{o}^{2}} \hat{\mathbf{V}}\_{r}^{(j)} \sum\_{i=1}^{N} f\_{t}^{(ij)} \,\hat{\mathbf{w}}^{(ij)},\tag{10}$$

$$\hat{\mathbf{V}}\_r^{(j)} = \left(\frac{\mathbf{1}}{\hat{\sigma}\_o^2} \sum\_{i=1}^N \hat{\mathbf{w}}^{(ij)} \hat{\mathbf{w}}^{(ij)\mathbf{T}} + \frac{\mathbf{1}}{\hat{\sigma}\_r^2} \mathbf{I}\right)^{-1},\tag{11}$$

$$\hat{\mathbf{R}}^{(j)} \equiv \mathbf{E} \left[ r^{(j)} r^{(j) \text{T}} | \left\{ f\_t^{(ij)} \right\}, \hat{\mathbf{d}}, \hat{\Theta} \right] = \hat{\mathbf{V}}\_r^{(j)} + \hat{\mathbf{r}}\_m^{(j)} \hat{\mathbf{r}}\_m^{(j) \text{T}}, \tag{12}$$

$$\mathbf{w}^{(ij)} = \begin{pmatrix} f\_{\mathbf{x}}^{(ij)}\mathbf{x}^{(i)}\mathbf{y}^{(i)} + f\_{\mathbf{y}}^{(ij)}\left(\mathbf{1} + \mathbf{y}^{(i)2}\right) \\ -f\_{\mathbf{x}}^{(ij)}\left(\mathbf{1} + \mathbf{x}^{(i)2}\right) - f\_{\mathbf{y}}^{(ij)}\mathbf{x}^{(i)}\mathbf{y}^{(i)} \end{pmatrix} + Z\_0 d^{(i)} \begin{pmatrix} f\_{\mathbf{y}}^{(ij)} \\ -f\_{\mathbf{x}}^{(ij)} \end{pmatrix} \equiv \mathbf{w}\_0^{(ij)} + Z\_0 d^{(i)} \mathbf{w}\_d^{(ij)}. \tag{13}$$

In the M step, f g *d*, Θ are updated so that Eq. (9) is maximized. Eq. (9) can be rewritten as follows, ignoring constant values.

$$J(\mathbf{d}, \Theta) = -\frac{\text{MN}}{2} \ln \sigma\_o^2 - M \ln \sigma\_r^2 - \frac{1}{2\sigma\_o^2} \hat{F} \left( \left\{ \mathbf{d}^{(i)} \right\} \right) - \frac{1}{2\sigma\_r^2} \hat{G} - \frac{\mathbf{d}^{\text{T}} \mathbf{L} \mathbf{d}}{2\sigma\_d^2}. \tag{14}$$

From this representation, *σ*<sup>2</sup> *<sup>o</sup>* and *σ*<sup>2</sup> *<sup>r</sup>* can be updated as

$$
\sigma\_o^2 = \frac{\hat{F}\left(\left\{d^{(i)}\right\}\right)}{MN}, \quad \sigma\_r^2 = \frac{\hat{G}}{2M}.\tag{15}
$$

For *d*, the partial derivative for each *d*ð Þ*<sup>i</sup>* in the last term of Eq. (9) contains the adjacent surrounding *d*s. Therefore, to update *d*, we need to solve the simultaneous equations. To avoid this, use the One-Step-Late (OSL) method [20]. That is, consider the surrounding *d* s as a constant and evaluate it with the current estimate ^ *d*s. This allows each *d*ð Þ*<sup>i</sup>* to be updated individually as follows:

$$\begin{split}d^{(i)} &= \frac{\hat{\sigma}\_{o}^{2}\hat{d}^{(i)}}{\sigma\_{d}^{2}Z\_{0}^{2}\sum\_{j=1}^{M}\text{tr}\left(\mathbf{A}^{(ij)}\hat{\mathbf{R}}^{(j)}\right) + \hat{\sigma}\_{o}^{2}} \\ &- \frac{\sigma\_{d}^{2}Z\_{0}\sum\_{j=1}^{M}\left\{f\_{t}^{(ij)}\,\boldsymbol{w}\_{d}^{(ij)^{\mathrm{T}}}\,\hat{\mathbf{r}}\_{m}^{(j)} + \text{tr}\left(\mathbf{B}^{(ij)}\hat{\mathbf{R}}^{(j)}\right)\right\}}{\sigma\_{d}^{2}Z\_{0}^{2}\sum\_{j=1}^{M}\text{tr}\left(\mathbf{A}^{(ij)}\hat{\mathbf{R}}^{(j)}\right) + \hat{\sigma}\_{o}^{2}},\end{split} \tag{16}$$

where *σ*<sup>2</sup> *<sup>o</sup>* is also evaluated by the current estimate, and the matrices *A*ð Þ *<sup>i</sup>*,*<sup>j</sup>* and *B*ð Þ *<sup>i</sup>*,*<sup>j</sup>* are defined as

$$\mathbf{A}^{(i,j)} \equiv \boldsymbol{\mathfrak{w}}\_d^{(i,j)} \boldsymbol{\mathfrak{w}}\_d^{(i,j)^\mathrm{T}},\tag{17}$$

them for depth recovery. The performance evaluation for *ft*

*Stereoscopic Calculation Model Based on Fixational Eye Movements*

We executed the proposed algorithm using *σ*<sup>2</sup>

recovery by varying the value of *σ*<sup>2</sup>

*DOI: http://dx.doi.org/10.5772/intechopen.97404*

with a deviation corresponding to 1% of the mean of *ft*

ment in Section 5.

*M* for each *σ*<sup>2</sup>

desirable that *σ*<sup>2</sup>

0 20 40 60 80 100 120

**Figure 3.**

*from [13]).*

*σ***2** *d=σ***<sup>2</sup>**

**Table 1.**

**11**

*RMSE of recovered depth.*

*Results of recovered depth maps with M* <sup>¼</sup> <sup>100</sup>*: <sup>σ</sup>*<sup>2</sup>

variational Bayes scheme.

measured from the image will be explained as the result of the real image experi-

average size of the optical flow is about one pixel, which is sufficiently small compared to the size of the intensity pattern in the **Figure 2(a)**, which meets the conditions of the gradient method. A Gaussian random value with a zero mean and

which completely satisfies Eq. (6) as observation noise. We evaluated the effectiveness of the smoothness constraint introduced by Eq. (8) by performing a depth

<sup>10</sup>�<sup>2</sup> as arbitrary values, and f g*<sup>d</sup>* was assumed initially as a plane of *<sup>Z</sup>* <sup>¼</sup> <sup>9</sup>*:*0. Examples of the results with *M* ¼ 100 are shown in **Figure 3**. In addition, we varied

camera rotations together. The RMSE of the recovered depth for the values of *σ*<sup>2</sup>

treat it as a stochastic unknown variable and formulate it in the framework of

0 20 40 60 80 100 120

 7.5 8 8.5 9 9.5 10 10.5 11

*<sup>o</sup>* � <sup>10</sup>�<sup>1</sup>*; (b) <sup>σ</sup>*<sup>2</sup>

(a) (b) (c)

*<sup>o</sup>* **10**�**<sup>1</sup> 10**�**<sup>2</sup> 10**�**<sup>3</sup> 10**�**<sup>4</sup> 10**�**<sup>5</sup>** *M* ¼ 50 0.6983 0.3948 0.2097 0.0945 0.3699 *M* ¼ 100 0.4741 0.3079 0.1841 0.0769 0.2124

*<sup>d</sup> is (a) σ*<sup>2</sup>

and *M* is organized in the **Table 1**. From these results, we can see that the smoothness constraint is important to reduce the degrees of freedom of *d*. However, overapplying this constraint will increase the recovery error because the recovered depth map will be too smooth. Note that the scales of the *Z* axes in **Figure 3(a)-(c)** are different. We can also see that as the number of camera rotations increases, observation collection works well to improve recovery accuracy. In the future, it is

*<sup>d</sup>*. The initial values of both *σ*<sup>2</sup>

*<sup>d</sup>* to see the effectiveness of using many observations caused by small

*<sup>d</sup>* be estimated adaptively for each pixel or local region. We plan to

, *f <sup>x</sup>*, and *f <sup>y</sup>* actually

*<sup>r</sup>* <sup>¼</sup> <sup>10</sup>�4. Under this condition, the

was added to the *ft*

*<sup>o</sup>* and *σ*<sup>2</sup>

0 20 40 60 80 100 120

*<sup>o</sup>* � <sup>10</sup>�<sup>3</sup>*; (c) <sup>σ</sup>*<sup>2</sup>

 8.85 8.9 8.95 9 9.05 9.1 9.15 9.2

*<sup>o</sup>* � <sup>10</sup>�<sup>5</sup> *(reprinted*

*d*

*<sup>r</sup>* were 1*:*0 �

$$\mathbf{B}^{(i,j)} \equiv \left( \boldsymbol{\mathfrak{w}}\_d^{(i,j)} \boldsymbol{\mathfrak{w}}\_0^{(i,j)^T} + \boldsymbol{\mathfrak{w}}\_0^{(i,j)} \boldsymbol{\mathfrak{w}}\_d^{(i,j)^T} \right) / 2,\tag{18}$$

and *d* ð Þ*<sup>i</sup>* indicates the local mean using a four-neighboring system that does not include *d*ð Þ*<sup>i</sup>* .

#### **3.4 Numerical experiments of differential-type method**

In order to confirm the function of the proposed method, we conducted numerical experiments using artificial images. **Figure 2(a)** shows the original image generated using the depth map shown in **Figure 2(b)**. The image size is 128 � 128 pixels, which is equivalent to �0*:*5≤ *x*, *y*≤0*:*5 measured in focal length units. In **Figure 2(b)**, the vertical axis shows the depth *Z* in units of focal length, and the horizontal axis shows the pixel position on the image plane.

In our model, the successive image pairs are used in turn to calculate *ft* . This study ignores the drift component of fixation eye movements and assumes that there is no temporal divergence of the range of motion at each image position. Therefore, each rotation sampled as a Gaussian independent random variable is considered to be relative to the initial image shown in **Figure 2(a)**. We can think of a gradient equation that holds between the resulting image and the initial image. Additionally, in order to firstly justify our algorithm for the assumed statistical models, we computed *ft* using Eq. (6) with true values of *<sup>r</sup>* and f g*<sup>d</sup>* and use

**Figure 2.** *Example of the data used in the experiments: (a) artificial image; (b) true depth map (reprinted from [13]).*

*Stereoscopic Calculation Model Based on Fixational Eye Movements DOI: http://dx.doi.org/10.5772/intechopen.97404*

them for depth recovery. The performance evaluation for *ft* , *f <sup>x</sup>*, and *f <sup>y</sup>* actually measured from the image will be explained as the result of the real image experiment in Section 5.

We executed the proposed algorithm using *σ*<sup>2</sup> *<sup>r</sup>* <sup>¼</sup> <sup>10</sup>�4. Under this condition, the average size of the optical flow is about one pixel, which is sufficiently small compared to the size of the intensity pattern in the **Figure 2(a)**, which meets the conditions of the gradient method. A Gaussian random value with a zero mean and with a deviation corresponding to 1% of the mean of *ft* was added to the *ft* which completely satisfies Eq. (6) as observation noise. We evaluated the effectiveness of the smoothness constraint introduced by Eq. (8) by performing a depth recovery by varying the value of *σ*<sup>2</sup> *<sup>d</sup>*. The initial values of both *σ*<sup>2</sup> *<sup>o</sup>* and *σ*<sup>2</sup> *<sup>r</sup>* were 1*:*0 � <sup>10</sup>�<sup>2</sup> as arbitrary values, and f g*<sup>d</sup>* was assumed initially as a plane of *<sup>Z</sup>* <sup>¼</sup> <sup>9</sup>*:*0. Examples of the results with *M* ¼ 100 are shown in **Figure 3**. In addition, we varied *M* for each *σ*<sup>2</sup> *<sup>d</sup>* to see the effectiveness of using many observations caused by small camera rotations together. The RMSE of the recovered depth for the values of *σ*<sup>2</sup> *d* and *M* is organized in the **Table 1**. From these results, we can see that the smoothness constraint is important to reduce the degrees of freedom of *d*. However, overapplying this constraint will increase the recovery error because the recovered depth map will be too smooth. Note that the scales of the *Z* axes in **Figure 3(a)-(c)** are different. We can also see that as the number of camera rotations increases, observation collection works well to improve recovery accuracy. In the future, it is desirable that *σ*<sup>2</sup> *<sup>d</sup>* be estimated adaptively for each pixel or local region. We plan to treat it as a stochastic unknown variable and formulate it in the framework of variational Bayes scheme.

#### **Figure 3.**

where *σ*<sup>2</sup>

and *d*

include *d*ð Þ*<sup>i</sup>* .

models, we computed *ft*

**Figure 2.**

**10**

*B*ð Þ *<sup>i</sup>*,*<sup>j</sup>* are defined as

*Applications of Pattern Recognition*

*<sup>o</sup>* is also evaluated by the current estimate, and the matrices *A*ð Þ *<sup>i</sup>*,*<sup>j</sup>* and

*<sup>d</sup> <sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup> d* T

<sup>T</sup>

In order to confirm the function of the proposed method, we conducted numerical experiments using artificial images. **Figure 2(a)** shows the original image generated using the depth map shown in **Figure 2(b)**. The image size is 128 � 128 pixels, which is equivalent to �0*:*5≤ *x*, *y*≤0*:*5 measured in focal length units. In **Figure 2(b)**, the vertical axis shows the depth *Z* in units of focal length, and the

In our model, the successive image pairs are used in turn to calculate *ft*

study ignores the drift component of fixation eye movements and assumes that there is no temporal divergence of the range of motion at each image position. Therefore, each rotation sampled as a Gaussian independent random variable is considered to be relative to the initial image shown in **Figure 2(a)**. We can think of a gradient equation that holds between the resulting image and the initial image. Additionally, in order to firstly justify our algorithm for the assumed statistical

using Eq. (6) with true values of *<sup>r</sup>* and f g*<sup>d</sup>* and use

8

(a) (b)

*Example of the data used in the experiments: (a) artificial image; (b) true depth map (reprinted from [13]).*

8.5

9

9.5

10

0 20 40 60 80 100 120 140 0

<sup>þ</sup> *<sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup>* <sup>0</sup> *<sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup> d*

ð Þ*<sup>i</sup>* indicates the local mean using a four-neighboring system that does not

, (17)

*=*2, (18)

. This

*<sup>A</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup>* � *<sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup>*

*<sup>d</sup> <sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup>* 0 T

*<sup>B</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup>* � *<sup>w</sup>*ð Þ *<sup>i</sup>*,*<sup>j</sup>*

**3.4 Numerical experiments of differential-type method**

horizontal axis shows the pixel position on the image plane.

*Results of recovered depth maps with M* <sup>¼</sup> <sup>100</sup>*: <sup>σ</sup>*<sup>2</sup> *<sup>d</sup> is (a) σ*<sup>2</sup> *<sup>o</sup>* � <sup>10</sup>�<sup>1</sup>*; (b) <sup>σ</sup>*<sup>2</sup> *<sup>o</sup>* � <sup>10</sup>�<sup>3</sup>*; (c) <sup>σ</sup>*<sup>2</sup> *<sup>o</sup>* � <sup>10</sup>�<sup>5</sup> *(reprinted from [13]).*


**Table 1.** *RMSE of recovered depth.*
