**5. Real image experiments of differential-type method**

### **5.1 Selective use of image pairs to improve accuracy**

When applying the difference-type method to an actual image and checking the actual performance, the performance was improved by selecting the image pair used for depth restoration. We have adopted a scheme that excludes image pairs that are expected to have large approximation errors in the gradient equation on a pixel-by-pixel basis. We can use the inner product of the spatial gradient vectors of consecutive image pairs to select image pairs that do not cause aliasing problems.

For each pixel, the image pairs of which the sign of the inner product *f*ð Þ *<sup>i</sup>*,*<sup>j</sup> s* T *f*ð Þ *<sup>i</sup>*, *<sup>j</sup>*�<sup>1</sup> *<sup>s</sup>* is negative are discarded. It is noted that *f*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>s</sup>* ¼ *f* ð Þ *i*,*j <sup>x</sup>* , *f* ð Þ *i*,*j y* h i<sup>T</sup> .

In the next step, from the image pairs remained by the above decision, we additively select the suitable image pairs at each pixel by estimating the amount of the higher order terms included in the observation of *ft* . *ft* is exactly represented as follows:

$$f\_t f\_t = -f\_{\mathbf{x}} v\_{\mathbf{x}} - f\_{\mathbf{y}} v\_{\mathbf{y}} - \frac{1}{2} \left\{ f\_{xx} v\_{\mathbf{x}}^2 + f\_{\mathbf{yy}} v\_{\mathbf{y}}^2 + 2 f\_{xy} v\_{\mathbf{x}} v\_{\mathbf{y}} \right\} + \dotsb \tag{28}$$

After discarding a bad image pair, the higher-order terms can be considered small. In this case, the quadratic term in Eq. (28) can be estimated for each pixel *i* as follows:

$$-\frac{1}{2}\left\{ \left( \boldsymbol{f}\_{\boldsymbol{x}}^{(i,j)} - \boldsymbol{f}\_{\boldsymbol{x}}^{(i,j-1)} \right) \boldsymbol{v}\_{\boldsymbol{x}}^{(i,j)} + \left( \boldsymbol{f}\_{\boldsymbol{y}}^{(i,j)} - \boldsymbol{f}\_{\boldsymbol{y}}^{(i,j-1)} \right) \boldsymbol{v}\_{\boldsymbol{y}}^{(i,j)} \right\}.\tag{29}$$

We can define a measure for estimating the equation error as the ratio of this higher order term to the first order term.

$$J = \frac{|\left(f\_{\mathbf{x}}^{(i,j)} - f\_{\mathbf{x}}^{(i,j-1)}\right)\mathbf{v}\_{\mathbf{x}}^{(i,j)} + \left(f\_{\mathbf{y}}^{(i,j)} - f\_{\mathbf{y}}^{(i,j-1)}\right)\mathbf{v}\_{\mathbf{y}}^{(i,j)}|}{2|f\_{\mathbf{x}}^{(i,j)}\mathbf{v}\_{\mathbf{x}}^{(i,j)} + f\_{\mathbf{y}}^{(i,j)}\mathbf{v}\_{\mathbf{y}}^{(i,j)}|}. \tag{30}$$

This measurement depends on the direction of the optical flow but is invariant with respect to the amplitude of the optical flow. To calculate the value of *J*, we need to know the true value of the optical flow. By examining the details of *J*, even if the difference of the spatial gradient *f*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>s</sup>* � *<sup>f</sup>*ð Þ *<sup>i</sup>*, *<sup>j</sup>*�<sup>1</sup> *<sup>s</sup>* is large, when the direction of *f*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>s</sup>* � *<sup>f</sup>*ð Þ *<sup>i</sup>*, *<sup>j</sup>*�<sup>1</sup> *<sup>s</sup>* is perpendicular to that of optical flow, the equation error becomes small. Therefore, the value ∣*f*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>s</sup>* � *<sup>f</sup>*ð Þ *<sup>i</sup>*, *<sup>j</sup>*�<sup>1</sup> *<sup>s</sup>* <sup>∣</sup>*=*∣*f*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>s</sup>* ∣ can be used as the worst value. In the following, the image pairs for which ∣*f*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>s</sup>* � *<sup>f</sup>*ð Þ *<sup>i</sup>*, *<sup>j</sup>*�<sup>1</sup> *<sup>s</sup>* <sup>∣</sup>*=*∣*f*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>s</sup>* ∣ is less than the certain threshold value are selected at each pixel to be used for depth recovery.

#### **5.2 Camera system implementation**

We built the camera hardware system for examining the practical performance of our camera model shown in **Figure 1**. The implemented camera system is shown in **Figure 9**.

The camera system can be rotated around the horizontal axis i.e. *X* axis and around the vertical axis, i.e. *Y* axis. The rotation around the optical direction, i.e. *Z* direction, cannot be performed, which is not needed to gain the depth information. The parameters of the system are shown as follows: focal length is 2*:*8 � 5*:*0 mm, image size is 1,200 � 1,600 pix., movable widths are 360 deg. for *X* axis and ð Þ �10, þ10 deg. for *Y* axis, and drivable minimum units are 1 pulse = 0*:*01 deg. for *X*-axis and 1 pulse = 0*:*00067 deg. for *Y*-axis.

## **5.3 Experimental results**

In this section, we explain the results of the experiments using the real images captured by the developed camera system [22]. Our camera system has a parallel stereo function. That is, the camera can be moved laterally by the slide system. Prior to the experiment, we calibrated the camera's internal parameters, including focal length and *Z*0, using the method in [23] and stereo calculations. The image used in the experiment is grayscale, consists of 256 � 256 pixels, and is 8-bit digitized. An example is shown in **Figure 10(a)**. The true inverse depth of the target object is shown in **Figure 10(b)**. It was measured in parallel stereo above using a two-plane model. In this figure, the horizontal axis shows the position in the image plane, and the vertical axis shows the inverse depth in units of focal length. We captured 100 images. The maximum number of iterations of the MAP-EM

algorithm was set to 600. Within this number of iterations, the iterations of almost

(e) (f)

*Profiles of cross-section of recovered inverse depth: (a) all image pairs are used (*100%*), (b) threshold* �1*:*5 94 ð % *image pairs were used), (c) threshold* �1*:*25 86 ð Þ % *, (d) threshold* �1 68 ð Þ % *, (e) threshold*

(a) (b)

*Stereoscopic Calculation Model Based on Fixational Eye Movements*

*DOI: http://dx.doi.org/10.5772/intechopen.97404*

*Data for experiments: (a) example of captured image, (b) true inverse depth of object (reprinted from [22]).*

(a) (b)

(c) (d)

*<sup>d</sup>* is heuristically determined. The average value of

*<sup>s</sup>* ∣ explained in the previous section with respect to all pixels was

all experiments converged. *σ*<sup>2</sup>

�0*:*75 62 ð Þ % *, (f) threshold* �0*:*5 62 ð Þ % *(reprinted from [22]).*

*<sup>s</sup>* <sup>∣</sup>*=*∣*f*ð Þ *<sup>i</sup>*,*<sup>j</sup>*

∣*f*ð Þ *<sup>i</sup>*,*<sup>j</sup>*

**19**

**Figure 11.**

**Figure 10.**

*<sup>s</sup>* � *<sup>f</sup>*ð Þ *<sup>i</sup>*, *<sup>j</sup>*�<sup>1</sup>

## *Stereoscopic Calculation Model Based on Fixational Eye Movements DOI: http://dx.doi.org/10.5772/intechopen.97404*

#### **Figure 10.**

*Data for experiments: (a) example of captured image, (b) true inverse depth of object (reprinted from [22]).*

#### **Figure 11.**

The camera system can be rotated around the horizontal axis i.e. *X* axis and around the vertical axis, i.e. *Y* axis. The rotation around the optical direction, i.e. *Z* direction, cannot be performed, which is not needed to gain the depth information. The parameters of the system are shown as follows: focal length is 2*:*8 � 5*:*0 mm, image size is 1,200 � 1,600 pix., movable widths are 360 deg. for *X* axis and ð Þ �10, þ10 deg. for *Y* axis, and drivable minimum units are 1 pulse = 0*:*01 deg. for

In this section, we explain the results of the experiments using the real images captured by the developed camera system [22]. Our camera system has a parallel stereo function. That is, the camera can be moved laterally by the slide system. Prior to the experiment, we calibrated the camera's internal parameters, including focal length and *Z*0, using the method in [23] and stereo calculations. The image used in the experiment is grayscale, consists of 256 � 256 pixels, and is 8-bit digitized. An example is shown in **Figure 10(a)**. The true inverse depth of the target object is shown in **Figure 10(b)**. It was measured in parallel stereo above using a two-plane model. In this figure, the horizontal axis shows the position in the image plane, and the vertical axis shows the inverse depth in units of focal length. We captured 100 images. The maximum number of iterations of the MAP-EM

*X*-axis and 1 pulse = 0*:*00067 deg. for *Y*-axis.

*Camera system implemented for tremor rotations.*

*Applications of Pattern Recognition*

**5.3 Experimental results**

**Figure 9.**

**18**

*Profiles of cross-section of recovered inverse depth: (a) all image pairs are used (*100%*), (b) threshold* �1*:*5 94 ð % *image pairs were used), (c) threshold* �1*:*25 86 ð Þ % *, (d) threshold* �1 68 ð Þ % *, (e) threshold* �0*:*75 62 ð Þ % *, (f) threshold* �0*:*5 62 ð Þ % *(reprinted from [22]).*

algorithm was set to 600. Within this number of iterations, the iterations of almost all experiments converged. *σ*<sup>2</sup> *<sup>d</sup>* is heuristically determined. The average value of ∣*f*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>s</sup>* � *<sup>f</sup>*ð Þ *<sup>i</sup>*, *<sup>j</sup>*�<sup>1</sup> *<sup>s</sup>* <sup>∣</sup>*=*∣*f*ð Þ *<sup>i</sup>*,*<sup>j</sup> <sup>s</sup>* ∣ explained in the previous section with respect to all pixels was defined for each image pair as a standard magnification (�1) of the threshold for selecting the suitable image pairs. Namely, by decreasing the threshold magnification, we can discard more image pairs. Conversely, by increasing the magnification, many image pairs can be used for recovery. Because of the limit of pages, we only show the results with *σ*<sup>2</sup> *<sup>r</sup>* <sup>¼</sup> <sup>2</sup>*:*<sup>64</sup> � <sup>10</sup>�<sup>2</sup> by which the average of the optical flow'<sup>s</sup> amplitude approximately coincides with *λ=*4.

multi-resolution processing proposed in [12]. In another scheme that deals with the textureless region in stereo vision, the region where the depth value is constant or changes smoothly, called the support region, is adaptively determined [25]. We will also consider whether the relationship between image changes due to tremor and microsaccade can be used for adaptive determination of this support region.

In recent years, many realizations of stereoscopic vision and motion stereoscopic vision by deep learning have been reported [26–28]. And the relationship with the conventional method based on mathematical formulas is often questioned. The deep learning method is hampered by the addition of a large number of images and annotations to them. Although unsupervised learning is often devised, the solution is often limited. Therefore, even if the conventional method is rather complicated and takes time, if a method capable of more precise depth recovery is constructed, it can be used for annotation calculation of deep learning. This can be understood as copying the conventional method to deep neural network (DNN). DNN takes time to learn, but has the advantage of being able to infer at high speed. In this way, it is

Here, the method of calibrating the axis of rotation is explained using **Figure 12**.

10 0 0 cos *θ* � sin *θ* 0 sin *θ* cos *θ*

The translation *T* of the lens center generated by this rotation is given by the

, respectively. Similarly, the optical axes before and after rotation

<sup>2</sup> <sup>¼</sup> ½ � 0, 0, 1 T, respectively. If the rotation is taken around the

3 7

the coordinates of the corresponding points on the image be *x*<sup>1</sup> ¼ *x*1, *y*1, *z*<sup>1</sup>

X-axis, the rotation matrix is given by the following equation.

2 6 4

*R* ¼

following equation in the coordinate system before rotation.

<sup>T</sup> in the coordinate system before

� �<sup>T</sup> and

<sup>5</sup>*:* (31)

<sup>T</sup> in the coordinate system after rotation, and

important that both schemes develop in a two-sided relationship.

*Stereoscopic Calculation Model Based on Fixational Eye Movements*

*DOI: http://dx.doi.org/10.5772/intechopen.97404*

Let a point in 3-D space be *X*<sup>1</sup> ¼ ½ � *X*1, *Y*1, *Z*<sup>1</sup>

camera rotation and *X*<sup>2</sup> ¼ *X*2, *Y*2, *Z*2�

<sup>1</sup> <sup>¼</sup> ½ � 0, 0, 1 <sup>T</sup> and *<sup>z</sup>*<sup>2</sup>

**Appendix**

*x*<sup>2</sup> ¼ *x*2, *y*2, *z*<sup>2</sup> � �<sup>T</sup>

are *z*<sup>1</sup>

**Figure 12.**

**21**

*Explanation of rotation axis calibration.*

**Figure 11** shows the result of the recovered depth for each threshold set as a constant multiple of the reference value. We also looked at the results using all image pairs. From these results, it can be confirmed that by reducing the magnification, inappropriate image pairs can be discarded and the accuracy of depth recovery is improved. The percentage shown in the caption of the figure shows the number of image pairs used for recovery, which is determined in conjunction with the change in threshold.
