**2. 2D motion measurement**

Computer vision is one of the most important techniques used in motion measurements, especially 2D motion measurement, because the instruments used in computer vision are comparatively cheaper, the measurement process is simple and the result is direct. In recent years, with the development of revolution and sensitivity on visual sensors, the measurement scale of computer vision has reached micro/nano scale.

Block matching method (BMA) is one of the most widely applied methods to compute the visual 2D motion from images, i.e. to estimate the 2D motion projected on the image plane by the objects moving in the 3D scene, as it is less susceptible to a random error source than edge based or image moment methods.

### **2.1 BMA method**

The foundational principle of BAM is to find a matching block from an image X in some other image Y, which may appear before or after X, and through comparing them to

Volumetric methods usually reconstruct 3D models of external anatomical structures from 2D images. They represent the final volume using a finite set of 3D geometric primitives. Then, from an image sequence acquired around the object to reconstruct, the images are calibrated and the 3D models of the referred object are built using different approaches of volumetric methods. These methods work in the object volumetric space and do not require a matching process between the images used. Thus, typically, the 3D models are built from a sequence of images, acquired using a turntable device and an off-the shelf camera (Teresa et al,in press, 2008). However, in some real applications, we do not need to reconstruct the 3D model of objects, because depth is enough to understand the 3D relationship of scenes.

DFS estimates depth from two images of the same scene captured by cameras at different positions and with different postures (Wu, 1999). Because it needs to extract and match feature points in these images, the computational task is so huge. As for DFF, it uses a mapping relation between focus and depth to estimate depth. It obtains a sequence of images with different depth, measures the focus degree using a measurement operator (Bove 1993; Nayar,1992), and attains the desired depth when the measurement value is maximal or minimal. Compared to DFS, DFF is simple in principle, but its estimation

DFD is first introduced by Pentland in 1987(Pentland,1987). It has been proved to be an effective depth reconstruction method by using the concept of blurring degree of region images with limit depth of field (Girod & Scherock, 1989; Pentland et al, 1994; Navar et al, 1996). Usually, DFD algorithm captures two images obtained with different camera parameters, measures blurring degree of every point, and estimates depth using the point spread function. During the past years, DFD has become attractive because 1) it requires only two images; 2) it avoids matching and masking problems; 3) it is effective both in the frequency domain and in the spatial domain (Gokstorp,1994; Subbarao & Surya,1994). However, since all above DFD methods need to capture two defocused images with changed camera parameters, they can not be used in applications with high level magnification microscopes, such as micro/nano manipulation, because on these situations, it is destructive to change camera parameters. This is the main reason why DFD has not

Computer vision is one of the most important techniques used in motion measurements, especially 2D motion measurement, because the instruments used in computer vision are comparatively cheaper, the measurement process is simple and the result is direct. In recent years, with the development of revolution and sensitivity on visual sensors, the

Block matching method (BMA) is one of the most widely applied methods to compute the visual 2D motion from images, i.e. to estimate the 2D motion projected on the image plane by the objects moving in the 3D scene, as it is less susceptible to a random error source than

The foundational principle of BAM is to find a matching block from an image X in some other image Y, which may appear before or after X, and through comparing them to

measurement scale of computer vision has reached micro/nano scale.

accuracy is highly related to the number of images.

been used in micro/nano manipulation until now.

**2. 2D motion measurement** 

edge based or image moment methods.

**2.1 BMA method** 

measure difference, such as distance or similarity, between two images. Therefore to select a criteria to determine whether a given block in image Y matches the search block in image X is top important, i.e., the object function.

BMA based techniques usually can be divided into two classes according to the measurement criterion, including the minimal difference and the maximal similarity. The widely used object functions based on difference measurements include Sum-of-Squared-Differences (SSD), Sum-of-Absolute-Differences (SAD) which can transferred into Local-SAD (LSAD) when its intensity is locally scaled and Zero-SAD (ZSAD) with setting the average gray level difference equal to zero. If the difference minimum is replaced by the maximum of a correlation measurement, some object functions can be got, such as Normalized-Cross-Correlation (NCC)(Qi & Michale,1987), Approximate-Maximum-Direct-Correlation (AMDC)( Kim & Meng,2007), or some other variations those are all approximate maximum likelihood estimators(Robinson & Milanfar,2004).

Fig. 1. Motion of the continuous image *F*(*i*,*j*) with respect to the pixel grid

Here, SSD, LSAD, ZSAD and NCC are all adopted to estimate the motion between two neighbor images in a same image sequence. The theory is shown in Fig. 1. Generally, *X*(*i*,*j*) and *Y*(*i*,*j*) are referred to as the model image and the target image respectively. F(*i*,*j*) is the continuous image function, *εx,y* represents additive noise, *s*=(*sx, sy*) is the shift between the model image and the target image.

$$X\_{i,j} = F(i,j) + \varepsilon\_x \tag{1}$$

$$Y\_{i,j} = F(\mathbf{i} - \mathbf{s}\_{x'} \mathbf{j} - \mathbf{s}\_y) + \varepsilon\_y \tag{2}$$

The object functions for SSD, LSAD, ZSAD and NCC are respectively defined as follow,

$$R\{\boldsymbol{\mu},\boldsymbol{\upsilon}\}\_{\text{SSD}} = \frac{1}{m \times n} \sum\_{i=0}^{m-1} \sum\_{j=0}^{n-1} \left[\boldsymbol{\chi}\_{i,j} - \boldsymbol{y}\_{i+u,j+v}\right]^2 \tag{3}$$

$$R\{\mu,\upsilon\}\_{LSAD} = \frac{1}{m \times n} \sum\_{i=0}^{m-1} \sum\_{j=0}^{n-1} \left| \mathbf{x}\_{i,j} - \frac{\overline{X\_{i,j}}}{\overline{Y\_{i+u,j+\upsilon}}} y\_{i+u,j+\upsilon} \right| \tag{4}$$

Applications of Computer Vision in Micro/Nano Observation 531

max( ( )) 2 *x x <sup>x</sup>*

max( ( )) 2 *y y <sup>y</sup>*

2 10 10 2 00 ( , ) (, ) [ ( , ) ( , ) ( , )] *<sup>x</sup> R R R RR*

2 0 1 01 2 00 ( , ) ( ,) [ ( , ) ( , ) ( , )] *<sup>x</sup> R R R RR*

However, the precision of the method mentioned above is usually low as it has no additional points to be used. Thus, one simple and effective way to improve the estimation precision is to properly increase the interpolated points and the order of the fitting

As is known, the searching region is the main factor which influences the computational cost and affects the performance of the BMA. Thus, in this section, with respect to an image sequence, an improved method is proposed to reduce the searching region effectively.

Since our aim here is to estimate the shift in the image sequence, and generally the motion is small between two neighbor images, it is not necessary to calculate *R*(*u*, *v*) with blocks throughout the whole image. Furthermore, the computation procedure not only increases the computational burden but also adds some opportunities of wrong matching when the

Assuming that the largest shift is known, our proposed new BMA can be denoted as



It is clear that using the improved algorithm the computational burden can be greatly reduced because of the reduced searching region. Besides, if the whole image is similar in texture and gray level, or the block is very small, the improved method can also effectively

10 10

0 1 01

*f x ax b* (13)

*f y ay b* (14)

(15)

(16)

where *R'*(*u* - *nx*, *v* - *ny*)=*R*(*u*, *v*).

**2.2.1 The searching region**

**2.2 Improved block matching algorithm** 

position in the target image.

corresponding moving steps.

decrease casual matching errors.

texture or gray level of the target image is very similar.

polynomials.

following steps,

If three points are used, the estimation results of the shifts can be denoted,

$$\mathcal{R}\{\boldsymbol{\mu},\boldsymbol{\upsilon}\}\\_{ZSAD} = \frac{1}{m \times n} \sum\_{i=0}^{m-1} \sum\_{j=0}^{n-1} \left[\boldsymbol{x}\_{i,j} - \overline{\boldsymbol{X}\_{i,j}} - \boldsymbol{y}\_{i+u,j+\boldsymbol{\upsilon}} + \overline{\boldsymbol{Y}\_{i+u,j+\boldsymbol{\upsilon}}}\right] \tag{5}$$

$$R(\mu, \upsilon)\_{\text{NCC}} = \frac{\frac{1}{n \times m} \sum\_{i=0}^{m-1} \sum\_{j=0}^{n-1} X\_{i,j} Y\_{i+u, j+v}}{\sqrt{\frac{1}{n \times m} \sum\_{i=0}^{m-1} \sum\_{j=0}^{n-1} X\_{i,j}^2} \sqrt{\frac{1}{n \times m} \sum\_{i=0}^{m-1} \sum\_{j=0}^{n-1} Y\_{i+u, j+v}^2}} \tag{6}$$

$$X\_{i,j} = \mathbf{x}\_{i,j} - \frac{1}{n \times m} \sum\_{i=0}^{m-1} \sum\_{j=0}^{n-1} \mathbf{x}\_{i,j} = \mathbf{x}\_{i,j} - \overline{X\_{i,j}} \tag{7}$$

$$\begin{split} \mathcal{Y}\_{i+u,j+v} &= \mathcal{Y}\_{i+u,j+v} - \frac{1}{n \times m} \sum\_{i=0}^{m-1} \sum\_{j=0}^{n-1} \mathcal{Y}\_{i+u,j+v} \\ &= \mathcal{Y}\_{i+u,j+v} - \overline{\mathcal{Y}\_{i+u,j+v}} \end{split} \tag{8}$$

where *<sup>i</sup>*, *<sup>j</sup> x* , *<sup>i</sup>*, *<sup>j</sup> y* are the original gray intensity of each point in the model and target image respectively, , , , *X Y <sup>i</sup> <sup>j</sup> <sup>i</sup> <sup>j</sup>* are the mean of each image inside their respective "block", *u*, *v* are coordinates of the model image block, and *R*(*u* ,*v*) is the object function between the model block and the target block. The shift ( , ) *<sup>x</sup> <sup>y</sup> s ss* is estimated by finding the peak of the objective function.

The shift between the model image and the target image can be denoted as,

$$\mathbf{s} = \mathbf{s}\_n + \mathbf{s}\_\Lambda \tag{9}$$

where ( , ) *n x <sup>y</sup> s nn* is the integer shift and ( , ) *<sup>x</sup> <sup>y</sup> s* is the sub-pixel shift. If the evaluation step is one pixel, *ns* can be obtained from,

$$R(n\_{x'}n\_y) = \max\{R(u,v)\}\tag{10}$$

However, the peak position above can only be solved with pixel-level accuracy, and it is not enough in many applications, especially in micro/nano manipulation. In order to attain the sub-pixel resolution, a quadratic curve fitting around the peak *sn* is usually used to estimate the sub-pixel shift *s*Δ as follows,

$$f(\mathbf{x}) = a\_{\mathbf{x}}\mathbf{x}^2 + b\_{\mathbf{x}}\mathbf{x} + c\_{\mathbf{x}} \tag{11}$$

$$f(y) = a\_y y^2 + b\_y y + c\_y \tag{12}$$

where *ax*, *bx*, *cx*, *ay*, *by*, *cy* are coefficients of the quadratic curves along *x* and *y* axis, which are the parameters to be estimated. The shift (,) *<sup>x</sup> <sup>y</sup> s* is estimated by finding the peak of the following *f*(*x*) and *f*(*y*).

$$\overline{\mathcal{E}\_{\mathbf{x}}} = \max(f(\mathbf{x})) = 2a\_{\mathbf{x}}\mathbf{x} + b\_{\mathbf{x}} \tag{13}$$

$$
\widetilde{\boldsymbol{\varepsilon}\_y} = \max(f(\boldsymbol{y})) = 2\boldsymbol{a}\_y \boldsymbol{y} + \boldsymbol{b}\_y \tag{14}
$$

If three points are used, the estimation results of the shifts can be denoted,

$$\widetilde{\varepsilon\_x} = \frac{R'(-1,0) - R'(1,0)}{2[R'(-1,0) + R'(1,0) - 2R'(0,0)]} \tag{15}$$

$$
\widetilde{\varepsilon\_x} = \frac{R'(0, -1) - R'(0, 1)}{2[R'(0, -1) + R'(0, 1) - 2R'(0, 0)]} \tag{16}
$$

where *R'*(*u* - *nx*, *v* - *ny*)=*R*(*u*, *v*).

530 Mechanical Engineering

,, , , (,) [ ]

1 1

*nm nm*

1 1

0 0

1 , , ,

*n m*

, , ,, , *m n i j i j i j i j i j i j Xx xxX n m*

*i u j v iu j v i u j v*

where *<sup>i</sup>*, *<sup>j</sup> x* , *<sup>i</sup>*, *<sup>j</sup> y* are the original gray intensity of each point in the model and target image respectively, , , , *X Y <sup>i</sup> <sup>j</sup> <sup>i</sup> <sup>j</sup>* are the mean of each image inside their respective "block", *u*, *v* are coordinates of the model image block, and *R*(*u* ,*v*) is the object function between the model block and the target block. The shift ( , ) *<sup>x</sup> <sup>y</sup> s ss* is estimated by finding the peak of the

> 

However, the peak position above can only be solved with pixel-level accuracy, and it is not enough in many applications, especially in micro/nano manipulation. In order to attain the sub-pixel resolution, a quadratic curve fitting around the peak *sn* is usually used to estimate

<sup>2</sup> ( ) *y yy f y ay by c*

where *ax*, *bx*, *cx*, *ay*, *by*, *cy* are coefficients of the quadratic curves along *x* and *y* axis, which are

 

, .

*y Y*

The shift between the model image and the target image can be denoted as,

where ( , ) *n x <sup>y</sup> s nn* is the integer shift and ( , ) *<sup>x</sup> <sup>y</sup> s*

the parameters to be estimated. The shift (,) *<sup>x</sup> <sup>y</sup> s*

step is one pixel, *ns* can be obtained from,

the sub-pixel shift *s*Δ as follows,

following *f*(*x*) and *f*(*y*).

*Y y y*

*i uj v i uj v*

*ZSAD i j i j i u j v iu j v*

*m n*

*i j NCC m n m n*

 

1 1

0 0 1 1 1 1 2 2

0 0 0 0

*i j i j*

*<sup>n</sup> ss s* (9)

is the sub-pixel shift. If the evaluation

( , ) max{ ( , )} *Rn n Ruv x y* (10)

<sup>2</sup> ( ) *x xx f x ax bx c* (11)

is estimated by finding the peak of the

(12)

(8)

(6)

(5)

, ,

*X Y*

*X Y*

1 1

*m n*

*i j*

0 0

*ij i uj v*

, ,

*i j i u j v*

(7)

1 1

*m n*

*i j Ruv xXy Y*

1

1

*m n*

(,)

*Ruv*

objective function.

0 0

1

*n m*

However, the precision of the method mentioned above is usually low as it has no additional points to be used. Thus, one simple and effective way to improve the estimation precision is to properly increase the interpolated points and the order of the fitting polynomials.

#### **2.2 Improved block matching algorithm**

#### **2.2.1 The searching region**

As is known, the searching region is the main factor which influences the computational cost and affects the performance of the BMA. Thus, in this section, with respect to an image sequence, an improved method is proposed to reduce the searching region effectively.

Since our aim here is to estimate the shift in the image sequence, and generally the motion is small between two neighbor images, it is not necessary to calculate *R*(*u*, *v*) with blocks throughout the whole image. Furthermore, the computation procedure not only increases the computational burden but also adds some opportunities of wrong matching when the texture or gray level of the target image is very similar.

Assuming that the largest shift is known, our proposed new BMA can be denoted as following steps,


It is clear that using the improved algorithm the computational burden can be greatly reduced because of the reduced searching region. Besides, if the whole image is similar in texture and gray level, or the block is very small, the improved method can also effectively decrease casual matching errors.

Applications of Computer Vision in Micro/Nano Observation 533

Estimation Error Based on Different Model Size

NCC SSD LSAD ZSAD

20x20 30x30 40x40 50x50 60x60 70x70 80x80 90x90 100x100110x110 120x120

Model size

Third, it is well known that estimation error of the curve fitting may become smaller when the order of a fitting polynomial increases. In this experiment, the comparison of the subpixel shift estimation error among different fitting polynomials including quadratic curve, cubic curve, quartic curve and spline curve, was conducted. Here, we used the same five fitting points, where the middle point is the peak and the other four points are on its both sides evenly. The main results are shown as Fig.5 to Fig.8, from which the following results

1. No matter what objective functions are used, the estimation error of the quadratic curve is always larger than the original method because of redundancy. While due to high smoothness, the spline curve can achieve an optimal result for all objective functions. 2. For NCC and SSD, the sub-pixel estimation is better enough when the fitting function is the cubic curve. That means, if the higher order fitting function is selected, the improvement in the estimation precision is unclear compared with the introduced computational burden. As far as the LSAD and ZSAD are concerned, the estimation precision improvement of the quartic fitting function is obvious. Thus, it can be concluded that the objective function is the main factor to decide the order of fitting

3. Since the computational burden has been greatly reduced by using the new BMA proposed in section 2.2.1, much higher precision can be achieved by using higher order fitting functions and larger number of fitting points. That can be properly selected

based on the preceding conclusions in real applications.

0

**2.2.3 The sub-pixel fitting precision** 

can be obtained,

equation.

Fig. 3. Estimation errors with different block size

0.02

0.04

0.06

0.08

0.1

Error(pixel)

0.12

0.14

0.16

0.18

0.2
