1. Introduction

One of the challenges in video processing is moving object detection and tracking. Some tasks require only detection of the motion, while others require extraction of the moving object or the motion area boundary. The biggest challenge is to estimate trajectory parameters of a moving object in a video sequence. A solution quality to the problem largely depends on the accuracy of moving object area detection, since all the information needed to determine motion parameters and trajectory of the object is extracted from the image.

There are various approaches to identify the area of moving object based on the inter-frame difference [1, 2], background subtraction [2, 3], the use of statistics [2, 4], block estimation [5], and optical flow analysis [6]. The processing can be presented as estimation of inter-frame geometric deformations of two images, one

n o of which can be considered as the reference image <sup>Z</sup><sup>s</sup> <sup>¼</sup> <sup>z</sup><sup>s</sup> and the second as i,j n o<sup>d</sup> <sup>∗</sup> deformed image <sup>Z</sup> <sup>i</sup>; <sup>j</sup> <sup>d</sup> <sup>¼</sup> <sup>z</sup> , where zi,j is the value in node ð Þ of the image. The i,j field <sup>H</sup> <sup>¼</sup> hi,j of inter-frame shift vectors hi,j of all points of the reference image corresponding to the nodes of the sample grid will be called deformation field or shift vectors' field.

An important task of estimating the trajectory of a moving object is to increase the spatial accuracy of detecting the area of motion in an image sequence. The solution of this problem depends on the quality of estimation of the deformation field. When finding estimates of shifts for each node of the deformed image, the approach of using stochastic gradient estimation [7–9] is promising. It finds coordinates ðx; yÞ of points of the reference image on the deformed image. That is, it <sup>Т</sup> evaluates vectors hi,j <sup>¼</sup> <sup>h</sup>ð Þ <sup>i</sup>;<sup>j</sup> <sup>x</sup>; <sup>h</sup>ð Þ <sup>i</sup>;j y of inter-frame shifts of points ð Þ <sup>i</sup>; <sup>j</sup> (Figure <sup>1</sup>) of the reference image.

In view of the insignificant change in brightness on adjacent frames of a video sequence, it is advisable to use mean square inter-frame difference as the objective function of estimation. Then, for the projections, hð Þ <sup>i</sup>;<sup>j</sup> <sup>x</sup> and hð Þ <sup>i</sup>;<sup>j</sup> <sup>y</sup> of shift vector, we get [10]

$$\hat{\vec{h}}\_{i,j+1} = \hat{\vec{h}}\_{i,j} - \lambda\_h \text{sign } \ \overline{\beta}\_h \Big( \mathbf{z}\_{i,j+1}^d, \hat{\vec{h}}\_{i,j} \Big), i = \overline{1, N\_\mathbf{x}}, j = \overline{1, N\_\mathbf{y} - 1}, \tag{1}$$

where λ<sup>h</sup> is the learning rate, which determines the rate of change of the esti- <sup>T</sup> mated parameters, β<sup>h</sup> ¼ βhx; βhy is the gradient estimation of an objective function, and Nx x Ny is the image size.

The projections of stochastic gradient can be represented as [11]

$$\begin{aligned} \beta\_{h\mathbf{x}} &= \left(\tilde{\mathbf{z}}\_{\mathbf{x}+\Delta\mathbf{x},\mathbf{y}}^{\mathbf{s}} - \tilde{\mathbf{z}}\_{\mathbf{x}-\Delta\mathbf{x},\mathbf{y}}^{\mathbf{s}}\right) \left(\tilde{\mathbf{z}}\_{\mathbf{x},\mathbf{y}}^{\mathbf{s}} - \mathbf{z}\_{i,j}^{\mathbf{d}}\right), \\ \beta\_{h\mathbf{y}} &= \left(\tilde{\mathbf{z}}\_{\mathbf{x},\mathbf{y}+\Delta\mathbf{y}}^{\mathbf{s}} - \tilde{\mathbf{z}}\_{\mathbf{x},\mathbf{y}-\Delta\mathbf{y}}^{\mathbf{s}}\right) \left(\tilde{\mathbf{z}}\_{\mathbf{x},\mathbf{y}}^{\mathbf{s}} - \mathbf{z}\_{i,j}^{\mathbf{d}}\right), \end{aligned}$$

~ where <sup>ð</sup>x; <sup>y</sup><sup>Þ</sup> are the coordinates of the reference image point on the deformed <sup>s</sup> <sup>z</sup> is the brightness of the continuous image Zs <sup>~</sup> obtained from <sup>Z</sup><sup>s</sup> image, by means x, <sup>y</sup> ~ ~ of interpolation, and <sup>Δ</sup>x, <sup>Δ</sup><sup>y</sup> are the steps of finding estimates of derivatives <sup>s</sup> <sup>s</sup> <sup>d</sup>z=d<sup>x</sup> and <sup>d</sup><sup>z</sup> <sup>=</sup>d<sup>y</sup> [12]. x, <sup>y</sup> x, <sup>y</sup>

Note that inter-frame shift vector hi,j can be also represented in polar form: hi,j <sup>¼</sup> <sup>ρ</sup>i,j ; φi,j (Figure 1), where ρi,j is the length of the vector and φi,j is the angle with respect to the x axis (Figure 1). Functionally, these representations are equivalent. However, due to the inertia of estimates of shift vectors' field H, when using

Figure 1. Representation of shift of reference image point ð Þ i; j . Formation of Inter-Frame Deformation Field of Images Using Reverse Stochastic Gradient… DOI: http://dx.doi.org/10.5772/intechopen.83489

stochastic gradient descent, the use of parameters ρi,j and φi,j does not give equiv- ^ alent estimates hi,j compared with the use of <sup>h</sup> <sup>i</sup>;<sup>j</sup> and <sup>h</sup> <sup>i</sup>;j y. This is due to the fact ð Þ<sup>x</sup> ð Þ that the sets of parameters have different physical meanings [13]. The answer to the question of which set is preferable for solving the problem of moving object area detection is not obvious and requires research.
