4. Comparison of the developed algorithms with the MVFAST algorithm

For block approach [18, 19], one of the most effective algorithms for the formation of shift vectors' field is the MVFAST (Motion Vector Field Adaptive Search

#### Formation of Inter-Frame Deformation Field of Images Using Reverse Stochastic Gradient… DOI: http://dx.doi.org/10.5772/intechopen.83489

Technique) algorithm [20]. Therefore, we compare the effectiveness of the proposed algorithms and the MVFAST algorithm.

In the MVFAST algorithm, the deformed image is divided into many nonoverlapping square blocks of a predetermined size. In most cases the block sizes are 2 � 2, 4 � 4, etc., but the algorithm allows setting blocks of any size, starting from one pixel. The estimation of the shift vector for the current block is determined by the estimates of the motion vectors of the neighboring blocks. The maximum value of the motion vector length of neighboring blocks is compared with two threshold values L<sup>1</sup> and L2, the values of which depend on the type of video sequence. As a result motion activity is classified as low, medium, and high. The criterion is the minimum of the sum of absolute differences (SAD). This characteristic for a block is the sum of the modules of the differences between the nodes of the current block and the corresponding nodes of the image on which the block location is searched. To improve performance and eliminate the influence of noise in the MVFAST algorithm, an early stop of the search is used. The search is terminated if the SAD is less than the specified threshold value T. Early stopping of the search can be disabled by setting the threshold T to zero.

Let us compare the results of estimation of the deformation field by algorithms D and MVFAST. For correctness of the comparison, below are the results obtained for the images in Figure 2 with the parameters of Algorithm D, λ<sup>h</sup> ¼ λρ ¼ 0:1, λφ ¼ π=180, Δx ¼ Δy ¼ 1, Δ<sup>h</sup> ¼ Δ<sup>ρ</sup> ¼ 0:2, and Δ<sup>φ</sup> ¼ 5π=180, and parameters of the MVFAST algorithm, L<sup>1</sup> ¼ 1, L<sup>2</sup> ¼ 2, T ¼ 1, and block size 1x1. Figure 11a shows the estimates of the shift magnitude of the image points corresponding to the nodes of a single row of the reference image and formed by the MVFAST algorithm. The ˜ ° results of Algorithm <sup>D</sup> with the sets of parameters hx; hy and <sup>ð</sup>ρ; <sup>φ</sup><sup>Þ</sup> are shown in Figure 6c and d, respectively.

It can be seen from the figure that the MVFAST algorithm forms fairly accurate estimates of node shifts, but, unlike Algorithm D, it has errors at the boundaries of the image of an object. Errors also occur in low-contrast areas inside the object. The latter is due to the fact that SAD in the center of the search for them often does not exceed the threshold T. Setting the threshold T ¼ 0 when a block size is 1 pixel also does not solve this problem. Algorithm D, due to the inertia of changes of estimates, is deprived of this disadvantage.

The values of the mathematical expectation and variance of estimation errors when using the MVFAST algorithm for a row and for the entire image are shown in Table 1. The table shows that the mathematical expectation of the MVFAST algorithm for the motion area is almost the same as for Algorithm D with the set of ˜ ° parameters hx; hy but several times higher (about five times for a row, eight times for the entire image) than for Algorithm D with the set of parameters ðρ; φÞ. The variance of the estimation error for the motion area for the MVFAST algorithm

Figure 11.

Example of estimates for the shift magnitude for a single row and visualization of the deformation field generated by the MVFAST algorithm.

Figure 12.

Example of estimates of shift magnitude for a single row and visualization of the deformation field formed by the MVFAST algorithm for complex motion.

significantly exceeds the variance for Algorithm D (13 times with the set of param- ˜ ° eters hx; hy and <sup>27</sup> times with the set of parameters <sup>ð</sup>ρ; <sup>φ</sup>Þ. In the absence of noise for area without motion, the MVFAST algorithm shows slightly better results com- ˜ ° pared to Algorithm D with the set of parameters hx; hy , but the results for Algorithm D with the set of parameters ðρ; φÞ is better. This is well illustrated by Figure 11b, which shows the visualization of the deformation field estimates for the ˜ ° entire image. The results of using Algorithm D with the sets of parameters hx; hy and ðρ; φÞ are shown in Figure 7b and d, respectively.

Let us also compare the efficiency of the algorithms with the complex motion for <sup>T</sup> the same conditions and parameters of inter-frame deformations (<sup>h</sup> ¼ ð2; <sup>3</sup><sup>Þ</sup> , <sup>φ</sup> <sup>¼</sup> <sup>4</sup><sup>∘</sup> , κ ¼ 1) which were used in the analysis of algorithms B and D.

Figure 12a shows the estimates of shift magnitudes for a single row of the reference image (the same row that was selected in the analysis of Algorithm D), when the MVFAST algorithm is used. The results of Algorithm D with the sets of ˜ ° parameters hx; hy and <sup>ð</sup>ρ; <sup>φ</sup><sup>Þ</sup> are shown in Figure 9c and <sup>d</sup>, respectively. The visualization of deformation field for the MVFAST algorithm is shown in Figure 12b. The deformation fields obtained by Algorithm D with the sets of ˜ ° parameters hx; hy and <sup>ð</sup>ρ; <sup>φ</sup><sup>Þ</sup> are shown in Figure 10c and <sup>d</sup>, respectively.

It can be seen from the figure that when estimating a deformation field for complex motion, the MVFAST algorithm gives significantly worse results in terms of spatial accuracy compared to the developed algorithms, and it is practically inapplicable for estimating the parameters of the motion trajectory.

Note that the increase of block size in the MVFAST algorithm allows to get rid of detection gaps inside the object image but still does not solve the problem of errors at its boundaries. At the same time, the increase of block size reduces the accuracy of moving object area detection. The proposed algorithms, in contrast to the MVFAST algorithm, provide subpixel accuracy while estimating the parameters of the shifts that is fundamentally impossible for the MVFAST algorithm.

#### 5. Analysis of computational costs

Along with the accuracy of estimation, computational complexity of the algorithms is also important. In this case, the required amount of computations per pixel of the reference image with coordinates ð Þ i; j consists of two components:

Formation of Inter-Frame Deformation Field of Images Using Reverse Stochastic Gradient… DOI: http://dx.doi.org/10.5772/intechopen.83489


� � The second component depends on many factors: complexity of the background, noise, number and speed of moving objects, etc. It can be evaluated with the given limitations of the particular problem being solved. Therefore, in the framework of this work, we restrict ourselves from analyzing the first component of computational costs with two sets of parameters for the estimated shift vectors hx; hy and ðρ; φÞ for the three algorithms discussed above:


Detailed analysis of the computational cost requires taking into account not only algorithm itself but a large number of other factors (type of the computing device, time of memory access and for other operations, etc.). Many of these factors depend on the particular imaging device and computing resources. Therefore, we analyze only the computational complexity of the calculated ratios of the algorithms. We will consider the number of operations of addition (+), subtraction (�), multiplication (�), division (÷), tacking the root ( pffi ), trigonometric functions ( sin , cos ), and interpolations for a node (~z).Table 2 shows the number of such operations needed to process one node of an image.

The total computational cost is defined as

$$N = c\_{\pm}N\_{\pm} + c\_{\times}N\_{\times} + c\_{\circ}N\_{\circ} + c\tilde{\mathbf{z}}N\tilde{\mathbf{z}} + c\_{\sqrt{}}N\_{\sqrt{}} + c\_{\sin}N\_{\sin} \tag{4}$$


#### Table 2.

The number of operations needed to process a single node of an image.

<sup>p</sup>ffi, csin are the coefficients characterizing the time needed to <sup>~</sup><sup>z</sup> , N where c�, c�, c÷, cz , c perform corresponding operations and N�, N�, N÷, N ~ <sup>p</sup>ffi, N sin are the number of operations.

� � Equation (4) allows estimating the computational cost of the algorithms for particular computing facilities. For example, for a PC with an AMD Turion II X2 M500 processor: c� = 1.4, c� = 1.4, c<sup>÷</sup> = 13.6, c~<sup>z</sup> = 63, cpffi = 135, and csin = 212 ns. Then for the set of parameters hx; hy , average time needed to process one node of an image is 0.53 μs for Algorithm A, 1.06 μs for Algorithm C, and 3.74 μs for Algorithm D. The average time for parameters ðρ; φÞ is 1.4 μs for Algorithm A, 2.8 μs for Algorithm C, and 4.62 μs for Algorithm D.

� � � � Let us note the features of the computational complexity of algorithms D and MVFAST. Studies have shown that the amount of computation of the latter is largely determined by the nature of the processed images. So if the adjacent frames of a video sequence do not contain motion, the computational complexity of the algorithm will be determined only by computing one SAD for each block. In the presence of motion, depending on its intensity, the computational complexity may increase in tens or even hundreds of times. In particular, when processing two frames of a video sequence without movement by the MVFAST algorithm with a block size of 1 pixel on the same PC, the average time spent on one node was 0.64 μs. This is the lowest possible computational cost. The average processing time for a single node for the images shown in Figure 2 was 8.91 μs. For similar images <sup>T</sup> with <sup>a</sup> complex type of motion, for example, with <sup>α</sup> ¼ ð2; <sup>3</sup><sup>Þ</sup> ; <sup>4</sup><sup>∘</sup> ; 1 , processing time is 21.84 μs. Note that the time expenses when using Algorithm D with the set of parameters hx; hy were 7.44, 8.8, and 10.75 μs under similar conditions, while with the parameters ðρ; φÞ, the time expenses were 8.13, 10.27, and 11.8 μs.

From the above results, it can be seen that the computational complexity of Algorithm D weakly depends on the presence or absence of moving objects, their size, and type of motion. The computational complexity of the MVFAST algorithm is largely determined by these factors. So for the case when the inter-frame deformations of images of a moving object correspond only to a parallel shift, the computation time for algorithms D and MVFAST is approximately the same. When motion is complex, it increases in several times for the MVFAST algorithm.

## 6. An example of estimating the trajectory of a moving object using deformation field

Let us give an example of estimating the trajectory of a moving object using the developed algorithms. Figure 13a and b shows two frames from the video sequence being processed in which the aircraft lands on an aircraft carrier.

� � A complicating factor is the uneven movement of the camera toward the landing plane. Therefore, to estimate the position of the moving object relative to the scene (not to the camera), it was necessary not only to detect and identify the moving object area but also to stabilize the image. First, the distortion caused by the movement of the camera is compensated. To do this the inter-frame deformation of adjacent frames presented as similarity model α ¼ hx; hy; φ; κ is estimated using deformation fields [21]. Then the area of a moving object is detected. Using the estimates of the deformation field in the area of moving object, the parameters of the inter-frame motion are found, which for 34 frames of the video sequence are isometrically presented in Figure 13c. These parameters are used to estimate a

Formation of Inter-Frame Deformation Field of Images Using Reverse Stochastic Gradient… DOI: http://dx.doi.org/10.5772/intechopen.83489

Figure 13. An example of estimation of moving object trajectory.

three-dimensional trajectory of a moving object. Figure 13d shows the result in relative coordinates XYZ. The camera position at the initial moment of the shooting is taken as the origin.

### 7. Conclusion

The effectiveness of the motion area detection in sequence of images using pixelby-pixel stochastic gradient estimation of the inter-frame shift vectors for all points of the reference image corresponding to its nodes (deformation field) is investigated. It is shown that stochastic gradient estimation of shift vectors does not give equivalent results for their representation as projections on the basic axes of the image and polar parameters. This is due to the fact that these sets of parameters have a different physical meaning. The analysis of variants of estimation algorithms for the deformation field has shown that the use of polar parameters provides higher accuracy of estimation.

Two approaches to estimate the parameters of the deformation field are considered. In the first approach, the stochastic gradient procedure sequentially processes all rows of an image to find estimates of shifts for all points of the reference image. It processes each row bidirectionally, i.e., from the left to the right and from the right to the left. The joint processing of the results allows compensating the inertia of the stochastic estimation. However, this approach does not take into account inter-row correlation, and images are processed as one-dimensional signals. In the second approach, to improve the accuracy of estimation, the correlation of image rows is taken into account. Algorithm processes rows one after the other with

change in direction after each row and uses obtained values to form resulting estimate for each node. From the results obtained for each node of the reference image, the resulting estimate of the shift vector is formed. Studies have shown that the second approach, with approximately the same computational complexity, showed significantly higher accuracy in the estimation of the shift vectors' field.

As a criterion for the formation of the resulting estimate, the minimum of gradient estimation of the objective function and the correlation maximum of local neighborhoods of the deformed and reference images were investigated. Correlation maximum criterion showed the best results. When using this criterion, not only the mean value and the variance of the estimation errors turned out to be less but also the effect of "oscillations" outside the motion area. But the use of this criterion requires a large computational cost.

The computational complexity of the proposed algorithms is investigated. Under equal conditions, the estimation of the projections of the shift vectors requires a slightly smaller amount of computations than the estimation of polar parameters. Using reverse processing of a row is less time-consuming compared with processing of the adjacent rows. The use of the correlation maximum criterion implies a larger amount of calculations than the use of the gradient estimation minimum criterion. The algorithms with large computational costs provide less error; therefore the choice of algorithm depends on the specific task.

The experiments have confirmed the effectiveness of using the developed algorithms to find the motion parameters and the trajectories of objects in a video sequence. In particular, an example of estimating three-dimensional trajectory of an aircraft in relative coordinates using 34 frames of a video sequence is considered. The parameters of the similarity model (parallel shift, rotation angle, and scale factor) were used to describe the inter-frame motion.

Comparison of the developed algorithms with the block algorithm MVFAST (Motion Vector Field Adaptive Search Technique) showed that under identical conditions, the latter shows worse results in accuracy of detecting the boundaries of motion. In addition, in contrast to the MVFAST algorithm, the proposed algorithms allow obtaining subpixel estimation accuracy.

Thus, the conducted study allows us to conclude that the pixel-by-pixel interframe estimation of the shift vectors of the reference image points corresponding to its nodes is a promising approach in detecting the area of a moving object in a series of images.

#### Acknowledgements

The reported study was funded by the RFBR and Government of Ulyanovsk Region according to the research projects 18-41-730006 and 18-41-730011.

Formation of Inter-Frame Deformation Field of Images Using Reverse Stochastic Gradient… DOI: http://dx.doi.org/10.5772/intechopen.83489
