**1.1.6 One at a time search algorithm**

The one at a time search algorithm estimates the x-component and the y-component of the motion vector independently. The candidate search is first performed along the x-axis.

2D Logarithmic (Alois, 2009) search is another algorithm, which tests limited candidates. It is similar to the three-step search. During the first iteration, a total of five candidates are tested. The candidates are centered on the current block location in a diamond shape. The step size for first iteration is set equal to half the search range. For the second iteration, the centre of the diamond is shifted to the best matching candidate. The step size is reduced by half only if the best candidate happens to be the centre of the diamond. If the best candidate is not the diamond centre, same step size is used even for second iteration. In this case, some of the diamond candidates are already evaluated during first iteration. Hence, there is no need for block matching calculation for these candidates during the second iteration. The results from the first iteration can be used for these candidates. The process continues till the step size becomes equal to one pixel. For this iteration all eight surrounding candidates are evaluated. The best matching candidate from this iteration is selected for the current block. The number of evaluated candidate is variable for the 2D logarithmic search. However, the

The one at a time search algorithm estimates the x-component and the y-component of the motion vector independently. The candidate search is first performed along the x-axis.

Fig. 2. Fast Search Algorithms

**1.1.5 2D logarithmic search** 

worst case and best case candidates can be calculated.

**1.1.6 One at a time search algorithm** 

During each ite ration, a set of three neighboring candidates along the x-axis are tested in Fig.2. The three-candidate set is shifted towards the best matching candidate, with the best matching candidate forming the centre of the set for the next iteration. The process stops if the best matching candidate happens to be the centre of the candidate set. The location of this candidate on the x-axis is used as the x-component of the motion vector. The search now continues parallel to the y-axis. A procedure similar to x-axis search is followed to estimate y-component of the motion vector. One-step at a time search on average tests less number of candidates. However, the motion vector accuracy is poor.

#### **1.1.7 Sub-pixel motion estimation (Fractional Pel Motion Estimation)**

Integer pixel motion estimation (also called as full search method) is carried out in the process of motion estimation that is mainly used to reduce the duplication (redundant data) among adjacent frames. But in practice, the distance of real motion is not always made by multiplier (which is constant) at the sampling interval.The actual motion in the video sequence can be much finer. Hence, the resulting object might not lie on the integer pixel (Iain Richardson, 2010) grid. To get a better match, the motion estimation needs to be performed on a sub-pixel grid. The sub-pixel grid can be either at half pixel resolution or quarter pixel resolution.

Therefore it is advantageous to use the subpixel motion estimation technique to ensure high compression with high PSNR ratio of reconstructed image. The motion vector can be calculated at 1/2, 1/4, 1/8 subpixel (Young et.al 2010) positions. The motion vector is to be calculated at 1/4 pixel gives more detailed information than at 1/2 pixel position. Since, the image has been enlarged, interpolation must be implemented to compensate for the pixel value in case of enlargement.

Fig. 3. 6-tap Directional interpolation filter for luma

H.264 Motion Estimation and Applications 63

resolution level. Similarly, the search range is also reduced. The motion estimation process starts at the lowest resolution. Typically, full search motion estimation is performed for each block at the lowest resolution. Since the block size and the search range are reduced, it does not require large computations. The motion vectors from lowest resolution are scaled and passed on as candidate motion vectors for each block to next level. At the next level, the motion vectors are refined with a smaller search area. A simpler motion estimation algorithm and a small search range is enough at close to highest resolution since the motion

There is another type of motion estimation technique known as global motion estimation. The motion estimation techniques discussed so far are useful in estimating local motion (i.e. motion of objects within the video frame). However, the video sequence can also contain global motion. For some applications, such as video stabilization, it is more useful to find global motion (Alan Bovik 2009) rather than local motion. In global motion, the same type of motion is applicable to each pixel in the video frame. Some examples of global motion are panning, tilting and zoom in/out. In all these motion, each pixel is moving using the same global motion model. The motion vectors for each pixel or pixel block can be described using following parametric model with four parameters Global motion vector for a pixel or pixel block is given (6) & (7). For pan and tilt global motion, only q0 and q1 are non-zero i.e. constant motion vector for the entire video frame. For pure zoom in/out, only p0 and p1

Gx = p0\* x + q0 (6)

vectors are already close to accurate motion vectors.

Fig. 4. Hierarchial block matching

**1.1.9 Global motion estimation** 

will be nonzero.

Figure.3 shows the support pixels for each quarter-pixel position (Young et.al 2010) with different colours. For instance, the blue integer pixels are used to support the interpolation of three horizontal fractional pixels, a, b and c, the light-blue integer pixels for three vertical fractional pixels, d, h and n, the deep-yellow integer pixels for two down-right fractional pixels, e and r, the light-yellow integer pixels for two down-left fractional pixels, g and p, the purple integer pixels for the central fractional pixel j.

For each of the three horizontal fractional (Zhibo et.al.,2007) positions, a, b and c, and the three vertical fractional positions, d, h and n, which are aligned with full pixel positions, a single 6-tap filter is used and their equations are produced . The filter coefficients of DIF are {3, -15, 111, 37, -10, 2}/128 for ¼ position (and mirrored for ¾ position), {3, -17, 78, 78, -17, 3}/128 for ½ position.

$$\begin{aligned} &\mathbf{a} = \mathbf{(3H-15I+111I)+37K-10L+2M+64} >> 7\\ &\mathbf{b} = \mathbf{(3H-17I+78I)+78K-17L+3M+64} >> 7\\ &\mathbf{c} = \mathbf{(2H-10I+37)+111K-15L+3M+64} >> 7\\ &\mathbf{d} = \mathbf{(3B-15E+111I)+37O-10S+2W+64} >> 7\\ &\mathbf{h} = \mathbf{(3B-17E+78I)+78O-17S+3W+64} >> 7\\ &\mathbf{n} = \mathbf{(2B-10E+37I)+111O-15S+3W+64} >> 7 \end{aligned} \tag{2}$$

For the 4 innermost quarter-pixel positions, e, g, p, and r, the 6-tap filters at +45 degree and - 45 degree angles are used respectively.

$$\begin{array}{l} \text{e} = (\text{3A-15D} + \text{111I} + \text{37P-10U} + 2\text{X} + 64) >> 7\\ \text{g} = (\text{3C-15G} + \text{111K} + \text{37O-10R} + 2\text{V} + 64) >> 7\\ \text{p} = (\text{2C-10G} + \text{37K} + \text{111O-15R} + \text{3V} + 64) >> 7\\ \text{r} = (\text{2A-10D} + \text{37}) + \text{111P-15U} + \text{3X} + 64) >> 7 \end{array} \tag{3}$$

For another 4 innermost quarter-pixel positions, f, i, k, and q, a combination of the 6-tap filters at +45 degree and -45 degree angles, which is equivalent to a 12-tap filter, is used.

$$\begin{array}{l} \mathbf{f} = (\mathbf{e} + \mathbf{g} + 1) \gg 1 = ((\mathbf{3A-15D+111) + 37P-10U+23} + (\mathbf{3C-15G+111K+37O-10R+27}) + 128) \gg 8 \\ \mathbf{i} = (\mathbf{e} + \mathbf{p} + 1) \gg 1 = ((\mathbf{3A-15D+111}) + 37P-10U+23) + (2C-10G+37K+111O-15R+3V) + 128) \gg 8 \\ \mathbf{k} = (\mathbf{g} + \mathbf{r} + 1) \gg 1 = ((\mathbf{3C-15G+111K+37O-10R+2V} + (\mathbf{2A-10D+37} + \mathbf{111P-15U+3X} + \mathbf{128}) + 8 \\ \mathbf{g} = (\mathbf{p} + \mathbf{r} + 1) \gg 1 = ((\mathbf{2C-10G+37K+111O-15R+3V} + (\mathbf{2A-10D+37} + \mathbf{111P-15U+3X} + \mathbf{128}) + 128) \gg 8 \end{array} \tag{4}$$

$$\mathbf{j} = \left( (\mathbf{\tilde{s}E} + \mathbf{\tilde{s}F}) + (\mathbf{\tilde{s}I} + \mathbf{\tilde{z}Z}) + 2\mathbf{\tilde{z}K} + \mathbf{\tilde{s}I} \right) + \left( \mathbf{\tilde{s}N} + 2\mathbf{\tilde{z}O} + 2\mathbf{\tilde{z}P} + \mathbf{\tilde{s}Q} \right) + \left( \mathbf{\tilde{s}S} + \mathbf{\tilde{z}I} \right) + \mathbf{\tilde{s}Q} \tag{5}$$

The exception is the central position, j, where a 12-tap non-separable filter is used. The filter coefficients of DIF are {0, 5, 5, 0; 5, 22, 22, 5; 5, 22, 22, 5; 0, 5, 5, 0}/128 for the central position.

#### **1.1.8 Hierarchical block matching**

Hierarchical block matching algorithm (Alan Bovik, 2009) is a more sophisticated motion estimation technique which provides consistent motion vectors by successively refining the motion vector at different resolutions. In this, a pyramid fig.4 of reduced resolution video frame is formed from the source video. The original video frame forms the highest resolution image and the other images in the pyramid are formed by down sampling the original image. A simple bi-linear down sampling can be used. This is illustrated in figure 4. The block size of NxN at the highest resolution is reduced to (N/2) x (N/2) in the next

Figure.3 shows the support pixels for each quarter-pixel position (Young et.al 2010) with different colours. For instance, the blue integer pixels are used to support the interpolation of three horizontal fractional pixels, a, b and c, the light-blue integer pixels for three vertical fractional pixels, d, h and n, the deep-yellow integer pixels for two down-right fractional pixels, e and r, the light-yellow integer pixels for two down-left fractional pixels, g and p,

For each of the three horizontal fractional (Zhibo et.al.,2007) positions, a, b and c, and the three vertical fractional positions, d, h and n, which are aligned with full pixel positions, a single 6-tap filter is used and their equations are produced . The filter coefficients of DIF are {3, -15, 111, 37, -10, 2}/128 for ¼ position (and mirrored for ¾ position), {3, -17, 78, 78, -17,

> a=(3H-15I+111J+37K-10L+2M+64)>>7 b=(3H-17I+78J+78K-17L+3M+64)>>7 c = (2H-10I+37J+111K-15L+3M+64)>>7

d=(3B-15E+111J+37O-10S+2W+64)>>7 h=(3B-17E+78J+78O-17S+3W+64)>>7 n = (2B-10E+37J+111O-15S+3W+64)>>7

For the 4 innermost quarter-pixel positions, e, g, p, and r, the 6-tap filters at +45 degree and -

e= (3A-15D+111J+37P-10U+2X+64)>>7 g = (3C-15G+111K+37O-10R+2V+64)>>7 p = (2C-10G+37K+111O-15R+3V+64)>>7 r = (2A-10D+37J+111P-15U+3X+64)>>7

For another 4 innermost quarter-pixel positions, f, i, k, and q, a combination of the 6-tap filters at +45 degree and -45 degree angles, which is equivalent to a 12-tap filter, is used. f = (e+g+1)>>1 = ((3A-15D+111J+37P-10U+2X)+(3C-15G+111K+37O-10R+2V)+128)>>8 i = (e+p+1)>>1 = ((3A-15D+111J+37P-10U+2X)+(2C-10G+37K+111O-15R+3V)+128)>>8 k = (g+r+1)>>1 = ((3C-15G+111K+37O-10R+2V)+(2A-10D+37J+111P-15U+3X)+128)>>8 q= (p+r+1)>>1 = ((2C-10G+37K+111O-15R+3V)+(2A-10D+37J+111P-15U+3X)+128)>>8

 j = ((5E+5F) + (5I+22J+22K+5L) + (5N+22O+22P+5Q) + (5S+5T) +64)>>7 (5) The exception is the central position, j, where a 12-tap non-separable filter is used. The filter coefficients of DIF are {0, 5, 5, 0; 5, 22, 22, 5; 5, 22, 22, 5; 0, 5, 5, 0}/128 for the central position.

Hierarchical block matching algorithm (Alan Bovik, 2009) is a more sophisticated motion estimation technique which provides consistent motion vectors by successively refining the motion vector at different resolutions. In this, a pyramid fig.4 of reduced resolution video frame is formed from the source video. The original video frame forms the highest resolution image and the other images in the pyramid are formed by down sampling the original image. A simple bi-linear down sampling can be used. This is illustrated in figure 4. The block size of NxN at the highest resolution is reduced to (N/2) x (N/2) in the next

(1)

(2)

(3)

(4)

the purple integer pixels for the central fractional pixel j.

3}/128 for ½ position.

45 degree angles are used respectively.

**1.1.8 Hierarchical block matching** 

resolution level. Similarly, the search range is also reduced. The motion estimation process starts at the lowest resolution. Typically, full search motion estimation is performed for each block at the lowest resolution. Since the block size and the search range are reduced, it does not require large computations. The motion vectors from lowest resolution are scaled and passed on as candidate motion vectors for each block to next level. At the next level, the motion vectors are refined with a smaller search area. A simpler motion estimation algorithm and a small search range is enough at close to highest resolution since the motion vectors are already close to accurate motion vectors.

Fig. 4. Hierarchial block matching

## **1.1.9 Global motion estimation**

There is another type of motion estimation technique known as global motion estimation. The motion estimation techniques discussed so far are useful in estimating local motion (i.e. motion of objects within the video frame). However, the video sequence can also contain global motion. For some applications, such as video stabilization, it is more useful to find global motion (Alan Bovik 2009) rather than local motion. In global motion, the same type of motion is applicable to each pixel in the video frame. Some examples of global motion are panning, tilting and zoom in/out. In all these motion, each pixel is moving using the same global motion model. The motion vectors for each pixel or pixel block can be described using following parametric model with four parameters Global motion vector for a pixel or pixel block is given (6) & (7). For pan and tilt global motion, only q0 and q1 are non-zero i.e. constant motion vector for the entire video frame. For pure zoom in/out, only p0 and p1 will be nonzero.

$$\mathbf{G}\_{\mathbf{x}} = \mathbf{p}\mathbf{0}^{\mathbf{a}} \times +\mathbf{q}\mathbf{0} \tag{6}$$

H.264 Motion Estimation and Applications 65

Motion estimation therefore aims to find a 'match' to the current block or region that minimizes the energy in the motion compensated residual (the difference between the current block and the reference area). An area in the reference frame centered on the current macro block (Iain Richardson 2010) position (the search area) is searched and the 16 × 16 region within the search area that minimizes a matching criterion is chosen as the 'best match'. The choice of matching criterion is important since block matching might require the distortion measure for 'residual energy' affects computational complexity and the accuracy of the motion estimation process. Therefore, all attempts to establish suitable criteria for motion estimation require further implicit or explicit modeling of the image sequence of the video. If all matching criteria resulted in compressed video of the same quality then, of

 However matching criteria (IEG Richardson 2003) often differ on the choice of substitute for the target block, with consequent variation in the quality of the coded frame. The MSD, for example, requires many multiplications whereas the MAD primarily uses additions. While multiplication might not have too great an impact on a software (Romuald 2006) coder, a hardware coder using MSE could be significantly more expensive than a hardware implementation of the SAD/MAD function. Equations 8,9,10 describe three energy measures, MSD, MAD and SAD. The motion compensation block size is *N* × *N* samples; Cur i, j, Ref i, j are current and reference area samples respectively.Fig.6 shows the image in

> 1 1 <sup>2</sup> 2 , ,

*i j i j*

<sup>1</sup> Cur Ref

10 0

 

*j*

*N N*

*N*

*MSD*

course, the least complex of these would always be used for block matching.

macroblock form for the current video frame.(see photo)

Fig. 6. Macroblock view of the Frame

**1.2.1 Mean squared difference** 

$$\mathbf{G}\_{\mathbf{\varbeta}} = \mathbf{p1}^\* \mathbf{x} + \mathbf{q1} \tag{7}$$

However a combination of all the parameters is usually present. Global motion estimation involves calculation of the four parameters in the model (p0, p1, q0, q1). The parameters can be calculated by treating them as four unknowns. Hence, ideally sample motion vectors at four different locations can be used to calculate the four run known parameters. In practice though, more processing is needed to get good estimate for the parameters. Also, note that still local motion estimation, at least at four locations, is essential to calculate the global motion estimation parameters. However, there are algorithms for global motion estimation, which do not rely on local motion estimation. The above parametric model with four parameters cannot fit rotational global motion. For rotational motion a six-parameter model is needed. However, the same four-parameter model concepts can be extended to the six-parameter model.
