**5.2 Sub-pixel precision**

Block based motion estimation assumes that every block have an integer pixel displacement which is, in reality, not true. Therefore, to improve the motion estimation and to increase the accuracy of the prediction, we have moved to sub-pixel precision by developing a sub-pixel technique with a bilinear interpolation process. This is done by interposing a line between each two lines of the image I (see Figure.7) and a column between each two columns of the image. Then, ME is applied to the new image O.

Fig. 7. Bilinear Interpolation for 1/2 pixel precision

The values of the pixels that are in the 1/2 pixel positions are determined relatively to their neighbouring pixels in the integer positions as follows:

$$\mathbf{O} \ (2\mathbf{x}, 2\mathbf{y}) = \mathbf{I}(\mathbf{x}, \mathbf{y}) \tag{2}$$

Wavelet Transform Based Motion Estimation and Compensation for Video Coding 33

Fig. 8. Example of a DWT coefficients (*Haar* wavelet) for a 1-D signal s(n) and a shifted signal s(n+1) by one pixel. (a) original signal s(n), (b) shifted signal, (c) low-pass frequencies subband s(n), (d) high-pass frequencies subband of s(n), (e) low-pass frequencies subband of

In the Figure.8, s(n+1) is a shifted variant by one pixel (shifting to the right) of the 1-D signal s(n). As illustrated in this figure, the difference between the high-pass frequencies subband before and after shifting is much important than the low-pass frequencies subband before and after shifting. This is a simple and a 1-D signal example but it is also the case of the 2-D signal. Hence, this is reinforced more our choice to conduct ME in the approximation (low-

To overcome the shift-variant property of the DWT, a shifting technique is used which increase the prediction quality (Yuan, 2002). Before applying ME, we shift the frame in spatial domain by one pixel in all directions. Then, the shifted frames are transformed to the wavelet domain for motion estimation more precise and more real. After calculating a motion vector for the block in every direction, we generate the final motion vector which is

This technique has increased the estimation results by smoothing the predicted vectors and reducing the aliasing effect. By adding this technique to the ME process, the estimation was remarkably ameliorated as shown in the Table 2. However, this technique has improved the PSNR of the reconstructed image after MC for the Tennis sequence from 31.7586 dB to

**Precision Tennis Susie Foreman** 

Table 2. PSNR of the reconstructed image without/with the Shifting technique

**ME without shifting** 31.7586 33.1613 31.2889

**ME with shifting** 32.3164 35.5236 32.6301

s (n+1), (f) high-pass frequencies subband of s(n+1).

pass frequencies subband) of the DWT.

the mean of all calculated vectors.

 **Sequence** 

32.3164 dB.

$$\mathbf{O} \ (2\mathbf{x} + 1, 2\mathbf{y}) = (\mathbf{I}(\mathbf{x}, \mathbf{y}) + \mathbf{I}(\mathbf{x} + 1, \mathbf{y})) / 2 \tag{3}$$

$$\mathbf{O} \bullet (\mathbf{2x}, \mathbf{2y} + \mathbf{1}) = (\mathbf{I}(\mathbf{x}, \mathbf{y}) + \mathbf{I}(\mathbf{x}, \mathbf{y} + \mathbf{1})) / 2 \tag{4}$$

$$\mathcal{O}\left(2\mathbf{x}+\mathbf{1}, 2\mathbf{y}+\mathbf{1}\right) = \left(\mathbf{I}(\mathbf{x}, \mathbf{y}) + \mathbf{I}(\mathbf{x}+\mathbf{1}, \mathbf{y})\right)\mathbf{I}(\mathbf{x}, \mathbf{y}+\mathbf{1}) + \mathbf{I}(\mathbf{x}+\mathbf{1}, \mathbf{y}+\mathbf{1}) \Big|\,/\ 4\tag{5}$$

With this technique, a motion vector can point in a half or quarter of pixel position or even more. In this case, a block which has a real location at a fraction of pixels will be better predicted. The sub-pixel precision can not only increase the accuracy of motion vectors and reduce errors, but also filter the image to eliminate noise and rapid changes. The results of conducting the ME on some standard video sequences shown on the table bellow prove the efficiency of the sub-pixel precision technique.


Table 1. PSNR of the reconstructed image with different sub-pixel precision

Using the sub-pixel technique as a pre-treatment step for the motion estimation process will improve it. Taken the Tennis sequence results in Table.1, the Peak Signal to Noise Ratio (PSNR), which is a criterion to compare the original frame to the reconstructed frame after motion compensation, is augmented from 31.7586 dB without using the sub-pixel technique to 34.2206 dB with a 1/2 of pixel precision and to 34.7099 dB with a 1/4 of pixel precision. This confirms the need to this technique for motion estimation. It should been noticed here that augmenting the sub-pixel precision level (to 1/8 of pixel precision or more) is not always beneficial since it can, in the most times, perturb the estimation.

That is true that this technique causes a doubling of image size, but is not a big problem since we conduct the motion estimation on the DWT approximation which has a reduced size. Furthermore, this technique saves time since it allows a quick search for the BMA by minimizing the path to find the corresponding block. For all this, in block based ME methods, sub-pixel technique is becoming crucial.

#### **5.3 Shifting technique**

The DWT has many advantages of multiresolution domain, which has made this spatialfrequency transformation very useful for the ME. However, the shift-variant property of the DWT caused by the decimation process has made the ME/MC less efficient in the wavelet domain. Otherwise, there is a big difference between the DWT of an image and the DWT of the same image shifted by even one pixel as shown in the Figure.8. This property touches especially the high frequencies in the image's edges, but it has less effect on the low frequencies.

With this technique, a motion vector can point in a half or quarter of pixel position or even more. In this case, a block which has a real location at a fraction of pixels will be better predicted. The sub-pixel precision can not only increase the accuracy of motion vectors and reduce errors, but also filter the image to eliminate noise and rapid changes. The results of conducting the ME on some standard video sequences shown on the table bellow prove the

**Precision Tennis Susie Foreman Integer pixel** 31.7586 33.1613 31.2889 **1/2 of pixel** 34.2206 37.8811 33.6719

**1/4 of pixel** 34.7099 40.0285 36.6072

**1/8 of pixel** 31.5650 37.4465 37.7870

Using the sub-pixel technique as a pre-treatment step for the motion estimation process will improve it. Taken the Tennis sequence results in Table.1, the Peak Signal to Noise Ratio (PSNR), which is a criterion to compare the original frame to the reconstructed frame after motion compensation, is augmented from 31.7586 dB without using the sub-pixel technique to 34.2206 dB with a 1/2 of pixel precision and to 34.7099 dB with a 1/4 of pixel precision. This confirms the need to this technique for motion estimation. It should been noticed here that augmenting the sub-pixel precision level (to 1/8 of pixel precision or more) is not

That is true that this technique causes a doubling of image size, but is not a big problem since we conduct the motion estimation on the DWT approximation which has a reduced size. Furthermore, this technique saves time since it allows a quick search for the BMA by minimizing the path to find the corresponding block. For all this, in block based ME

The DWT has many advantages of multiresolution domain, which has made this spatialfrequency transformation very useful for the ME. However, the shift-variant property of the DWT caused by the decimation process has made the ME/MC less efficient in the wavelet domain. Otherwise, there is a big difference between the DWT of an image and the DWT of the same image shifted by even one pixel as shown in the Figure.8. This property touches especially

the high frequencies in the image's edges, but it has less effect on the low frequencies.

Table 1. PSNR of the reconstructed image with different sub-pixel precision

always beneficial since it can, in the most times, perturb the estimation.

methods, sub-pixel technique is becoming crucial.

**5.3 Shifting technique** 

efficiency of the sub-pixel precision technique.

 **Sequence** 

O (2x, 2y)= I(x, y) (2)

O (2x+1,2y)= (I(x, y)+I(x+1,y))/2 (3)

O (2x, 2y+1)= (I(x, y)+I(x,y+1))/2 (4)

O (2x+1,2y+1)= (I(x, y)+I(x+1,y) I(x,y+1)+I(x+1,y+1))/4 (5)

Fig. 8. Example of a DWT coefficients (*Haar* wavelet) for a 1-D signal s(n) and a shifted signal s(n+1) by one pixel. (a) original signal s(n), (b) shifted signal, (c) low-pass frequencies subband s(n), (d) high-pass frequencies subband of s(n), (e) low-pass frequencies subband of s (n+1), (f) high-pass frequencies subband of s(n+1).

In the Figure.8, s(n+1) is a shifted variant by one pixel (shifting to the right) of the 1-D signal s(n). As illustrated in this figure, the difference between the high-pass frequencies subband before and after shifting is much important than the low-pass frequencies subband before and after shifting. This is a simple and a 1-D signal example but it is also the case of the 2-D signal. Hence, this is reinforced more our choice to conduct ME in the approximation (lowpass frequencies subband) of the DWT.

To overcome the shift-variant property of the DWT, a shifting technique is used which increase the prediction quality (Yuan, 2002). Before applying ME, we shift the frame in spatial domain by one pixel in all directions. Then, the shifted frames are transformed to the wavelet domain for motion estimation more precise and more real. After calculating a motion vector for the block in every direction, we generate the final motion vector which is the mean of all calculated vectors.

This technique has increased the estimation results by smoothing the predicted vectors and reducing the aliasing effect. By adding this technique to the ME process, the estimation was remarkably ameliorated as shown in the Table 2. However, this technique has improved the PSNR of the reconstructed image after MC for the Tennis sequence from 31.7586 dB to 32.3164 dB.


Table 2. PSNR of the reconstructed image without/with the Shifting technique

Wavelet Transform Based Motion Estimation and Compensation for Video Coding 35

This technique is very powerful since it corrects the motion vectors by a hierarchical procedure based on modifying the block sizes. It provides a good estimation and tries to

Another refinement technique is also carried out for our method, which consists on moving the estimation to a lower level (larger resolution) of the DWT. This process is not performed for all blocks, but it runs only on poorly predicted blocks. The refinement will re-estimate the motion of the blocks that has an error greater than certain threshold. This technique has

**Precision Tennis Susie Foreman** 

**the block size** 32.0609 33.0652 17.6722

**lower DWT level** 32.6278 34.3762 17.9133

As presented in the Table above, the second refinement technique has better results, which

Table 3. PSNR of the reconstructed image with different refinement techniques

**ME without refinement** 31.7586 32.5908 17.7091

Fig. 10. Motion estimation with different block size strategies

given a more accurate estimation prediction quality.

 **Sequence** 

**ME + refinement with changing** 

**ME + refinement by moving to a** 

have encouraged us to use it in our method.

minimize the error by taking into account the intra-block movements.

## **5.4 Blocks overlapping technique**

Supplementary technique for improving the motion estimation is to overlap the neighbouring block to smooth the motions vectors in a way to have a more real prediction (as shown in Figure.9). So, each motion vector will be the average of itself and the direct neighbouring motion vectors with a certain weighting (every MV will have a weight stronger than the weight of the MVs of the neighbouring blocks).

Fig. 9. Correcting the MVs with blocks overlapping technique

This blocks overlapping technique will surmount the false prediction especially the discontinuity at the edges which gives the high frequencies in the estimated image. This is done since the technique is somewhat averaging the possible candidates for each pixel and correcting then a probable false estimation. Hence, this technique will make the visual quality more clear and net.

### **5.5 Refinement techniques**

The basic idea in the BMA is to divide the frame into blocks of a fixed size N×N. This means that all the pixels of the same block have the same displacement. But, this is not true in most cases, since there may be different movements in the same block (movements intra-block).

For this, we have developed two techniques which aim to take into consideration this problem and give each image pixel a MV representing its real movement.

The first technique consists on dividing the blocks which are poorly predicted and conducting a re-estimation on the new sub-blocks. This will fix the blocks size relatively to the movements and we will build then a variable block size ME system (see Figure.10) as develop by Arvanitidou et al (Arvanitidou, 2009).

Supplementary technique for improving the motion estimation is to overlap the neighbouring block to smooth the motions vectors in a way to have a more real prediction (as shown in Figure.9). So, each motion vector will be the average of itself and the direct neighbouring motion vectors with a certain weighting (every MV will have a weight

This blocks overlapping technique will surmount the false prediction especially the discontinuity at the edges which gives the high frequencies in the estimated image. This is done since the technique is somewhat averaging the possible candidates for each pixel and correcting then a probable false estimation. Hence, this technique will make the visual

The basic idea in the BMA is to divide the frame into blocks of a fixed size N×N. This means that all the pixels of the same block have the same displacement. But, this is not true in most cases, since there may be different movements in the same block (movements

For this, we have developed two techniques which aim to take into consideration this

The first technique consists on dividing the blocks which are poorly predicted and conducting a re-estimation on the new sub-blocks. This will fix the blocks size relatively to the movements and we will build then a variable block size ME system (see Figure.10) as

problem and give each image pixel a MV representing its real movement.

develop by Arvanitidou et al (Arvanitidou, 2009).

stronger than the weight of the MVs of the neighbouring blocks).

Fig. 9. Correcting the MVs with blocks overlapping technique

quality more clear and net.

**5.5 Refinement techniques** 

intra-block).

**5.4 Blocks overlapping technique** 

Fig. 10. Motion estimation with different block size strategies

This technique is very powerful since it corrects the motion vectors by a hierarchical procedure based on modifying the block sizes. It provides a good estimation and tries to minimize the error by taking into account the intra-block movements.

Another refinement technique is also carried out for our method, which consists on moving the estimation to a lower level (larger resolution) of the DWT. This process is not performed for all blocks, but it runs only on poorly predicted blocks. The refinement will re-estimate the motion of the blocks that has an error greater than certain threshold. This technique has given a more accurate estimation prediction quality.


Table 3. PSNR of the reconstructed image with different refinement techniques

As presented in the Table above, the second refinement technique has better results, which have encouraged us to use it in our method.

Wavelet Transform Based Motion Estimation and Compensation for Video Coding 37

observe that when the motion estimation is applied on the DCT domain, block effects appeared. On the other hand, using the classical DWT domain, there are also blocks effects, despite its superiority to the DCT domain. Our method gives a better visual quality that resembles to the quality of the reconstructed frame by the spatial domain

Fig. 11. The ME/MC results on the 129th frame of the "foreman" sequence. (a) The original image. The estimated frame: (b) ME/MC in the DCT domain, (c) ME/MC in the DWT

The efficiency of our motion estimation method is well confirmed by the results, in the visual qualities of the reconstructed frames, reached by applying the ME/MC on the Tennis sequence conducted in several domains. The results mentored in Figure.12 consolidate the fact that our motion estimation method outperforms other motion estimations methods

based ME/MC system.

domain, (d) with our method.

conducted in different domains.

All these techniques have united to improve our methods which make it fast, efficient and accurate. In addition, we can even exploit the human visual system and remove the small variations not recognized by the human eye between the two frames. The motion vectors and the prediction error can be encoded after transformed by DWT using the Embedded Zerotree wavelet algorithm (EZW) developed by Shapiro (Shapiro, 1993) or by the Set Partitioning in Hierarchical Trees Algorithm (SPIHT) developed by Said and Pearlman (Said, 1996) which are algorithms that exploit the wavelet structure for an efficient coding.
