**3. Motion estimation and compensation**

With the continuous growth in the volume of video data in the multimedia databases, it has become crucial to reduce the quantity of the data to be transmitted and stored by video compression and coding. That is why, motion estimation is introduced as a solution to reduce the quantity of data by eliminating the temporal redundancy between adjacent frames in an image sequence. ME/MC are the fundamental parts of video coding systems and form the core of many video processing applications. Motion estimation eliminates temporal redundancy from video by exploiting the temporal correlation between successive frames, so that it reduces the amount of data to be transmitted or stored while maintaining sufficient data quality. However, ME extracts temporal motion information from video sequences, while MC uses this motion information for efficient interframe coding.

Motion estimation process serves to predict motion between two successive frames and produce the motion vectors (MVs) which represent the displacements between these two frames. Consequently, instead of transmitting two frames, we will send only one frame which is the reference frame, the motion vectors and the residue which is the difference between the current frame and the reconstructed frame by motion compensation. So, the MVs and the prediction error are transmitted instead of the frame itself. With this process, the encoder will have sufficient information to faithfully reproduce the frame sequence. The combination of the motion estimation and motion compensation is a key part of the video coding.

There are many methods to achieve ME/MC. In fact, They can be divided into two classes: the statistical methods, the differentials methods as indirect methods (applied to image features) and the optical flow, and the block based method as direct ones (applied to image pixels). Block matching algorithm (Gharavi, 1990) is an effective and popular technique for block based motion estimation. It has been widely adopted in various video coding standards and highly desirable since it maintains an acceptable prediction errors.

Block-based motion estimation is most used method because of its simplicity and performances, which made it the standard approach in the video coding systems. The procedure of BMA is to divide the frames into a block of N×N pixels, to match every block of the current frame (CF) with his most similar block inside a research window in the reference frame (RF) and to generate the motion vector. Consequently, for this method, the most important parameters here are the size of the block N and the size of the search

The fact that the DWT approximation contains the most of the information issued from the original image was encouraging to benefit of this DWT propriety. For this, the motion estimation was conducted principally in this subband which accelerates the motion

The discrete wavelet transform (DWT) as a powerful tool for signal processing has found its application in many areas of research. Image compression is still one of the most successful applications in which the DWT has been applied. So, it is natural that researchers are interested in creating a DWT based new technologies for video compression and motion

With the continuous growth in the volume of video data in the multimedia databases, it has become crucial to reduce the quantity of the data to be transmitted and stored by video compression and coding. That is why, motion estimation is introduced as a solution to reduce the quantity of data by eliminating the temporal redundancy between adjacent frames in an image sequence. ME/MC are the fundamental parts of video coding systems and form the core of many video processing applications. Motion estimation eliminates temporal redundancy from video by exploiting the temporal correlation between successive frames, so that it reduces the amount of data to be transmitted or stored while maintaining sufficient data quality. However, ME extracts temporal motion information from video sequences, while MC uses this motion information for efficient interframe

Motion estimation process serves to predict motion between two successive frames and produce the motion vectors (MVs) which represent the displacements between these two frames. Consequently, instead of transmitting two frames, we will send only one frame which is the reference frame, the motion vectors and the residue which is the difference between the current frame and the reconstructed frame by motion compensation. So, the MVs and the prediction error are transmitted instead of the frame itself. With this process, the encoder will have sufficient information to faithfully reproduce the frame sequence. The combination of the motion estimation and motion compensation is a key part of the video

There are many methods to achieve ME/MC. In fact, They can be divided into two classes: the statistical methods, the differentials methods as indirect methods (applied to image features) and the optical flow, and the block based method as direct ones (applied to image pixels). Block matching algorithm (Gharavi, 1990) is an effective and popular technique for block based motion estimation. It has been widely adopted in various video coding standards and highly desirable since it maintains an acceptable

Block-based motion estimation is most used method because of its simplicity and performances, which made it the standard approach in the video coding systems. The procedure of BMA is to divide the frames into a block of N×N pixels, to match every block of the current frame (CF) with his most similar block inside a research window in the reference frame (RF) and to generate the motion vector. Consequently, for this method, the most important parameters here are the size of the block N and the size of the search

estimation process.

estimation (Kutil, 2003).

coding.

coding.

prediction errors.

**3. Motion estimation and compensation** 

window P. However, the block matching is based on minimizing a criterion like the Mean Absolute Error (MAD) or the Mean Square Error (MSE) which is the most common block distortion measure for matching two blocks and it provides more accurate block matching. The MV will be applicable to every pixels of the same block which reduces the computational requirement.

To identify the best corresponding block, the simplest way is to evaluate every block in the reference frame (exhaustive search, ES). But, although this method finds generally the appropriate block, it consumes a high computation time. Hence, others fast searching strategies (Barjatya, 2004) have been developed where search is done in a particular order. There are the Three Step Search (TSS), the Simple and Efficient Search (SES), the Four Step Search (4SS), the Adaptive Rood Pattern Search (ARPS) and the Diamond Search (DS) which has proved to be the best searching strategies coming close to the ES results. So, the DS was improved in many variants such as the Cross DS (CDS), the Small CDS (SCDS) and the New CDS (NCDS).

In conventional coding systems such as H.261 and MPEG-1/2, BMA is conducted directly on frame which needs a large computing power. That is why many studies have been made and proved that it is better to transform the frame before executing the ME techniques. However, with the development of new video coding standards, wavelets have received an important interest since it has shown good and effective results. The main idea behind wavelet is to generate a space-frequency representation focusing only on the spatial frequencies that are most significant to the human eye. This wavelet decomposition is a reversible procedure which is performed by successive approximations of the initial information (original frame). This process, will improve the coding efficiency since the wavelet coefficients are much correlated and this representation reduces the blocking effects especially in the edges.

Initially, the DWT was used to encode the MVs and the estimation errors after conducting the motion estimation in the spatial or the frequency domains (Figure.4.a). Thereafter, given that the DWT is a spatial-frequency representation for the image that concentrates the most important information in one subband (DWT approximation subband) and since the different DWT subbands are hierarchically correlated, the DWT was used as a domain to conduce the motion estimation and it has shown a great success.

 (a) Conventional ME + DWT based MVs and ME errors encoding

 (b) Motion estimation in the wavelet domain

Wavelet Transform Based Motion Estimation and Compensation for Video Coding 29

The proposed method makes use of the wavelet properties to apply the motion estimation directly in the wavelet coefficients. We have adopted the fine-to-coarse motion estimation strategy which has shown its success by many previous works. After applying the DWT on both CF and RF, the motion is estimated firstly between the DWT approximations of the two images. So, we have provided a better estimation since the approximation contains the most visual information. The motion vectors of the approximation are directly calculated. We have exploited that every DWT coefficient has four descendants in the lower DWT level (Quadtree structure). So, the motion vectors of the details subbands are deducted using the hierarchical relationship that exists between the DWT subbands as shown in Figure 5. We compute the motion vectors of the details subbands following this

, ,1 , 2 (,) *L i V V xy <sup>i</sup> <sup>j</sup> L i <sup>j</sup>*

Working on a three level DWT (L=3), we will have i={1, 2, 3} which is the level, j={1, 2, 3, 4} representing the subband number, Vi,j(x, y) is the motion vector for the subband "j" at the

every subband block is the double of the displacement of the same subband block in the

Moreover, by predicting the motion only in the approximation which has a small size compared to the original frame and contains the most significant information, not only the

lower DWT level where we add to it a refinement factor *i*, *<sup>j</sup>*

Fig. 5. DWT subbands motion vectors representation (L=3)

error as given in the equation and presented in the Figure above.

is the refinement factor (equal to 0 if "i" is equal to L). The displacement of

(1)

which correct the estimation

formula:

level "i" and *i*, *<sup>j</sup>*

Exploiting the hierarchical relationship between the wavelet coefficients of the different subbands in different levels, different hierarchical ME methods were developed which are adapted to the wavelet transformation. The hierarchical relationship gives that every wavelet coefficients has four descendants in the lower level of the DWT. The motion estimation is conduced hierarchically so that it is calculated firstly in one of the DWT level and it is corrected with the estimation obtained, thereafter, at the others levels.

In fact, there are two main ME categories of approaches for DWT based: forward and backward approaches. The forward approach consists on conducting the ME in the DWT details subbands of the low level and using it to determine the motion in the higher level subbands (coarse-to-fine). Researchers like Meyer and al (Meyer, 1997) have followed the forward approach to propose a ME method with a new pyramid structure. They have taken the aliasing effect, caused by the BMA used, into consideration and build a ME system given a good perceptual quality after MC. Also, P.Y Cheng and al (Cheng, 1995) has proposed a multiscale forward ME working on the DWT coefficients. They have built a new pyramidal structure overcoming the shift variant problem of the DWT.

Nosratinia and Orchard (Nosratinia, 1995) were the first researchers who developed a ME system based on DWT following a backward approach (coarse-to-fine) where they estimated the motion in the finest DWT resolution (higher level) and then progressively refined the ME by incorporating the finer level. Furthermore, Conklin and Hemami (Conklin, 1997) have proved the superiority of the backward ME approach over the forward one in terms of compression rate and visual quality after compensation. This is what encourages more recent researchers (Lundmark, 2000; Yuan, 2002) to follow this approach in their ME systems.

The effectiveness of the BMA and the suitability of the DWT in the video coding, have led us to develop a block matching based motion estimation method in the wavelet domain.
