**3. Characteristics of the Human Visual System**

Research has shown that embedding the Human Visual System (HVS) model into compression algorithms yields significant improvement in the visual quality of the reconstructed images (Aili et al., 2006; Antonini et al., 1992; Bradley, 1999; HSontsch & Karam, 2000; Nadenau et al., 2003; Sheikh Akbari & Soraghan, 2003; Thornton et al., 2002; Voukelatos & Soraghan, 1997). It has been shown in (Bradley, 1999; Nadenau et al., 2003) that the performance of image compression techniques can be significantly improved by exploiting the limitations of the HVS for compression purposes. To achieve this aim, the HVS-model can be embedded in the compression algorithm to optimise the perceived visual quality.

The performance of image compression techniques can also be significantly improved by embedding the properties of the Human Visual System (HVS) in their compression algorithms (Bradley, 1999; Nadenau et al., 2003). Due to the space–frequency localization properties of wavelet transforms, wavelet based image codecs are most suitable for embedding the HVS model in their coding algorithm (Bradley, 1999). The HVS model can be embedded either in the quantization stage (Aili et al., 2006; HSontsch & Karam, 2000; Nadenau et al., 2003), or at the bit allocation stage (Antonini et al., 1992; Sheikh Akbari & Soraghan, 2003; Thornton et al., 2002; Voukelatos & Soraghan, 1997) of the wavelet based encoders. In this chapter, HVS coefficients for wavelet high frequency subbands are calculated and their application in improving the coding performance of the statistical

The main goal of all image compression techniques is to minimize the number of bits required to represent a digital image, while preserving an acceptable level of image quality. Image data are amendable to compression due to the spatial redundancies they exhibit and also because they contain information that, from a perceptual point of view, can be considered irrelevant. Many standard and non-standard image compression techniques have been developed to compress digital images. These techniques exploit some or all of these image properties to improve the quality of the decoded images at higher compression

Image compression techniques can be classified into two main groups, named: lossless and lossy compression techniques. In lossless compression process, the original data and the reconstructed data must be identical for each and every data sample. Lossless compression is demanded in different applications such as: medical imagery, i.e. cardiography, to avoid the loss of data and errors introduced into the imagery. Also, it is applied to the case that is

In most image processing applications, there is no need for the reconstructed data to be identical in value with its original. Therefore, some amount of loss is permitted in the reconstructed data. This kind of compression techniques, which results in an imperfect reconstruction, is called lossy compression. By using lossy compression, it is possible to represent the image with some loss using fewer bits in comparison to a lossless

Research has shown that embedding the Human Visual System (HVS) model into compression algorithms yields significant improvement in the visual quality of the reconstructed images (Aili et al., 2006; Antonini et al., 1992; Bradley, 1999; HSontsch & Karam, 2000; Nadenau et al., 2003; Sheikh Akbari & Soraghan, 2003; Thornton et al., 2002; Voukelatos & Soraghan, 1997). It has been shown in (Bradley, 1999; Nadenau et al., 2003) that the performance of image compression techniques can be significantly improved by exploiting the limitations of the HVS for compression purposes. To achieve this aim, the HVS-model can be embedded in the compression algorithm to optimise the perceived visual

ratios. Some of these image coding schemes are tabulated in Table 1.

not possible to determine the acceptable loss of data.

**3. Characteristics of the Human Visual System** 

encoder is investigated.

compression.

quality.

**2. Fundamental of compression** 


Table 1. Standard and non-standard image compression techniques.

Fig. 1. The CSF curves for the luminance and chrominance channels of the HVS (Nadenau et al., 2003).

Due to the complexity of the human visual processing system, assessments of the performance of HVS-models are based on psychophysical observations. Physiologists have performed many psycho-visual experiments with the goal of understanding how the HVS works. One of the limitations of the HVS, which was found experimentally, is the lower sensitivity of the HVS for patterns with high spatial-frequencies. Exploiting this property of the HVS model, and embedding it into compression algorithms, can significantly improve the visual quality of compressed images. Natural images are composed of small details and

Wavelet Based Image Compression Techniques 427

Squared Error (MSE) distortion criterion, where the weights are determined based on the property of the HVS introduced in (Campbell & Robson, 1968). (Voukelatos & Soraghan, 1998) introduced another wavelet based image compression technique using VQ and the properties of the HVS. They first calculated the value of the Contrast Sensitivity Function (CSF) for the central spatial frequency of each subband. These values were then used to scale the threshold value for each subband, which were used in vector selection prior to the VQ process. A weighted MSE distortion criterion using perceptual weights is also employed to allocate bits among different subbands. Voukelatos and Soraghan reported significant improvement over existing block-based image compression techniques at very low bitrates. Thornton et al (Thornton et al., 2002) extended the Voukelatos and Soraghan's algorithm (Voukelatos & Soraghan, 1998) to video for very low bitrate transmission. Thornton et al. incorporated the properties of the HVS to code the intra-frames and reported significant improvement in objective visual quality of the decompressed video sequences. Sheikh Akbari and Soraghan (Sheikh Akbari & Soraghan, 2003) developed another wavelet based video compression scheme using the VQ scheme and the properties of the HVS. They calculated the value of the CSF for the central spatial frequency of each subband of the Quarter Common Intermediate (QCIF) image size. These values were then used to scale the threshold value for each subband, which were used in vector selection prior to the VQ

The JPEG 2000 standard image codec supports two types of visual frequency weighting: Fixed Visual Weighting (FVW) and Visual Progressive Coding or Visual Progressive Weighting (VPW). In FVW, only one set of CSF weights is chosen and applied in accordance with the viewing conditions, and in the VPW, different sets of CSF weights are used at the various stages of the embedded coding. This is because during a progressive transmission stage, the image is viewed at various distances. For example, at low bitrates, the image is viewed from a relatively large distance, while as more bits are received, the quality of the reconstructed image is increased, which implies that the viewer looks at the image from a closer distance (Skodras et al, 2001). Nadenau et al. incorporated the characteristic of the HVS into a wavelet-based image compression algorithm using a noise-shape filtering stage prior to the quantization stage (Nadenau et al., 2003). They filtered the transformed coefficients using a "HVS filter" for each subband. This algorithm improves the compression ratio up to 30% over the JPEG2000 baseline for a number of test images. A new image compression method based on the HVS was proposed by Aili et al. (Aili et al., 2006). In this codec, the input image is first decomposed using a wavelet transform, and then the transformed coefficients in different subbands are weighted by the peak of the contrast sensitivity function (CSF) curve in the wavelet domain. Finally the weighted wavelet coefficients were coded using the Set Partitioning in Heretical Tree (SPIHT) algorithm. This technique showed significantly higher visual and almost the same objective quality to that

In this section, the perceptual weights that regulate the quantization steps in different image compression techniques are specifically calculated for a Quarter Common Intermediate Format (QCIF) image size. The derivation of the weighting factors is based on the results of

subjective experimental data that was presented in (Van Dyck & Rajala, 1994).

process and also in the bit allocation among different subbands.

of the conventional SPIHT technique.

**3.2 Calculation of perceptual weights** 

shaped regions. Therefore, it is necessary to describe the contrast sensitivity as a function of spatial frequency. This phenomenon has been known as the Contrast Sensitivity Function (CSF) (Nadenau et al, 2003; Tan et al, 2004). Figure 1 shows the CSF curves for the luminance and chrominance channels of the HVS. From Figure 1, it can be seen that the HVS is more sensitive to the luminance component than the chrominance components. The sensitivity of the HVS in terms of luminance is greatest around the mid-frequencies, in the region of 4 cycles per optical degree (cpd). It rapidly reduces at higher spatial frequencies, and slightly decreases at lower frequencies. The HVS, in terms of chrominance components behaves like a low passfilter; therefore there is no decrease in its sensitivity at low frequencies.

Fig. 2. A low frequency pattern (left) and a high frequency pattern (right), the high frequency pattern appear less intense.

To give a sense of the sensitivity of HVS to different frequencies, two black and white patterns are shown in Figure 2, a low frequency pattern on the left and a high frequency pattern on the right. In both patterns, the black and white have the same brightness, but the black and white colours of the right hand pattern appears less intense than the pattern in the left side. This can be explained by the fact that the HVS is less sensitive to high frequency components.

#### **3.1 Human Visual System in compression techniques**

Wavelet-based image coding schemes have proven to be ideally suited for embedding complete HVS models, due to the space–frequency localization properties of the wavelet decompositions (Bradley, 1999). The HVS model has been embedded either at the quantization stage (Aili et al., 2006; HSontsch & Karam, 2000; Nadenau et al., 2003), or at the bit allocation stage (Antonini et al., 1992; Sheikh Akbari & Soraghan, 2003; Thornton et al., 2002; Voukelatos & Soraghan, 1997) of the codec, which yields significant improvement in the visual quality of the reconstructed images. Antonini et al. (Antonini et al., 1992) developed a wavelet-based image compression scheme using Vector Quantization (VQ) and the property of the HVS. This algorithm performs a Discrete Wavelet Transform (DWT) on the input image and then the resulting coefficients in different subbands are vector quantized. The bit allocation among different subbands is based on a weighted Mean

shaped regions. Therefore, it is necessary to describe the contrast sensitivity as a function of spatial frequency. This phenomenon has been known as the Contrast Sensitivity Function (CSF) (Nadenau et al, 2003; Tan et al, 2004). Figure 1 shows the CSF curves for the luminance and chrominance channels of the HVS. From Figure 1, it can be seen that the HVS is more sensitive to the luminance component than the chrominance components. The sensitivity of the HVS in terms of luminance is greatest around the mid-frequencies, in the region of 4 cycles per optical degree (cpd). It rapidly reduces at higher spatial frequencies, and slightly decreases at lower frequencies. The HVS, in terms of chrominance components behaves like a low pass-

filter; therefore there is no decrease in its sensitivity at low frequencies.

Fig. 2. A low frequency pattern (left) and a high frequency pattern (right), the high

To give a sense of the sensitivity of HVS to different frequencies, two black and white patterns are shown in Figure 2, a low frequency pattern on the left and a high frequency pattern on the right. In both patterns, the black and white have the same brightness, but the black and white colours of the right hand pattern appears less intense than the pattern in the left side. This can be explained by the fact that the HVS is less sensitive to high frequency

Wavelet-based image coding schemes have proven to be ideally suited for embedding complete HVS models, due to the space–frequency localization properties of the wavelet decompositions (Bradley, 1999). The HVS model has been embedded either at the quantization stage (Aili et al., 2006; HSontsch & Karam, 2000; Nadenau et al., 2003), or at the bit allocation stage (Antonini et al., 1992; Sheikh Akbari & Soraghan, 2003; Thornton et al., 2002; Voukelatos & Soraghan, 1997) of the codec, which yields significant improvement in the visual quality of the reconstructed images. Antonini et al. (Antonini et al., 1992) developed a wavelet-based image compression scheme using Vector Quantization (VQ) and the property of the HVS. This algorithm performs a Discrete Wavelet Transform (DWT) on the input image and then the resulting coefficients in different subbands are vector quantized. The bit allocation among different subbands is based on a weighted Mean

frequency pattern appear less intense.

**3.1 Human Visual System in compression techniques** 

components.

Squared Error (MSE) distortion criterion, where the weights are determined based on the property of the HVS introduced in (Campbell & Robson, 1968). (Voukelatos & Soraghan, 1998) introduced another wavelet based image compression technique using VQ and the properties of the HVS. They first calculated the value of the Contrast Sensitivity Function (CSF) for the central spatial frequency of each subband. These values were then used to scale the threshold value for each subband, which were used in vector selection prior to the VQ process. A weighted MSE distortion criterion using perceptual weights is also employed to allocate bits among different subbands. Voukelatos and Soraghan reported significant improvement over existing block-based image compression techniques at very low bitrates. Thornton et al (Thornton et al., 2002) extended the Voukelatos and Soraghan's algorithm (Voukelatos & Soraghan, 1998) to video for very low bitrate transmission. Thornton et al. incorporated the properties of the HVS to code the intra-frames and reported significant improvement in objective visual quality of the decompressed video sequences. Sheikh Akbari and Soraghan (Sheikh Akbari & Soraghan, 2003) developed another wavelet based video compression scheme using the VQ scheme and the properties of the HVS. They calculated the value of the CSF for the central spatial frequency of each subband of the Quarter Common Intermediate (QCIF) image size. These values were then used to scale the threshold value for each subband, which were used in vector selection prior to the VQ process and also in the bit allocation among different subbands.

The JPEG 2000 standard image codec supports two types of visual frequency weighting: Fixed Visual Weighting (FVW) and Visual Progressive Coding or Visual Progressive Weighting (VPW). In FVW, only one set of CSF weights is chosen and applied in accordance with the viewing conditions, and in the VPW, different sets of CSF weights are used at the various stages of the embedded coding. This is because during a progressive transmission stage, the image is viewed at various distances. For example, at low bitrates, the image is viewed from a relatively large distance, while as more bits are received, the quality of the reconstructed image is increased, which implies that the viewer looks at the image from a closer distance (Skodras et al, 2001). Nadenau et al. incorporated the characteristic of the HVS into a wavelet-based image compression algorithm using a noise-shape filtering stage prior to the quantization stage (Nadenau et al., 2003). They filtered the transformed coefficients using a "HVS filter" for each subband. This algorithm improves the compression ratio up to 30% over the JPEG2000 baseline for a number of test images. A new image compression method based on the HVS was proposed by Aili et al. (Aili et al., 2006). In this codec, the input image is first decomposed using a wavelet transform, and then the transformed coefficients in different subbands are weighted by the peak of the contrast sensitivity function (CSF) curve in the wavelet domain. Finally the weighted wavelet coefficients were coded using the Set Partitioning in Heretical Tree (SPIHT) algorithm. This technique showed significantly higher visual and almost the same objective quality to that of the conventional SPIHT technique.

### **3.2 Calculation of perceptual weights**

In this section, the perceptual weights that regulate the quantization steps in different image compression techniques are specifically calculated for a Quarter Common Intermediate Format (QCIF) image size. The derivation of the weighting factors is based on the results of subjective experimental data that was presented in (Van Dyck & Rajala, 1994).

#### **3.2.1 Calculation of spatial frequencies**

The perceptual coding model is designed for a QCIF image size, thus this corresponds to a physical dimension of 1.8 2.2 inches on the workstation monitor, i.e. videophone display. Therefore, the pixel resolution *r* , which is measured in pixels per inch, in both the horizontal and vertical dimensions, will be 80 pixels/inch. Let us assume the viewing distance *v*, which is measured in metres, be 0.30 metres. This distance is a good approximation of the natural viewing distance of a human using a videophone device. The sampling frequency, *fs* in pixels per degree, can be then calculated using Equation 1 (Nadenau et al., 2003):

$$f\_S = \frac{2 \text{ v} \tan(0.5^\circ) r}{0.0254} \tag{1}$$

Wavelet Based Image Compression Techniques 429

HL1

X 9.2193

6.5190

2.0615

HH1

LH1

X 6.5190

X 3.2595

X 1.8582

**3.2.3 Perceptual weight factors** 

8.246

4.123

X 2.1865

HL3 HH3

HL2

X 1.8582

LH3

Fig. 3. Subband centre spatial frequencies in cycles/degree.

C.I.E. XYZ space using the equations in 5: (Ghanbari, 1999).

X 4.3731 HH2

LH2

X 3.2595

6.1845

4.123 8.246 <sup>0</sup>

The perceptual weight for each subband is the reciprocal of its mean detection threshold. Hence, the mean detection thresholds for the YIQ space need to be calculated before the perceptual weights can be determined. The mean detection thresholds in the xyY space for the centre frequencies of the subbands shown in Figure 3 are first calculated by linearly interpolating the values in Table 2. In wavelet decomposition, the diagonal subbands (HH) do not discriminate between left and right, so an average of the two values is employed. The resulting thresholds in the *xy*Y space for the centre of the high frequency subbands are listed in Table 4. By using equations 3 and 4, two chromaticity coordinates *xyY i i* , , <sup>0</sup> , where *i* 1,2 for each subband can be calculated. These two chromaticity coordinates are in the xyY space. Therefore they are converted from the xyY space to the

The signal is critically down-sampled at Nyquist rate to 0.5 cycle/pixel. Hence the maximum frequency represented in the signal is:

$$f\_{\text{max}} = 0.5f\_S \tag{2}$$

Thus the maximum frequency represented in the QCIF image size with the thirty centimetre distance will be 8.246 cycles/degree. The centre radial frequency for each subband is determined by the Euclidean distance of its centre from the origin where subbands are in a square of length 8.246 and the base-band is in the origin. Figure 3 shows the centre radial frequencies for each sub-band of a three level wavelet decomposition.

#### **3.2.2 Mean detection threshold**

The mean detection threshold is the smallest change in a colour that is noticeable by a human observer and is used to calculate the perceptual weighting factors. It is a function of spatial frequency, orientation, luminance and background colour. The initial data presented in (Van Dyck & Rajala, 1994) was measured in the *xyY* colour space, where x and y are the C.I.E. chromaticity coordinates and Y is the luminance. Table 2 gives the set of thresholds for various frequencies and orientations measured along the luminance, Red-Green and Blue-Yellow directions when the luminance value Y0 is <sup>2</sup> 5 / *cd m* and background colour is white. The chromaticity coordinates for white are: *x y* 0 0 , 0.33 , 0.35 . For transition along the Red-Green and Blue-Yellow direction each mean detection threshold gives two chromaticity coordinates corresponding to the maximum and minimum of the sinusoidal variation as shown in equations 3 and 4, respectively.

$$
\begin{array}{ccccccccc}
\text{ax}\_{i} &=& \text{ax}\_{0} & \pm & \Delta \mathbf{x} & \dots & t & & & & & & \mathbf{x} \end{array}
\tag{3}
$$

<sup>0</sup> . *<sup>i</sup> y y yt* (4)

where *t* is the mean detection threshold, *x* and *y* are the step sizes for the changes in the *x* and *y* direction. The values used for *x* and *y* for all three directions are given in Table 3.

The perceptual coding model is designed for a QCIF image size, thus this corresponds to a physical dimension of 1.8 2.2 inches on the workstation monitor, i.e. videophone display. Therefore, the pixel resolution *r* , which is measured in pixels per inch, in both the horizontal and vertical dimensions, will be 80 pixels/inch. Let us assume the viewing distance *v*, which is measured in metres, be 0.30 metres. This distance is a good approximation of the natural viewing distance of a human using a videophone device. The sampling frequency, *fs* in pixels per degree, can be then calculated using Equation 1

> 2 tan(0.5 ) 0.0254 *<sup>S</sup> v r <sup>f</sup>*

The signal is critically down-sampled at Nyquist rate to 0.5 cycle/pixel. Hence the

Thus the maximum frequency represented in the QCIF image size with the thirty centimetre distance will be 8.246 cycles/degree. The centre radial frequency for each subband is determined by the Euclidean distance of its centre from the origin where subbands are in a square of length 8.246 and the base-band is in the origin. Figure 3 shows the centre radial

The mean detection threshold is the smallest change in a colour that is noticeable by a human observer and is used to calculate the perceptual weighting factors. It is a function of spatial frequency, orientation, luminance and background colour. The initial data presented in (Van Dyck & Rajala, 1994) was measured in the *xyY* colour space, where x and y are the C.I.E. chromaticity coordinates and Y is the luminance. Table 2 gives the set of thresholds for various frequencies and orientations measured along the luminance, Red-Green and Blue-Yellow directions when the luminance value Y0 is <sup>2</sup> 5 / *cd m* and background colour is white. The chromaticity coordinates for white are: *x y* 0 0 , 0.33 , 0.35 . For transition along the Red-Green and Blue-Yellow direction each mean detection threshold gives two chromaticity coordinates corresponding to the maximum and minimum of the

where *t* is the mean detection threshold, *x* and *y* are the step sizes for the changes in the *x* and *y* direction. The values used for *x* and *y* for all three directions are given in

frequencies for each sub-band of a three level wavelet decomposition.

sinusoidal variation as shown in equations 3 and 4, respectively.

(1)

max 0.5 *<sup>S</sup> f f* (2)

<sup>0</sup> . *<sup>i</sup> x x xt* (3)

<sup>0</sup> . *<sup>i</sup> y y yt* (4)

**3.2.1 Calculation of spatial frequencies** 

maximum frequency represented in the signal is:

**3.2.2 Mean detection threshold** 

Table 3.

(Nadenau et al., 2003):

Fig. 3. Subband centre spatial frequencies in cycles/degree.

#### **3.2.3 Perceptual weight factors**

The perceptual weight for each subband is the reciprocal of its mean detection threshold. Hence, the mean detection thresholds for the YIQ space need to be calculated before the perceptual weights can be determined. The mean detection thresholds in the xyY space for the centre frequencies of the subbands shown in Figure 3 are first calculated by linearly interpolating the values in Table 2. In wavelet decomposition, the diagonal subbands (HH) do not discriminate between left and right, so an average of the two values is employed. The resulting thresholds in the *xy*Y space for the centre of the high frequency subbands are listed in Table 4. By using equations 3 and 4, two chromaticity coordinates *xyY i i* , , <sup>0</sup> , where *i* 1,2 for each subband can be calculated. These two chromaticity coordinates are in the xyY space. Therefore they are converted from the xyY space to the C.I.E. XYZ space using the equations in 5: (Ghanbari, 1999).

Wavelet Based Image Compression Techniques 431

1 /

*i ii i*

For the luminance direction each mean detection threshold also provides two XYZ values

0

*<sup>X</sup> X X Yt*

0

*Y Y Yt*

0

*<sup>Z</sup> Z Z Yt*

where *Y* is given in Table 3, *t* is the mean detection threshold and *i* 1,2 . The vector 00 0 *XYZ* , , contains the coordinates of the white point, computed from equation 6. The resulting values are then transformed into the YIQ space. The Red-Green line lies approximately in the I-direction and the Blue-Yellow line lies mostly in the Q direction. The linear transformations in equations 7 and 8 are used to give two points for

> 1.910 0.533 0.288 0.985 2.000 0.028 . 0.058 0.118 0.896

0.299 0.587 0.114 0.596 0.274 0.322 . 0.211 0.523 0.312

The YIQ mean detection threshold for each direction is the inverse Euclidean distance between these two points. The computed weighting factors for each subband of QCIF video, based on the properties of the HVS, are shown in Table 5. These values represent the perceptual weights that can be used to regulate the quantization step-size in the pixel quantization of the high frequency subbands' coefficients of the Multiresolution based

Statistical parameters of the image data have been used in a number of image compression techniques (Chang & Chen, 1993; Lu et al., 2000; Lu et al., 2002; Saryazdi & Jafari, 2002) and have demonstrated promising improvement in the quality of decompressed images, especially at medium to high compression ratios. A vector quantization based image

*Y R I G Q B* 

*R X G Y B Z* 

*Z x yY y*

0

0

. .

*Y*

0

0

. .

*Y*

.

0

(5)

(6)

(7)

(8)

 0

1,2

*ii i*

*X xY y*

*i*

*i*

*i*

*for i*

that are calculated using the equations in 6:

each direction in the YIQ space.

image/video codecs.

**4. Statistical parameters in image compression** 

/


Table 2. Mean detection thresholds in xyY space (Van Dyck 1994).


Table 3. Step size for changes in each direction (Van Dyck 1994).


Table 4. Mean detection thresholds in *xy*Y space for subbands.

Spatial frequency cycles/deg

1.0 2.0 5.0 10.0 20.0

Luminance 6.750 6.330 7.250 13.500 65.083 R-G 4.750 4.750 7.617 17.417 77.417 B-Y 6.000 6.833 32.667 70.167 150.000

Luminance 6.833 6.250 6.833 22.500 77.800 R-G 5.583 7.083 9.250 23.000 90.375 B-Y 6.667 9.417 31.833 65.700 150.000

Luminance 7.667 6.917 11.167 37.083 49.000 R-G 7.917 7.167 16.083 37.500 100.750 B-Y 12.417 18.500 45.500 86.500 150.000

Luminance 8.083 7.583 9.167 42.583 85.750 R-G 7.750 6.333 13.833 35.417 103.500 B-Y 13.750 19.750 47.750 83.000 114.000

Direction Y *x y* Luminance 0.0124 0.0 0.0 R-G 0.0 0.000655 -0.000357 B-Y 0.0 0.000283 0.000689

Luminance R-G B-Y

Spatial Direction

Horizontal (LH)

> Vertical (HL)

Left Diagonal (HH)

Right Diagonal (HH)

Colour Direction

Table 2. Mean detection thresholds in xyY space (Van Dyck 1994).

Table 3. Step size for changes in each direction (Van Dyck 1994).

Table 4. Mean detection thresholds in *xy*Y space for subbands.

SUBBAND Mean Detection Threshold

LH1 8.731 9.939 41.554 HL1 10.546 12.508 39.859 HH1 32.436 48.890 75.188 LH2 6.664 5.793 16.236 HL2 6.462 7.871 17.576 HH2 9.556 13.242 40.877 LH3 6.520 4.750 6.454 HL3 6.514 6.402 8.168 HH3 7.431 7.261 20.839

$$\begin{array}{rcl} X\_i &=& \mathbf{x}\_i & \mathbf{y}\_0 & \neq & y\_i \\\\ Z\_i &=& \begin{pmatrix} \mathbf{1} & \mathbf{-} & \mathbf{x}\_i & \mathbf{-} & y\_i \end{pmatrix} & \mathbf{Y}\_0 & \neq & y\_i \\\\ & & & & \\\\ & & & & \end{array} \tag{5}$$
  $for \quad i = 1, 2, 3$ 

For the luminance direction each mean detection threshold also provides two XYZ values that are calculated using the equations in 6:

$$\begin{array}{rcl} X\_i &=& X\_0 \ \pm & \Delta Y. \frac{X\_0}{Y\_0}.t\\ Y\_i &=& Y\_0 \ \pm & \Delta Y.t \end{array} \tag{6}$$
 
$$\begin{array}{rcl} Z\_i &=& Z\_0 \ \pm & \Delta Y. \frac{Z\_0}{Y\_0}.t \end{array} \tag{6}$$

where *Y* is given in Table 3, *t* is the mean detection threshold and *i* 1,2 . The vector 00 0 *XYZ* , , contains the coordinates of the white point, computed from equation 6. The resulting values are then transformed into the YIQ space. The Red-Green line lies approximately in the I-direction and the Blue-Yellow line lies mostly in the Q direction. The linear transformations in equations 7 and 8 are used to give two points for each direction in the YIQ space.

$$
\begin{bmatrix} R \\ G \\ B \end{bmatrix} = \begin{bmatrix} 1.910 & -0.533 & -0.288 \\ -0.985 & 2.000 & -0.028 \\ 0.058 & -0.118 & 0.896 \end{bmatrix} \dots \begin{bmatrix} X \\ Y \\ Z \end{bmatrix} \tag{7}
$$

$$
\begin{bmatrix} Y \\ I \\ Q \end{bmatrix} = \begin{bmatrix} 0.299 & 0.587 & 0.114 \\ 0.596 & -0.274 & -0.322 \\ 0.211 & -0.523 & 0.312 \end{bmatrix} \dots \begin{bmatrix} R \\ G \\ B \end{bmatrix} \tag{8}
$$

The YIQ mean detection threshold for each direction is the inverse Euclidean distance between these two points. The computed weighting factors for each subband of QCIF video, based on the properties of the HVS, are shown in Table 5. These values represent the perceptual weights that can be used to regulate the quantization step-size in the pixel quantization of the high frequency subbands' coefficients of the Multiresolution based image/video codecs.

#### **4. Statistical parameters in image compression**

Statistical parameters of the image data have been used in a number of image compression techniques (Chang & Chen, 1993; Lu et al., 2000; Lu et al., 2002; Saryazdi & Jafari, 2002) and have demonstrated promising improvement in the quality of decompressed images, especially at medium to high compression ratios. A vector quantization based image

Wavelet Based Image Compression Techniques 433

of the statistical behaviour of the wavelet transformed coefficients in each subband of an image, can play an important role in designing an efficient compression algorithm. Study of many non-artificial images has shown that distribution of the wavelet-transformed coefficients in high frequency subbands of natural images follow a Gaussian distribution (Altunbasak & Kamaci, 2004; Kilic & Yilmaz, 2003; Eude et al., 1994; Valade & Nicolas, 2004; Yovanof & Liu, 1996). In the following, the Gaussian distribution and its statistical parameters are first reviewed. Then, a review of the study on the distribution of the wavelet transform-coefficients of images is given. A one dimensional Gaussian distribution function

<sup>2</sup> 2

is the mean value of ( ) *gf x* and is calculated using Equation 10:

( ) *<sup>g</sup>*

is known as the standard deviation, which determines the width the of the

*i <sup>i</sup> <sup>x</sup> <sup>n</sup>* <sup>1</sup> 1

 *n*

*i <sup>i</sup> <sup>x</sup> <sup>n</sup>* <sup>1</sup> <sup>2</sup> <sup>2</sup> ( ) <sup>1</sup>

where *n* is the number of the discrete data, and *<sup>i</sup> x* is the data. Every Gaussian distribution function is defined by two parameters: the mean value, which defines the central location of the distribution, and the variance, which defines the width of the distribution. Four Gaussian distribution functions, with different mean values and variances, are shown in

Study of the distribution of wavelet transform coefficients in each subband has shown that the distribution of the coefficients in the detail subbands of the wavelet-transformed data of natural images is approximately Gaussian (coefficients in the baseband are excluded) [Valade and Nicolas, 2004][Kilic and Yilmaz, 2003]. Distributions of the wavelet coefficients of an image, after applying a three level 2D-wavelet transform, are shown in Figure 5. From Figure 5, it can be seen that except for the lowest frequency coefficients, the distribution of

*n*

*<sup>x</sup> <sup>f</sup> x dx*

2 2 ( ) *<sup>g</sup>*

*<sup>x</sup> <sup>f</sup> x dx*

<sup>1</sup> ( )

, and variance, <sup>2</sup>

the coefficients in high frequency subbands is approximately Gaussian.

distribution. The square of the standard deviation, <sup>2</sup>

2 2 2 ( )

*gf <sup>x</sup> <sup>e</sup>* (9)

(10)

(11)

, of discrete data, are calculated using

(12)

(13)

, is called the variance and is

*x*

( ) *gf x* is defined as follow:

determined as follows:

where the mean value,

Equations 12 and 13, respectively.

where

and 

Figure 4.

compression algorithm was proposed by Chang and Chen (Chang & Chen, 1993). It first generates a number of sub-codebooks from the super-codebook, and then employs the statistical parameters of the upper and left neighbour vectors to decide which codebook is to be used for vector quantization. This coding scheme has been extended by Lu et al. (Lu et al., 2000) who generated two master-codebooks, one for the codewords whose variances are larger than a threshold, and another one for the remainder codewords. Lu et al. exploited the current vector's statistical parameter to decide which of these two master codebooks to use for vector quantization, and then Chang and Chen's algorithm was applied to perform vector quantization. Lu et al. (Lu et al., 2002) successfully developed other gradient-based vector quantization schemes and reported further improvement at low bit rates. In the Lu et al. proposed algorithms, one master codebook is first generated and codewords are then sorted in ascending order of their gradient values. In the first algorithm, Chang and Chen's (Chang and Chen, 1993) technique is used to perform vector quantization, with the difference that gradient parameters instead of statistical parameters are used to decide which codebook is to be used for vector quantization. In the second algorithm, the number of codebooks was increased, which resulted in further bit reduction. Another statisticallybased image compression scheme was reported by (Saryazdi and Jafari, 2002). In this algorithm, the input image is divided to a number of blocks. The statistical parameters are then used to classify each block into uniform and non-uniform blocks. The uniform blocks are coded by their minimum values. The non-uniform blocks are coded by their minimum and residual values, where the residual values are vector quantized. They reported promising visual quality at high compression ratios.


Table 5. Perceptual weight factors for the YIQ colour domain.

#### **4.1 Distribution of wavelet transform coefficients**

Wavelet transform is one of the most popular transform that has been used in many imagecoding schemes. As each statistical distribution function has its own parameters, knowledge

compression algorithm was proposed by Chang and Chen (Chang & Chen, 1993). It first generates a number of sub-codebooks from the super-codebook, and then employs the statistical parameters of the upper and left neighbour vectors to decide which codebook is to be used for vector quantization. This coding scheme has been extended by Lu et al. (Lu et al., 2000) who generated two master-codebooks, one for the codewords whose variances are larger than a threshold, and another one for the remainder codewords. Lu et al. exploited the current vector's statistical parameter to decide which of these two master codebooks to use for vector quantization, and then Chang and Chen's algorithm was applied to perform vector quantization. Lu et al. (Lu et al., 2002) successfully developed other gradient-based vector quantization schemes and reported further improvement at low bit rates. In the Lu et al. proposed algorithms, one master codebook is first generated and codewords are then sorted in ascending order of their gradient values. In the first algorithm, Chang and Chen's (Chang and Chen, 1993) technique is used to perform vector quantization, with the difference that gradient parameters instead of statistical parameters are used to decide which codebook is to be used for vector quantization. In the second algorithm, the number of codebooks was increased, which resulted in further bit reduction. Another statisticallybased image compression scheme was reported by (Saryazdi and Jafari, 2002). In this algorithm, the input image is divided to a number of blocks. The statistical parameters are then used to classify each block into uniform and non-uniform blocks. The uniform blocks are coded by their minimum values. The non-uniform blocks are coded by their minimum and residual values, where the residual values are vector quantized. They reported

SUBBAND Y-DOMAIN I-DOMAIN Q-DOMAIN

Wavelet transform is one of the most popular transform that has been used in many imagecoding schemes. As each statistical distribution function has its own parameters, knowledge

LH1 4.3807 2.0482 1.0502 HL1 3.4573 1.6159 1.0992 HH1 1.2372 0.6978 0.6065 LH2 5.9673 3.6449 2.6340 HL2 6.1708 2.7149 2.4728 HH2 4.1934 1.6384 1.1331 LH3 6.1796 4.5685 7.1443 HL3 6.1984 3.3243 5.5495 HH3 5.3931 2.9888 2.2339

promising visual quality at high compression ratios.

Table 5. Perceptual weight factors for the YIQ colour domain.

**4.1 Distribution of wavelet transform coefficients** 

of the statistical behaviour of the wavelet transformed coefficients in each subband of an image, can play an important role in designing an efficient compression algorithm. Study of many non-artificial images has shown that distribution of the wavelet-transformed coefficients in high frequency subbands of natural images follow a Gaussian distribution (Altunbasak & Kamaci, 2004; Kilic & Yilmaz, 2003; Eude et al., 1994; Valade & Nicolas, 2004; Yovanof & Liu, 1996). In the following, the Gaussian distribution and its statistical parameters are first reviewed. Then, a review of the study on the distribution of the wavelet transform-coefficients of images is given. A one dimensional Gaussian distribution function ( ) *gf x* is defined as follow:

$$\int f\_{\mathcal{S}}(x) \, d\mathbf{x} = \frac{1}{\sqrt{2\pi\sigma^2}} \quad \text{e} \quad \frac{-(x-\mu)^2}{2\sigma^2} \tag{9}$$

where is the mean value of ( ) *gf x* and is calculated using Equation 10:

$$
\mu = \int\_{-\infty}^{+\infty} \mathbf{x} \, f\_{\mathcal{S}}(\mathbf{x}) \, d\mathbf{x} \tag{10}
$$

and is known as the standard deviation, which determines the width the of the distribution. The square of the standard deviation, <sup>2</sup> , is called the variance and is determined as follows:

$$
\sigma^2 = \int\_{-\infty}^{+\infty} \mathbf{x}^2 \, f\_{\mathcal{S}}(\mathbf{x}) \, d\mathbf{x} \tag{11}
$$

where the mean value, , and variance, <sup>2</sup> , of discrete data, are calculated using Equations 12 and 13, respectively.

$$\|\mu\|\_{\text{op}} = \frac{1}{n} \cdot \sum\_{i=1}^{n} \|\mathbf{x}\_i\|\_{\text{op}} \tag{12}$$

$$\left(\sigma^{2}\right)^{2} = \frac{1}{n} \cdot \sum\_{i=1}^{n} \left(x\_{i} - \mu\right)^{2} \tag{13}$$

where *n* is the number of the discrete data, and *<sup>i</sup> x* is the data. Every Gaussian distribution function is defined by two parameters: the mean value, which defines the central location of the distribution, and the variance, which defines the width of the distribution. Four Gaussian distribution functions, with different mean values and variances, are shown in Figure 4.

Study of the distribution of wavelet transform coefficients in each subband has shown that the distribution of the coefficients in the detail subbands of the wavelet-transformed data of natural images is approximately Gaussian (coefficients in the baseband are excluded) [Valade and Nicolas, 2004][Kilic and Yilmaz, 2003]. Distributions of the wavelet coefficients of an image, after applying a three level 2D-wavelet transform, are shown in Figure 5. From Figure 5, it can be seen that except for the lowest frequency coefficients, the distribution of the coefficients in high frequency subbands is approximately Gaussian.

Wavelet Based Image Compression Techniques 435

In section 4.1 the Gaussian distribution function and its statistical parameters were reviewed. It was shown that every Gaussian distribution function is defined by two parameters: the mean value, which defines the central location of the distribution, and the variance, which defines the width of the distribution. It was also noted that the distribution of the coefficients in each detail subband of the wavelet-transformed data of the natural images is approximately Gaussian has led to the development of a Statistical Encoding (SE) algorithm. The SE algorithm assumes that the coefficients in the 2D input matrix partly follows the Gaussian distribution. Therefore it estimates those parts through a novel hierarchical estimation algorithm, which codes in a lossy manner those parts with their mean values. The SE algorithm applies a threshold value on the variance of the input data to determine if it is possible to estimate them with the mean value of a single Gaussian distribution function or if it needs further dividing into four sub-matrices. This hierarchal algorithm is iterated on the resulting sub-matrices until the distribution of the coefficients in all sub-matrices fulfils the above criteria. Finally, the SE algorithm takes the Gaussian mean values of the resulting sub-matrices as the estimation value for those submatrices. The SE algorithm generates a quadtree-like binary map along with the mean values to keep a record of the location of the sub-matrices, which are estimated with their

A block diagram of the SE algorithm is shown in Figure 6. A two dimensional matrix of size NN, which for simplification is called U, along with a threshold value, which represents the level of compression, are input to the SE technique. The SE algorithm performs the following process to compress the input matrix U: The SE algorithm first defines two empty vectors called mv (mean value vector) and q (quadtree-like vector). It then calculates the variance (var) and the mean value (m) of the matrix U and compares the resulted variance value with the threshold value. If the variance is less than the threshold value, the matrix is coded by its mean value (m) and one bit binary data equal to 0, which are placed in the mv and q vectors, respectively. If the variance is greater than the threshold, one bit binary data equal to one is placed at the q vector and the size of the matrix is checked. If the size of the matrix is 22, the four coefficients of the matrix are scanned and placed in the mv vector and encoding process is finished by sending the mean value vector mv and the quadtree-like vector q. If the size of the matrix is greater than 22, the matrix U is divided into four equal non-overlapping blocks. These four blocks are then processed from left to right, as shown in Figure 6. For simplify, only the continuation of the coding process of the first block, U1, is discussed. This process is repeated exactly on the three other blocks. Processing of the first block U1 is described as follows: The variance (var1) and the mean value (m1) of the sub-matrix U1 are first calculated and then the resulting variance value is compared with the input threshold value. If it is less than the threshold value, the calculated mean value (m1) is concatenated to the mean value vector mv and one bit binary data equal to 0 is appended to the quadtree-like vector q. The encoding process of this sub-block is terminated at this stage. Otherwise, the size of the sub-block is checked. If it is 22, one bit binary data equal to 1 is appended to the current quadtree-like vector q and the four coefficients of the sub-block are scanned and concatenated to the mv vector and encoding process is ended for this sub-block. If its size is larger than 22, one bit binary data equal to 1 is concatenated to the

**4.2 Statistical encoder** 

mean values.

Fig. 4. Gaussian distribution functions (http://en.wikipedia.org/wiki/Normal distribution).

Fig. 5. Histogram of three level wavelet transform of an image (Kilic & Yilmaz, 2003).

In summary, it can be concluded that distribution of the wavelet coefficients in high frequency subbands of natural images can be well approximated by a Gaussian distribution. Therefore, effective use of statistical parameters of the transformed image data (mean values and variances of a Gaussian distribution function) is key in estimation of the transformed data and yielding compression.

#### **4.2 Statistical encoder**

434 Advances in Wavelet Theory and Their Applications in Engineering, Physics and Technology

Fig. 4. Gaussian distribution functions (http://en.wikipedia.org/wiki/Normal -

Fig. 5. Histogram of three level wavelet transform of an image (Kilic & Yilmaz, 2003).

In summary, it can be concluded that distribution of the wavelet coefficients in high frequency subbands of natural images can be well approximated by a Gaussian distribution. Therefore, effective use of statistical parameters of the transformed image data (mean values and variances of a Gaussian distribution function) is key in estimation of the transformed

distribution).

data and yielding compression.

In section 4.1 the Gaussian distribution function and its statistical parameters were reviewed. It was shown that every Gaussian distribution function is defined by two parameters: the mean value, which defines the central location of the distribution, and the variance, which defines the width of the distribution. It was also noted that the distribution of the coefficients in each detail subband of the wavelet-transformed data of the natural images is approximately Gaussian has led to the development of a Statistical Encoding (SE) algorithm. The SE algorithm assumes that the coefficients in the 2D input matrix partly follows the Gaussian distribution. Therefore it estimates those parts through a novel hierarchical estimation algorithm, which codes in a lossy manner those parts with their mean values. The SE algorithm applies a threshold value on the variance of the input data to determine if it is possible to estimate them with the mean value of a single Gaussian distribution function or if it needs further dividing into four sub-matrices. This hierarchal algorithm is iterated on the resulting sub-matrices until the distribution of the coefficients in all sub-matrices fulfils the above criteria. Finally, the SE algorithm takes the Gaussian mean values of the resulting sub-matrices as the estimation value for those submatrices. The SE algorithm generates a quadtree-like binary map along with the mean values to keep a record of the location of the sub-matrices, which are estimated with their mean values.

A block diagram of the SE algorithm is shown in Figure 6. A two dimensional matrix of size NN, which for simplification is called U, along with a threshold value, which represents the level of compression, are input to the SE technique. The SE algorithm performs the following process to compress the input matrix U: The SE algorithm first defines two empty vectors called mv (mean value vector) and q (quadtree-like vector). It then calculates the variance (var) and the mean value (m) of the matrix U and compares the resulted variance value with the threshold value. If the variance is less than the threshold value, the matrix is coded by its mean value (m) and one bit binary data equal to 0, which are placed in the mv and q vectors, respectively. If the variance is greater than the threshold, one bit binary data equal to one is placed at the q vector and the size of the matrix is checked. If the size of the matrix is 22, the four coefficients of the matrix are scanned and placed in the mv vector and encoding process is finished by sending the mean value vector mv and the quadtree-like vector q. If the size of the matrix is greater than 22, the matrix U is divided into four equal non-overlapping blocks. These four blocks are then processed from left to right, as shown in Figure 6. For simplify, only the continuation of the coding process of the first block, U1, is discussed. This process is repeated exactly on the three other blocks. Processing of the first block U1 is described as follows: The variance (var1) and the mean value (m1) of the sub-matrix U1 are first calculated and then the resulting variance value is compared with the input threshold value. If it is less than the threshold value, the calculated mean value (m1) is concatenated to the mean value vector mv and one bit binary data equal to 0 is appended to the quadtree-like vector q. The encoding process of this sub-block is terminated at this stage. Otherwise, the size of the sub-block is checked. If it is 22, one bit binary data equal to 1 is appended to the current quadtree-like vector q and the four coefficients of the sub-block are scanned and concatenated to the mv vector and encoding process is ended for this sub-block. If its size is larger than 22, one bit binary data equal to 1 is concatenated to the

Wavelet Based Image Compression Techniques 437

Encoder

Encoder

Encoder

Encoder

Encoder

Encoder

A block diagram of the Multi-resolution and Statistical Based (MSB) image-coding algorithm is shown in Figure 7. A gray scale image is input to the image encoder. The MSB encoder

Threshold Regulator

T1 …. Tn

Level Shifter Statistical-based

Level Shifter Statistical-based

Level Shifter Statistical-based

Level Shifter Statistical-based

Level Shifter Statistical-based

Level Shifter Statistical-based

Quality factor

LH1

HL1

HH1

LHn

HLn

HHn

LLn

Lossless encoding

**4.3 Statistical and wavelet based image codec** 

Fig. 7. The multi-resolution and statistical based image encoder.

…….

Baseband coded data

M

Quadtree like data

Min

Quadtree like data

Min

Quadtree like data

Min

Quadtree like data

Min

Quadtree like data

Min

Quadtree like data

Min

Mean vector

Mean vector

Mean vector

Mean vector

Mean vector

Mean vector

Perceptual weights

> B i t - s t r e a m

U

X

D

W

I n p u t I m a g e

T

current quadtree-like vector q and the sub-block U1 is then divided into four equal nonoverlapping blocks. These four new sub-blocks are named successor sub-blocks and are processed from left to right in the same way that their four ancestor sub-blocks were encoded. The above process is continued until whole successor blocks are encoded. When the encoding process is finished two vectors **mv** and **q** represent the compressed data of the input matrix U.

Fig. 6. Block diagram of the Statistical Encoder.

current quadtree-like vector q and the sub-block U1 is then divided into four equal nonoverlapping blocks. These four new sub-blocks are named successor sub-blocks and are processed from left to right in the same way that their four ancestor sub-blocks were encoded. The above process is continued until whole successor blocks are encoded. When the encoding process is finished two vectors **mv** and **q** represent the compressed data of

q : = [q , 1] , and divide matrix 'U' into four equal non-overlapped blocks

N

U2 U3

U11 U12 U13 U14 U41 U42 U43 U44

U U n2 m4

U n3 U n4 Um3 U U m1 m2

….

….

….

….

….

….

U1 U4

Is size = 2x2

var < thr

N

Y

<sup>Y</sup> mv : = [mv , m] q : = [q , 0]

mv : = [mv , c1 , c2 , c3 , c4]

var4 < thr

Calculate m4 = mean (U4) var4 = var(U4)

N

Y

<sup>Y</sup> mv4 : = [mv3 , m4] q 4 : = [q 3 , 0]

End

End

q 4 : = [q 3 , 1]

' into four equal non-overlapped blocks

…

varm4 < thr <sup>Y</sup> mv m4 : =[mv m3 , m m4]

q m4 : = [q m3 , 1]

mvm4 : = [mvm3 , c1 , c2 , c3 , c4]

q m4 : = [q m3 , 0]

mv4 : = [mv3 , c1 , c2 , c3 , c4]

...

End

End

Is size = 2x2

q 4= [q 3 , 1]

…

N

Divide 'Um', the 4x4 matrix, into four non-overlapped 2x2 blocks

Calculate mm1 = mean (Um1) varm1 = var(Um1)

Divide matrix'U4

…

q : = [q , 1]

the input matrix U.

var1 < thr

N

Calculate m1 = mean (U1) var1 = var(U1)

Y

…

varn1 < thr <sup>Y</sup> mvn1 : = [mvn-1 , mn1]

Is size = 2x2

q1 : = [q , 1]

...

U n1

Calculate mn1 = mean (Un1) varn1 = var(Un1)

N

Input Matrix ( U )

Divide matrix 'U1' into four equal non-overlapped blocks

Divide 'Un', the 4x4 matrix, into four non-overlapped 2x2 blocks

q n1 : = [q n-1 , 0]

q n1 = [q n-1 , 1]

mvn1 : = [mvn-1 , c1 , c2 , c3 , c4]

Fig. 6. Block diagram of the Statistical Encoder.

q1 : = [q , 1]

mv1 : = [mv , c1 , c2 , c3 , c4]

…

…

<sup>Y</sup> mv1 : = [mv , m1] q1: = [q , 0]

mv : = [] q : = [] Calculate m = mean(U) var = var(U)

#### **4.3 Statistical and wavelet based image codec**

A block diagram of the Multi-resolution and Statistical Based (MSB) image-coding algorithm is shown in Figure 7. A gray scale image is input to the image encoder. The MSB encoder

Wavelet Based Image Compression Techniques 439

DWT Level Subband Y-Domain I-Domain Q-Domain

LH 3.0230 1.3251 0.7258

HL 2.0443 1.0275 0.7681

HH 0.8713 0.4273 0.4697

LH 5.4726 2.9355 1.6570

HL 5.5166 2.3270 1.6560

HH 2.4531 1.0992 0.8321

LH 6.1930 4.2479 4.9906

HL 6.3060 2.9823 4.0070

HH 4.8143 2.2390 1.6068

Table 6. Perceptual weights for the YIQ colour domain (512512 image size and a viewing

In order to evaluate the performance of the proposed MSB codec two sets of experiments were performed. In the first sets of experiments the performance of the MSB codec using perceptual weights is compared to that of MSB without using perceptual weights to regulate the threshold values for different subbands, which are presented in Sub-section 4.3.2.1 In the second sets of experiments, the MSB codec using perceptual weights is compared to those of JPEG and JPEG2000 standard image codecs, where the results are illustrated in Sub-section

The performance of the MSB image codec was investigated on three greyscale test images (with resolution of 8-bits per pixel) and size of 512512 pixels: 'Lena', 'Elaine', and 'House'. These test images cover all range of spatial frequencies from very low frequency smooth areas, to textures with middle frequencies, and very high frequency sharp edges. In order to evaluate the effect of the perceptual weights on the performance of the proposed codec, 'Lena', 'Elaine', and 'House' test images were compressed using the proposed codec with and without using perceptual weights to regulate the uniform threshold value for different subbands. A three level Daubechies 9/7 wavelet transform was used to decompose the input image into ten subbands for this experiment. The PSNR criterion was used to evaluate the quality of the reconstructed images. The PSNR measurements for the test images at different compression ratios using the MSB codec

**4.3.2.1 Results for the codec with and without using perceptual weights** 

ONE

TWO

THREE

distance of 40 cm).

**4.3.2 Results** 

4.3.2.2.

then applies a 2D lifting based Discrete Wavelet Transform (DWT) to the input image data and decomposes them into a number of subbands. The DWT concentrates most of the image energy into the baseband. Hence, the baseband is losslessly coded using a Differential Pulse Code Modulation (DPCM) algorithm, which will be explained at the end of this section, to preserve visually important information in the baseband. Coefficients in each detail subband are coded using the procedure that is illustrated in Figure 7 as follows: (i) The coefficients in each detail subband are first level shifted to have a minimum value (Min) of zero; (ii) The resulting level shifted coefficients are then coded using the SE algorithm. The SE algorithm takes the level shifted coefficients of a detail subband and a threshold value, which is specifically designed for that subband, and performs the encoding process (The procedures for generating threshold values for different subbands are explained in Section 4.2.1); (iii) The output of each SE encoder is a mean value vector (mv), which carries the mean values, and a quadtree-like vector (q), which carries the quadtree-like data; (iv) Finally the multiplexor combines all the resulting data together and generates the compressed output bitstream.


Fig. 8. Three-sample prediction neighbourhoods for DPCM method.

In the DPCM method pixel X with the value of x, is predicted from its three neighbouring pixels, called: A, B and C, with the values of a, b and c respectively, as shown in Figure 8. The prediction value of pixel X, called Px, is calculated using Equation 14:

$$\begin{array}{ccccc} P\_X & = & b & + & \frac{a-c}{2} \end{array} \tag{14}$$

The predicted value of pixel X is then subtracted from the actual value of pixel X to generate an error value, and all the resulting error values are finally losslessly coded.

#### **4.3.1 Threshold generation**

In this research work, perceptual weights are employed to regulate the threshold values for different subbands. Hence, the threshold value for each detail subband is generated using a uniform quality factor divided by the perceptual weight of the centre of that subband, where the uniform quality factor can take any positive value. There is a direct relationship between the uniform quality factor and the resulting compression ratios. In Section 3.2 an algorithm for calculating the perceptual weights for detail subbands of a wavelet transformed image data was given. The proposed algorithm is used to calculate the perceptual weights for the centre of each detail subband of an image of size 512512 and a viewing distance of 40 centimetres, which are shown in Table 6.

then applies a 2D lifting based Discrete Wavelet Transform (DWT) to the input image data and decomposes them into a number of subbands. The DWT concentrates most of the image energy into the baseband. Hence, the baseband is losslessly coded using a Differential Pulse Code Modulation (DPCM) algorithm, which will be explained at the end of this section, to preserve visually important information in the baseband. Coefficients in each detail subband are coded using the procedure that is illustrated in Figure 7 as follows: (i) The coefficients in each detail subband are first level shifted to have a minimum value (Min) of zero; (ii) The resulting level shifted coefficients are then coded using the SE algorithm. The SE algorithm takes the level shifted coefficients of a detail subband and a threshold value, which is specifically designed for that subband, and performs the encoding process (The procedures for generating threshold values for different subbands are explained in Section 4.2.1); (iii) The output of each SE encoder is a mean value vector (mv), which carries the mean values, and a quadtree-like vector (q), which carries the quadtree-like data; (iv) Finally the multiplexor combines all the resulting data together and generates the compressed

A X

In the DPCM method pixel X with the value of x, is predicted from its three neighbouring pixels, called: A, B and C, with the values of a, b and c respectively, as shown in Figure 8.

*<sup>a</sup> <sup>c</sup> Px <sup>b</sup>*

The predicted value of pixel X is then subtracted from the actual value of pixel X to generate

In this research work, perceptual weights are employed to regulate the threshold values for different subbands. Hence, the threshold value for each detail subband is generated using a uniform quality factor divided by the perceptual weight of the centre of that subband, where the uniform quality factor can take any positive value. There is a direct relationship between the uniform quality factor and the resulting compression ratios. In Section 3.2 an algorithm for calculating the perceptual weights for detail subbands of a wavelet transformed image data was given. The proposed algorithm is used to calculate the perceptual weights for the centre of each detail subband of an image of size 512512 and a

2

(14)

Fig. 8. Three-sample prediction neighbourhoods for DPCM method.

The prediction value of pixel X, called Px, is calculated using Equation 14:

an error value, and all the resulting error values are finally losslessly coded.

viewing distance of 40 centimetres, which are shown in Table 6.

B C

output bitstream.

**4.3.1 Threshold generation** 


Table 6. Perceptual weights for the YIQ colour domain (512512 image size and a viewing distance of 40 cm).

## **4.3.2 Results**

In order to evaluate the performance of the proposed MSB codec two sets of experiments were performed. In the first sets of experiments the performance of the MSB codec using perceptual weights is compared to that of MSB without using perceptual weights to regulate the threshold values for different subbands, which are presented in Sub-section 4.3.2.1 In the second sets of experiments, the MSB codec using perceptual weights is compared to those of JPEG and JPEG2000 standard image codecs, where the results are illustrated in Sub-section 4.3.2.2.

#### **4.3.2.1 Results for the codec with and without using perceptual weights**

The performance of the MSB image codec was investigated on three greyscale test images (with resolution of 8-bits per pixel) and size of 512512 pixels: 'Lena', 'Elaine', and 'House'. These test images cover all range of spatial frequencies from very low frequency smooth areas, to textures with middle frequencies, and very high frequency sharp edges. In order to evaluate the effect of the perceptual weights on the performance of the proposed codec, 'Lena', 'Elaine', and 'House' test images were compressed using the proposed codec with and without using perceptual weights to regulate the uniform threshold value for different subbands. A three level Daubechies 9/7 wavelet transform was used to decompose the input image into ten subbands for this experiment. The PSNR criterion was used to evaluate the quality of the reconstructed images. The PSNR measurements for the test images at different compression ratios using the MSB codec

Wavelet Based Image Compression Techniques 441

and ringing artefacts around edges in the image. In terms of overall visual quality, the MSB decoded Lena test image has superior visual quality to that of JPEG. It is also clear that the quality of the MSB decoded image is slightly inferior to that of JPEG2000. From Figures 13(b), which illustrates the decoded Elaine test images at a compression ratio of 40, it is obvious that: a) the decoded JPEG image exhibits severe blocking artefacts; b) the MSB decoded image has higher visual quality but suffers from blurring in the background of the image and ringing artefacts in its sharp edges and c) the JPEG2000 decoded image has high visual quality but slight blurring and ringing artefacts can be seen in some regions of the background and sharp edges of the image. It is clear that the JPEG2000

The results presented here demonstrate that the MSB codec outperforms JPEG and JPEG2000 image codecs, subjectively and objectively, at low compression ratios (up to compression ratio of 5). The results also show that at middle-range compression ratios JPEG decoded images somewhat suffer from blocking artefacts, while the visual quality of the

The results at high compression ratios (around 40) indicate that a) the JPEG decoded images severely suffer from blocking artefacts, so much so that there is no point in using JPEG to code images at high compression ratios; b) the MSB decoded images have significantly higher visual quality than that of JPEG, while they slightly suffer from patchy blur in regions with soft texture and ringing noise at sharp edges; c) decoded MSB images have significantly lower PSNR in comparison to that of JPEG2000 but their visual quality is

In this Chapter first a novel statistical encoding algorithm was presented. The proposed SE algorithm assumes that the distribution of the coefficients in the input matrix is partly Gaussian and uses a hierarchal encoding algorithm to estimate the coefficients in the input matrix with the Gaussian mean values of multiple distributions; then a multi-resolution and statistical based image-coding scheme was developed. It applies a 2D wavelet transform on the input image data to decompose it into its frequency subbands. The baseband is losslessly coded to preserve the visually important image data. The coefficients in each detail subband were first dc level shifted to have a minimum value of zero and then coded using the SE algorithm. The SE algorithm takes the dc level shifted coefficients of a detail subband and a threshold value, which is generated for that subband. The encoding process is then performed. Perceptual weights were calculated for the centre of each detail subband and

Experimental results showed that the proposed coding scheme provides significantly higher subjective and objective quality when perceptual weights are used to regulate the threshold values. The results also indicated that the proposed codec outperforms JPEG and JPEG2000 coding schemes subjectively and objectively at low compression ratios. Results showed that the proposed coding scheme outperforms JPEG subjectively at higher compression ratios. It offers comparable visual quality to that of JPEG2000 at high

decoded images have slightly higher visual quality than MSB decoded images.

MSB decoded images is significantly higher.

slightly inferior to that of JPEG2000.

used to regulate the threshold value for that subband.

**5. Conclusion** 

compression ratios.

with and without perceptual weights are given in Figure 9(a) to 9(c) respectively. From these Figures, it is clear that the MSB codec using perceptual weights gives significantly higher performance to that of the MSB codec without perceptual weights. However, it is well known that the PSNR is an unreliable metric for measuring the visual quality of the decompressed images (Kaia et al., 2005). Hence, to illustrate the true visual quality obtained using the MSB codec with and without perceptual weights, the reconstructed 'Lena', 'Elaine', and 'House' images at compression ratio of 16 using the proposed codec are shown in Figure 10(I) to 10(III), respectively. From theses figures, it can be seen that the reconstructed images, when perceptual weights are used in the encoding process, have significantly higher visual quality with less blurred edges and better surface details. From Figure 10(I) and 10(II), which show decoded 'Lena' and 'Elaine' test images, it is obvious that the images using the MSB codec using perceptual weights have a noticeably higher quality to those decoded using the MSB codec without employing perceptual weights. It can also be seen that the decoded test images using the MSB codec with perceptual weights have clearer facial details with less blurring in the faces. From Figure 10(III) it is clear that the reconstructed 'House' test image using MSB with HVS have significantly higher visual quality with lower blurred edges and clearer surface details.

#### **4.3.2.2 Results of the MSB, JPEG and JPEG2000 codecs**

In this section, the performance of the MSB codec with perceptual weights is compared to JPEG and JPEG2000 (JPEG2000, 2005) standard image coding techniques. The MSB, JPEG and JPEG2000 were used to compress ''Lena', 'Elaine', and 'House' test images at different compression ratios. The PSNR measurements for the encoded images using the MSB, JPEG, and JPEG2000 image codecs at different compression ratios are shown in Figures 11(a) to 11(c), respectively. From these figures it can be seen that the MSB codec gives superior performance to JPEG and JPEG2000 at low compression ratios. From Figure 11(a) and 11(b), it can be observed that the proposed codec offers higher PSNR in coding 'Lena' and 'Elaine' test images to those of JPEG and JPEG2000 at compression ratios lower than 5. From Figure 11(c), it is clear that the MSB codec outperforms JPEG and JPEG2000 in coding 'House' test images at compression ratios of up to 4. However, it is well known that the PSNR often does not reflect the visual quality of the decoded images, thus a perceptual quality evaluation seems to be necessary. To demonstrate the visual quality achieved using the MSB, JPEG and JPEG2000 coding techniques at different compression ratios, the decoded 'Lena' and 'Elaine' test images at compression ratios 5 and 40 using these techniques are shown in Figures 12 and 13, respectively.

From Figures 12(a), it can be seen that the visual quality of the decoded Lena test image at a compression ratio of 5 using MSB codec is high. It is also clear that the quality of the decoded Lena test image using MSB codec is slightly higher than that of JPEG and almost the same as that of JPEG2000. The Elaine test image contains significant high frequency details and is more difficult to code. From Figure 12(b), which illustrates the decoded Elaine test images at compression ratio of 5, the high visual quality of all the decoded images is obvious. From Figures 13(a), which illustrates the decoded Lena test images at a compression ratio of 40, the severe blocking artefact of the decoded image using JPEG is quite obvious, where the MSB decoded image contains some blurring around the mouth

with and without perceptual weights are given in Figure 9(a) to 9(c) respectively. From these Figures, it is clear that the MSB codec using perceptual weights gives significantly higher performance to that of the MSB codec without perceptual weights. However, it is well known that the PSNR is an unreliable metric for measuring the visual quality of the decompressed images (Kaia et al., 2005). Hence, to illustrate the true visual quality obtained using the MSB codec with and without perceptual weights, the reconstructed 'Lena', 'Elaine', and 'House' images at compression ratio of 16 using the proposed codec are shown in Figure 10(I) to 10(III), respectively. From theses figures, it can be seen that the reconstructed images, when perceptual weights are used in the encoding process, have significantly higher visual quality with less blurred edges and better surface details. From Figure 10(I) and 10(II), which show decoded 'Lena' and 'Elaine' test images, it is obvious that the images using the MSB codec using perceptual weights have a noticeably higher quality to those decoded using the MSB codec without employing perceptual weights. It can also be seen that the decoded test images using the MSB codec with perceptual weights have clearer facial details with less blurring in the faces. From Figure 10(III) it is clear that the reconstructed 'House' test image using MSB with HVS have significantly higher visual quality with lower blurred edges and clearer surface details.

In this section, the performance of the MSB codec with perceptual weights is compared to JPEG and JPEG2000 (JPEG2000, 2005) standard image coding techniques. The MSB, JPEG and JPEG2000 were used to compress ''Lena', 'Elaine', and 'House' test images at different compression ratios. The PSNR measurements for the encoded images using the MSB, JPEG, and JPEG2000 image codecs at different compression ratios are shown in Figures 11(a) to 11(c), respectively. From these figures it can be seen that the MSB codec gives superior performance to JPEG and JPEG2000 at low compression ratios. From Figure 11(a) and 11(b), it can be observed that the proposed codec offers higher PSNR in coding 'Lena' and 'Elaine' test images to those of JPEG and JPEG2000 at compression ratios lower than 5. From Figure 11(c), it is clear that the MSB codec outperforms JPEG and JPEG2000 in coding 'House' test images at compression ratios of up to 4. However, it is well known that the PSNR often does not reflect the visual quality of the decoded images, thus a perceptual quality evaluation seems to be necessary. To demonstrate the visual quality achieved using the MSB, JPEG and JPEG2000 coding techniques at different compression ratios, the decoded 'Lena' and 'Elaine' test images at compression ratios 5 and 40 using these techniques are shown in Figures 12

From Figures 12(a), it can be seen that the visual quality of the decoded Lena test image at a compression ratio of 5 using MSB codec is high. It is also clear that the quality of the decoded Lena test image using MSB codec is slightly higher than that of JPEG and almost the same as that of JPEG2000. The Elaine test image contains significant high frequency details and is more difficult to code. From Figure 12(b), which illustrates the decoded Elaine test images at compression ratio of 5, the high visual quality of all the decoded images is obvious. From Figures 13(a), which illustrates the decoded Lena test images at a compression ratio of 40, the severe blocking artefact of the decoded image using JPEG is quite obvious, where the MSB decoded image contains some blurring around the mouth

**4.3.2.2 Results of the MSB, JPEG and JPEG2000 codecs** 

and 13, respectively.

and ringing artefacts around edges in the image. In terms of overall visual quality, the MSB decoded Lena test image has superior visual quality to that of JPEG. It is also clear that the quality of the MSB decoded image is slightly inferior to that of JPEG2000. From Figures 13(b), which illustrates the decoded Elaine test images at a compression ratio of 40, it is obvious that: a) the decoded JPEG image exhibits severe blocking artefacts; b) the MSB decoded image has higher visual quality but suffers from blurring in the background of the image and ringing artefacts in its sharp edges and c) the JPEG2000 decoded image has high visual quality but slight blurring and ringing artefacts can be seen in some regions of the background and sharp edges of the image. It is clear that the JPEG2000 decoded images have slightly higher visual quality than MSB decoded images.

The results presented here demonstrate that the MSB codec outperforms JPEG and JPEG2000 image codecs, subjectively and objectively, at low compression ratios (up to compression ratio of 5). The results also show that at middle-range compression ratios JPEG decoded images somewhat suffer from blocking artefacts, while the visual quality of the MSB decoded images is significantly higher.

The results at high compression ratios (around 40) indicate that a) the JPEG decoded images severely suffer from blocking artefacts, so much so that there is no point in using JPEG to code images at high compression ratios; b) the MSB decoded images have significantly higher visual quality than that of JPEG, while they slightly suffer from patchy blur in regions with soft texture and ringing noise at sharp edges; c) decoded MSB images have significantly lower PSNR in comparison to that of JPEG2000 but their visual quality is slightly inferior to that of JPEG2000.
